Clone
Steve Loughran <stevel@cloudera.com>
committed
on 13 Feb
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd.
Contributed by Steve Loughran.

During S3A rename() and delete() ca… Show more
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd.

Contributed by Steve Loughran.

During S3A rename() and delete() calls, the list of objects delete is

built up into batches of a thousand and then POSTed in a single large

DeleteObjects request.

But as the IO capacity allowed on an S3 partition may only be 3500 writes

per second *and* each entry in that POST counts as a single write, then

one of those posts alone can trigger throttling on an already loaded

S3 directory tree. Which can trigger backoff and retry, with the same

thousand entry post, and so recreate the exact same problem.

Fixes

* Page size for delete object requests is set in

 fs.s3a.bulk.delete.page.size; the default is 250.

* The property fs.s3a.experimental.aws.s3.throttling (default=true)

 can be set to false to disable throttle retry logic in the AWS

 client SDK -it is all handled in the S3A client. This

 gives more visibility in to when operations are being throttled

* Bulk delete throttling events are logged to the log

 org.apache.hadoop.fs.s3a.throttled log at INFO; if this appears

 often then choose a smaller page size.

* The metric "store_io_throttled" adds the entire count of delete

 requests when a single DeleteObjects request is throttled.

* A new quantile, "store_io_throttle_rate" can track throttling

 load over time.

* DynamoDB metastore throttle resilience issues have also been

 identified and fixed. Note: the fs.s3a.experimental.aws.s3.throttling

 flag does not apply to DDB IO precisely because there may still be

 lurking issues there and it safest to rely on the DynamoDB client

 SDK.

Change-Id: I00f85cdd94fc008864d060533f6bd4870263fd84

Show less

trunk + 4 more