Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
HADOOP-16988. Remove source code from branch-2. (aajisaka via jhung)

This closes #1959

  1. … 10846 more files in changeset.
HADOOP-16319. S3A Etag tests fail with default encryption enabled on bucket.

Contributed by Ben Roling.

ETag values are unpredictable with some S3 encryption algorithms.

Skip ITestS3AMiscOperations tests which make assertions about etags

when default encryption on a bucket is enabled.

When testing with an AWS an account which lacks the privilege

for a call to getBucketEncryption(), we don't skip the tests.

In the event of failure, developers get to expand the

permissions of the account or relax default encryption settings.

  1. … 1 more file in changeset.
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd.

Contributed by Steve Loughran.

During S3A rename() and delete() calls, the list of objects delete is

built up into batches of a thousand and then POSTed in a single large

DeleteObjects request.

But as the IO capacity allowed on an S3 partition may only be 3500 writes

per second *and* each entry in that POST counts as a single write, then

one of those posts alone can trigger throttling on an already loaded

S3 directory tree. Which can trigger backoff and retry, with the same

thousand entry post, and so recreate the exact same problem.

Fixes

* Page size for delete object requests is set in

fs.s3a.bulk.delete.page.size; the default is 250.

* The property fs.s3a.experimental.aws.s3.throttling (default=true)

can be set to false to disable throttle retry logic in the AWS

client SDK -it is all handled in the S3A client. This

gives more visibility in to when operations are being throttled

* Bulk delete throttling events are logged to the log

org.apache.hadoop.fs.s3a.throttled log at INFO; if this appears

often then choose a smaller page size.

* The metric "store_io_throttled" adds the entire count of delete

requests when a single DeleteObjects request is throttled.

* A new quantile, "store_io_throttle_rate" can track throttling

load over time.

* DynamoDB metastore throttle resilience issues have also been

identified and fixed. Note: the fs.s3a.experimental.aws.s3.throttling

flag does not apply to DDB IO precisely because there may still be

lurking issues there and it safest to rely on the DynamoDB client

SDK.

Change-Id: I00f85cdd94fc008864d060533f6bd4870263fd84

  1. … 26 more files in changeset.
HADOOP-16832. S3Guard testing doc: Add required parameters for S3Guard testing in IDE. (#1822). Contributed by Mukund Thakur.

HADOOP-16732. S3Guard to support encrypted DynamoDB table (#1752). Contributed by Mingliang Liu.

  1. … 8 more files in changeset.
HADOOP-16758. Refine testing.md to tell user better how to use auth-keys.xml (#1753)

Contributed by Mingliang Liu

HADOOP-16758. Refine testing.md to tell user better how to use auth-keys.xml (#1753)

Contributed by Mingliang Liu

HADOOP-16758. Refine testing.md to tell user better how to use auth-keys.xml (#1753)

Contributed by Mingliang Liu

HADOOP-16758. Refine testing.md to tell user better how to use auth-keys.xml (#1753)

Contributed by Mingliang Liu

HADOOP-16758. Refine testing.md to tell user better how to use auth-keys.xml (#1753)

Contributed by Mingliang Liu

HADOOP-16384: S3A: Avoid inconsistencies between DDB and S3.

Contributed by Steve Loughran

Contains

- HADOOP-16397. Hadoop S3Guard Prune command to support a -tombstone option.

- HADOOP-16406. ITestDynamoDBMetadataStore.testProvisionTable times out intermittently

This patch doesn't fix the underlying problem but it

* changes some tests to clean up better

* does a lot more in logging operations in against DDB, if enabled

* adds an entry point to dump the state of the metastore and s3 tables (precursor to fsck)

* adds a purge entry point to help clean up after a test run has got a store into a mess

* s3guard prune command adds -tombstone option to only clear tombstones

The outcome is that tests should pass consistently and if problems occur we have better diagnostics.

Change-Id: I3eca3f5529d7f6fec398c0ff0472919f08f054eb

  1. … 35 more files in changeset.
HADOOP-15847. S3Guard testConcurrentTableCreations to set R/W capacity == 0

Contributed by lqjaclee

Change-Id: I4a4d5b29f2677c188799479e4db38f07fa0591d1

  1. … 2 more files in changeset.
HADOOP-16117. Update AWS SDK to 1.11.563.

Contributed by Steve Loughran.

Change-Id: I7c46ed2a6378e1370f567acf4cdcfeb93e43fa13

  1. … 2 more files in changeset.
HADOOP-16117. Update AWS SDK to 1.11.563.

Contributed by Steve Loughran.

Change-Id: I7c46ed2a6378e1370f567acf4cdcfeb93e43fa13

  1. … 2 more files in changeset.
HADOOP-16085. S3Guard: use object version or etags to protect against inconsistent read after replace/overwrite.

Contributed by Ben Roling.

S3Guard will now track the etag of uploaded files and, if an S3

bucket is versioned, the object version.

You can then control how to react to a mismatch between the data

in the DynamoDB table and that in the store: warn, fail, or, when

using versions, return the original value.

This adds two new columns to the table: etag and version.

This is transparent to older S3A clients -but when such clients

add/update data to the S3Guard table, they will not add these values.

As a result, the etag/version checks will not work with files uploaded by older clients.

For a consistent experience, upgrade all clients to use the latest hadoop version.

  1. … 55 more files in changeset.
HADOOP-16252. Add prefix to dynamo tables in tests.

Contributed by Ben Roling.

  1. … 7 more files in changeset.
HADOOP-15999. S3Guard: Better support for out-of-band operations.

Author: Gabor Bota

  1. … 6 more files in changeset.
HADOOP-16124. Extend documentation in testing.md about S3 endpoint constants.

Contributed by Adam Antal.

(cherry picked from commit c0427c84dddf942529dfdfc5cc7a3e25e3f12c5e)

HADOOP-16124. Extend documentation in testing.md about S3 endpoint constants.

Contributed by Adam Antal.

HADOOP-15229. Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.

The new openFile() API is asynchronous, and implemented across FileSystem and FileContext.

The MapReduce V2 inputs are moved to this API, and you can actually set must/may

options to pass in.

This is more useful for setting things like s3a seek policy than for S3 select,

as the existing input format/record readers can't handle S3 select output where

the stream is shorter than the file length, and splitting plain text is suboptimal.

Future work is needed there.

In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific

configuration parameters which can be set in jobs and used to set filesystem input stream

options (seek policy, retry, encryption secrets, etc).

Contributed by Steve Loughran

  1. … 71 more files in changeset.
HADOOP-16065. -Ddynamodb should be -Ddynamo in AWS SDK testing document.

(cherry picked from commit 3c60303ac59d3b6cc375e7ac10214fc36d330fa4)

HADOOP-16065. -Ddynamodb should be -Ddynamo in AWS SDK testing document.

HADOOP-14556. S3A to support Delegation Tokens.

Contributed by Steve Loughran and Daryn Sharp.

  1. … 101 more files in changeset.
HADOOP-16027. [DOC] Effective use of FS instances during S3A integration tests. Contributed by Gabor Bota.

Revert "HADOOP-14556. S3A to support Delegation Tokens."

This reverts commit d7152332b32a575c3a92e3f4c44b95e58462528d.

  1. … 104 more files in changeset.
HADOOP-14556. S3A to support Delegation Tokens.

Contributed by Steve Loughran.

  1. … 104 more files in changeset.
HADOOP-15987. ITestDynamoDBMetadataStore should check if table configured properly. Contributed by Gabor Bota.

  1. … 1 more file in changeset.
HADOOP-15926. Document upgrading the section in NOTICE.txt when upgrading the version of AWS SDK. Contributed by Dinesh Chitlangia.

(cherry picked from commit 66b1335bb3a9a6f3a3db455540c973d4a85bef73)

HADOOP-15926. Document upgrading the section in NOTICE.txt when upgrading the version of AWS SDK. Contributed by Dinesh Chitlangia.

HADOOP-15426 Make S3guard client resilient to DDB throttle events and network failures (Contributed by Steve Loughran)

  1. … 16 more files in changeset.