Clone
 

steve loughran <stevel@cloudera.com> in hadoop

HADOOP-16547. make sure that s3guard prune sets up the FS (#1402). Contributed by Steve Loughran.

Change-Id: Iaf71561cef6c797a3c66fed110faf08da6cac361

HADOOP-16490. Avoid/handle cached 404s during S3A file creation.

Contributed by Steve Loughran.

This patch avoids issuing any HEAD path request when creating a file with overwrite=true,

so 404s will not end up in the S3 load balancers unless someone calls getFileStatus/exists/isFile

in their own code.

The Hadoop FsShell CommandWithDestination class is modified to not register uncreated files

for deleteOnExit(), because that calls exists() and so can place the 404 in the cache, even

after S3A is patched to not do it itself.

Because S3Guard knows when a file should be present, it adds a special FileNotFound retry policy

independently configurable from other retry policies; it is also exponential, but with

different parameters. This is because every HEAD request will refresh any 404 cached in

the S3 Load Balancers. It's not enough to retry: we have to have a suitable gap between

attempts to (hopefully) ensure any cached entry wil be gone.

The options and values are:

fs.s3a.s3guard.consistency.retry.interval: 2s

fs.s3a.s3guard.consistency.retry.limit: 7

The S3A copy() method used during rename() raises a RemoteFileChangedException which is not caught

so not downgraded to false. Thus: when a rename is unrecoverable, this fact is propagated.

Copy operations without S3Guard lack the confidence that the file exists, so don't retry the same way:

it will fail fast with a different error message. However, because create(path, overwrite=false) no

longer does HEAD path, we can at least be confident that S3A itself is not creating those cached

404 markers.

Change-Id: Ia7807faad8b9a8546836cb19f816cccf17cca26d

  1. … 10 more files in changeset.
HADOOP-16430. S3AFilesystem.delete to incrementally update s3guard with deletions

Contributed by Steve Loughran.

This overlaps the scanning for directory entries with batched calls to S3 DELETE and updates of the S3Guard tables.

It also uses S3Guard to list the files to delete, so find newly created files even when S3 listings are not use consistent.

For path which the client considers S3Guard to be authoritative, we also do a recursive LIST of the store and delete files; this is to find unindexed files and do guarantee that the delete(path, true) call really does delete everything underneath.

Change-Id: Ice2f6e940c506e0b3a78fa534a99721b1698708e

  1. … 28 more files in changeset.
HADOOP-16470. Make last AWS credential provider in default auth chain EC2ContainerCredentialsProviderWrapper.

Contributed by Steve Loughran.

Contains HADOOP-16471. Restore (documented) fs.s3a.SharedInstanceProfileCredentialsProvider.

Change-Id: I06b99b57459cac80bf743c5c54f04e59bb54c2f8

HADOOP-16500 S3ADelegationTokens to only log at debug on startup (#1269). Contributed by Steve Loughran.

Change-Id: Ifafc15f32791911976d7ebc36fb6e8853f59ed41

HADOOP-16481. ITestS3GuardDDBRootOperations.test_300_MetastorePrune needs to set region. (#1209). Contributed by Steve Loughran.

HADOOP-16499. S3A retry policy to be exponential (#1246). Contributed by Steve Loughran.

HADOOP-16472. findbugs warning on LocalMetadataStore.ttlTimeProvider sync

Contributed by Steve Loughran.

Moved the setter and addAncestors to synchronized

Change-Id: Ib362c66d1b8c9124eca7db9a44274ac08d0b3be6

HADOOP-15237. In KMS docs there should be one space between KMS_LOG and NOTE.

Contributed by Snigdhanjali Mishra.

Change-Id: I3abaa658c8786f8afa802ccbb629551313431b0e

HADOOP-16380. S3Guard to determine empty directory status for all non-root directories.

Contributed by Steve Loughran and Gabor Bota.

This

* Asks S3Guard to determine the empty directory status.

* Has S3A's root directory rm("/") command to always return false (as abfs does)

* Documents that object stores MAY do this

* Overloads ContractTestUtils.assertDeleted to let assertions declare that the source directory does not need to exist. This stops inconsistencies in directory listings failing a root test.

It avoids a recent regression (HADOOP-16279) where if there was a tombstone above the first element found in a directory listing, the directory would be considered empty, when in fact there were child entries. That could downgrade an rm(path, recursive) to a no-op, while also confusing rename(src, dest), as dest could be mistaken for an empty directory and so permit the copy above it, rather than reject it "destination path exists and is not empty".

Change-Id: I136a3d1a5a48a67e6155d790a40ff558d0d2c108

Revert "HDFS-9913. DistCp to add -useTrash to move deleted files to Trash."

Reverting due to test failures if ~/.Trash not present during test setup.

This reverts commit ee3115f488ce8e44bffac15af9c646190bf67b88.

Change-Id: Icbeeb261570b9131ff99d765ac0945c335b26658

HADOOP-16384: S3A: Avoid inconsistencies between DDB and S3.

Contributed by Steve Loughran

Contains

- HADOOP-16397. Hadoop S3Guard Prune command to support a -tombstone option.

- HADOOP-16406. ITestDynamoDBMetadataStore.testProvisionTable times out intermittently

This patch doesn't fix the underlying problem but it

* changes some tests to clean up better

* does a lot more in logging operations in against DDB, if enabled

* adds an entry point to dump the state of the metastore and s3 tables (precursor to fsck)

* adds a purge entry point to help clean up after a test run has got a store into a mess

* s3guard prune command adds -tombstone option to only clear tombstones

The outcome is that tests should pass consistently and if problems occur we have better diagnostics.

Change-Id: I3eca3f5529d7f6fec398c0ff0472919f08f054eb

  1. … 21 more files in changeset.
HADOOP-16357. TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists.

Contributed by Steve Loughran.

This patch

* changes the default for the staging committer to append, as we get for the classic FileOutputFormat committer

* adds a check for the dest path being a file not a dir

* adds tests for this

* Changes AbstractCommitTerasortIT. to not use the simple parser, so fails if the file is present.

Change-Id: Id53742958ed1cf321ff96c9063505d64f3254f53

HADOOP-16393. S3Guard init command uses global settings, not those of target bucket.

Contributed by Steve Loughran.

Change-Id: I226a91ab8d7758340f8d221aa80a7abf9a0d3e8f

HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename.

Contributed by Steve Loughran.

Change-Id: I825b0bc36be960475d2d259b1cdab45ae1bb78eb

  1. … 56 more files in changeset.
HADOOP-15563. S3Guard to support creating on-demand DDB tables.

Contributed by Steve Loughran

Change-Id: I2262b5b9f52e42ded8ed6f50fd39756f96e77087

Revert "HADOOP-16344. Make DurationInfo public unstable."

This reverts commit 829848ba2e3e04e3b7bf5a02e0379470eec0809e.

Change-Id: Ied91250e191b2ba701a8fc697c78b3756ce76be8

HADOOP-16117. Update AWS SDK to 1.11.563.

Contributed by Steve Loughran.

Change-Id: I7c46ed2a6378e1370f567acf4cdcfeb93e43fa13

Revert "HADOOP-16050: s3a SSL connections should use OpenSSL"

This reverts commit b067f8acaa79b1230336900a5c62ba465b2adb28.

Change-Id: I584b050a56c0e6f70b11fa3f7db00d5ac46e7dd8

Revert "HADOOP-16321: ITestS3ASSL+TestOpenSSLSocketFactory failing with java.lang.UnsatisfiedLinkErrors"

This reverts commit 5906268f0dd63a93eb591ddccf70d23b15e5c2ed.

HADOOP-16266. Add more fine-grained processing time metrics to the RPC layer -follow-on patch.

This follow-on patch to HADOOP-16266 fixes up the problem where logs were being full of

stack traces because the timeout passed down to select was in nanos, whereas the API

expected millis.

Contributed by Erik Krogen.

Change-Id: I5c6e9ddf68127b1d7e0ca0e179d036eb9941e445

HADOOP-16332. Remove S3A dependency on http core.

Contributed by Steve Loughran.

Change-Id: I53209c993a405fefdb5e1b692d5a56d027d3b845