Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
HADOOP-16988. Remove source code from branch-2. (aajisaka via jhung)

This closes #1959

    • -152
    • +0
    ./AbstractContractAppendTest.java
    • -112
    • +0
    ./AbstractContractConcatTest.java
    • -335
    • +0
    ./AbstractContractCreateTest.java
    • -122
    • +0
    ./AbstractContractDeleteTest.java
    • -644
    • +0
    ./AbstractContractGetFileStatusTest.java
    • -186
    • +0
    ./AbstractContractMkdirTest.java
    • -288
    • +0
    ./AbstractContractRenameTest.java
    • -227
    • +0
    ./AbstractContractRootDirectoryTest.java
    • -61
    • +0
    ./AbstractContractSetTimesTest.java
    • -378
    • +0
    ./AbstractFSContractTestBase.java
  1. … 10832 more files in changeset.
HADOOP-14630 Contract Tests to verify create, mkdirs and rename under a file is forbidden

Contributed by Steve Loughran.

Not all stores do complete validation here; in particular the S3A

Connector does not: checking up the entire directory tree to see if a path matches

is a file significantly slows things down.

This check does take place in S3A mkdirs(), which walks backwards up the list of

parent paths until it finds a directory (success) or a file (failure).

In practice production applications invariably create destination directories

before writing 1+ file into them -restricting check purely to the mkdirs()

call deliver significant speed up while implicitly including the checks.

Change-Id: I2c9df748e92b5655232e7d888d896f1868806eb0

    • -12
    • +116
    ./AbstractContractCreateTest.java
  1. … 7 more files in changeset.
HDFS-13404. Addendum: RBF: TestRouterWebHDFSContractAppend.testRenameFileBeingAppended fail. Contributed by Takanobu Asanuma.

(cherry picked from commit b52fd05d42d9a76f6936a5d86c23fcd66244fe3d)

Conflicts:

hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractAppendTest.java

HADOOP-16859: ABFS: Add unbuffer support to ABFS connector.

Contributed by Sahil Takiar

    • -20
    • +49
    ./AbstractContractUnbufferTest.java
  1. … 4 more files in changeset.
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd.

Contributed by Steve Loughran.

During S3A rename() and delete() calls, the list of objects delete is

built up into batches of a thousand and then POSTed in a single large

DeleteObjects request.

But as the IO capacity allowed on an S3 partition may only be 3500 writes

per second *and* each entry in that POST counts as a single write, then

one of those posts alone can trigger throttling on an already loaded

S3 directory tree. Which can trigger backoff and retry, with the same

thousand entry post, and so recreate the exact same problem.

Fixes

* Page size for delete object requests is set in

fs.s3a.bulk.delete.page.size; the default is 250.

* The property fs.s3a.experimental.aws.s3.throttling (default=true)

can be set to false to disable throttle retry logic in the AWS

client SDK -it is all handled in the S3A client. This

gives more visibility in to when operations are being throttled

* Bulk delete throttling events are logged to the log

org.apache.hadoop.fs.s3a.throttled log at INFO; if this appears

often then choose a smaller page size.

* The metric "store_io_throttled" adds the entire count of delete

requests when a single DeleteObjects request is throttled.

* A new quantile, "store_io_throttle_rate" can track throttling

load over time.

* DynamoDB metastore throttle resilience issues have also been

identified and fixed. Note: the fs.s3a.experimental.aws.s3.throttling

flag does not apply to DDB IO precisely because there may still be

lurking issues there and it safest to rely on the DynamoDB client

SDK.

Change-Id: I00f85cdd94fc008864d060533f6bd4870263fd84

  1. … 25 more files in changeset.
HADOOP-16759. Filesystem openFile() builder to take a FileStatus param (#1761). Contributed by Steve Loughran

* Enhanced builder + FS spec

* s3a FS to use this to skip HEAD on open

* and to use version/etag when opening the file

works with S3AFileStatus FS and S3ALocatedFileStatus

  1. … 17 more files in changeset.
HADOOP-16697. Tune/audit S3A authoritative mode.

Contains:

HADOOP-16474. S3Guard ProgressiveRenameTracker to mark destination

dirirectory as authoritative on success.

HADOOP-16684. S3guard bucket info to list a bit more about

authoritative paths.

HADOOP-16722. S3GuardTool to support FilterFileSystem.

This patch improves the marking of newly created/import directory

trees in S3Guard DynamoDB tables as authoritative.

Specific changes:

* Renamed directories are marked as authoritative if the entire

operation succeeded (HADOOP-16474).

* When updating parent table entries as part of any table write,

there's no overwriting of their authoritative flag.

s3guard import changes:

* new -verbose flag to print out what is going on.

* The "s3guard import" command lets you declare that a directory tree

is to be marked as authoritative

hadoop s3guard import -authoritative -verbose s3a://bucket/path

When importing a listing and a file is found, the import tool queries

the metastore and only updates the entry if the file is different from

before, where different == new timestamp, etag, or length. S3Guard can get

timestamp differences due to clock skew in PUT operations.

As the recursive list performed by the import command doesn't retrieve the

versionID, the existing entry may in fact be more complete.

When updating an existing due to clock skew the existing version ID

is propagated to the new entry (note: the etags must match; this is needed

to deal with inconsistent listings).

There is a new s3guard command to audit a s3guard bucket/path's

authoritative state:

hadoop s3guard authoritative -check-config s3a://bucket/path

This is primarily for testing/auditing.

The s3guard bucket-info command also provides some more details on the

authoritative state of a store (HADOOP-16684).

Change-Id: I58001341c04f6f3597fcb4fcb1581ccefeb77d91

  1. … 30 more files in changeset.
HADOOP-16685: FileSystem#listStatusIterator does not check if given path exists (#1695)

(cherry picked from commit 3161813482868e42befb618d6f5687d8ffed0e5c)

    • -0
    • +8
    ./AbstractContractGetFileStatusTest.java
  1. … 1 more file in changeset.
HADOOP-16685: FileSystem#listStatusIterator does not check if given path exists (#1695)

    • -0
    • +8
    ./AbstractContractGetFileStatusTest.java
  1. … 1 more file in changeset.
HADOOP-15097. AbstractContractDeleteTest::testDeleteNonEmptyDirRecursive with misleading path. Contributed by Xieming Li.

(cherry picked from commit 92c28c100ee1aa414948cd510321ad13cb8639bc)

HADOOP-15097. AbstractContractDeleteTest::testDeleteNonEmptyDirRecursive with misleading path. Contributed by Xieming Li.

(cherry picked from commit 92c28c100ee1aa414948cd510321ad13cb8639bc)

HADOOP-15097. AbstractContractDeleteTest::testDeleteNonEmptyDirRecursive with misleading path. Contributed by Xieming Li.

(cherry picked from commit 92c28c100ee1aa414948cd510321ad13cb8639bc)

(cherry picked from commit 81060b341371a79ec0c240c66e252e2af88b4301)

HADOOP-15097. AbstractContractDeleteTest::testDeleteNonEmptyDirRecursive with misleading path. Contributed by Xieming Li.

HADOOP-15097. AbstractContractDeleteTest::testDeleteNonEmptyDirRecursive with misleading path. Contributed by Xieming Li.

(cherry picked from commit 92c28c100ee1aa414948cd510321ad13cb8639bc)

Revert "HADOOP-15870. S3AInputStream.remainingInFile should use nextReadPos."

This reverts commit 7a4b3d42c4e36e468c2a46fd48036a6fed547853.

The patch broke TestRouterWebHDFSContractSeek as it turns out that

WebHDFSInputStream.available() is always 0.

  1. … 3 more files in changeset.
HADOOP-15870. S3AInputStream.remainingInFile should use nextReadPos.

Contributed by lqjacklee.

Change-Id: I32bb00a683102e7ff8ff8ce0b8d9c3195ca7381c

  1. … 3 more files in changeset.
HADOOP-15691 Add PathCapabilities to FileSystem and FileContext.

Contributed by Steve Loughran.

This complements the StreamCapabilities Interface by allowing applications to probe for a specific path on a specific instance of a FileSystem client

to offer a specific capability.

This is intended to allow applications to determine

* Whether a method is implemented before calling it and dealing with UnsupportedOperationException.

* Whether a specific feature is believed to be available in the remote store.

As well as a common set of capabilities defined in CommonPathCapabilities,

file systems are free to add their own capabilities, prefixed with

fs. + schema + .

The plan is to identify and document more capabilities -and for file systems which add new features, for a declaration of the availability of the feature to always be available.

Note

* The remote store is not expected to be checked for the feature;

It is more a check of client API and the client's configuration/knowledge

of the state of the remote system.

* Permissions are not checked.

Change-Id: I80bfebe94f4a8bdad8f3ac055495735b824968f5

  1. … 37 more files in changeset.
HADOOP-16490. Avoid/handle cached 404s during S3A file creation.

Contributed by Steve Loughran.

This patch avoids issuing any HEAD path request when creating a file with overwrite=true,

so 404s will not end up in the S3 load balancers unless someone calls getFileStatus/exists/isFile

in their own code.

The Hadoop FsShell CommandWithDestination class is modified to not register uncreated files

for deleteOnExit(), because that calls exists() and so can place the 404 in the cache, even

after S3A is patched to not do it itself.

Because S3Guard knows when a file should be present, it adds a special FileNotFound retry policy

independently configurable from other retry policies; it is also exponential, but with

different parameters. This is because every HEAD request will refresh any 404 cached in

the S3 Load Balancers. It's not enough to retry: we have to have a suitable gap between

attempts to (hopefully) ensure any cached entry wil be gone.

The options and values are:

fs.s3a.s3guard.consistency.retry.interval: 2s

fs.s3a.s3guard.consistency.retry.limit: 7

The S3A copy() method used during rename() raises a RemoteFileChangedException which is not caught

so not downgraded to false. Thus: when a rename is unrecoverable, this fact is propagated.

Copy operations without S3Guard lack the confidence that the file exists, so don't retry the same way:

it will fail fast with a different error message. However, because create(path, overwrite=false) no

longer does HEAD path, we can at least be confident that S3A itself is not creating those cached

404 markers.

Change-Id: Ia7807faad8b9a8546836cb19f816cccf17cca26d

  1. … 24 more files in changeset.
HADOOP-16430. S3AFilesystem.delete to incrementally update s3guard with deletions

Contributed by Steve Loughran.

This overlaps the scanning for directory entries with batched calls to S3 DELETE and updates of the S3Guard tables.

It also uses S3Guard to list the files to delete, so find newly created files even when S3 listings are not use consistent.

For path which the client considers S3Guard to be authoritative, we also do a recursive LIST of the store and delete files; this is to find unindexed files and do guarantee that the delete(path, true) call really does delete everything underneath.

Change-Id: Ice2f6e940c506e0b3a78fa534a99721b1698708e

    • -4
    • +8
    ./AbstractContractGetFileStatusTest.java
  1. … 41 more files in changeset.
HADOOP-16380. S3Guard to determine empty directory status for all non-root directories.

Contributed by Steve Loughran and Gabor Bota.

This

* Asks S3Guard to determine the empty directory status.

* Has S3A's root directory rm("/") command to always return false (as abfs does)

* Documents that object stores MAY do this

* Overloads ContractTestUtils.assertDeleted to let assertions declare that the source directory does not need to exist. This stops inconsistencies in directory listings failing a root test.

It avoids a recent regression (HADOOP-16279) where if there was a tombstone above the first element found in a directory listing, the directory would be considered empty, when in fact there were child entries. That could downgrade an rm(path, recursive) to a no-op, while also confusing rename(src, dest), as dest could be mistaken for an empty directory and so permit the copy above it, rather than reject it "destination path exists and is not empty".

Change-Id: I136a3d1a5a48a67e6155d790a40ff558d0d2c108

    • -1
    • +1
    ./AbstractContractRootDirectoryTest.java
  1. … 8 more files in changeset.
HADOOP-16384: S3A: Avoid inconsistencies between DDB and S3.

Contributed by Steve Loughran

Contains

- HADOOP-16397. Hadoop S3Guard Prune command to support a -tombstone option.

- HADOOP-16406. ITestDynamoDBMetadataStore.testProvisionTable times out intermittently

This patch doesn't fix the underlying problem but it

* changes some tests to clean up better

* does a lot more in logging operations in against DDB, if enabled

* adds an entry point to dump the state of the metastore and s3 tables (precursor to fsck)

* adds a purge entry point to help clean up after a test run has got a store into a mess

* s3guard prune command adds -tombstone option to only clear tombstones

The outcome is that tests should pass consistently and if problems occur we have better diagnostics.

Change-Id: I3eca3f5529d7f6fec398c0ff0472919f08f054eb

    • -18
    • +53
    ./AbstractContractRootDirectoryTest.java
  1. … 34 more files in changeset.
HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename.

Contributed by Steve Loughran.

Change-Id: I825b0bc36be960475d2d259b1cdab45ae1bb78eb

  1. … 70 more files in changeset.
HDFS-13404. Addendum: RBF: TestRouterWebHDFSContractAppend.testRenameFileBeingAppended fail. Contributed by Takanobu Asanuma.

HDFS-13404. Addendum: RBF: TestRouterWebHDFSContractAppend.testRenameFileBeingAppended fail. Contributed by Takanobu Asanuma.

HDFS-13404. Addendum: RBF: TestRouterWebHDFSContractAppend.testRenameFileBeingAppended fail. Contributed by Takanobu Asanuma.

HADOOP-16205 Backport ABFS driver from trunk to branch 2.0: Fix build and test failures.

Contributed by Yuan Gao.

    • -3
    • +19
    ./AbstractContractGetFileStatusTest.java
  1. … 13 more files in changeset.
HADOOP-14747. S3AInputStream to implement CanUnbuffer.

Author: Sahil Takiar <stakiar@cloudera.com>

    • -0
    • +125
    ./AbstractContractUnbufferTest.java
  1. … 8 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

  1. … 4 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

  1. … 4 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

(cherry picked from commit 366186d9990ef9059b6ac9a19ad24310d6f36d04)

  1. … 4 more files in changeset.