Clone Tools
  • last updated 25 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
HADOOP-16458. LocatedFileStatusFetcher.getFileStatuses failing intermittently with S3

Contributed by Steve Loughran.

Includes

-S3A glob scans don't bother trying to resolve symlinks

-stack traces don't get lost in getFileStatuses() when exceptions are wrapped

-debug level logging of what is up in Globber

-Contains HADOOP-13373. Add S3A implementation of FSMainOperationsBaseTest.

-ITestRestrictedReadAccess tests incomplete read access to files.

This adds a builder API for constructing globbers which other stores can use

so that they too can skip symlink resolution when not needed.

Change-Id: I23bcdb2783d6bd77cf168fdc165b1b4b334d91c7

  1. … 11 more files in changeset.
HADOOP-16602. mvn package fails in hadoop-aws.

Contributed by Xieming Li.

Follow-up to HADOOP-16445

Change-Id: I72c62d55b734a0f67556844f398ef4a50d9ea585

HADOOP-15691 Add PathCapabilities to FileSystem and FileContext.

Contributed by Steve Loughran.

This complements the StreamCapabilities Interface by allowing applications to probe for a specific path on a specific instance of a FileSystem client

to offer a specific capability.

This is intended to allow applications to determine

* Whether a method is implemented before calling it and dealing with UnsupportedOperationException.

* Whether a specific feature is believed to be available in the remote store.

As well as a common set of capabilities defined in CommonPathCapabilities,

file systems are free to add their own capabilities, prefixed with

fs. + schema + .

The plan is to identify and document more capabilities -and for file systems which add new features, for a declaration of the availability of the feature to always be available.

Note

* The remote store is not expected to be checked for the feature;

It is more a check of client API and the client's configuration/knowledge

of the state of the remote system.

* Permissions are not checked.

Change-Id: I80bfebe94f4a8bdad8f3ac055495735b824968f5

  1. … 34 more files in changeset.
HADOOP-16445. Allow separate custom signing algorithms for S3 and DDB (#1332)

    • -0
    • +99
    ./s3a/SignerManager.java
    • -1
    • +3
    ./s3a/s3guard/DynamoDBClientFactory.java
  1. … 4 more files in changeset.
HADOOP-16547. make sure that s3guard prune sets up the FS (#1402). Contributed by Steve Loughran.

Change-Id: Iaf71561cef6c797a3c66fed110faf08da6cac361

  1. … 1 more file in changeset.
HADOOP-16565. Region must be provided when requesting session credentials or SdkClientException will be thrown (#1454). Contributed by Gabor Bota.

    • -9
    • +23
    ./s3a/auth/MarshalledCredentialBinding.java
  1. … 1 more file in changeset.
HADOOP-16371: Option to disable GCM for SSL connections when running on Java 8.

Contributed by Sahil Takiar.

This moves the SSLSocketFactoryEx class from hadoop-azure into hadoop-common

as the DelegatingSSLSocketFactory and binds the S3A connector to it so that

it can avoid using those HTTPS algorithms which are underperformant on Java 8.

Change-Id: Ie9e6ac24deac1aa05e136e08899620efa7d22abd

    • -0
    • +113
    ./s3a/impl/NetworkBinding.java
  1. … 13 more files in changeset.
HADOOP-16566. S3Guard fsck: Use org.apache.hadoop.util.StopWatch instead of com.google.common.base.Stopwatch (#1433). Contributed by Gabor Bota.

Change-Id: Ied43ef1522dfc6a1210d6fc58c38d8208824931b

HADOOP-16423. S3Guard fsck: Check metadata consistency between S3 and metadatastore (log) (#1208). Contributed by Gabor Bota.

Change-Id: I6bbb331b6c0a41c61043e482b95504fda8a50596

    • -0
    • +483
    ./s3a/s3guard/S3GuardFsck.java
    • -0
    • +346
    ./s3a/s3guard/S3GuardFsckViolationHandler.java
  1. … 5 more files in changeset.
HADOOP-16490. Avoid/handle cached 404s during S3A file creation.

Contributed by Steve Loughran.

This patch avoids issuing any HEAD path request when creating a file with overwrite=true,

so 404s will not end up in the S3 load balancers unless someone calls getFileStatus/exists/isFile

in their own code.

The Hadoop FsShell CommandWithDestination class is modified to not register uncreated files

for deleteOnExit(), because that calls exists() and so can place the 404 in the cache, even

after S3A is patched to not do it itself.

Because S3Guard knows when a file should be present, it adds a special FileNotFound retry policy

independently configurable from other retry policies; it is also exponential, but with

different parameters. This is because every HEAD request will refresh any 404 cached in

the S3 Load Balancers. It's not enough to retry: we have to have a suitable gap between

attempts to (hopefully) ensure any cached entry wil be gone.

The options and values are:

fs.s3a.s3guard.consistency.retry.interval: 2s

fs.s3a.s3guard.consistency.retry.limit: 7

The S3A copy() method used during rename() raises a RemoteFileChangedException which is not caught

so not downgraded to false. Thus: when a rename is unrecoverable, this fact is propagated.

Copy operations without S3Guard lack the confidence that the file exists, so don't retry the same way:

it will fail fast with a different error message. However, because create(path, overwrite=false) no

longer does HEAD path, we can at least be confident that S3A itself is not creating those cached

404 markers.

Change-Id: Ia7807faad8b9a8546836cb19f816cccf17cca26d

    • -1
    • +15
    ./s3a/RemoteFileChangedException.java
    • -2
    • +30
    ./s3a/S3GuardExistsRetryPolicy.java
    • -2
    • +21
    ./s3a/impl/ChangeDetectionPolicy.java
    • -0
    • +44
    ./s3a/impl/StatusProbeEnum.java
  1. … 16 more files in changeset.
HADOOP-16554. mvn javadoc:javadoc fails in hadoop-aws.

Contributed by Xieming Li.

Change-Id: I78e88b5b1ae4702446d2bdd3e2faa3e10b45aef0

HADOOP-16430. S3AFilesystem.delete to incrementally update s3guard with deletions

Contributed by Steve Loughran.

This overlaps the scanning for directory entries with batched calls to S3 DELETE and updates of the S3Guard tables.

It also uses S3Guard to list the files to delete, so find newly created files even when S3 listings are not use consistent.

For path which the client considers S3Guard to be authoritative, we also do a recursive LIST of the store and delete files; this is to find unindexed files and do guarantee that the delete(path, true) call really does delete everything underneath.

Change-Id: Ice2f6e940c506e0b3a78fa534a99721b1698708e

    • -10
    • +22
    ./s3a/InconsistentAmazonS3Client.java
    • -0
    • +577
    ./s3a/impl/DeleteOperation.java
    • -0
    • +69
    ./s3a/impl/ExecutingStoreOperation.java
    • -2
    • +8
    ./s3a/impl/MultiObjectDeleteSupport.java
    • -0
    • +198
    ./s3a/impl/OperationCallbacks.java
    • -139
    • +25
    ./s3a/impl/RenameOperation.java
  1. … 28 more files in changeset.
HADOOP-16416. mark DynamoDBMetadataStore.deleteTrackingValueMap as final. Contributed by kevin su.

Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>

    • -2
    • +2
    ./s3a/s3guard/DynamoDBMetadataStore.java
HADOOP-16470. Make last AWS credential provider in default auth chain EC2ContainerCredentialsProviderWrapper.

Contributed by Steve Loughran.

Contains HADOOP-16471. Restore (documented) fs.s3a.SharedInstanceProfileCredentialsProvider.

Change-Id: I06b99b57459cac80bf743c5c54f04e59bb54c2f8

    • -0
    • +44
    ./s3a/SharedInstanceCredentialProvider.java
    • -14
    • +20
    ./s3a/auth/IAMInstanceCredentialsProvider.java
  1. … 2 more files in changeset.
HADOOP-16500 S3ADelegationTokens to only log at debug on startup (#1269). Contributed by Steve Loughran.

Change-Id: Ifafc15f32791911976d7ebc36fb6e8853f59ed41

HADOOP-16499. S3A retry policy to be exponential (#1246). Contributed by Steve Loughran.

  1. … 10 more files in changeset.
HADOOP-16472. findbugs warning on LocalMetadataStore.ttlTimeProvider sync

Contributed by Steve Loughran.

Moved the setter and addAncestors to synchronized

Change-Id: Ib362c66d1b8c9124eca7db9a44274ac08d0b3be6

HADOOP-16433. S3Guard: Filter expired entries and tombstones when listing with MetadataStore.listChildren().

Contributed by Gabor Bota.

This pulls the tracking of the lastUpdated timestamp of metadata entries up from the DDB metastore into all s3guard stores, and then uses this to filter out expired tombstones from listings.

Change-Id: I80f121236b49c75a024116f65a3ef29d3580b462

    • -10
    • +31
    ./s3a/s3guard/DirListingMetadata.java
    • -8
    • +15
    ./s3a/s3guard/DynamoDBMetadataStore.java
    • -21
    • +29
    ./s3a/s3guard/LocalMetadataStore.java
  1. … 6 more files in changeset.
HADOOP-16380. S3Guard to determine empty directory status for all non-root directories.

Contributed by Steve Loughran and Gabor Bota.

This

* Asks S3Guard to determine the empty directory status.

* Has S3A's root directory rm("/") command to always return false (as abfs does)

* Documents that object stores MAY do this

* Overloads ContractTestUtils.assertDeleted to let assertions declare that the source directory does not need to exist. This stops inconsistencies in directory listings failing a root test.

It avoids a recent regression (HADOOP-16279) where if there was a tombstone above the first element found in a directory listing, the directory would be considered empty, when in fact there were child entries. That could downgrade an rm(path, recursive) to a no-op, while also confusing rename(src, dest), as dest could be mistaken for an empty directory and so permit the copy above it, rather than reject it "destination path exists and is not empty".

Change-Id: I136a3d1a5a48a67e6155d790a40ff558d0d2c108

  1. … 8 more files in changeset.
HADOOP-13868. [s3a] New default for S3A multi-part configuration (#1125)

  1. … 2 more files in changeset.
HADOOP-16383. Pass ITtlTimeProvider instance in initialize method in MetadataStore interface. Contributed by Gabor Bota. (#1009)

    • -1
    • +1
    ./s3a/impl/MultiObjectDeleteSupport.java
    • -37
    • +36
    ./s3a/s3guard/DynamoDBMetadataStore.java
    • -16
    • +23
    ./s3a/s3guard/LocalMetadataStore.java
    • -19
    • +21
    ./s3a/s3guard/MetadataStore.java
    • -6
    • +10
    ./s3a/s3guard/NullMetadataStore.java
  1. … 9 more files in changeset.
HADOOP-15729. [s3a] Allow core threads to time out. (#1075)

  1. … 1 more file in changeset.
HADOOP-16384: S3A: Avoid inconsistencies between DDB and S3.

Contributed by Steve Loughran

Contains

- HADOOP-16397. Hadoop S3Guard Prune command to support a -tombstone option.

- HADOOP-16406. ITestDynamoDBMetadataStore.testProvisionTable times out intermittently

This patch doesn't fix the underlying problem but it

* changes some tests to clean up better

* does a lot more in logging operations in against DDB, if enabled

* adds an entry point to dump the state of the metastore and s3 tables (precursor to fsck)

* adds a purge entry point to help clean up after a test run has got a store into a mess

* s3guard prune command adds -tombstone option to only clear tombstones

The outcome is that tests should pass consistently and if problems occur we have better diagnostics.

Change-Id: I3eca3f5529d7f6fec398c0ff0472919f08f054eb

    • -0
    • +223
    ./s3a/s3guard/AbstractS3GuardDynamoDBDiagnostic.java
    • -0
    • +787
    ./s3a/s3guard/DumpS3GuardDynamoTable.java
    • -58
    • +307
    ./s3a/s3guard/DynamoDBMetadataStore.java
    • -2
    • +37
    ./s3a/s3guard/PathMetadataDynamoDBTranslation.java
    • -2
    • +2
    ./s3a/s3guard/PathOrderComparators.java
    • -0
    • +248
    ./s3a/s3guard/PurgeS3GuardDynamoTable.java
    • -0
    • +241
    ./s3a/s3guard/S3GuardTableAccess.java
  1. … 21 more files in changeset.
HADOOP-16357. TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists.

Contributed by Steve Loughran.

This patch

* changes the default for the staging committer to append, as we get for the classic FileOutputFormat committer

* adds a check for the dest path being a file not a dir

* adds tests for this

* Changes AbstractCommitTerasortIT. to not use the simple parser, so fails if the file is present.

Change-Id: Id53742958ed1cf321ff96c9063505d64f3254f53

    • -3
    • +4
    ./s3a/commit/staging/StagingCommitter.java
  1. … 11 more files in changeset.
HADOOP-16393. S3Guard init command uses global settings, not those of target bucket.

Contributed by Steve Loughran.

Change-Id: I226a91ab8d7758340f8d221aa80a7abf9a0d3e8f

  1. … 1 more file in changeset.
HADOOP-16409. Allow authoritative mode on non-qualified paths. Contributed by Sean Mackrory

    • -0
    • +1
    ./s3a/s3guard/DynamoDBMetadataStore.java
  1. … 1 more file in changeset.
HADOOP-16396. Allow authoritative mode on a subdirectory. (#1043)

    • -1
    • +0
    ./s3a/s3guard/DynamoDBMetadataStore.java
  1. … 4 more files in changeset.
HADOOP-16390. escape javadoc in S3AUtils public methods

Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>

HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename.

Contributed by Steve Loughran.

Change-Id: I825b0bc36be960475d2d259b1cdab45ae1bb78eb

    • -12
    • +30
    ./s3a/commit/AbstractS3ACommitter.java
    • -12
    • +143
    ./s3a/commit/CommitOperations.java
    • -2
    • +5
    ./s3a/commit/magic/MagicS3GuardCommitter.java
    • -3
    • +9
    ./s3a/commit/staging/StagingCommitter.java
    • -0
    • +49
    ./s3a/impl/AbstractStoreOperation.java
    • -0
    • +126
    ./s3a/impl/CallableSupplier.java
    • -0
    • +74
    ./s3a/impl/ContextAccessors.java
  1. … 56 more files in changeset.
HADOOP-16379: S3AInputStream.unbuffer should merge input stream stats into fs-wide stats

Contributed by Sahil Takiar

Change-Id: I2bcfaaea00d12c633757069402dcd0b91a5f5c05

  1. … 1 more file in changeset.