Clone Tools
  • last updated 26 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
HADOOP-16357. TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists.

Contributed by Steve Loughran.

This patch

* changes the default for the staging committer to append, as we get for the classic FileOutputFormat committer

* adds a check for the dest path being a file not a dir

* adds tests for this

* Changes AbstractCommitTerasortIT. to not use the simple parser, so fails if the file is present.

Change-Id: Id53742958ed1cf321ff96c9063505d64f3254f53

    • -3
    • +4
    ./s3a/commit/staging/StagingCommitter.java
  1. … 11 more files in changeset.
HADOOP-16393. S3Guard init command uses global settings, not those of target bucket.

Contributed by Steve Loughran.

Change-Id: I226a91ab8d7758340f8d221aa80a7abf9a0d3e8f

  1. … 1 more file in changeset.
HADOOP-16409. Allow authoritative mode on non-qualified paths. Contributed by Sean Mackrory

    • -0
    • +1
    ./s3a/s3guard/DynamoDBMetadataStore.java
  1. … 1 more file in changeset.
HADOOP-16396. Allow authoritative mode on a subdirectory. (#1043)

    • -1
    • +0
    ./s3a/s3guard/DynamoDBMetadataStore.java
  1. … 4 more files in changeset.
HADOOP-16390. escape javadoc in S3AUtils public methods

Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>

HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename.

Contributed by Steve Loughran.

Change-Id: I825b0bc36be960475d2d259b1cdab45ae1bb78eb

    • -12
    • +30
    ./s3a/commit/AbstractS3ACommitter.java
    • -12
    • +143
    ./s3a/commit/CommitOperations.java
    • -2
    • +5
    ./s3a/commit/magic/MagicS3GuardCommitter.java
    • -3
    • +9
    ./s3a/commit/staging/StagingCommitter.java
    • -0
    • +49
    ./s3a/impl/AbstractStoreOperation.java
    • -0
    • +126
    ./s3a/impl/CallableSupplier.java
    • -0
    • +74
    ./s3a/impl/ContextAccessors.java
  1. … 56 more files in changeset.
HADOOP-16379: S3AInputStream.unbuffer should merge input stream stats into fs-wide stats

Contributed by Sahil Takiar

Change-Id: I2bcfaaea00d12c633757069402dcd0b91a5f5c05

  1. … 1 more file in changeset.
HADOOP-16279. S3Guard: Implement time-based (TTL) expiry for entries (and tombstones).

Contributed by Gabor Bota.

Change-Id: I73a2d2861901dedfe7a0e783b310fbb95e7c1af9

    • -24
    • +75
    ./s3a/s3guard/DynamoDBMetadataStore.java
    • -0
    • +34
    ./s3a/s3guard/ITtlTimeProvider.java
    • -39
    • +72
    ./s3a/s3guard/LocalMetadataStore.java
    • -18
    • +69
    ./s3a/s3guard/MetadataStore.java
  1. … 12 more files in changeset.
HADOOP-15563. S3Guard to support creating on-demand DDB tables.

Contributed by Steve Loughran

Change-Id: I2262b5b9f52e42ded8ed6f50fd39756f96e77087

    • -13
    • +48
    ./s3a/s3guard/DynamoDBMetadataStore.java
  1. … 8 more files in changeset.
Revert "HADOOP-16050: s3a SSL connections should use OpenSSL"

This reverts commit b067f8acaa79b1230336900a5c62ba465b2adb28.

Change-Id: I584b050a56c0e6f70b11fa3f7db00d5ac46e7dd8

  1. … 13 more files in changeset.
HADOOP-16332. Remove S3A dependency on http core.

Contributed by Steve Loughran.

Change-Id: I53209c993a405fefdb5e1b692d5a56d027d3b845

  1. … 2 more files in changeset.
HADOOP-16085. S3Guard: use object version or etags to protect against inconsistent read after replace/overwrite.

Contributed by Ben Roling.

S3Guard will now track the etag of uploaded files and, if an S3

bucket is versioned, the object version.

You can then control how to react to a mismatch between the data

in the DynamoDB table and that in the store: warn, fail, or, when

using versions, return the original value.

This adds two new columns to the table: etag and version.

This is transparent to older S3A clients -but when such clients

add/update data to the S3Guard table, they will not add these values.

As a result, the etag/version checks will not work with files uploaded by older clients.

For a consistent experience, upgrade all clients to use the latest hadoop version.

    • -0
    • +20
    ./s3a/RemoteFileChangedException.java
    • -0
    • +63
    ./s3a/S3ALocatedFileStatus.java
    • -10
    • +138
    ./s3a/impl/ChangeDetectionPolicy.java
    • -14
    • +133
    ./s3a/impl/ChangeTracker.java
    • -0
    • +80
    ./s3a/impl/CopyOutcome.java
  1. … 41 more files in changeset.
HADOOP-16278. With S3A Filesystem, Long Running services End up Doing lot of GC and eventually die.

Contributed by Rajat Khandelwal

(cherry picked from commit 591ca698230f25217c10c7549aff8097baa11f1e)

HADOOP-16278. With S3A Filesystem, Long Running services End up Doing lot of GC and eventually die.

Contributed by Rajat Khandelwal

(cherry picked from commit 591ca698230f25217c10c7549aff8097baa11f1e)

HADOOP-16278. With S3A Filesystem, Long Running services End up Doing lot of GC and eventually die.

Contributed by Rajat Khandelwal

HADOOP-16221. S3Guard: add option to fail operation on metadata write failure.

    • -0
    • +40
    ./s3a/MetadataPersistenceException.java
  1. … 3 more files in changeset.
HADOOP-14747. S3AInputStream to implement CanUnbuffer.

Author: Sahil Takiar <stakiar@cloudera.com>

  1. … 9 more files in changeset.
HADOOP-16118. S3Guard to support on-demand DDB tables.

This is the first step for on-demand operations: things recognize when they are using on-demand tables,

as do the tests.

Contributed by Steve Loughran.

    • -0
    • +24
    ./s3a/s3guard/DynamoDBMetadataStore.java
  1. … 4 more files in changeset.
HADOOP-16050: s3a SSL connections should use OpenSSL

(cherry picked from commit aebf229c175dfa19fff3b31e9e67596f6c6124fa)

  1. … 13 more files in changeset.
HADOOP-11572. s3a delete() operation fails during a concurrent delete of child entries. Contributed by Steve Loughran.

(cherry picked from commit 2ac5aab8d725f761a9f9723471a4426f6b5d78c4)

  1. … 2 more files in changeset.
HADOOP-16197 S3AUtils.translateException to map CredentialInitializationException to AccessDeniedException

Contributed by Steve Loughran.

Change-Id: Ie98ca5210bf0009f297edbcacf1fc6dfe5ea70cd.

HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

  1. … 4 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

  1. … 4 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

(cherry picked from commit 366186d9990ef9059b6ac9a19ad24310d6f36d04)

  1. … 4 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

  1. … 6 more files in changeset.
HADOOP-15999. S3Guard: Better support for out-of-band operations.

Author: Gabor Bota

  1. … 4 more files in changeset.
HADOOP-16186. S3Guard: NPE in DynamoDBMetadataStore.lambda$listChildren.

Author: Gabor Bota

    • -11
    • +31
    ./s3a/s3guard/DynamoDBMetadataStore.java
  1. … 1 more file in changeset.
HADOOP-16201: S3AFileSystem#innerMkdirs builds needless lists (#636)

HADOOP-16195 MarshalledCredentials toString

Change-Id: I4f1bdd2be0d5760c5501dce6edb6122499108b53

HADOOP-16055. Upgrade AWS SDK to 1.11.271 in branch-2.

Contains HADOOP-12705 Upgrade Jackson 2.2.3 to 2.7.8.

This change was required to address license compatibility issues with the JSON parser in the older AWS SDKs.

A consequence of this is that the version of Jackson 2 shipped is now 2.7.8.

Author: Akira Ajisaka <aajisaka@apache.org>

  1. … 5 more files in changeset.