Clone Tools
  • last updated 13 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
HADOOP-11572. s3a delete() operation fails during a concurrent delete of child entries. Contributed by Steve Loughran.

(cherry picked from commit 2ac5aab8d725f761a9f9723471a4426f6b5d78c4)

  1. … 2 more files in changeset.
HADOOP-16197 S3AUtils.translateException to map CredentialInitializationException to AccessDeniedException

Contributed by Steve Loughran.

Change-Id: Ie98ca5210bf0009f297edbcacf1fc6dfe5ea70cd.

HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

  1. … 4 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

  1. … 4 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

(cherry picked from commit 366186d9990ef9059b6ac9a19ad24310d6f36d04)

  1. … 4 more files in changeset.
HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

  1. … 6 more files in changeset.
HADOOP-15999. S3Guard: Better support for out-of-band operations.

Author: Gabor Bota

  1. … 4 more files in changeset.
HADOOP-16186. S3Guard: NPE in DynamoDBMetadataStore.lambda$listChildren.

Author: Gabor Bota

    • -11
    • +31
    ./s3a/s3guard/DynamoDBMetadataStore.java
  1. … 1 more file in changeset.
HADOOP-16201: S3AFileSystem#innerMkdirs builds needless lists (#636)

HADOOP-16195 MarshalledCredentials toString

Change-Id: I4f1bdd2be0d5760c5501dce6edb6122499108b53

HADOOP-16055. Upgrade AWS SDK to 1.11.271 in branch-2.

Contains HADOOP-12705 Upgrade Jackson 2.2.3 to 2.7.8.

This change was required to address license compatibility issues with the JSON parser in the older AWS SDKs.

A consequence of this is that the version of Jackson 2 shipped is now 2.7.8.

Author: Akira Ajisaka <aajisaka@apache.org>

  1. … 5 more files in changeset.
HADOOP-15625. S3A input stream to use etags/version number to detect changed source files.

Author: Ben Roling <ben.roling@gmail.com>

Initial patch from Brahma Reddy Battula.

    • -0
    • +44
    ./s3a/NoVersionAttributeException.java
    • -0
    • +49
    ./s3a/RemoteFileChangedException.java
    • -0
    • +376
    ./s3a/impl/ChangeDetectionPolicy.java
    • -0
    • +196
    ./s3a/impl/ChangeTracker.java
    • -0
    • +42
    ./s3a/impl/LogExactlyOnce.java
    • -0
    • +30
    ./s3a/impl/package-info.java
  1. … 7 more files in changeset.
HADOOP-15625. S3A input stream to use etags/version number to detect changed source files.

Author: Ben Roling <ben.roling@gmail.com>

Initial patch from Brahma Reddy Battula.

    • -0
    • +44
    ./s3a/NoVersionAttributeException.java
    • -0
    • +49
    ./s3a/RemoteFileChangedException.java
    • -0
    • +376
    ./s3a/impl/ChangeDetectionPolicy.java
    • -0
    • +196
    ./s3a/impl/ChangeTracker.java
    • -0
    • +42
    ./s3a/impl/LogExactlyOnce.java
    • -0
    • +30
    ./s3a/impl/package-info.java
  1. … 7 more files in changeset.
HADOOP-16109. Parquet reading S3AFileSystem causes EOF (#589)

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

Change-Id: I39b87f3d5daa98f65de2c0a44e348821a4930573

(cherry picked from commit 9b8044d00b0edb0a597c6fd768e9be6a96da74da)

  1. … 4 more files in changeset.
HADOOP-16109. Parquet reading S3AFileSystem causes EOF (#589)

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

Change-Id: I39b87f3d5daa98f65de2c0a44e348821a4930573

(cherry picked from commit 9b8044d00b0edb0a597c6fd768e9be6a96da74da)

  1. … 4 more files in changeset.
HADOOP-16109. Parquet reading S3AFileSystem causes EOF (#589)

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

Change-Id: I39b87f3d5daa98f65de2c0a44e348821a4930573

  1. … 4 more files in changeset.
HADOOP-16109. Parquet reading S3AFileSystem causes EOF (#589)

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

Change-Id: I39b87f3d5daa98f65de2c0a44e348821a4930573

  1. … 4 more files in changeset.
HADOOP-16109. Parquet reading S3AFileSystem causes EOF

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

  1. … 3 more files in changeset.
HADOOP-16109. Parquet reading S3AFileSystem causes EOF

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

  1. … 3 more files in changeset.
HADOOP-16109. Parquet reading S3AFileSystem causes EOF

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

  1. … 2 more files in changeset.
HADOOP-16093. Move DurationInfo from hadoop-aws to hadoop-common org.apache.hadoop.util.

Contributed by Abhishek Modi

    • -0
    • +1
    ./s3a/commit/AbstractS3ACommitter.java
    • -1
    • +1
    ./s3a/commit/magic/MagicS3GuardCommitter.java
    • -1
    • +1
    ./s3a/commit/staging/StagingCommitter.java
  1. … 11 more files in changeset.
HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found.

Contributed by Adam Antal.

(Revised patch applied after stevel committed the wrong one; that has been reverted)

  1. … 2 more files in changeset.
Revert "HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found."

This reverts commit c4a00d1ad3d3cfc02a6a4e1e04353678f2d588e1.

  1. … 2 more files in changeset.
HADOOP-16098. Fix javadoc warnings in hadoop-aws. Contributed by Masatake Iwasaki.

    • -0
    • +2
    ./s3a/TemporaryAWSCredentialsProvider.java
    • -0
    • +1
    ./s3a/auth/AbstractSessionCredentialsProvider.java
    • -0
    • +3
    ./s3a/auth/MarshalledCredentialBinding.java
  1. … 3 more files in changeset.
HADOOP-15229. Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.

The new openFile() API is asynchronous, and implemented across FileSystem and FileContext.

The MapReduce V2 inputs are moved to this API, and you can actually set must/may

options to pass in.

This is more useful for setting things like s3a seek policy than for S3 select,

as the existing input format/record readers can't handle S3 select output where

the stream is shorter than the file length, and splitting plain text is suboptimal.

Future work is needed there.

In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific

configuration parameters which can be set in jobs and used to set filesystem input stream

options (seek policy, retry, encryption secrets, etc).

Contributed by Steve Loughran

    • -0
    • +53
    ./s3a/InternalConstants.java
    • -0
    • +77
    ./s3a/select/InternalSelectConstants.java
    • -0
    • +431
    ./s3a/select/SelectBinding.java
    • -0
    • +296
    ./s3a/select/SelectConstants.java
    • -0
    • +457
    ./s3a/select/SelectInputStream.java
  1. … 57 more files in changeset.
HDFS-13713. Add specification of Multipart Upload API to FS specification, with contract tests.

Contributed by Ewan Higgs and Steve Loughran.

(cherry picked from commit c1d24f848345f6d34a2ac2d570d49e9787a0df6a)

  1. … 10 more files in changeset.
HADOOP-14556. S3A to support Delegation Tokens.

Contributed by Steve Loughran and Daryn Sharp.

    • -6
    • +74
    ./s3a/AWSCredentialProviderList.java
    • -9
    • +24
    ./s3a/SimpleAWSCredentialsProvider.java
    • -37
    • +52
    ./s3a/TemporaryAWSCredentialsProvider.java
    • -0
    • +70
    ./s3a/auth/AbstractAWSCredentialProvider.java
    • -0
    • +170
    ./s3a/auth/AbstractSessionCredentialsProvider.java
  1. … 87 more files in changeset.
HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found.

Contributed by Adam Antal.

  1. … 2 more files in changeset.
Revert "HADOOP-14556. S3A to support Delegation Tokens."

This reverts commit d7152332b32a575c3a92e3f4c44b95e58462528d.

    • -74
    • +6
    ./s3a/AWSCredentialProviderList.java
    • -24
    • +9
    ./s3a/SimpleAWSCredentialsProvider.java
    • -52
    • +37
    ./s3a/TemporaryAWSCredentialsProvider.java
    • -70
    • +0
    ./s3a/auth/AbstractAWSCredentialProvider.java
    • -170
    • +0
    ./s3a/auth/AbstractSessionCredentialsProvider.java
  1. … 90 more files in changeset.
HADOOP-14556. S3A to support Delegation Tokens.

Contributed by Steve Loughran.

    • -6
    • +74
    ./s3a/AWSCredentialProviderList.java
    • -9
    • +24
    ./s3a/SimpleAWSCredentialsProvider.java
    • -37
    • +52
    ./s3a/TemporaryAWSCredentialsProvider.java
    • -0
    • +70
    ./s3a/auth/AbstractAWSCredentialProvider.java
    • -0
    • +170
    ./s3a/auth/AbstractSessionCredentialsProvider.java
  1. … 90 more files in changeset.