Clone
 

steve loughran <stevel@apache.org> in hadoop

HADOOP-16478. S3Guard bucket-info fails if the caller lacks s3:GetBucketLocation.

Contributed by Steve Loughran.

Includes HADOOP-16651. S3 getBucketLocation() can return "US" for us-east.

Change-Id: Ifc0dca76e51495ed1a8fc0f077b86bf125deff40

HADOOP-16635. S3A "directories only" scan still does a HEAD.

Contributed by Steve Loughran.

Change-Id: I5e41d7f721364c392e1f4344db83dfa8c5aa06ce

Revert "HADOOP-15870. S3AInputStream.remainingInFile should use nextReadPos."

This reverts commit 7a4b3d42c4e36e468c2a46fd48036a6fed547853.

The patch broke TestRouterWebHDFSContractSeek as it turns out that

WebHDFSInputStream.available() is always 0.

HADOOP-16118. S3Guard to support on-demand DDB tables.

This is the first step for on-demand operations: things recognize when they are using on-demand tables,

as do the tests.

Contributed by Steve Loughran.

HADOOP-16197 S3AUtils.translateException to map CredentialInitializationException to AccessDeniedException

Contributed by Steve Loughran.

Change-Id: Ie98ca5210bf0009f297edbcacf1fc6dfe5ea70cd.

HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

(cherry picked from commit 366186d9990ef9059b6ac9a19ad24310d6f36d04)

HADOOP-16233. S3AFileStatus to declare that isEncrypted() is always true (#685)

This is needed to fix up some confusion about caching of job.addCache() handling of S3A paths; all parent dirs -the files are downloaded by the NM without using the DTs of the user submitting the job. This means that when you submit jobs to an EC2 cluster with lower IAM permissions than the user, cached resources don't get downloaded and the job doesn't start.

Production code changes:

* S3AFileStatus Adds "true" to the superclass's encrypted flag during construction.

Tests

* Base AbstractContractOpenTest can control whether zero byte files created in tests are encrypted. Not done via an XML attribute, just a subclass point. Thoughts?

* Verify that the filecache considers paths to not have the permissions which trigger reduce-privilege downloads

* And extend ITestDelegatedMRJob to test a completely different bucket (open street map), to verify that cached resources do get their tokens picked up

Docs:

* Advise FS developers to say all files are encrypted. It's otherwise harmless and it'll stop other people seeing impossible to debug error messages on app launch.

Contributed by Steve Loughran.

Change-Id: Ifaae4c9d735ccc5eafeebd2584b65daf2d4e5da3

HADOOP-16218. Findbugs warning of null param to non-nullable method in Configuration with Guava update. (#655)

Change-Id: I461e518ce9a4730b91a8138ad55b39e9a4b0a4b8

HADOOP-16058. S3A tests to include Terasort.

Contributed by Steve Loughran.

This includes

- HADOOP-15890. Some S3A committer tests don't match ITest* pattern; don't run in maven

- MAPREDUCE-7090. BigMapOutput example doesn't work with paths off cluster fs

- MAPREDUCE-7091. Terasort on S3A to switch to new committers

- MAPREDUCE-7092. MR examples to work better against cloud stores

  1. … 8 more files in changeset.
HADOOP-16058. S3A tests to include Terasort.

Contributed by Steve Loughran.

This includes

- HADOOP-15890. Some S3A committer tests don't match ITest* pattern; don't run in maven

- MAPREDUCE-7090. BigMapOutput example doesn't work with paths off cluster fs

- MAPREDUCE-7091. Terasort on S3A to switch to new committers

- MAPREDUCE-7092. MR examples to work better against cloud stores

  1. … 7 more files in changeset.
HADOOP-16195 MarshalledCredentials toString

Change-Id: I4f1bdd2be0d5760c5501dce6edb6122499108b53

HADOOP-16109. Parquet reading S3AFileSystem causes EOF (#589)

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

Change-Id: I39b87f3d5daa98f65de2c0a44e348821a4930573

(cherry picked from commit 9b8044d00b0edb0a597c6fd768e9be6a96da74da)

HADOOP-16109. Parquet reading S3AFileSystem causes EOF (#589)

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

Change-Id: I39b87f3d5daa98f65de2c0a44e348821a4930573

(cherry picked from commit 9b8044d00b0edb0a597c6fd768e9be6a96da74da)

HADOOP-16109. Parquet reading S3AFileSystem causes EOF (#589)

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

Change-Id: I39b87f3d5daa98f65de2c0a44e348821a4930573

HADOOP-16109. Parquet reading S3AFileSystem causes EOF (#589)

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

Change-Id: I39b87f3d5daa98f65de2c0a44e348821a4930573

HADOOP-16109. Parquet reading S3AFileSystem causes EOF

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

HADOOP-16109. Parquet reading S3AFileSystem causes EOF

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

HADOOP-16109. Parquet reading S3AFileSystem causes EOF

Nobody gets seek right. No matter how many times they think they have.

Reproducible test from: Dave Christianson

Fixed seek() logic: Steve Loughran

HADOOP-16068. ABFS Authentication and Delegation Token plugins to optionally be bound to specific URI of the store.

Contributed by Steve Loughran.

  1. … 22 more files in changeset.
HADOOP-16149 hadoop-mapreduce-client-app build not converging due to transient dependencies

Change-Id: If95b12b223770b057041f99f0f8fd8ba370c377f

HADOOP-16105. WASB in secure mode does not set connectingUsingSAS.

Contributed by Steve Loughran.

(cherry picked from commit 9cb2f470b759bbe7609a00e8f8f72779e2daae80)

HADOOP-16105. WASB in secure mode does not set connectingUsingSAS.

Contributed by Steve Loughran.

(cherry picked from commit 9cb2f470b759bbe7609a00e8f8f72779e2daae80)

HADOOP-16105. WASB in secure mode does not set connectingUsingSAS.

Contributed by Steve Loughran.

Revert "HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found."

This reverts commit c4a00d1ad3d3cfc02a6a4e1e04353678f2d588e1.

Revert "HADOOP-15954. ABFS: Enable owner and group conversion for MSI and login user using OAuth."

(accidentally mixed in two patches)

This reverts commit fa8cd1bf28f5b81849ba351a2d7225fbc580350d.

HADOOP-15229. Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.

The new openFile() API is asynchronous, and implemented across FileSystem and FileContext.

The MapReduce V2 inputs are moved to this API, and you can actually set must/may

options to pass in.

This is more useful for setting things like s3a seek policy than for S3 select,

as the existing input format/record readers can't handle S3 select output where

the stream is shorter than the file length, and splitting plain text is suboptimal.

Future work is needed there.

In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific

configuration parameters which can be set in jobs and used to set filesystem input stream

options (seek policy, retry, encryption secrets, etc).

Contributed by Steve Loughran

  1. … 57 more files in changeset.
HDFS-13713. Add specification of Multipart Upload API to FS specification, with contract tests.

Contributed by Ewan Higgs and Steve Loughran.

(cherry picked from commit c1d24f848345f6d34a2ac2d570d49e9787a0df6a)

HADOOP-16079. Token.toString faulting if any token listed can't load.

Contributed by Steve Loughran.

(cherry picked from commit 7f46d13dac8cf85b094f41b3dd68e02c69e5afbc)