HADOOP-16490. Avoid/handle cached 404s during S3A file creation. Contributed by Steve Loughran.
This patch avoids issuing any HEAD path req…
HADOOP-16490. Avoid/handle cached 404s during S3A file creation.Contributed by Steve Loughran.This patch avoids issuing any HEAD path request when creating a file with overwrite=true,so 404s will not end up in the S3 load balancers unless someone calls getFileStatus/exists/isFilein their own code.The Hadoop FsShell CommandWithDestination class is modified to not register uncreated filesfor deleteOnExit(), because that calls exists() and so can place the 404 in the cache, evenafter S3A is patched to not do it itself.Because S3Guard knows when a file should be present, it adds a special FileNotFound retry policyindependently configurable from other retry policies; it is also exponential, but withdifferent parameters. This is because every HEAD request will refresh any 404 cached inthe S3 Load Balancers. It's not enough to retry: we have to have a suitable gap betweenattempts to (hopefully) ensure any cached entry wil be gone.The options and values are:fs.s3a.s3guard.consistency.retry.interval: 2sfs.s3a.s3guard.consistency.retry.limit: 7The S3A copy() method used during rename() raises a RemoteFileChangedException which is not caughtso not downgraded to false. Thus: when a rename is unrecoverable, this fact is propagated.Copy operations without S3Guard lack the confidence that the file exists, so don't retry the same way:it will fail fast with a different error message. However, because create(path, overwrite=false) nolonger does HEAD path, we can at least be confident that S3A itself is not creating those cached404 markers.Change-Id: Ia7807faad8b9a8546836cb19f816cccf17cca26d