DRILL-4203: Parquet File. Date is stored wrongly - Added new extra field in the parquet meta info "is.date.correct = true"; - Removed unnecessary double conversion of value with Julian day; - Added ability to correct corrupted dates for parquet files with the second version old metadata cache file as well. This closes #595
DRILL-4203: Fix date values written in parquet files created by Drill Drill was writing non-standard dates into parquet files for all releases before 1.9.0. The values have been read by Drill correctly by Drill, but external tools like Spark reading the files will see corrupted values for all dates that have been written by Drill.
This change corrects the behavior of the Drill parquet writer to correctly store dates in the format given in the parquet specification.
To maintain compatibility with old files, the parquet reader code has been updated to check for the old format and automatically shift the corrupted values into corrected ones automatically.
The test cases included here should ensure that all files produced by historical versions of Drill will continue to return the same values they had in previous releases. For compatibility with external tools, any old files with corrupted dates can be re-written using the CREATE TABLE AS command (as the writer will now only produce the specification-compliant values, even if after reading out of older corrupt files).
While the old behavior was a consistent shift into an unlikely range to be used in a modern database (over 10,000 years in the future), these are still valid date values. In the case where these may have been written into files intentionally, and we cannot be certain from the metadata if Drill produced the files, an option is included to turn off the auto-correction. Use of this option is assumed to be extremely unlikely, but it is included for completeness.
This patch was originally written against version 1.5.0, when rebasing the corruption threshold was updated to 1.9.0.