Clone
 

jason altekruse <altekrusejason@gmail.com> in drill

DRILL-4551: Implement new functions (cot, regex_matches, split_part, isdate)

DRILL-4482: Fix Avro nested field selection regression

Update some of the Avro tests to properly verify their results,

others still need to be fixed. These will be addressed in DRILL-4110.

Closes #419

DRILL-4437: Operator unit test framework

Closes #394

DRILL-4442: Move getSV2 and getSV4 methods to VectorAccessible

Up one level from previous location RecordBatch, most implementations

already implement the method as they implement RecordBatch rather than

VectorAccessible itself. Add unsupported operation exception to others.

DRILL-4441: Fix varchar data read out of Avro filtering incorrectly due to metadata bug

The precision of the Varchar datatype was not being set causing inconsistent

truncation of values to the default length of 1. Fixed the same issue with varbinary.

The test framework was previously taking a string as the baseline for a binary value,

which cannot express all possible values. Fixed the test to intstead use a byte array.

Thie required updating the hive tests that were using the old method of specifying

baselines with a String.

Fix cast to varbinary when reading from a data source with schema needed for writing

a test.

Updated patch to remove varchar lengths from table creation.

This issue was fixed more generally by DRILL-4465, which provides a default

type length for varchar and varbinary during the setup of calcite. This update now

just provides tests to verify the fix in this case.

Closes #393

DRILL-4448: Clean up deserialization of oderings in sorts

Fix sort operator deserialization and validation to respect existing

contract specified in the tests.

Update version to 1.6.0-SNAPSHOT

    • -3
    • +2
    /contrib/data/tpch-sample-data/pom.xml
  1. … 12 more files in changeset.
DRILL-4383: Allow custom configurations to be specified for a FileSystem plugin

add an example s3 plugin, disabled by default

Closes #375

DRILL-4445: Standardize the Physical and Logical plan nodes to use Lists instead of arrays for their inputs

Remove some extra translation logic used to move between the

two representations.

TODO - look back the the Join logical node, has two JsonCreator annotations,

but only one will be used. Not sure if the behavior of which is chosen

is considered documented behavior, should just fix it on our end.

  1. … 15 more files in changeset.
[maven-release-plugin] prepare release drill-1.5.0

    • -3
    • +2
    /contrib/data/tpch-sample-data/pom.xml
  1. … 12 more files in changeset.
Temporary fix for build issue with generated dependency-reduced-pom.xml

This generated pom file was being discovered and maven was trying to

run the target directory in jdbc-all as a submodule.

This change reverts to the default output location (the module root)

and adds corresponding .gitignore and RAT exclude entries.

More investigation of why this became an issue when we added the

maven-enforcer plugin to the module (and only appears when running a release)

will come in DRILL-4336.

Also updated intergration test for the jdbc-all jar with small

path change, as changing the location of the dependency-reduced-pom.xml

actually changed the directory the test was being executed from.

DRILL-4375: Fix the maven release profile

This generated pom file was being discovered and maven was trying to

run the target directory in jdbc-all as a submodule.

This change reverts to the default output location (the module root)

and adds corresponding .gitignore and RAT exclude entries. NOTE:

this is considered bad practice as generated files should appear in

the target directory and be removed upon a maven clean. This default

location is considered to be a known shortcoming of the shade plugin.

Also updated integration test for the jdbc-all jar with small

path change, as changing the location of the dependency-reduced-pom.xml

actually changed the directory the test was being executed from.

Closes #402

DRILL-4128: Fix NPE when calling getString on a JDBC ResultSet when the type is not varchar

DRILL-4128: Fix NPE when calling getString on a JDBC ResultSet when the type is not varchar

Adding Jason's GPG key

Adding Jason's GPG key

DRILL-4322: Add underlying exception message when IOException causes DROP TABLE failure

This closes #344

DRILL-2653: Improve web UI experience when there is an error in a storage plugin configuration

Fixed success message, made the error messages red

This closes #343

DRILL-4203: Fix DrillVersionInfo to make it provide a valid version number even during the unit tests.

This is now a build-time generated class, rather than one that looks on the

classpath for META-INF files.

This pattern for file generation with parameters passed from the POM files

was borrowed from parquet-mr.

    • -0
    • +90
    /common/src/main/java/org/apache/drill/version/Generator.java
DRILL-4241: Fixing the build, make RAT and checkstyle happy.

DRILL-4203: Fix date values written in parquet files created by Drill

Drill was writing non-standard dates into parquet files for all releases

before 1.9.0. The values have been read by Drill correctly by Drill, but

external tools like Spark reading the files will see corrupted values for

all dates that have been written by Drill.

This change corrects the behavior of the Drill parquet writer to correctly

store dates in the format given in the parquet specification.

To maintain compatibility with old files, the parquet reader code has

been updated to check for the old format and automatically shift the

corrupted values into corrected ones automatically.

The test cases included here should ensure that all files produced by

historical versions of Drill will continue to return the same values they

had in previous releases. For compatibility with external tools, any old

files with corrupted dates can be re-written using the CREATE TABLE AS

command (as the writer will now only produce the specification-compliant

values, even if after reading out of older corrupt files).

While the old behavior was a consistent shift into an unlikely range

to be used in a modern database (over 10,000 years in the future), these are still

valid date values. In the case where these may have been written into

files intentionally, and we cannot be certain from the metadata if Drill

produced the files, an option is included to turn off the auto-correction.

Use of this option is assumed to be extremely unlikely, but it is included

for completeness.

This patch was originally written against version 1.5.0, when rebasing

the corruption threshold was updated to 1.9.0.

Added regenerated binary files, updated metadata cache files accordingly.

One small fix in the ParquetGroupScan to accommodate changes in master that changed

when metadata is read.

Tests for bugs revealed by the regression suite.

Fix drill version number in metadata file generation

  1. … 72 more files in changeset.
Add note about parquet file migration in 1.3

Add note about parquet file migration in 1.3

DRILL-4056: Fix corruption bug reading string data out of Avro

- Fix issue where we are reading a byte array without considering length

- Removed use of unnecessary Holder objects.

- Added restriction on batch size produced by a single call to next.

- Add some basic result verification to avro tests.

This closes #266

DRILL-4056: Fix corruption bug reading string data out of Avro

- Fix issue where we are reading a byte array without considering length

- Removed use of unnecessary Holder objects.

- Added restriction on batch size produced by a single call to next.

- Add some basic result verification to avro tests.

DRILL-4048: Fix reading required dictionary encoded varbinary data in parquet files after recent update

Fix was small, this update is a little larger than necessary because I was hoping to create

a unit test by modifying the one I had added in the earlier patch with the version upgrade.

Unfortunately we don't have a good way to generate Parquet files with required columns from

unit tests right now. So I just added a smaller subset of the binary file that was posted on

the JIRA issue. The refactoring of the earlier test was still useful for readability,

so I kept it in.

DRILL-3876: Avoid an extra copy of the original list when flattening

This only fixes a basic case, a more complete refactoring of the rewrite rule could avoid copies in cases with multiple flattens, this will be addressed in DRILL-3899.

close apache/drill#187

DRILL-3773: Fix Mongo FieldSelection

Mongo plugin was previously rewriting a complex (multi-level) column reference as a simple selection of the top level field.

This changeset does not change this behavior in terms of the filter sent to mongo, but it add the original selected column to the list that will be read in by the JSON reader once that data is returned from mongo.

What this means is that we will be requesting more data from mongo that necessary (as we were previously), but this will be leveraging the existing functionality in the JSON reader to grab only the sub-selection actually requested in the query. This allows for difficult schema changes to be avoided by projecting only columns without schema changes.

This also fixes and adds unit tests for FieldSelection that cause wrong results when selecting a nested column and its parent.

DRILL-4028: Update Drill to leverage latest version of Parquet library.

- Remove references to the shaded version of a Jackson @JsonCreator annotation from parquet, replace with proper fasterxml version.

- Fixing imports using the wrong parquet packages after rebase.

- Fixing issues with Drill parquet read a write path after merging the Drill parquet fork back into mainline.

- Fixed the issue with the writer, needed to flush the RecordConsumer in the ParquetRecordWriter.

- Consolidate page reading code

- Added some test to print out some additional context when an ordered comparison of two datasets fails in a test.

- Fix up parquet API usage in Hive Module.

- Adding unit test to read a write all types in parquet, the decimal types and interval year have some issues.

- Use direct codec factory from new package in the parquet library now that it has been moved.

- Moving the test for Direct Codec Factory out of the Drill source as the class itself has been moved.

- Small fix after consolidating two different ByteBuffer based implementations of BytesInput.

- Small fixes to accommodate interface changes.

- Small changes to remove direct references to DirectCodecFactory, this class is not accessible outside of parquet, but an instance with the same contract is now accessible with a new factory method on CodecFactory.

- Fixed failing test using miniDFS when reading a larger parquet file.

This closes #236

  1. … 42 more files in changeset.
DRILL-1904 - Part 1: (Build fix) Spaces at the end of lines in docs caused a check style violation