Clone Tools
  • last updated 29 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6810: Disable NULL_IF_NULL NullHandling for functions with ComplexWriter closes #1509

  1. … 14 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 144 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 228 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2052 more files in changeset.
DRILL-6340 Output Batch Control in Project using the RecordBatchSizer

Changes required to implement Output Batch Sizing in Project using the RecordBatchSizer.

closes #1302

  1. … 42 more files in changeset.
DRILL-5919: Add non-numeric support for JSON processing

1. Added two session options store.json.reader.non_numeric_numbers and store.json.reader.non_numeric_numbers that allow to read/write NaN and Infinity as numbers. By default these options

are set to true.

2. Extended signature of convert_toJSON and convert_fromJSON functions by adding second optional parameter

that enables/disables read/write NaN and Infinity. By default it is set true.

3. Added unit tests with nan, infitity values for math and aggregate functions

4. Replaced JsonReader's constructors with builder.

This closes #1026

  1. … 15 more files in changeset.
DRILL-5034: Select timestamp from hive generated parquet always return in UTC

- TIMESTAMP_IMPALA function is reverted to retaine local timezone

- TIMESTAMP_IMPALA_LOCALTIMEZONE is deleted

- Retain local timezone for the INT96 timestamp values in the parquet files while

PARQUET_READER_INT96_AS_TIMESTAMP option is on

Minor changes according to the review

Fix for the test, which relies on particular timezone

close #656

  1. … 6 more files in changeset.
Revert "DRILL-4373: Drill and Hive have incompatible timestamp representations in parquet - added sys/sess option "store.parquet.int96_as_timestamp"; - added int96 to timestamp converter for both readers; - added unit tests;"

This reverts commit 7e7214b40784668d1599f265067f789aedb6cf86.

    • -25
    • +10
    ./ConvertFromImpalaTimestamp.java
  1. … 14 more files in changeset.
DRILL-4373: Drill and Hive have incompatible timestamp representations in parquet - added sys/sess option "store.parquet.int96_as_timestamp"; - added int96 to timestamp converter for both readers; - added unit tests;

This closes #600

    • -10
    • +25
    ./ConvertFromImpalaTimestamp.java
  1. … 15 more files in changeset.
DRILL-2908: Fix Parquet for var length vectors where encoding changes across pages. Add unit tests. Add option to make parquet page size and disctionary page size configurable at session level. This closes #162

  1. … 13 more files in changeset.
DRILL-3364: Prune scan range if the filter is on the leading field with byte comparable encoding

The change adds support to perform row-key range pruning when the row-key prefix

is interpreted as UINT4_BE, TIMESTAMP_EPOCH_BE, TIME_EPOCH_BE, DATE_EPOCH_BE,

UINT8_BE encoded.

Testing Done: Added a unit-tests for the new feature, also ran all existing

unit-tests to make sure there is no regression.

    • -0
    • +45
    ./TimeStampEpochBEConvertFrom.java
    • -0
    • +54
    ./TimeStampEpochBEConvertTo.java
    • -0
    • +47
    ./TimeStampEpochConvertFrom.java
    • -0
    • +55
    ./TimeStampEpochConvertTo.java
    • -0
    • +45
    ./UInt4BEConvertFrom.java
    • -0
    • +54
    ./UInt4BEConvertTo.java
    • -0
    • +46
    ./UInt4ConvertFrom.java
    • -0
    • +55
    ./UInt4ConvertTo.java
  1. … 5 more files in changeset.
DRILL-2976: Part 2 - disable extended json in the default convert_toJSON method, the functionality is still accessible in a new convert_toComplexJSON method.

Added unit tests for both versions of the function (in the case of the simple json version I added result verification to an old test)

Address Mehant's review comments.

Fix failing unit test to turn on now disabled option for using extended types in written json files.

  1. … 3 more files in changeset.
DRILL-1460: part 2 - make convert_fromJSON function consistent with the default reader behavior and do not read all numbers as double.

DRILL-2356: Fix round function for exact input types

    • -0
    • +193
    ./RoundFunctions.java
  1. … 2 more files in changeset.
DRILL-2908:Enable reading the Int 96 type from parquet files.

column chunk metadata can be out of order from the column ordering in the schema, even though it exposes both as a list, making them seem like they should correspond, so we have to make our own map between the column names and indexes in the list.

Support for varbinary reading and int96 reading in the new reader.

Support the second version page header, the java library will only dictionary encode fix length byte arrays when the writer version is set to 2.0

Looks to be working in the vectorized reader, need a test case.

Fixed complex reader, was using the wrong field to figure out the length to read.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetFixedWidthDictionaryReaders.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetToDrillTypeConverter.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetGroupConverter.java

UDF for reading impala timestamps from varbinary

Fix for reading fixed binary and int96 columns in the vectorized parquet reader.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/NullableFixedByteAlignedReaders.java

Fix for a bug reading fixed binary and int 96 data out of parquet when the data is plain encoded.

    • -0
    • +50
    ./ConvertFromImpalaTimestamp.java
  1. … 9 more files in changeset.
DRILL-2695: Add Support for large in conditions through the use of the Values operator. Update JSON reader to support reading Extended JSON. Update JSON writer to support writing extended JSON data. Update JSON reader to automatically unwrap a file that includes a single top-level array (used by values). Update Options manager to use getOption(<Type>Validator) to directly retrieve typed value. Remove JSON rewinding Add support for CONVERT_TO( [], 'SIMPLEJSON') to disable extended types as part of udf use.

  1. … 64 more files in changeset.
DRILL-1460: Implement "read_numbers_as_double" option for JSON reader

Conflicts:

contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoRecordReader.java

exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java

exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/conv/JsonConvertFrom.java

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JSONRecordReader.java

exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java

exec/java-exec/src/test/java/org/apache/drill/exec/store/json/TestJsonRecordReader.java

  1. … 7 more files in changeset.
DRILL-2143: Part 1 - just remove RecordBatch from UDF setup method.

Disable date/time functions as they can no longer access incoming recordbatch to get query start time or timezone.

During rebase DateTypeFunctions.CurrentDate was updated to include a commented out vesion of the fix from DRILL-2372,

this will be fixed in the new patch for part 2 of 2143.

Remove RecordBatch from setup of a new new UDFs.

  1. … 110 more files in changeset.
DRILL-1774: Update JSON Reader to do single pass reading and better use Jackson's interning. Also improve projection pushdown support.

  1. … 29 more files in changeset.
Remove extraneous System.out.print statements.

  1. … 6 more files in changeset.
DRILL-1541: Add big endian version of convert to/from for double and float

    • -0
    • +47
    ./DoubleBEConvertFrom.java
    • -0
    • +55
    ./DoubleBEConvertTo.java
    • -0
    • +47
    ./FloatBEConvertFrom.java
    • -0
    • +55
    ./FloatBEConvertTo.java
  1. … 1 more file in changeset.
DRILL-1333: Flatten operator for allowing more complex queryies against repeated data.

  1. … 34 more files in changeset.
DRILL-1400: JsonConvertFrom should use the correct work buffer from JsonReader

  1. … 1 more file in changeset.
DRILL-634: Cleanup/organize Java imports and trailing whitespaces from Drill code

  1. … 755 more files in changeset.
DRILL-1309: Implement ProjectPastFilterPushdown and update DrillScanRel cost model so that exclusive column so that star query is more expensive than exclusive column projection. Various fixes affecting record reaaders to handle `*` column as well as fixes to some test cases.

exclude parquet files from rat check

  1. … 32 more files in changeset.
DRILL-1313: All text mode for json reader

Current implementation handles nulls that appear while in text mode differently depending if they appear in lists or maps. This allows for a null where a list or map is expected to act the same way it does without text mode enabled. For an expected map it just assumes that the field didn't exist, in which case the leaves below become null filled, and for a list it will default to showing an empty list.

If we are actually inside of a list, a null in JSON will be treated the same as the string "null", which improves over the previous behavior of just dropping the null value all together, as we do not support null values within any of the repeated primitive vectors currently.

Patch has been rebased on top of merge branch.

  1. … 12 more files in changeset.
DRILL-1283: JSON project pushdown.

Allows for users to avoid reading columns of a JSON file, including those that include elements of JSON that drill does not currently support. This can be used to query a subset of an existing file while avoiding elements like schema changes in some columns or nulls in lists that are currently not compatible with Drill.

Patch was revised based on Hanifi's review comments, and then rebased off of the merge branch.

  1. … 22 more files in changeset.
DRILL-1138: Explicit casting to boolean fails

+ Renamed ConvertUtil to ByteBufUtil

  1. … 27 more files in changeset.
Switch to DrillBuf Add @Inject DrillBuf Move comparison functions to memory sensitive ones Add scalar replacement functionality for value holders Simplify date parsing function Add local compiled code caching

  1. … 199 more files in changeset.
DRILL-935: Run-time code generation support for function which decodes string/varbinary into complex JSON object.

    • -0
    • +101
    ./JsonConvertFrom.java
  1. … 23 more files in changeset.