Clone Tools
  • last updated 23 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7254: Read Hive union w/o nulls

    • -0
    • +12
    ./templates/AbstractFieldWriter.java
    • -0
    • +212
    ./templates/UnionVectorListWriter.java
    • -0
    • +268
    ./templates/UnionVectorWriter.java
  1. … 13 more files in changeset.
DRILL-7373: Fix problems involving reading from DICT type

- Fixed FieldIdUtil to resolve reading from DICT for some complex cases;

- optimized reading from DICT given a key by passing an appropriate Object type to DictReader#find(...) and DictReader#read(...) methods when schema is known (e.g. when reading from Hive tables) instead of generating it on fly based on int or String path and key type;

- fixed error when accessing value by not existing key value in Avro table.

  1. … 8 more files in changeset.
DRILL-7376: Drill ignores Hive schema for MaprDB tables when group scan has star column

  1. … 3 more files in changeset.
DRILL-7252: Read Hive map using Dict<K,V> vector

  1. … 16 more files in changeset.
DRILL-7341: Vector reAlloc may fail after exchange

closes #1838

    • -1
    • +8
    ./templates/VariableLengthVectors.java
  1. … 2 more files in changeset.
DRILL-7337: Add vararg UDFs support

  1. … 37 more files in changeset.
DRILL-7315: Revise precision and scale order in the method arguments

    • -4
    • +4
    ./templates/AbstractPromotableFieldWriter.java
  1. … 16 more files in changeset.
DRILL-7313: Use Hive schema for MaprDB native reader when field was empty

- Added all_text_mode option for hive maprDB Json

- Improved logic to convert Hive's schema into Drill's one

- Added unit tests for schema conversion

  1. … 27 more files in changeset.
DRILL-7253: Read Hive struct w/o nulls

  1. … 17 more files in changeset.
DRILL-7258: Remove field width limit for text reader

The V2 text reader enforced a limit of 64K characters when using

column headers, but not when using the columns[] array. The V3 reader

enforced the 64K limit in both cases.

This patch removes the limit in both cases. The limit now is the

16MB vector size limit. With headers, no one column can exceed 16MB.

With the columns[] array, no one row can exceed 16MB. (The 16MB

limit is set by the Netty memory allocator.)

Added an "appendBytes()" method to the scalar column writer which adds

additional bytes to those already written for a specific column or

array element value. The method is implemented for VarChar, Var16Char

and VarBinary vectors. It throws an exception for all other types.

When used with a type conversion shim, the appendBytes() method throws

an exception. This should be OK because, the previous setBytes() should

have failed because a huge value is not acceptable for numeric or date

types conversions.

Added unit tests of the append feature, and for the append feature in

the batch overflow case (when appending bytes causes the vector or

batch to overflow.) Also added tests to verify the lack of column width

limit with the text reader, both with and without headers.

closes #1802

  1. … 24 more files in changeset.
DRILL-7257: Set nullable var-width vector lastSet value

Turns out this is due to a subtle issue with variable-width nullable

vectors. Such vectors have a lastSet attribute in the Mutator class.

When using "transfer pairs" to copy values, the code somehow decides

to zero-fill from the lastSet value to the record count. The row set

framework did not set this value, meaning that the RemovingRecordBatch

zero-filled the dir0 column when it chose to use transfer pairs rather

than copying values. The use of transfer pairs occurs when all rows in

a batch pass the filter prior to the removing record batch.

Modified the nullable vector writer to properly set the lastSet value at

the end of each batch. Added a unit test to verify the value is set

correctly.

Includes a bit of code clean-up.

    • -11
    • +14
    ./templates/NullableValueVectors.java
  1. … 8 more files in changeset.
DRILL-7251: Read Hive array w/o nulls

1. HiveFieldConverter replaced by Hive writers for primitives

2. Created HiveValueWriterFactory and HiveListWriter to implement arrays support

4. Readers generation replaced by HiveDefaultRecordReader and HiveTextRecordReader

5. Few reader initializers replaced by one

6. Added method to repeated vardecimal writer

7. Minor fix for array column in View

  1. … 53 more files in changeset.
DRILL-7143: Support default value for empty columns

Modifies the prior work to add default values for columns. The prior work added defaults

when the entire column is missing from a reader (the old Nullable Int column). The Row

Set mechanism now will also "fill empty" slots with the default value.

Added default support for the column writers. The writers automatically obtain the

default value from the column schema. The default can also be set explicitly on

the column writer.

Updated the null column mechanism to use this feature rather than the ad-hoc

implemention in the prior commit.

Semantics changed a bit. Only Required columns take a default. The default value

is ignored or nullable columns since nullable columns already have a file default: NULL.

Other changes:

* Updated the CSV-with-schema tests to illustrate the new behavior.

* Made multiple fixes for Boolean and Decimal columns and added unit tests.

* Upgraded Fremarker to version 2.3.28 to allow use of the continue statement.

* Reimplemented the Bit column reader and writer to use the BitVector directly since this vector is rather special.

* Added get/set Boolean methods for column accessors

* Moved the BooleanType class to the common package

* Added more CSV unit tests to explore decimal types, booleans, and defaults

* Add special handling for blank fields in from-string conversions

* Added options to the conversion factory to specify blank-handling behavior.

CSV uses a mapping of blanks to null (nullable) or default value (non-nullable)

closes #1726

    • -111
    • +145
    ./templates/ColumnAccessors.java
  1. … 72 more files in changeset.
DRILL-7096: Develop vector for canonical Map<K,V>

- Added new type DICT;

- Created value vectors for the type for single and repeated modes;

- Implemented corresponding FieldReaders and FieldWriters;

- Made changes in EvaluationVisitor to be able to read values from the map by key;

- Made changes to DrillParquetGroupConverter to be able to read Parquet's MAP type;

- Added an option `store.parquet.reader.enable_map_support` to disable reading MAP type as DICT from Parquet files;

- Updated AvroRecordReader to use new DICT type for Avro's MAP;

- Added support of the new type to ParquetRecordWriter.

    • -2
    • +29
    ./templates/AbstractFieldReader.java
    • -0
    • +34
    ./templates/AbstractFieldWriter.java
    • -0
    • +5
    ./templates/AbstractPromotableFieldWriter.java
    • -0
    • +151
    ./templates/RepeatedDictWriter.java
  1. … 94 more files in changeset.
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

    • -36
    • +147
    ./templates/ColumnAccessors.java
  1. … 224 more files in changeset.
DRILL-7086: Output schema for row set mechanism

Enhances the row set mechanism to take an "output schema" that describes the vectors to

create. The "input schema" describes the type that the reader would like to write. A

conversion mechanism inserts a conversion shim to convert from the input to output type.

Provides a set of implicit type conversions, including string-to-date/time conversions

which use the new format property stored in column metadata. Includes unit tests for

the new functionality.

closes #1690

    • -22
    • +25
    ./templates/BasicTypeHelper.java
  1. … 64 more files in changeset.
DRILL-6533: Allow using literal values in functions which expect FieldReader instead of ValueHolder

closes #1617

  1. … 2 more files in changeset.
DRILL-6962: Function coalesce returns an Error when none of the columns in coalesce exist in a parquet file

- Updated UntypedNullVector to hold value count when vector is allocated and transfered to another one;

- Updated RecordBatchLoader and DrillCursor to handle case when only UntypedNull values are present in RecordBatch (special case when data buffer is null but actual values are present);

- Added functions to cast UntypedNull value to other types for use in UDFs;

- Moved UntypedReader, UntypedHolderReaderImpl and UntypedReaderImpl from org.apache.drill.exec.vector.complex.impl to org.apache.drill.exec.vector package.

closes #1614

  1. … 15 more files in changeset.
DRILL-6797: Fix UntypedNull handling for complex types

  1. … 12 more files in changeset.
DRILL-6783: CAST string literal as INTERVAL MONTH/YEAR works inconsistently when selecting from a table with multiple rows

close apache/drill#1496

  1. … 1 more file in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 981 more files in changeset.
DRILL-6676: Add Union, List and Repeated List types to Result Set Loader

Adds required functionalty to the list and repeated list vectors.

Row set accessor changes

Adds a "variant" type to model both unions and (non-repeated) lists (which can act as a repeated union, among other things.)

Adds union, list and repeated list support to the result set loader and associated classes.

Copied much of the general documentation from my private Wiki into mark-down files.

closes #1429

  1. … 67 more files in changeset.
DRILL-6596: Fix fillEmpties and set methods for Nullable variable length vectors to not use emptyByteArray

closes #1377

    • -16
    • +6
    ./templates/NullableValueVectors.java
DRILL-6578: Handle query cancellation in Parquet reader

closes #1360

    • -1
    • +3
    ./templates/VariableLengthVectors.java
  1. … 1 more file in changeset.
DRILL-6530: JVM crash with a query involving multiple json files with one file having a schema change of one column from string to list

This closes #1343

DRILL-6147: Adding Columnar Parquet Batch Sizing functionality

closes #1330

    • -1
    • +11
    ./templates/VariableLengthVectors.java
  1. … 42 more files in changeset.
DRILL-6461: Added basic data correctness tests for hash agg, and improved operator unit testing framework.

git closes #1344

  1. … 35 more files in changeset.
DRILL-6421: Refactor DecimalUtility and CoreDecimalUtility classes

closes #1267

  1. … 12 more files in changeset.
DRILL-6402: Repeated Value Vectors copyFrom methods are not updating the value count and writer index correctly for values vector

DRILL-6242 Use java.time.Local{Date|Time|DateTime} for Drill Date, Time, Timestamp types. (#3)

close apache/drill#1247

* DRILL-6242 - Use java.time.Local{Date|Time|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java

Fix merge conflicts and check style.

  1. … 40 more files in changeset.