Clone Tools
  • last updated 15 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7518: Support INT_64 for nullable INT64 in Parquet

closes #1952

DRILL-5983: Add missing nullable Parquet readers for INT and UINT logical types

closes #1866

    • -6
    • +57
    ./NullableFixedByteAlignedReaders.java
    • -35
    • +37
    ./ParquetFixedWidthDictionaryReaders.java
DRILL-4517: Support reading empty Parquet files

1. Modified flat and complex parquet readers to output schema only when requested number of records to read is 0. In this case readers are not initialized to improve performance.

2. Allowed reading requested number of rows instead of all rows in the row group (DRILL-6528).

3. Fixed issue with nulls number determination in the row group (fixed IsPredicate#isAllNulls method).

4. Allowed reading empty parquet files via adding empty / fake row group.

5. General refactoring and unit tests.

6. Parquet tests categorization.

closes #1839

  1. … 43 more files in changeset.
DRILL-7062: Initial implementation of run-time rowgroup pruning closes #1738

    • -1
    • +2
    ./batchsizing/RecordBatchSizerManager.java
  1. … 22 more files in changeset.
Fixed IllegalStateException while reading Parquet data

DRILL-7155: Create a standard logging message for batch sizes generated by individual operators. This is needed for QA verification of the Batch Size feature DRILL-6238. closes #1716

    • -2
    • +1
    ./batchsizing/RecordBatchSizerManager.java
  1. … 10 more files in changeset.
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

  1. … 223 more files in changeset.
DRILL-7100: Fixed IllegalArgumentException when reading Parquet data

    • -2
    • +2
    ./batchsizing/BatchOverflowOptimizer.java
    • -23
    • +23
    ./batchsizing/BatchSizingMemoryUtil.java
    • -23
    • +26
    ./batchsizing/RecordBatchSizerManager.java
  1. … 2 more files in changeset.
DRILL-5603: Replace String file paths to Hadoop Path - replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality

closes #1657

  1. … 83 more files in changeset.
DRILL-7018: Fixed Parquet buffer overflow when reading timestamp column

close apache/drill#1630

    • -2
    • +3
    ./NullableFixedByteAlignedReaders.java
DRILL-6874: Close input stream after AsyncPageReaderTask is completed

close apache/drill#1565

DRILL-6410: Fixed memory leak in flat Parquet reader

  1. … 3 more files in changeset.
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

  1. … 102 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

    • -2
    • +2
    ./NullableFixedByteAlignedReaders.java
    • -2
    • +2
    ./ParquetFixedWidthDictionaryReaders.java
  1. … 970 more files in changeset.
DRILL-6709: Extended the batch stats utility to other operators

closes #1444

  1. … 15 more files in changeset.
DRILL-6685: Fixed exception when reading Parquet data

    • -0
    • +18
    ./VarLenAbstractPageEntryReader.java
  1. … 2 more files in changeset.
DRILL-6670: Align Parquet TIMESTAMP_MICROS logical type handling with earlier versions + minor fixes

closes #1428

  1. … 14 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 144 more files in changeset.
DRILL-5495: convert_from function on top of int96 data results in ArrayIndexOutOfBoundsException

    • -4
    • +3
    ./NullableFixedByteAlignedReaders.java
DRILL-5797: Use Parquet new reader on all non-complex columns queries

  1. … 5 more files in changeset.
DRILL-6579: Added sanity checks to the Parquet reader to avoid infinite loops

closes #1361

DRILL-6570: Fixed IndexOutofBoundException in Parquet Reader

DRILL-6560: Enhanced the batch statistics logging enablement

closes #1355

    • -6
    • +11
    ./batchsizing/RecordBatchOverflow.java
    • -17
    • +28
    ./batchsizing/RecordBatchSizerManager.java
  1. … 5 more files in changeset.
DRILL-6147: Adding Columnar Parquet Batch Sizing functionality

closes #1330

    • -32
    • +32
    ./NullableFixedByteAlignedReaders.java
    • -22
    • +22
    ./ParquetFixedWidthDictionaryReaders.java
  1. … 29 more files in changeset.
DRILL-6421: Refactor DecimalUtility and CoreDecimalUtility classes

closes #1267

  1. … 13 more files in changeset.
DRILL-6353: Upgrade Parquet MR dependencies

closes #1259

  1. … 17 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 231 more files in changeset.
DRILL-5846: Improve parquet performance for Flat Data Types

closes #1060

    • -0
    • +265
    ./VarLenAbstractEntryReader.java
    • -0
    • +163
    ./VarLenBulkPageReader.java
    • -0
    • +172
    ./VarLenColumnBulkEntry.java
    • -0
    • +570
    ./VarLenColumnBulkInput.java
    • -0
    • +104
    ./VarLenEntryDictionaryReader.java
    • -0
    • +134
    ./VarLenEntryReader.java
    • -0
    • +75
    ./VarLenFixedEntryReader.java
    • -0
    • +132
    ./VarLenNullableDictionaryReader.java
    • -0
    • +166
    ./VarLenNullableEntryReader.java
    • -0
    • +91
    ./VarLenNullableFixedEntryReader.java
  1. … 12 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

    • -1
    • +1
    ./NullableFixedByteAlignedReaders.java
    • -2
    • +2
    ./ParquetFixedWidthDictionaryReaders.java
  1. … 2052 more files in changeset.
DRILL-6094: Decimal data type enhancements

Add ExprVisitors for VARDECIMAL

Modify writers/readers to support VARDECIMAL

- Added usage of VarDecimal for parquet, hive, maprdb, jdbc;

- Added options to store decimals as int32 and int64 or fixed_len_byte_array or binary;

Add UDFs for VARDECIMAL data type

- modify type inference rules

- remove UDFs for obsolete DECIMAL types

Enable DECIMAL data type by default

Add unit tests for DECIMAL data type

Fix mapping for NLJ when literal with non-primitive type is used in join conditions

Refresh protobuf C++ source files

Changes in C++ files

Add support for decimal logical type in Avro.

Add support for date, time and timestamp logical types.

Update Avro version to 1.8.2.

    • -72
    • +58
    ./NullableFixedByteAlignedReaders.java
    • -41
    • +56
    ./ParquetFixedWidthDictionaryReaders.java
  1. … 196 more files in changeset.