Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

  1. … 224 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2066 more files in changeset.
DRILL-5355: Misc. code cleanup closes #784

  1. … 23 more files in changeset.
DRILL-3987: (MOVE) Extract key vector, field reader, complex/field writer classes.

    • -89
    • +0
    ./SingleLikeRepeatedMapReaderImpl.java
  1. … 176 more files in changeset.
DRILL-2288: Fix ScanBatch violation of IterOutcome protocol and downstream chain of bugs.

Increments:

2288: Pt. 1 Core: Added unit test. [Drill2288GetColumnsMetadataWhenNoRowsTest, empty.json]

2288: Pt. 1 Core: Changed HBase test table #1's # of regions from 1 to 2. [HBaseTestsSuite]

Also added TODO(DRILL-3954) comment about # of regions.

2288: Pt. 2 Core: Documented IterOutcome much more clearly. [RecordBatch]

Also edited some related Javadoc.

2288: Pt. 2 Hyg.: Edited doc., added @Override, etc. [AbstractRecordBatch, RecordBatch]

Purged unused SetupOutcome.

Added @Override.

Edited comments.

Fix some comments to doc. comments.

2288: Pt. 3 Core&Hyg.: Added validation of IterOutcome sequence. [IteratorValidatorBatchIterator]

Also:

Renamed internal members for clarity.

Added comments.

2288: Pt. 4 Core: Fixed a NONE -> OK_NEW_SCHEMA in ScanBatch.next(). [ScanBatch]

(With nearby comments.)

2288: Pt. 4 Hyg.: Edited comments, reordered, whitespace. [ScanBatch]

Reordered

Added comments.

Aligned.

2288: Pt. 4 Core+: Fixed UnionAllRecordBatch to receive IterOutcome sequence right. (3659) [UnionAllRecordBatch]

2288: Pt. 5 Core: Fixed ScanBatch.Mutator.isNewSchema() to stop spurious "new schema" reports (fix short-circuit OR, to call resetting method right). [ScanBatch]

2288: Pt. 5 Hyg.: Renamed, edited comments, reordered. [ScanBatch, SchemaChangeCallBack, AbstractSingleRecordBatch]

Renamed getSchemaChange -> getSchemaChangedAndReset.

Renamed schemaChange -> schemaChanged.

Added doc. comments.

Aligned.

2288: Pt. 6 Core: Avoided dummy Null.IntVec. column in JsonReader when not needed (MapWriter.isEmptyMap()). [JsonReader, 3 vector files]

2288: Pt. 6 Hyg.: Edited comments, message. Fixed message formatting. [RecordReader, JSONFormatPlugin, JSONRecordReader, AbstractMapVector, JsonReader]

Fixed message formatting.

Edited comments.

Edited message.

Fixed spurious line break.

2288: Pt. 7 Core: Added column families in HBaseRecordReader* to avoid dummy Null.IntVec. clash. [HBaseRecordReader]

2288: Pt. 8 Core.1: Cleared recordCount in OrderedPartitionRecordBatch.innerNext(). [OrderedPartitionRecordBatch]

2288: Pt. 8 Core.2: Cleared recordCount in ProjectRecordBatch.innerNext. [ProjectRecordBatch]

2288: Pt. 8 Core.3: Cleared recordCount in TopNBatch.innerNext. [TopNBatch]

2288: Pt. 9 Core: Had UnorderedReceiverBatch reset RecordBatchLoader's record count. [UnorderedReceiverBatch, RecordBatchLoader]

2288: Pt. 9 Hyg.: Added comments. [RecordBatchLoader]

2288: Pt. 10 Core: Worked around mismatched map child vectors in MapVector.getObject(). [MapVector]

2288: Pt. 11 Core: Added OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

2288: Pt. 12 Core: Fixed memory leak in BaseTestQuery's printing.

Fixed bad skipping of RecordBatchLoader.clear(...) and

QueryDataBatch.load(...) for zero-row batches in printResult(...).

Also, dropped suppression of call to

VectorUtil.showVectorAccessibleContent(...) (so zero-row batches are

as visible as others).

2288: Pt. 13 Core: Fixed test that used unhandled periods in column alias identifiers.

2288: Misc.: Added # of rows to showVectorAccessibleContent's output. [VectorUtil]

2288: Misc.: Added simple/partial toString() [VectorContainer, AbstractRecordReader, JSONRecordReader, BaseValueVector, FieldSelection, AbstractBaseWriter]

2288: Misc. Hyg.: Added doc. comments to VectorContainer. [VectorContainer]

2288: Misc. Hyg.: Edited comment. [DrillStringUtils]

2288: Misc. Hyg.: Clarified message for unhandled identifier containing period.

2288: Pt. 3 Core&Hyg. Upd.: Added schema comparison result to logging. [IteratorValidatorBatchIterator]

2288: Pt. 7 Core Upd.: Handled HBase columns too re NullableIntVectors. [HBaseRecordReader, TestTableGenerator, TestHBaseFilterPushDown]

Created map-child vectors for requested columns.

Added unit test method testDummyColumnsAreAvoided, adding new row to test table,

updated some row counts.

2288: Pt. 7 Hyg. Upd.: Edited comment. [HBaseRecordReader]

2288: Pt. 11 Core Upd.: REVERTED all of bad OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

This reverts commit 0939660f4620c03da97f4e1bf25a27514e6d0b81.

2288: Pt. 6 Core Upd.: Added isEmptyMap override in new (just-rebased-in) PromotableWriter. [PromotableWriter]

Adjusted definition and default implementation of isEmptyMap (to handle MongoDB

storage plugin's use of JsonReader).

2288: Pt. 6 Hyg. Upd.: Purged old atLeastOneWrite flag. [JsonReader]

2288: Pt. 14: Disabled newly dying test testNestedFlatten().

  1. … 38 more files in changeset.
DRILL-3229: Miscellaneous Union-type fixes

closes #207

closes #180

  1. … 34 more files in changeset.
DRILL-3232: Promotable writer

    • -0
    • +188
    ./PromotableWriter.java
  1. … 28 more files in changeset.
DRILL-3229: Implement Union type vector

    • -0
    • +90
    ./UnionListReader.java
  1. … 51 more files in changeset.
DRILL-1942-hygiene

- Formatting

- @Overrides

- finals

- some AutoCloseable additions

- new isCancelled() abstract method on FragmentManager, implemented on subclasses

Added missing new abstract method isCancelled()

Close apache/drill#120

  1. … 24 more files in changeset.
DRILL-1942-templates: template changes with a few related dependencies. This closes #108

  1. … 13 more files in changeset.
DRILL-3353: Fix dropping nested fields

Use the SchemaChangeCallBack in more places to track schema changes

Reset the ephemeral transfer pair when making a new transfer pair for Map or RepeatedMap

  1. … 17 more files in changeset.
DRILL-2292: CTAS broken when we have repeated maps

  1. … 6 more files in changeset.
DRILL-2375: implement reader reset mechanism and reset reader before accessing it during projection

  1. … 3 more files in changeset.
DRILL-2695: Add Support for large in conditions through the use of the Values operator. Update JSON reader to support reading Extended JSON. Update JSON writer to support writing extended JSON data. Update JSON reader to automatically unwrap a file that includes a single top-level array (used by values). Update Options manager to use getOption(<Type>Validator) to directly retrieve typed value. Remove JSON rewinding Add support for CONVERT_TO( [], 'SIMPLEJSON') to disable extended types as part of udf use.

  1. … 60 more files in changeset.
DRILL-2118: inform user with a user friendly error message if kvgen fails due to heterogenous types

  1. … 3 more files in changeset.
DRILL-2280: Refactor ValueVector interface & add an abstract ValueVector implementation

  1. … 21 more files in changeset.
DRILL-1960: Automatic reallocation

  1. … 62 more files in changeset.
DRILL-1885: fix a problem regarding ordinal to vector mapping that report incorrect result or fails a query & refactor code, eliminiate redundancy support case insensitive vector lookup yet case sensitive result reporting rely on fields in the order that they show up in the schema while copying vectors

  1. … 12 more files in changeset.
DRILL-1814: Check if writer state is OK before performing copy in copyAsValue() and copyAsField()

  1. … 1 more file in changeset.
DRILL-1764: return max value capacity if vector has no children

  1. … 6 more files in changeset.
DRILL-1547: enforce writers to explicitly check for buffer bounds to avoid IndexOutOfBounds errors; make writer hierarchy to stop immediately in case of a write error

  1. … 14 more files in changeset.
DRILL-1333: Flatten operator for allowing more complex queryies against repeated data.

  1. … 34 more files in changeset.
DRILL-1402: Add check-style rules for trailing space, TABs and blocks without braces

  1. … 439 more files in changeset.
DRILL-634: Cleanup/organize Java imports and trailing whitespaces from Drill code

  1. … 763 more files in changeset.
DRILL-1324: Add mechanism to detect schema changes when adding a new primitive vector in a Map, RepeatedMap, RepeatedList vector

  1. … 12 more files in changeset.
DRILL-1283: JSON project pushdown.

Allows for users to avoid reading columns of a JSON file, including those that include elements of JSON that drill does not currently support. This can be used to query a subset of an existing file while avoiding elements like schema changes in some columns or nulls in lists that are currently not compatible with Drill.

Patch was revised based on Hanifi's review comments, and then rebased off of the merge branch.

  1. … 21 more files in changeset.
DRILL-1252: Implement Complex parquet and json writers

  1. … 17 more files in changeset.
DRILL-968: Use checkstyle plugin to prevent inadvertent use of shaded Guava classes

+ Disallow non-static '*' imports in handwritten code.

+ Updated the current code to be in compliance.

+ Run 'rat' plugin in 'validate' phase.

  1. … 102 more files in changeset.
DRILL-935: Run-time code generation support for function which decodes string/varbinary into complex JSON object.

  1. … 23 more files in changeset.
DRILL-927: Run-time code generation support for reading Complex Type.

Fix in RepeatedMapVector.

    • -0
    • +89
    ./SingleLikeRepeatedMapReaderImpl.java
  1. … 21 more files in changeset.