Clone Tools
  • last updated 10 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7437: Storage Plugin for Generic HTTP REST API

  1. … 35 more files in changeset.
DRILL-7530: Fix class names in loggers

1. Fix incorrect class names for loggers.

2. Minor code cleanup.

closes #1957

  1. … 55 more files in changeset.
DRILL-7441: Fix issues with fillEmpties, offset vectors

Fixes subtle issues with offset vectors and "fill empties"

logic.

Drill has an informal standard that if a batch has no rows, then

offset vectors within that batch should have zero size. Contrast

this with batches of size 1 that should have offset vectors of

size 2. Changed to enforce this rule throughout.

Nullable, repeated and variable-width vectors have "fill empties"

logic that is used in two places: when setting the value count and

when preparing to write a new value. The current logic is not

quite right for either case. Added tests and fixed the code to

properly handle each case.

Revised the batch validator to enforce the offset-vector length of 0 for

0-sized batches rule. The result was much simpler code.

Added tools to easily print a batch, restoring some code that

was recently lost when the RowSet classes were moved.

Code cleanup in all files touched.

Added logic to "dirty" allocated buffers when testing to ensure

logic is not sensitive to the "pristine" state of new buffers.

Added logic to the column writers to enforce the zero-size-batch rule

for offset vectors. Added unit tests for this case.

Fixed the column writers to set the "lastSet" mutator value for

nullable types since other code relies on this value.

Removed the "setCount" field in nullable vectors: turns out

it is not actually used.

closes #1896

  1. … 43 more files in changeset.
DRILL-7436: Fix record count, vector structure issues in several operators

Adds additional vector checks to the BatchValidator.

Enables checking for the following operators:

* FilterRecordBatch

* PartitionLimitRecordBatch

* UnnestRecordBatch

* HashAggBatch

* RemovingRecordBatch

Fixes vector count issues for each of these.

Fixes empty-batch (record count = 0) handling in several of the

above operators. Added a method to VectorContainer to correctly

create an empty batch. (An empty batch, counter-intuitively,

needs vectors allocated to hold the 0 value in the first

position of each offset vector.)

Disables verbose logging for MongoDB tests. Details are written to

the log rather than the console.

Disables two invalid Mongo tests. See DRILL-7428.

Adjusts the expression tree materializer to not add the LATE type

to Union vectors. (See DRILL-7435.)

Ensures that Union vectors contain valid vectors for each subtype.

The present fix is a work-around, see DRILL-7434 for a better

long-term fix.

Cleans up code formatting and other minor issues in each file touched

during the fixes in this PR.

  1. … 36 more files in changeset.
DRILL-7359: Add support for DICT type in RowSet Framework

closes #1870

  1. … 82 more files in changeset.
DRILL-7376: Drill ignores Hive schema for MaprDB tables when group scan has star column

  1. … 3 more files in changeset.
DRILL-7369: Schema for MaprDB tables is not used for the case when several fields are queried

closes #1852

DRILL-7362: COUNT(*) on JSON with outer list results in JsonParse error

closes #1849

  1. … 3 more files in changeset.
DRILL-7313: Use Hive schema for MaprDB native reader when field was empty

- Added all_text_mode option for hive maprDB Json

- Improved logic to convert Hive's schema into Drill's one

- Added unit tests for schema conversion

  1. … 27 more files in changeset.
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

  1. … 223 more files in changeset.
DRILL-7060: Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' (#1663)

  1. … 7 more files in changeset.
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

  1. … 102 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 980 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 228 more files in changeset.
DRILL-6242 Use java.time.Local{Date|Time|DateTime} for Drill Date, Time, Timestamp types. (#3)

close apache/drill#1247

* DRILL-6242 - Use java.time.Local{Date|Time|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java

Fix merge conflicts and check style.

  1. … 43 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2055 more files in changeset.
DRILL-6118: Handle item star columns during project / filter push down and directory pruning

1. Added DrillFilterItemStarReWriterRule to re-write item star fields to regular field references.

2. Refactored DrillPushProjectIntoScanRule to handle item star fields, factored out helper classes and methods from PreUitl.class.

3. Fixed issue with dynamic star usage (after Calcite upgrade old usage of star was still present, replaced WILDCARD -> DYNAMIC_STAR for clarity).

4. Added unit tests to check project / filter push down and directory pruning with item star.

  1. … 26 more files in changeset.
DRILL-6049: Misc. hygiene and code cleanup changes

close apache/drill#1085

  1. … 123 more files in changeset.
DRILL-5919: Add non-numeric support for JSON processing

1. Added two session options store.json.reader.non_numeric_numbers and store.json.reader.non_numeric_numbers that allow to read/write NaN and Infinity as numbers. By default these options

are set to true.

2. Extended signature of convert_toJSON and convert_fromJSON functions by adding second optional parameter

that enables/disables read/write NaN and Infinity. By default it is set true.

3. Added unit tests with nan, infitity values for math and aggregate functions

4. Replaced JsonReader's constructors with builder.

This closes #1026

  1. … 16 more files in changeset.
DRILL-5864: Selecting a non-existing field from a MapR-DB JSON table fails with NPE.

    • -0
    • +94
    ./JsonReaderUtils.java
  1. … 4 more files in changeset.
DRILL-5355: Misc. code cleanup closes #784

  1. … 23 more files in changeset.
DRILL-3562: Query fails when using flatten on JSON data where some documents have an empty array

closes #713

  1. … 2 more files in changeset.
DRILL-4653: Malformed JSON should not stop the entire query from progressing

This closes #518

  1. … 8 more files in changeset.
DRILL-4479: For empty fields under all_text_mode enabled (a) use varchar for the default columns and (b) ensure we create fields corresponding to all columns.

close apache/drill#420

  1. … 3 more files in changeset.
DRILL-4184: Support variable length decimal fields in parquet

  1. … 68 more files in changeset.
DRILL-4382: Remove dependency on drill-logical from vector package

  1. … 80 more files in changeset.
DRILL-2288: Fix ScanBatch violation of IterOutcome protocol and downstream chain of bugs.

Increments:

2288: Pt. 1 Core: Added unit test. [Drill2288GetColumnsMetadataWhenNoRowsTest, empty.json]

2288: Pt. 1 Core: Changed HBase test table #1's # of regions from 1 to 2. [HBaseTestsSuite]

Also added TODO(DRILL-3954) comment about # of regions.

2288: Pt. 2 Core: Documented IterOutcome much more clearly. [RecordBatch]

Also edited some related Javadoc.

2288: Pt. 2 Hyg.: Edited doc., added @Override, etc. [AbstractRecordBatch, RecordBatch]

Purged unused SetupOutcome.

Added @Override.

Edited comments.

Fix some comments to doc. comments.

2288: Pt. 3 Core&Hyg.: Added validation of IterOutcome sequence. [IteratorValidatorBatchIterator]

Also:

Renamed internal members for clarity.

Added comments.

2288: Pt. 4 Core: Fixed a NONE -> OK_NEW_SCHEMA in ScanBatch.next(). [ScanBatch]

(With nearby comments.)

2288: Pt. 4 Hyg.: Edited comments, reordered, whitespace. [ScanBatch]

Reordered

Added comments.

Aligned.

2288: Pt. 4 Core+: Fixed UnionAllRecordBatch to receive IterOutcome sequence right. (3659) [UnionAllRecordBatch]

2288: Pt. 5 Core: Fixed ScanBatch.Mutator.isNewSchema() to stop spurious "new schema" reports (fix short-circuit OR, to call resetting method right). [ScanBatch]

2288: Pt. 5 Hyg.: Renamed, edited comments, reordered. [ScanBatch, SchemaChangeCallBack, AbstractSingleRecordBatch]

Renamed getSchemaChange -> getSchemaChangedAndReset.

Renamed schemaChange -> schemaChanged.

Added doc. comments.

Aligned.

2288: Pt. 6 Core: Avoided dummy Null.IntVec. column in JsonReader when not needed (MapWriter.isEmptyMap()). [JsonReader, 3 vector files]

2288: Pt. 6 Hyg.: Edited comments, message. Fixed message formatting. [RecordReader, JSONFormatPlugin, JSONRecordReader, AbstractMapVector, JsonReader]

Fixed message formatting.

Edited comments.

Edited message.

Fixed spurious line break.

2288: Pt. 7 Core: Added column families in HBaseRecordReader* to avoid dummy Null.IntVec. clash. [HBaseRecordReader]

2288: Pt. 8 Core.1: Cleared recordCount in OrderedPartitionRecordBatch.innerNext(). [OrderedPartitionRecordBatch]

2288: Pt. 8 Core.2: Cleared recordCount in ProjectRecordBatch.innerNext. [ProjectRecordBatch]

2288: Pt. 8 Core.3: Cleared recordCount in TopNBatch.innerNext. [TopNBatch]

2288: Pt. 9 Core: Had UnorderedReceiverBatch reset RecordBatchLoader's record count. [UnorderedReceiverBatch, RecordBatchLoader]

2288: Pt. 9 Hyg.: Added comments. [RecordBatchLoader]

2288: Pt. 10 Core: Worked around mismatched map child vectors in MapVector.getObject(). [MapVector]

2288: Pt. 11 Core: Added OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

2288: Pt. 12 Core: Fixed memory leak in BaseTestQuery's printing.

Fixed bad skipping of RecordBatchLoader.clear(...) and

QueryDataBatch.load(...) for zero-row batches in printResult(...).

Also, dropped suppression of call to

VectorUtil.showVectorAccessibleContent(...) (so zero-row batches are

as visible as others).

2288: Pt. 13 Core: Fixed test that used unhandled periods in column alias identifiers.

2288: Misc.: Added # of rows to showVectorAccessibleContent's output. [VectorUtil]

2288: Misc.: Added simple/partial toString() [VectorContainer, AbstractRecordReader, JSONRecordReader, BaseValueVector, FieldSelection, AbstractBaseWriter]

2288: Misc. Hyg.: Added doc. comments to VectorContainer. [VectorContainer]

2288: Misc. Hyg.: Edited comment. [DrillStringUtils]

2288: Misc. Hyg.: Clarified message for unhandled identifier containing period.

2288: Pt. 3 Core&Hyg. Upd.: Added schema comparison result to logging. [IteratorValidatorBatchIterator]

2288: Pt. 7 Core Upd.: Handled HBase columns too re NullableIntVectors. [HBaseRecordReader, TestTableGenerator, TestHBaseFilterPushDown]

Created map-child vectors for requested columns.

Added unit test method testDummyColumnsAreAvoided, adding new row to test table,

updated some row counts.

2288: Pt. 7 Hyg. Upd.: Edited comment. [HBaseRecordReader]

2288: Pt. 11 Core Upd.: REVERTED all of bad OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

This reverts commit 0939660f4620c03da97f4e1bf25a27514e6d0b81.

2288: Pt. 6 Core Upd.: Added isEmptyMap override in new (just-rebased-in) PromotableWriter. [PromotableWriter]

Adjusted definition and default implementation of isEmptyMap (to handle MongoDB

storage plugin's use of JsonReader).

2288: Pt. 6 Hyg. Upd.: Purged old atLeastOneWrite flag. [JsonReader]

2288: Pt. 14: Disabled newly dying test testNestedFlatten().

  1. … 38 more files in changeset.
DRILL-3229: Miscellaneous Union-type fixes

closes #207

closes #180

  1. … 38 more files in changeset.
DRILL-3229: Implement Union type vector

  1. … 53 more files in changeset.
DRILL-3773: Fix Mongo FieldSelection

Mongo plugin was previously rewriting a complex (multi-level) column reference as a simple selection of the top level field.

This changeset does not change this behavior in terms of the filter sent to mongo, but it add the original selected column to the list that will be read in by the JSON reader once that data is returned from mongo.

What this means is that we will be requesting more data from mongo that necessary (as we were previously), but this will be leveraging the existing functionality in the JSON reader to grab only the sub-selection actually requested in the query. This allows for difficult schema changes to be avoided by projecting only columns without schema changes.

This also fixes and adds unit tests for FieldSelection that cause wrong results when selecting a nested column and its parent.

  1. … 7 more files in changeset.