Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6418: Handle Schema change in Unnest And Lateral for unnest field / non-unnest field

Note: Changed Lateral to handle non-empty right batch with OK_NEW_SCHEMA

closes #1271

  1. … 10 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2064 more files in changeset.
DRILL-6327: Update unary operators to handle IterOutcome.EMIT Note: Handles for Non-Blocking Unary operators (like Filter/Project/etc) with EMIT Iter.Outcome

closes #1240

  1. … 16 more files in changeset.
DRILL-6340 Output Batch Control in Project using the RecordBatchSizer

Changes required to implement Output Batch Sizing in Project using the RecordBatchSizer.

closes #1302

    • -0
    • +46
    ./OutputSizeEstimateConstants.java
    • -0
    • +147
    ./OutputWidthExpression.java
    • -0
    • +278
    ./OutputWidthVisitor.java
    • -0
    • +37
    ./OutputWidthVisitorState.java
    • -0
    • +310
    ./ProjectMemoryManager.java
  1. … 49 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 221 more files in changeset.
DRILL-6118: Handle item star columns during project / filter push down and directory pruning

1. Added DrillFilterItemStarReWriterRule to re-write item star fields to regular field references.

2. Refactored DrillPushProjectIntoScanRule to handle item star fields, factored out helper classes and methods from PreUitl.class.

3. Fixed issue with dynamic star usage (after Calcite upgrade old usage of star was still present, replaced WILDCARD -> DYNAMIC_STAR for clarity).

4. Added unit tests to check project / filter push down and directory pruning with item star.

  1. … 26 more files in changeset.
DRILL-6049: Misc. hygiene and code cleanup changes

close apache/drill#1085

  1. … 123 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 365 more files in changeset.
DRILL-4735: ConvertCountToDirectScan rule enhancements

1. ConvertCountToDirectScan rule will be applicable for 2 or more COUNT aggregates.

To achieve this DynamicPojoRecordReader was added which accepts any number of columns,

on the contrary with PojoRecordReader which depends on class fields.

AbstractPojoRecordReader class was added to factor out common logic for these two readers.

2. ConvertCountToDirectScan will distinguish between missing, directory and implicit columns.

For missing columns count will be set 0, for implicit to the total records count

since implicit columns are based on files and there is no data without a file.

If directory column will be encountered, rule won't be applied.

CountsCollector class was introduced to encapsulate counts collection logic.

3. MetadataDirectGroupScan class was introduced to indicate to the user when metadata was used

during calculation and for which files it was applied.

DRILL-4735: Changes after code review.

close #900

  1. … 22 more files in changeset.
DRILL-4264: Allow field names to include dots

  1. … 98 more files in changeset.
DRILL-5546: Handle schema change exception failure caused by empty input or empty batch.

1. Modify ScanBatch's logic when it iterates list of RecordReader.

1) Skip RecordReader if it returns 0 row && present same schema. A new schema (by calling Mutator.isNewSchema() ) means either a new top level field is added, or a field in a nested field is added, or an existing field type is changed.

2) Implicit columns are presumed to have constant schema, and are added to outgoing container before any regular column is added in.

3) ScanBatch will return NONE directly (called as "fast NONE"), if all its RecordReaders haver empty input and thus are skipped, in stead of returing OK_NEW_SCHEMA first.

2. Modify IteratorValidatorBatchIterator to allow

1) fast NONE ( before seeing a OK_NEW_SCHEMA)

2) batch with empty list of columns.

2. Modify JsonRecordReader when it get 0 row. Do not insert a nullable-int column for 0 row input. Together with ScanBatch, Drill will skip empty json files.

3. Modify binary operators such as join, union to handle fast none for either one side or both sides. Abstract the logic in AbstractBinaryRecordBatch, except for MergeJoin as its implementation is quite different from others.

4. Fix and refactor union all operator.

1) Correct union operator hanndling 0 input rows. Previously, it will ignore inputs with 0 row and put nullable-int into output schema, which causes various of schema change issue in down-stream operator. The new behavior is to take schema with 0 into account

in determining the output schema, in the same way with > 0 input rows. By doing that, we ensure Union operator will not behave like a schema-lossy operator.

2) Add a UnionInputIterator to simplify the logic to iterate the left/right inputs, removing significant chunk of duplicate codes in previous implementation.

The new union all operator reduces the code size into half, comparing the old one.

5. Introduce UntypedNullVector to handle convertFromJson() function, when the input batch contains 0 row.

Problem: The function convertFromJSon() is different from other regular functions in that it only knows the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have

a way to know the output type, and previously will assume Map type. That works under the assumption other operators like Union would ignore batch with 0 row, which is no longer

the case in the current implementation.

Solution: Use MinorType.NULL at the output type for convertFromJSON() when input contains 0 row. The new UntypedNullVector is used to represent a column with MinorType.NULL.

6. HBaseGroupScan convert star column into list of row_key and column family. HBaseRecordReader should reject column star since it expectes star has been converted somewhere else.

In HBase a column family always has map type, and a non-rowkey column always has nullable varbinary type, this ensures that HBaseRecordReader across different HBase regions will have the same top level schema, even if the region is

empty or prune all the rows due to filter pushdown optimization. In other words, we will not see different top level schema from different HBaseRecordReader for the same table.

However, such change will not be able to handle hard schema change : c1 exists in cf1 in one region, but not in another region. Further work is required to handle hard schema change.

7. Modify scan cost estimation when the query involves * column. This is to remove the planning randomness since previously two different operators could have same cost.

8. Add a new flag 'outputProj' to Project operator, to indicate if Project is for the query's final output. Such Project is added by TopProjectVisitor, to handle fast NONE when all the inputs to the query are empty

and are skipped.

1) column star is replaced with empty list

2) regular column reference is replaced with nullable-int column

3) An expression will go through ExpressionTreeMaterializer, and use the type of materialized expression as the output type

4) Return an OK_NEW_SCHEMA with the schema using the above logic, then return a NONE to down-stream operator.

9. Add unit test to test operators handling empty input.

10. Add unit test to test query when inputs are all empty.

DRILL-5546: Revise code based on review comments.

Handle implicit column in scan batch. Change interface in ScanBatch's constructor.

1) Ensure either the implicit column list is empty, or all the reader has the same set of implicit columns.

2) We could skip the implicit columns when check if there is a schema change coming from record reader.

3) ScanBatch accept a list in stead of iterator, since we may need go through the implicit column list multiple times, and verify the size of two lists are same.

ScanBatch code review comments. Add more unit tests.

Share code path in ProjectBatch to handle normal setupNewSchema() and handleNullInput().

- Move SimpleRecordBatch out of TopNBatch to make it sharable across different places.

- Add Unit test verify schema for star column query against multilevel tables.

Unit test framework change

- Fix memory leak in unit test framework.

- Allow SchemaTestBuilder to pass in BatchSchema.

close #906

  1. … 67 more files in changeset.
DRILL-5399: Fix race condition in DrillComplexWriterFuncHolder

  1. … 10 more files in changeset.
DRILL-5419: Calculate return string length for literals & some string functions

1. Revisited calculation logic for string literals and some string functions

(cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement,

coalesce, first_value, last_value, lag, lead).

Synchronized return type length calculation logic between limit 0 and regular queries.

2. Deprecated width and changed it to precision for string types in MajorType.

3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType.

FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n).

New annotation in UDFs ReturnType will indicate which return type strategy should be used.

4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535.

5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY.

6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder).

This closes #819

  1. … 78 more files in changeset.
DRILL-5355: Misc. code cleanup closes #784

  1. … 23 more files in changeset.
DRILL-5116: Enable generated code debugging in each Drill operator

DRILL-5052 added the ability to debug generated code. The reviewer suggested

permitting the technique to be used for all Drill operators. This PR provides

the required fixes. Most were small changes, others dealt with the rather

clever way that the existing byte-code merge converted static nested classes

to non-static inner classes, with the way that constructors were inserted

at the byte-code level and so on. See the JIRA for the details.

This code passed the unit tests twice: once with the traditional byte-code

manipulations, a second time using "plain-old Java" code compilation.

Plain-old Java is turned off by default, but can be turned on for all

operators with a single config change: see the JIRA for info. Consider

the plain-old Java option to be experimental: very handy for debugging,

perhaps not quite tested enough for production use.

close apache/drill#716

  1. … 61 more files in changeset.
DRILL-4715: Fix java compilation error in run-time generated code when query has large number of expressions.

Refactor unit test in drillbit context initialization and pass in option manager.

close apache/drill#521

  1. … 53 more files in changeset.
DRILL-4679: When convert() functions are present, ensure that ProjectRecordBatch produces a schema even for empty result set.

Add unit tests

Modify doAlloc() to accept record count parameter (addresses review comment)

  1. … 2 more files in changeset.
DRILL-3474: Add implicit file columns support

  1. … 8 more files in changeset.
DRILL-3474: Add implicit file columns support

  1. … 8 more files in changeset.
DRILL-4382: Remove dependency on drill-logical from vector package

  1. … 80 more files in changeset.
DRILL-3581: Upgrade HPPC to 0.7.1

  1. … 14 more files in changeset.
DRILL-3987: (REFACTOR) Common and Vector modules building.

- Extract Accountor interface from Implementation

- Separate FMPP modules to separate out Vector Needs versus external needs

- Separate out Vector classes from those that are VectorAccessible.

- Cleanup Memory Exception hiearchy

  1. … 105 more files in changeset.
DRILL-2288: Fix ScanBatch violation of IterOutcome protocol and downstream chain of bugs.

Increments:

2288: Pt. 1 Core: Added unit test. [Drill2288GetColumnsMetadataWhenNoRowsTest, empty.json]

2288: Pt. 1 Core: Changed HBase test table #1's # of regions from 1 to 2. [HBaseTestsSuite]

Also added TODO(DRILL-3954) comment about # of regions.

2288: Pt. 2 Core: Documented IterOutcome much more clearly. [RecordBatch]

Also edited some related Javadoc.

2288: Pt. 2 Hyg.: Edited doc., added @Override, etc. [AbstractRecordBatch, RecordBatch]

Purged unused SetupOutcome.

Added @Override.

Edited comments.

Fix some comments to doc. comments.

2288: Pt. 3 Core&Hyg.: Added validation of IterOutcome sequence. [IteratorValidatorBatchIterator]

Also:

Renamed internal members for clarity.

Added comments.

2288: Pt. 4 Core: Fixed a NONE -> OK_NEW_SCHEMA in ScanBatch.next(). [ScanBatch]

(With nearby comments.)

2288: Pt. 4 Hyg.: Edited comments, reordered, whitespace. [ScanBatch]

Reordered

Added comments.

Aligned.

2288: Pt. 4 Core+: Fixed UnionAllRecordBatch to receive IterOutcome sequence right. (3659) [UnionAllRecordBatch]

2288: Pt. 5 Core: Fixed ScanBatch.Mutator.isNewSchema() to stop spurious "new schema" reports (fix short-circuit OR, to call resetting method right). [ScanBatch]

2288: Pt. 5 Hyg.: Renamed, edited comments, reordered. [ScanBatch, SchemaChangeCallBack, AbstractSingleRecordBatch]

Renamed getSchemaChange -> getSchemaChangedAndReset.

Renamed schemaChange -> schemaChanged.

Added doc. comments.

Aligned.

2288: Pt. 6 Core: Avoided dummy Null.IntVec. column in JsonReader when not needed (MapWriter.isEmptyMap()). [JsonReader, 3 vector files]

2288: Pt. 6 Hyg.: Edited comments, message. Fixed message formatting. [RecordReader, JSONFormatPlugin, JSONRecordReader, AbstractMapVector, JsonReader]

Fixed message formatting.

Edited comments.

Edited message.

Fixed spurious line break.

2288: Pt. 7 Core: Added column families in HBaseRecordReader* to avoid dummy Null.IntVec. clash. [HBaseRecordReader]

2288: Pt. 8 Core.1: Cleared recordCount in OrderedPartitionRecordBatch.innerNext(). [OrderedPartitionRecordBatch]

2288: Pt. 8 Core.2: Cleared recordCount in ProjectRecordBatch.innerNext. [ProjectRecordBatch]

2288: Pt. 8 Core.3: Cleared recordCount in TopNBatch.innerNext. [TopNBatch]

2288: Pt. 9 Core: Had UnorderedReceiverBatch reset RecordBatchLoader's record count. [UnorderedReceiverBatch, RecordBatchLoader]

2288: Pt. 9 Hyg.: Added comments. [RecordBatchLoader]

2288: Pt. 10 Core: Worked around mismatched map child vectors in MapVector.getObject(). [MapVector]

2288: Pt. 11 Core: Added OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

2288: Pt. 12 Core: Fixed memory leak in BaseTestQuery's printing.

Fixed bad skipping of RecordBatchLoader.clear(...) and

QueryDataBatch.load(...) for zero-row batches in printResult(...).

Also, dropped suppression of call to

VectorUtil.showVectorAccessibleContent(...) (so zero-row batches are

as visible as others).

2288: Pt. 13 Core: Fixed test that used unhandled periods in column alias identifiers.

2288: Misc.: Added # of rows to showVectorAccessibleContent's output. [VectorUtil]

2288: Misc.: Added simple/partial toString() [VectorContainer, AbstractRecordReader, JSONRecordReader, BaseValueVector, FieldSelection, AbstractBaseWriter]

2288: Misc. Hyg.: Added doc. comments to VectorContainer. [VectorContainer]

2288: Misc. Hyg.: Edited comment. [DrillStringUtils]

2288: Misc. Hyg.: Clarified message for unhandled identifier containing period.

2288: Pt. 3 Core&Hyg. Upd.: Added schema comparison result to logging. [IteratorValidatorBatchIterator]

2288: Pt. 7 Core Upd.: Handled HBase columns too re NullableIntVectors. [HBaseRecordReader, TestTableGenerator, TestHBaseFilterPushDown]

Created map-child vectors for requested columns.

Added unit test method testDummyColumnsAreAvoided, adding new row to test table,

updated some row counts.

2288: Pt. 7 Hyg. Upd.: Edited comment. [HBaseRecordReader]

2288: Pt. 11 Core Upd.: REVERTED all of bad OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

This reverts commit 0939660f4620c03da97f4e1bf25a27514e6d0b81.

2288: Pt. 6 Core Upd.: Added isEmptyMap override in new (just-rebased-in) PromotableWriter. [PromotableWriter]

Adjusted definition and default implementation of isEmptyMap (to handle MongoDB

storage plugin's use of JsonReader).

2288: Pt. 6 Hyg. Upd.: Purged old atLeastOneWrite flag. [JsonReader]

2288: Pt. 14: Disabled newly dying test testNestedFlatten().

  1. … 39 more files in changeset.
DRILL-3912: Common subexpression elimination

Closes #189

  1. … 12 more files in changeset.
DRILL-3233: Expression handling for Union types

  1. … 20 more files in changeset.
DRILL-3229: Implement Union type vector

  1. … 54 more files in changeset.
DRILL-3353: Fix dropping nested fields

Use the SchemaChangeCallBack in more places to track schema changes

Reset the ephemeral transfer pair when making a new transfer pair for Map or RepeatedMap

  1. … 17 more files in changeset.
DRILL-2053: Fix incorrect query result when join CTE, by making Project operator use case-insensitive matching for columns in input data stream.

  1. … 1 more file in changeset.
DRILL-2150: Create an abstraction for repeated value vectors.

  1. … 35 more files in changeset.
DRILL-2757: Verify operators correctly handle low memory conditions and cancellations

includes:

DRILL-2816: system error does not display the original Exception message

DRILL-2893: ScanBatch throws a NullPointerException instead of returning OUT_OF_MEMORY

DRILL-2894: FixedValueVectors shouldn't set it's data buffer to null when it fails to allocate it

DRILL-2895: AbstractRecordBatch.buildSchema() should properly handle OUT_OF_MEMORY outcome

DRILL-2905: RootExec implementations should properly handle IterOutcome.OUT_OF_MEMORY

DRILL-2920: properly handle OutOfMemoryException

DRILL-2947: AllocationHelper.allocateNew() doesn't have a consistent behavior when it can't allocate

also:

- added UserException.memoryError() with a pre assigned error message

- injection site in ScanBatch and unit test that runs various tpch queries and injects

an exception in the ScanBatch that will cause an OUT_OF_MEMORY outcome to be sent

  1. … 36 more files in changeset.