Clone Tools
  • last updated 20 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7576: Fail fast for operator errors

Converts operators to fail with a UserException rather than using

the STOP iterator status. The result is clearer error messages

and simpler code.

closes #1975

  1. … 66 more files in changeset.
DRILL-7324: Final set of "batch count" fixes

Final set of fixes for batch count/record count issues. Enables

vector checking for all operators.

closes #1912

  1. … 19 more files in changeset.
DRILL-7456: Batch count fixes for 12 operators

Enables batch validation for 12 additional operators:

* MergingRecordBatch

* OrderedPartitionRecordBatch

* RangePartitionRecordBatch

* TraceRecordBatch

* UnionAllRecordBatch

* UnorderedReceiverBatch

* UnpivotMapsRecordBatch

* WindowFrameRecordBatch

* TopNBatch

* HashJoinBatch

* ExternalSortBatch

* WriterRecordBatch

Fixes issues found with those checks so that this set of

operators passes all checks.

Includes code cleanup in many files touched during this

work.

closes #1906

  1. … 43 more files in changeset.
DRILL-7436: Fix record count, vector structure issues in several operators

Adds additional vector checks to the BatchValidator.

Enables checking for the following operators:

* FilterRecordBatch

* PartitionLimitRecordBatch

* UnnestRecordBatch

* HashAggBatch

* RemovingRecordBatch

Fixes vector count issues for each of these.

Fixes empty-batch (record count = 0) handling in several of the

above operators. Added a method to VectorContainer to correctly

create an empty batch. (An empty batch, counter-intuitively,

needs vectors allocated to hold the 0 value in the first

position of each offset vector.)

Disables verbose logging for MongoDB tests. Details are written to

the log rather than the console.

Disables two invalid Mongo tests. See DRILL-7428.

Adjusts the expression tree materializer to not add the LATE type

to Union vectors. (See DRILL-7435.)

Ensures that Union vectors contain valid vectors for each subtype.

The present fix is a work-around, see DRILL-7434 for a better

long-term fix.

Cleans up code formatting and other minor issues in each file touched

during the fixes in this PR.

  1. … 35 more files in changeset.
DRILL-7257: Set nullable var-width vector lastSet value

Turns out this is due to a subtle issue with variable-width nullable

vectors. Such vectors have a lastSet attribute in the Mutator class.

When using "transfer pairs" to copy values, the code somehow decides

to zero-fill from the lastSet value to the record count. The row set

framework did not set this value, meaning that the RemovingRecordBatch

zero-filled the dir0 column when it chose to use transfer pairs rather

than copying values. The use of transfer pairs occurs when all rows in

a batch pass the filter prior to the removing record batch.

Modified the nullable vector writer to properly set the lastSet value at

the end of each batch. Added a unit test to verify the value is set

correctly.

Includes a bit of code clean-up.

  1. … 7 more files in changeset.
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

  1. … 224 more files in changeset.
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

  1. … 102 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 983 more files in changeset.
DRILL-6687: Updated with review comments

  1. … 1 more file in changeset.
DRILL-6687: Improve RemovingRecordBatch to do transfer when all records needs to be copied Add optimization in SelectionVector2 to enable RemovingRecordBatch to transfer ValueVectors from incoming to output container when all records needs to be copied. Modified FilterRecordBatch and LimitRecordBatch to play by this optimization

    • -0
    • +54
    ./GenericCopierFactory.java
    • -0
    • +69
    ./StraightCopier.java
  1. … 10 more files in changeset.
DRILL-6653: Unsupported Schema change exception where there is no schema change

closes #1422

  1. … 1 more file in changeset.
DRILL-6461: Added basic data correctness tests for hash agg, and improved operator unit testing framework.

git closes #1344

  1. … 31 more files in changeset.
DRILL-6385: Support JPPD feature

  1. … 63 more files in changeset.
DRILL-6446: Support for EMIT outcome in TopN - Added comments for TopNBatch and PriorityQueueTemplate - Adding support for SchemaChange across next() call with HyperVector in incoming container. This is achieved by adding a new method in HyperVectorWrapper which just updates the vector[] array holding multiple vectors with provided input ValueVector array. And also modifying RemovingRecordBatch GenericSV4Copier to hold reference to VectorWrapper instead of ValueVector[] for each column in incoming batch - Handling empty batches. Two cases like empty batches in the begining with EMIT outcome and empty batches between consecutive EMIT outcome but after receiving some batches with data and EMIT outcome. Note: In first case of empty batch it was only returning EMIT outcome without properly creating the output container and SV4 vector. Because of that there could be a case where let's say first batch with EMIT outcome is empty then TopN will return an empty batch with SV mode NONE and if later batch comes with some records and EMIT outcome, that will generate output batch with OK_NEW_SCHEMA (since TopN always generate first output batch with records with OK_NEW_SCHEMA as it returns output with SV4 mode). Also let's consider both batch with EMIT outcome were produced after processing first 2 rows of an input batch. This is a problem as this is simulating schema change across rows of same incoming batch which will never be the case.

Note: In second case of empty batches priority queue will not be null and will be uninitialized. Also optimize to send EMIT outcome with output batch which has all the data to return for current iteration

rather than sending it with OK followed by empty batch with EMIT outcome.

closes #1293

  1. … 7 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 231 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2066 more files in changeset.
DRILL-6327: Update unary operators to handle IterOutcome.EMIT Note: Handles for Non-Blocking Unary operators (like Filter/Project/etc) with EMIT Iter.Outcome

closes #1240

  1. … 16 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 222 more files in changeset.
DRILL-5993: Used generic copiers in the selection vector remover, and implemented testing improvements for RowSets and codegen.

closes #1057

    • -0
    • +95
    ./AbstractCopier.java
    • -0
    • +53
    ./AbstractSV2Copier.java
    • -0
    • +53
    ./AbstractSV4Copier.java
    • -0
    • +96
    ./GenericCopier.java
  1. … 17 more files in changeset.
DRILL-5993: Adding generic copiers which do not require codegen

    • -0
    • +56
    ./GenericSV2Copier.java
    • -0
    • +58
    ./GenericSV4Copier.java
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 365 more files in changeset.
DRILL-5325: Unit tests for the managed sort

Uses the sub-operator test framework (DRILL-5318), including the test

row set abstraction (DRILL-5323) to enable unit testing of the

“managed” external sort. This PR allows early review of the code, but

cannot be pulled until the dependencies (mentioned above) are pulled.

Refactors the external sort code into small chunks that can be unit

tested, then “wraps” that code in tests for all interesting data types,

record batch sizes, and so on.

Refactors some of the operator definitions to more easily allow

programmatic setup in the unit tests.

Fixes a number of bugs discovered by the unit tests. The biggest

changes were in the new code: the code that computes spilling and

merging based on memory levels.

Otherwise, although GitHub will show many files change, most of the

changes are simply moving blocks of code around to create smaller units

that can be tested independently.

Includes a refactoring of the code that does spilling, along with a

complete set of low-level unit tests.

Excludes long-running sort tests.

Defines a test category for long-running tests.

First attempt to provide a way to run such tests from Maven.

closes #808

  1. … 50 more files in changeset.
DRILL-5116: Enable generated code debugging in each Drill operator

DRILL-5052 added the ability to debug generated code. The reviewer suggested

permitting the technique to be used for all Drill operators. This PR provides

the required fixes. Most were small changes, others dealt with the rather

clever way that the existing byte-code merge converted static nested classes

to non-static inner classes, with the way that constructors were inserted

at the byte-code level and so on. See the JIRA for the details.

This code passed the unit tests twice: once with the traditional byte-code

manipulations, a second time using "plain-old Java" code compilation.

Plain-old Java is turned off by default, but can be turned on for all

operators with a single config change: see the JIRA for info. Consider

the plain-old Java option to be experimental: very handy for debugging,

perhaps not quite tested enough for production use.

close apache/drill#716

  1. … 59 more files in changeset.
DRILL-4715: Fix java compilation error in run-time generated code when query has large number of expressions.

Refactor unit test in drillbit context initialization and pass in option manager.

close apache/drill#521

  1. … 53 more files in changeset.
DRILL-4182 TopN schema changes support.

  1. … 7 more files in changeset.
DRILL-3987: (REFACTOR) Common and Vector modules building.

- Extract Accountor interface from Implementation

- Separate FMPP modules to separate out Vector Needs versus external needs

- Separate out Vector classes from those that are VectorAccessible.

- Cleanup Memory Exception hiearchy

  1. … 105 more files in changeset.
DRILL-1942-hygiene: - add AutoCloseable to many classes - minor fixes - formatting

this closes #133

  1. … 30 more files in changeset.
DRILL-3353: Fix dropping nested fields

Use the SchemaChangeCallBack in more places to track schema changes

Reset the ephemeral transfer pair when making a new transfer pair for Map or RepeatedMap

  1. … 17 more files in changeset.
DRILL-2757: Verify operators correctly handle low memory conditions and cancellations

includes:

DRILL-2816: system error does not display the original Exception message

DRILL-2893: ScanBatch throws a NullPointerException instead of returning OUT_OF_MEMORY

DRILL-2894: FixedValueVectors shouldn't set it's data buffer to null when it fails to allocate it

DRILL-2895: AbstractRecordBatch.buildSchema() should properly handle OUT_OF_MEMORY outcome

DRILL-2905: RootExec implementations should properly handle IterOutcome.OUT_OF_MEMORY

DRILL-2920: properly handle OutOfMemoryException

DRILL-2947: AllocationHelper.allocateNew() doesn't have a consistent behavior when it can't allocate

also:

- added UserException.memoryError() with a pre assigned error message

- injection site in ScanBatch and unit test that runs various tpch queries and injects

an exception in the ScanBatch that will cause an OUT_OF_MEMORY outcome to be sent

  1. … 36 more files in changeset.
DRILL-2826: Simplify and centralize Operator Cleanup

- Remove cleanup method from RecordBatch interface

- Make OperatorContext creation and closing the management of FragmentContext

- Make OperatorContext an abstract class and the impl only available to FragmentContext

- Make RecordBatch closing the responsibility of the RootExec

- Make all closes be suppresing closes to maximize memory release in failure

- Add new CloseableRecordBatch interface used by RootExec

- Make RootExec AutoCloseable

- Update RecordBatchCreator to return CloseableRecordBatches so that RootExec can maintain list

- Generate list of operators through change in ImplCreator

  1. … 94 more files in changeset.