Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7725: Updates to the EVF2 framework

* Supports internal implicit columns

* Better support for standard conversions

* Handle several reader corner cases

* Simplified file reader

closes #2073

    • -1
    • +2
    ./scan/TestScanOperExecOuputSchema.java
    • -1
    • +1
    ./scan/TestScanOrchestratorImplicitColumns.java
    • -92
    • +95
    ./scan/convert/TestDirectConverter.java
    • -45
    • +7
    ./scan/project/TestSchemaSmoothing.java
    • -0
    • +124
    ./scan/v3/TestFixedReceiver.java
    • -46
    • +38
    ./scan/v3/file/TestFileDescrip.java
  1. … 67 more files in changeset.
DRILL-7701: EVF V2 Scan Framework

Revises the scan framework to use the revised schema resolution

introduced in DRILL-7696.

    • -0
    • +59
    ./scan/v3/BaseMockBatchReader.java
    • -0
    • +204
    ./scan/v3/BaseScanTest.java
    • -0
    • +93
    ./scan/v3/ScanFixture.java
    • -0
    • +533
    ./scan/v3/TestScanBasics.java
    • -0
    • +275
    ./scan/v3/TestScanEarlySchema.java
    • -0
    • +378
    ./scan/v3/TestScanLateSchema.java
    • -0
    • +329
    ./scan/v3/TestScanOuputSchema.java
    • -0
    • +203
    ./scan/v3/TestScanOverflow.java
    • -0
    • +115
    ./scan/v3/file/BaseFileScanTest.java
    • -0
    • +375
    ./scan/v3/file/TestFileScan.java
    • -0
    • +262
    ./scan/v3/file/TestFileScanLifecycle.java
    • -0
    • +109
    ./scan/v3/file/TestImplicitColumnLoader.java
    • -0
    • +468
    ./scan/v3/lifecycle/BaseTestScanLifecycle.java
    • -0
    • +298
    ./scan/v3/lifecycle/TestMissingColumnLoader.java
  1. … 43 more files in changeset.
DRILL-6168: Revise format plugin table functions

Allows table functions to inherit properties from a

defined format plugin.

Also DRILL-7612: enforces immutability for all format plugins.

  1. … 46 more files in changeset.
DRILL-7696: EVF v2 scan schema resolution

Provides the mechanism to resolve the scan schema from a

projection list, provided schema, early reader schema and

actual reader schema.

    • -33
    • +0
    ./scan/project/TestNullColumnLoader.java
    • -0
    • +58
    ./scan/v3/file/FileScanUtils.java
    • -0
    • +38
    ./scan/v3/file/MockFileNames.java
    • -0
    • +149
    ./scan/v3/file/TestFileDescrip.java
    • -0
    • +555
    ./scan/v3/file/TestImplicitColumnResolver.java
    • -0
    • +71
    ./scan/v3/schema/BaseTestSchemaTracker.java
    • -0
    • +179
    ./scan/v3/schema/TestDynamicSchemaFilter.java
    • -0
    • +357
    ./scan/v3/schema/TestProjectedPath.java
    • -0
    • +472
    ./scan/v3/schema/TestProjectionParser.java
    • -0
    • +470
    ./scan/v3/schema/TestScanSchemaTracker.java
    • -0
    • +1052
    ./scan/v3/schema/TestScanSchemaTrackerMaps.java
    • -0
    • +276
    ./scan/v3/schema/TestSchemaTrackerDefined.java
    • -0
    • +172
    ./scan/v3/schema/TestSchemaTrackerEarlyReaderSchema.java
    • -0
    • +609
    ./scan/v3/schema/TestSchemaTrackerInputSchema.java
  1. … 61 more files in changeset.
DRILL-7640: EVF-based JSON Loader

Builds on the JSON structure parser and several other PRs

to provide an enhanced, robust mechanism to read JSON data

into value vectors via the EVF. This is not the JSON reader,

rather it is the "V2" version of the JsonProcessor which

does the actual JSON parsing/loading work.

closes #2023

  1. … 41 more files in changeset.
DRILL-7330: Implement metadata usage for all format plugins

    • -11
    • +49
    ./scan/TestScanOrchestratorImplicitColumns.java
  1. … 58 more files in changeset.
DRILL-7601: Shift column conversion to reader from scan framework

Allows the column writers to be generic, moves scan-specific

conversions into each reader where needed, implemented in

a reader-specific way.

Adds a revised way of handling projections in the result set

loader that is not coupled with conversion, as the prior

design was.

Updates the CSV, Avro, Log and HDF5 readers.

closes #1993

    • -329
    • +0
    ./scan/TestFileMetadataColumnParser.java
    • -341
    • +0
    ./scan/TestFileMetadataProjection.java
    • -0
    • +329
    ./scan/TestImplicitColumnParser.java
    • -0
    • +328
    ./scan/TestImplicitColumnProjection.java
    • -44
    • +70
    ./scan/TestScanOperExecOuputSchema.java
    • -15
    • +15
    ./scan/TestScanOrchestratorEarlySchema.java
    • -0
    • +437
    ./scan/TestScanOrchestratorImplicitColumns.java
  1. … 206 more files in changeset.
DRILL-7583: Remove STOP status from operator outcome

Now that all operators have been converted to throw

exceptions on error condistions, the STOP status is

unused. This patch removes the STOP status and the

related kill() and killIncoming() methods. The

"kill" methods are replaced by "cancel" methods which

handle "normal" case cancellation, such as for

LIMIT.

closes #1981

    • -1
    • +1
    ./join/TestLateralJoinCorrectness.java
    • -24
    • +30
    ./limit/TestLimitBatchEmitOutcome.java
    • -5
    • +4
    ./protocol/TestOperatorRecordBatch.java
    • -15
    • +15
    ./unnest/MockLateralJoinBatch.java
    • -3
    • +2
    ./unnest/TestUnnestWithLateralCorrectness.java
  1. … 68 more files in changeset.
DRILL-7576: Fail fast for operator errors

Converts operators to fail with a UserException rather than using

the STOP iterator status. The result is clearer error messages

and simpler code.

closes #1975

    • -0
    • +132
    ./TestStackAnalyzer.java
    • -15
    • +17
    ./partitionsender/TestPartitionSender.java
    • -36
    • +19
    ./unnest/TestUnnestWithLateralCorrectness.java
  1. … 62 more files in changeset.
DRILL-7574: Generalize the projection parser

Adds support for multi-dimensional arrays, and columns

projected as both an array and a map.

closes #1974

    • -18
    • +11
    ./scan/TestColumnsArrayParser.java
    • -100
    • +0
    ./scan/TestFileMetadataColumnParser.java
    • -109
    • +1
    ./scan/TestScanOrchestratorEarlySchema.java
    • -1
    • +0
    ./scan/project/TestSchemaSmoothing.java
  1. … 41 more files in changeset.
DRILL-7530: Fix class names in loggers

1. Fix incorrect class names for loggers.

2. Minor code cleanup.

closes #1957

    • -1
    • +0
    ./join/TestLateralJoinCorrectnessBatchProcessing.java
    • -1
    • +4
    ./protocol/TestOperatorRecordBatch.java
  1. … 52 more files in changeset.
DRILL-7507: Convert fragment interrupts to exceptions

Modifies fragment interrupt handling to throw a specialized

exception, rather than relying on the complex and cumbersome

STOP iterator status.

closes #1949

    • -1
    • +1
    ./unnest/TestUnnestWithLateralCorrectness.java
  1. … 14 more files in changeset.
DRILL-7506: Simplify code gen error handling

Pushes code gen error handling close to the code gen itself to

allow clearer error messages. Doing so avoids the need to bubble

code gen exceptions up the call stack, resulting in cleaner

operator code.

closes #1948

    • -4
    • +5
    ./join/TestLateralJoinCorrectness.java
    • -51
    • +59
    ./project/TestProjectEmitOutcome.java
  1. … 37 more files in changeset.
DRILL-7502: Invalid codegen for typeof() with UNION

Also fixes DRILL-6362: typeof() reports NULL for primitive

columns with a NULL value.

typeof() is meant to return "NULL" if a UNION has a NULL

value, but the column type when known, such as for non-UNION

columns.

Also fixes DRILL-7499: sqltypeof() function with an array returns

"ARRAY", not type. This was due to treating REPEATED like LIST.

Handling of the Union vector in code gen is problematic

with about three special cases. Existing code handled two

of the cases. This change handles the third case.

Figuring out the change required poking around quite a bit

of unclear code. Added comments and restructuring to make

that code a bit more clear.

The fix modified code gen for the Union Holder. It can now

"go back in time" to add the union reader at the point we

need it.

closes #1945

    • -13
    • +13
    ./TopN/TestTopNSchemaChanges.java
  1. … 53 more files in changeset.
DRILL-7487: Removes the unused OUT_OF_MEMORY iterator status

See JIRA ticket for full explanation.

closes #1930

    • -95
    • +1
    ./join/TestLateralJoinCorrectness.java
    • -28
    • +27
    ./unnest/TestUnnestCorrectness.java
    • -28
    • +27
    ./unnest/TestUnnestWithLateralCorrectness.java
  1. … 38 more files in changeset.
DRILL-6832: Removes the old "unmanaged" external sort

When the "managed" external sort was implemented a couple

of years back, we retained the original "unmanaged" version

out of an abundance of caution. The new version is now

battle tested and it is time to retire the original one.

closes #1929

    • -0
    • +133
    ./xsort/SortTestUtilities.java
    • -0
    • +377
    ./xsort/TestCopier.java
    • -0
    • +191
    ./xsort/TestExternalSortExec.java
    • -0
    • +726
    ./xsort/TestExternalSortInternals.java
    • -0
    • +196
    ./xsort/TestLenientAllocation.java
    • -0
    • +116
    ./xsort/TestShortArrays.java
    • -57
    • +9
    ./xsort/TestSimpleExternalSort.java
    • -0
    • +727
    ./xsort/TestSortEmitOutcome.java
    • -0
    • +626
    ./xsort/TestSortImpl.java
    • -26
    • +2
    ./xsort/TestSortSpillWithException.java
    • -0
    • +658
    ./xsort/TestSorter.java
    • -133
    • +0
    ./xsort/managed/SortTestUtilities.java
    • -191
    • +0
    ./xsort/managed/TestExternalSortExec.java
  1. … 50 more files in changeset.
DRILL-7393: Revisit Drill tests to ensure that patching is executed before any test run

- Added BaseTest with patchers and extended all tests from it.

- Added a test to java-exec module to ensure that all tests there are inherited from BaseTest.

- Revised exception handling in the patchers, now it's individual for each patching method.

closes #1910

    • -1
    • +2
    ./common/HashTableAllocationTrackerTest.java
    • -1
    • +2
    ./join/TestBatchSizePredictorImpl.java
    • -1
    • +2
    ./join/TestBuildSidePartitioningImpl.java
    • -1
    • +2
    ./join/TestHashJoinHelperSizeCalculatorImpl.java
    • -1
    • +2
    ./join/TestHashJoinMemoryCalculator.java
    • -1
    • +2
    ./join/TestHashTableSizeCalculatorConservativeImpl.java
    • -1
    • +2
    ./join/TestHashTableSizeCalculatorLeanImpl.java
    • -1
    • +2
    ./join/TestPostBuildCalculationsImpl.java
    • -1
    • +2
    ./svremover/AbstractGenericCopierTest.java
  1. … 128 more files in changeset.
DRILL-7456: Batch count fixes for 12 operators

Enables batch validation for 12 additional operators:

* MergingRecordBatch

* OrderedPartitionRecordBatch

* RangePartitionRecordBatch

* TraceRecordBatch

* UnionAllRecordBatch

* UnorderedReceiverBatch

* UnpivotMapsRecordBatch

* WindowFrameRecordBatch

* TopNBatch

* HashJoinBatch

* ExternalSortBatch

* WriterRecordBatch

Fixes issues found with those checks so that this set of

operators passes all checks.

Includes code cleanup in many files touched during this

work.

closes #1906

  1. … 43 more files in changeset.
DRILL-7450: Improve performance for ANALYZE command

- Implement two-phase aggregation for the lowest metadata aggregate to optimize performance

- Allow using complex functions with hash aggregate

- Use hash aggregation for PHASE_1of2 for ANALYZE to reduce memory usage and avoid sorting non-aggregated data

- Add sort above hash aggregation to fix correctness of merge exchange and stream aggregate

closes #1907

    • -63
    • +241
    ./agg/TestAggWithAnyValue.java
    • -102
    • +103
    ./agg/TestHashAggEmitOutcome.java
  1. … 58 more files in changeset.
DRILL-7442: Create multi-batch row set reader

Adds a ResultSetReader that works across multiple batches

in a result set. Reuses the same row set and readers if

schema is unchanged, creates a new set if the schema changes.

Adds a unit test for the result set reader.

Adds a "rebind" capability to the row set readers to focus

on new buffers under an existing set of vectors. Used when

a new batch arrives, if the schema is unchanged.

Extends row set classses to be aware of the BatchAccessor class

which encapsulates a container and optional selection vector,

and tracks schema changes.

Moves row set tests into the same package as the row sets.

(Row set classes were moved a while back, but the tests were

not moved.)

Renames some BatchAccessor methods.

closes #1897

    • -2
    • +2
    ./protocol/TestOperatorRecordBatch.java
    • -16
    • +16
    ./scan/TestFileScanFramework.java
    • -17
    • +17
    ./scan/TestScanOperExecEarlySchema.java
    • -20
    • +20
    ./scan/TestScanOperExecLateSchema.java
    • -8
    • +8
    ./scan/TestScanOperExecOuputSchema.java
  1. … 55 more files in changeset.
DRILL-7445: Create batch copier based on result set framework

The result set framework now provides both a reader and writer.

This PR provides a copier that copies batches using this

framework. Such a copier can:

- Copy selected records

- Copy all records, such as for an SV2 or SV4

The copier uses the result set loader to create uniformly-sized

output batches from input batches of any size. It does this

by merging or splitting input batches as needed.

Since the result set reader handles both SV2 and SV4s, the

copier can filter or reorder rows based on the SV associated

with the input batch.

This version assumes single stream of input batches, and handles

any schema changes in that input by creating output batches

that track the input schema. This would be used in, say, the

selection vector remover. A different design is needed for merging

such as in the merging receiver.

Adds a "copy" method to the column writers. Copy is implemented

by doing a direct memory copy from source to destination vectors.

A unit test verifies functionality for various use cases

and data types.

closes #1899

    • -5
    • +4
    ./protocol/TestOperatorRecordBatch.java
  1. … 39 more files in changeset.
DRILL-7441: Fix issues with fillEmpties, offset vectors

Fixes subtle issues with offset vectors and "fill empties"

logic.

Drill has an informal standard that if a batch has no rows, then

offset vectors within that batch should have zero size. Contrast

this with batches of size 1 that should have offset vectors of

size 2. Changed to enforce this rule throughout.

Nullable, repeated and variable-width vectors have "fill empties"

logic that is used in two places: when setting the value count and

when preparing to write a new value. The current logic is not

quite right for either case. Added tests and fixed the code to

properly handle each case.

Revised the batch validator to enforce the offset-vector length of 0 for

0-sized batches rule. The result was much simpler code.

Added tools to easily print a batch, restoring some code that

was recently lost when the RowSet classes were moved.

Code cleanup in all files touched.

Added logic to "dirty" allocated buffers when testing to ensure

logic is not sensitive to the "pristine" state of new buffers.

Added logic to the column writers to enforce the zero-size-batch rule

for offset vectors. Added unit tests for this case.

Fixed the column writers to set the "lastSet" mutator value for

nullable types since other code relies on this value.

Removed the "setCount" field in nullable vectors: turns out

it is not actually used.

closes #1896

    • -11
    • +11
    ./join/TestMergeJoinAdvanced.java
    • -12
    • +11
    ./writer/TestCorruptParquetDateCorrection.java
  1. … 40 more files in changeset.
DRILL-7439: Batch count fixes for six additional operators

Enables vector checks, and fixes batch count and vector issues for:

* StreamingAggBatch

* RuntimeFilterRecordBatch

* FlattenRecordBatch

* MergeJoinBatch

* NestedLoopJoinBatch

* LimitRecordBatch

Also fixes a zero-size batch validity issue for the CSV reader when

all files contain no data.

Includes code cleanup for files touched in this PR.

closes #1893

  1. … 21 more files in changeset.
DRILL-7436: Fix record count, vector structure issues in several operators

Adds additional vector checks to the BatchValidator.

Enables checking for the following operators:

* FilterRecordBatch

* PartitionLimitRecordBatch

* UnnestRecordBatch

* HashAggBatch

* RemovingRecordBatch

Fixes vector count issues for each of these.

Fixes empty-batch (record count = 0) handling in several of the

above operators. Added a method to VectorContainer to correctly

create an empty batch. (An empty batch, counter-intuitively,

needs vectors allocated to hold the 0 value in the first

position of each offset vector.)

Disables verbose logging for MongoDB tests. Details are written to

the log rather than the console.

Disables two invalid Mongo tests. See DRILL-7428.

Adjusts the expression tree materializer to not add the LATE type

to Union vectors. (See DRILL-7435.)

Ensures that Union vectors contain valid vectors for each subtype.

The present fix is a work-around, see DRILL-7434 for a better

long-term fix.

Cleans up code formatting and other minor issues in each file touched

during the fixes in this PR.

  1. … 36 more files in changeset.
DRILL-7424: Project operator fails to set the container row count

Enabled the "batch validator" for the Project operator. Ran tests.

Exceptions occurred because, in some paths, the Project operator

fails to set the container row count.

Fixes the project operator. Cleans up formatting issues in files

touched during the investigation. Cleaned up batch-related issues

in Project.

  1. … 8 more files in changeset.
DRILL-7414: EVF incorrectly sets buffer writer index after rollover

Enabling the vector validator on the "new" scan operator, in cases

in which overflow occurs, identified that the DrillBuf writer index

was not properly set for repeated vectors.

Enables such checking, adds unit tests, and fixes the writer index

issue.

closes #1878

  1. … 5 more files in changeset.
DRILL-7403: Validate batch checks, vector integretity in unit tests

Enhances the existing record batch checks to check all the various

batch record counts, and to more fully validate all vector types.

This code revealed that virtually all record batches have

problems: they omit setting some record count or other, they

introduce some form of vector corruption.

Since we want things to work as we make fixes, this change enables

the checks for only one record batch: the "new" scan. Others are

to come as they are fixed.

closes #1871

    • -70
    • +48
    ./validate/TestBatchValidator.java
  1. … 3 more files in changeset.
DRILL-6096: Provide mechanism to configure text writer configuration

1. Usage of format plugin configuration allows to specify line and field delimiters, quotes and escape characters.

2. Usage of system / session options allows to specify if writer should add headers, force quotes.

closes #1873

    • -0
    • +264
    ./writer/TestTextWriter.java
  1. … 19 more files in changeset.
DRILL-7359: Add support for DICT type in RowSet Framework

closes #1870

    • -12
    • +12
    ./scan/project/projSet/TestProjectionSet.java
  1. … 82 more files in changeset.
DRILL-7326: Support repeated lists for CTAS parquet format

closes #1844

  1. … 4 more files in changeset.