Clone Tools
  • last updated 19 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7441: Fix issues with fillEmpties, offset vectors

Fixes subtle issues with offset vectors and "fill empties"

logic.

Drill has an informal standard that if a batch has no rows, then

offset vectors within that batch should have zero size. Contrast

this with batches of size 1 that should have offset vectors of

size 2. Changed to enforce this rule throughout.

Nullable, repeated and variable-width vectors have "fill empties"

logic that is used in two places: when setting the value count and

when preparing to write a new value. The current logic is not

quite right for either case. Added tests and fixed the code to

properly handle each case.

Revised the batch validator to enforce the offset-vector length of 0 for

0-sized batches rule. The result was much simpler code.

Added tools to easily print a batch, restoring some code that

was recently lost when the RowSet classes were moved.

Code cleanup in all files touched.

Added logic to "dirty" allocated buffers when testing to ensure

logic is not sensitive to the "pristine" state of new buffers.

Added logic to the column writers to enforce the zero-size-batch rule

for offset vectors. Added unit tests for this case.

Fixed the column writers to set the "lastSet" mutator value for

nullable types since other code relies on this value.

Removed the "setCount" field in nullable vectors: turns out

it is not actually used.

closes #1896

  1. … 43 more files in changeset.
DRILL-7436: Fix record count, vector structure issues in several operators

Adds additional vector checks to the BatchValidator.

Enables checking for the following operators:

* FilterRecordBatch

* PartitionLimitRecordBatch

* UnnestRecordBatch

* HashAggBatch

* RemovingRecordBatch

Fixes vector count issues for each of these.

Fixes empty-batch (record count = 0) handling in several of the

above operators. Added a method to VectorContainer to correctly

create an empty batch. (An empty batch, counter-intuitively,

needs vectors allocated to hold the 0 value in the first

position of each offset vector.)

Disables verbose logging for MongoDB tests. Details are written to

the log rather than the console.

Disables two invalid Mongo tests. See DRILL-7428.

Adjusts the expression tree materializer to not add the LATE type

to Union vectors. (See DRILL-7435.)

Ensures that Union vectors contain valid vectors for each subtype.

The present fix is a work-around, see DRILL-7434 for a better

long-term fix.

Cleans up code formatting and other minor issues in each file touched

during the fixes in this PR.

  1. … 36 more files in changeset.
DRILL-7403: Validate batch checks, vector integretity in unit tests

Enhances the existing record batch checks to check all the various

batch record counts, and to more fully validate all vector types.

This code revealed that virtually all record batches have

problems: they omit setting some record count or other, they

introduce some form of vector corruption.

Since we want things to work as we make fixes, this change enables

the checks for only one record batch: the "new" scan. Others are

to come as they are fixed.

closes #1871

  1. … 3 more files in changeset.
DRILL-7350: Move RowSet related classes from test folder

  1. … 292 more files in changeset.
DRILL-7310: Move schema-related classes from exec module to be able to use them in metastore module

closes #1816

  1. … 102 more files in changeset.
DRILL-6952: Host compliant text reader on the row set framework

The result set loader allows controlling batch sizes. The new scan framework

built on top of that framework handles projection, implicit columns, null

columns and more. This commit converts the "new" ("compliant") text reader

to use the new framework. Options select the use of the V2 ("new") or V3

(row-set based) versions. Unit tests demonstrate V3 functionality.

closes #1683

  1. … 58 more files in changeset.
DRILL-6901: Move schema builder to src/main

Moves the SchemaBuilder class out of the src/test name space into the src/main namespace. Specifically, into the existing record.metadata package.

Many files changed in this move. Corrected two minor issues: import of the wrong Arrays class and unnecessary annotations.

  1. … 89 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 231 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2066 more files in changeset.
DRILL-6210: Enhanced test schema utilities

closes #1150

  1. … 55 more files in changeset.
DRILL-6027: - Added memory claculator - Added unit tests and docs. - Fixed IOB caused by output vector allocation. - Don't double count records that were spilled in HashJoin

  1. … 55 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 365 more files in changeset.
DRILL-5832: Change OperatorFixture to use system option manager

- Rename FixtureBuilder to ClusterFixtureBuilder

- Provide alternative way to reset system/session options

- Fix for DRILL-5833: random failure in TestParquetWriter

- Provide strict, but clear, errors for missing options

closes #970

  1. … 51 more files in changeset.
DRILL-5657: Size-aware vector writer structure

- Vector and accessor layer

- Row Set layer

- Tuple and column models

- Revised write-time metadata

- "Result set loader" layer

this closes #914

  1. … 187 more files in changeset.
DRILL-5504: Add vector validator to diagnose offset vector issues

Validates offset vectors in VarChar and repeated vectors. Validates the

special case of repeated VarChar vectors (two layers of offsets.)

Provides two new session variables to turn on validation. One enables

the existing operator (iterator) validation, the other adds vector

validation. This allows validation to occur in a “production” Drill

(without restarting Drill with assertions, as previously required.)

Unit tests validate the validator. Another test validates the

integration, but requires manual steps, so is ignored by default.

This version is first-cut: all work is done within a single class.

Allows back-porting to an earlier version to solve a specific issues. A

revision should move some of the work into generated code (or refactor

vectors to allow outside access), since offset vectors appear for each

subclass; not on a base class that would allow generic operations.

* Added boot-time options to allow enabling vector validation in Maven

unit tests.

* Code cleanup per suggestions.

* Additional (manual) tests for boot-time options and default options.

closes #832

    • -0
    • +323
    ./TestBatchValidator.java
    • -0
    • +135
    ./TestValidationOptions.java
  1. … 8 more files in changeset.