Clone Tools
  • last updated 23 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7418: MetadataDirectGroupScan improvements

1. Replaced files listing with selection root information to reduce query plan size in MetadataDirectGroupScan.

2. Fixed MetadataDirectGroupScan ser / de issues.

3. Added PlanMatcher to QueryBuilder for more convenient plan matching.

4. Re-written TestConvertCountToDirectScan to use ClusterTest.

5. Refactoring and code clean up.

  1. … 11 more files in changeset.
DRILL-7412: Minor unit test improvements

Many tests intentionally trigger errors. A debug-only log setting

sent those errors to stdout. The resulting stack dumps simply cluttered

the test output, so disabled error output to the console.

Drill can apply bounds checks to vectors. Tests run via Maven

enable bounds checking. Now, bounds checking is also enabled in

"debug mode" (when assertions are enabled, as in an IDE.)

Drill contains two test frameworks. The older BaseTestQuery was

marked as deprecated, but many tests still use it and are unlikely

to be changed soon. So, removed the deprecated marker to reduce the

number of spurious warnings.

Also includes a number of minor clean-ups.

closes #1876

  1. … 15 more files in changeset.
DRILL-7402: Suppress batch dumps for expected failures in tests

Drill provides a way to dump the last few batches when an error

occurs. However, in tests, we often deliberately cause something

to fail. In this case, the batch dump is unnecessary.

This enhancement adds a config property, disabled in tests, that

controls the dump activity. The option is enabled in the one test

that needs it enabled.

closes #1872

  1. … 3 more files in changeset.
DRILL-6096: Provide mechanism to configure text writer configuration

1. Usage of format plugin configuration allows to specify line and field delimiters, quotes and escape characters.

2. Usage of system / session options allows to specify if writer should add headers, force quotes.

closes #1873

  1. … 19 more files in changeset.
DRILL-7358: Fix COUNT(*) for empty text files

Fixes a subtle error when a text file has a header (and so has a

schema), but is in a COUNT(*) query, so that no columns are

projected. Ensures that, in this case, an empty schema is

treated as a valid result set.

Tests: updated CSV tests to include this case.

closes #1867

  1. … 10 more files in changeset.
DRILL-7368: Fix Iceberg Metastore failure when filter column contains nulls

  1. … 9 more files in changeset.
DRILL-7350: Move RowSet related classes from test folder

  1. … 278 more files in changeset.
DRILL-7310: Move schema-related classes from exec module to be able to use them in metastore module

closes #1816

    • -7
    • +14
    ./rowSet/test/TestRepeatedListAccessors.java
    • -3
    • +2
    ./rowSet/test/TestVariantAccessors.java
  1. … 99 more files in changeset.
DRILL-7306: Disable schema-only batch for new scan framework

The EVF framework is set up to return a "fast schema" empty batch

with only schema as its first batch because, when the code was

written, it seemed that's how we wanted operators to work. However,

DRILL-7305 notes that many operators cannot handle empty batches.

Since the empty-batch bugs show that Drill does not, in fact,

provide a "fast schema" batch, this ticket asks to disable the

feature in the new scan framework. The feature is disabled with

a config option; it can be re-enabled if ever it is needed.

SQL differentiates between two subtle cases, and both are

supported by this change.

1. Empty results: the query found a schema, but no rows

are returned. If no reader returns any rows, but at

least one reader provides a schema, then the scan

returns an empty batch with the schema.

2. Null results: the query found no schema or rows. No

schema is returned. If no reader returns rows or

schema, then the scan returns no batch: it instead

immediately returns a DONE status.

For CSV, an empty file with headers returns the null result set

(because we don't know the schema.) An empty CSV file without headers

returns an empty result set because we do know the schema: it will

always be the columns array.

Old tests validate the original schema-batch mode, new tests

added to validate the no-schema-batch mode.

  1. … 42 more files in changeset.
DRILL-7258: Remove field width limit for text reader

The V2 text reader enforced a limit of 64K characters when using

column headers, but not when using the columns[] array. The V3 reader

enforced the 64K limit in both cases.

This patch removes the limit in both cases. The limit now is the

16MB vector size limit. With headers, no one column can exceed 16MB.

With the columns[] array, no one row can exceed 16MB. (The 16MB

limit is set by the Netty memory allocator.)

Added an "appendBytes()" method to the scalar column writer which adds

additional bytes to those already written for a specific column or

array element value. The method is implemented for VarChar, Var16Char

and VarBinary vectors. It throws an exception for all other types.

When used with a type conversion shim, the appendBytes() method throws

an exception. This should be OK because, the previous setBytes() should

have failed because a huge value is not acceptable for numeric or date

types conversions.

Added unit tests of the append feature, and for the append feature in

the batch overflow case (when appending bytes causes the vector or

batch to overflow.) Also added tests to verify the lack of column width

limit with the text reader, both with and without headers.

closes #1802

    • -0
    • +3
    ./rowSet/test/TestFixedWidthWriter.java
    • -0
    • +89
    ./rowSet/test/TestScalarAccessors.java
  1. … 21 more files in changeset.
DRILL-7278: Refactor result set loader projection mechanism

Drill 1.16 added a enhanced scan framework based on the row set

mechanisms, and a "provisioned schema" feature build on top

of that framework. Conversion of the log reader plugin to use

the framework identified additional features we wish to add,

such as marking a column as "special" (not expanded in a wildcard

query.)

This work identified that the code added for provisioned schemas in

Drill 1.16 worked, but is a bit overly complex, making it hard to add

the desired new feature.

This patch refactors the "reader" projection code:

* Create a "projection set" mechanism that the reader can query to ask,

"the caller just added a column. Should it be projected or not?"

* Unifies the type conversion mechanism added as part of provisioned

schemas.

* Added the "special column" property for both "reader" and "provided"

schemas.

* Verified that provisioned schemas work with maps (at least on the scan

framework side.)

* Replaced the previous "schema transformer" mechanism with a new "type

conversion" mechanism that unifies type conversion, provided schemas

and an optional custom type conversion mechanism.

* Column writers can report if they are projected. Moved this query

from metadata to the column writer itself.

* Extended and clarified documentation of the feature.

* Revised and/or added unit tests.

closes #1797

  1. … 72 more files in changeset.
DRILL-7257: Set nullable var-width vector lastSet value

Turns out this is due to a subtle issue with variable-width nullable

vectors. Such vectors have a lastSet attribute in the Mutator class.

When using "transfer pairs" to copy values, the code somehow decides

to zero-fill from the lastSet value to the record count. The row set

framework did not set this value, meaning that the RemovingRecordBatch

zero-filled the dir0 column when it chose to use transfer pairs rather

than copying values. The use of transfer pairs occurs when all rows in

a batch pass the filter prior to the removing record batch.

Modified the nullable vector writer to properly set the lastSet value at

the end of each batch. Added a unit test to verify the value is set

correctly.

Includes a bit of code clean-up.

    • -4
    • +9
    ./rowSet/test/TestScalarAccessors.java
  1. … 8 more files in changeset.
DRILL-7251: Read Hive array w/o nulls

1. HiveFieldConverter replaced by Hive writers for primitives

2. Created HiveValueWriterFactory and HiveListWriter to implement arrays support

4. Readers generation replaced by HiveDefaultRecordReader and HiveTextRecordReader

5. Few reader initializers replaced by one

6. Added method to repeated vardecimal writer

7. Minor fix for array column in View

  1. … 53 more files in changeset.
DRILL-4782 / DRILL-7139: Fix DATE_ADD and TO_TIME functions

- cast function for the day interval changed to round milliseconds to complete days

- ToDateTypeFunctions#toTime now returning milliseconds of day

- updated the way how DayInterval subtracts and adds, to follow the cast function logic

UT core updates:

- added vectorValue function to the queryBuilder to simplify retrieving value of the vector

- refactored singleton query result functions at queryBuilder

  1. … 6 more files in changeset.
DRILL-7143: Support default value for empty columns

Modifies the prior work to add default values for columns. The prior work added defaults

when the entire column is missing from a reader (the old Nullable Int column). The Row

Set mechanism now will also "fill empty" slots with the default value.

Added default support for the column writers. The writers automatically obtain the

default value from the column schema. The default can also be set explicitly on

the column writer.

Updated the null column mechanism to use this feature rather than the ad-hoc

implemention in the prior commit.

Semantics changed a bit. Only Required columns take a default. The default value

is ignored or nullable columns since nullable columns already have a file default: NULL.

Other changes:

* Updated the CSV-with-schema tests to illustrate the new behavior.

* Made multiple fixes for Boolean and Decimal columns and added unit tests.

* Upgraded Fremarker to version 2.3.28 to allow use of the continue statement.

* Reimplemented the Bit column reader and writer to use the BitVector directly since this vector is rather special.

* Added get/set Boolean methods for column accessors

* Moved the BooleanType class to the common package

* Added more CSV unit tests to explore decimal types, booleans, and defaults

* Add special handling for blank fields in from-string conversions

* Added options to the conversion factory to specify blank-handling behavior.

CSV uses a mapping of blanks to null (nullable) or default value (non-nullable)

closes #1726

    • -182
    • +0
    ./rowSet/test/DummyWriterTest.java
    • -4
    • +184
    ./rowSet/test/TestColumnConverter.java
    • -0
    • +182
    ./rowSet/test/TestDummyWriter.java
    • -24
    • +209
    ./rowSet/test/TestFillEmpties.java
    • -2
    • +109
    ./rowSet/test/TestScalarAccessors.java
    • -4
    • +104
    ./rowSet/test/TestSchemaBuilder.java
  1. … 64 more files in changeset.
DRILL-7096: Develop vector for canonical Map<K,V>

- Added new type DICT;

- Created value vectors for the type for single and repeated modes;

- Implemented corresponding FieldReaders and FieldWriters;

- Made changes in EvaluationVisitor to be able to read values from the map by key;

- Made changes to DrillParquetGroupConverter to be able to read Parquet's MAP type;

- Added an option `store.parquet.reader.enable_map_support` to disable reading MAP type as DICT from Parquet files;

- Updated AvroRecordReader to use new DICT type for Avro's MAP;

- Added support of the new type to ParquetRecordWriter.

  1. … 107 more files in changeset.
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

    • -21
    • +253
    ./rowSet/test/TestColumnConverter.java
    • -2
    • +0
    ./rowSet/test/TestIndirectReaders.java
    • -112
    • +234
    ./rowSet/test/TestScalarAccessors.java
    • -2
    • +0
    ./rowSet/test/TestVariantAccessors.java
  1. … 210 more files in changeset.
DRILL-7086: Output schema for row set mechanism

Enhances the row set mechanism to take an "output schema" that describes the vectors to

create. The "input schema" describes the type that the reader would like to write. A

conversion mechanism inserts a conversion shim to convert from the input to output type.

Provides a set of implicit type conversions, including string-to-date/time conversions

which use the new format property stored in column metadata. Includes unit tests for

the new functionality.

closes #1690

    • -20
    • +490
    ./rowSet/test/TestColumnConverter.java
  1. … 61 more files in changeset.
DRILL-7051: Upgrade jetty - upgrade Jetty dependencies to 9.3 version - adaptation to the new Jetty API (SessionHandler, LoginService, AbstractLoginService) - add JavaDocs and code refactoring

closes #1681

  1. … 9 more files in changeset.
DRILL-7056: Drill fails with NPE when starting in distributed mode & 31010 port is used closes #1656

  1. … 1 more file in changeset.
DRILL-6952: Host compliant text reader on the row set framework

The result set loader allows controlling batch sizes. The new scan framework

built on top of that framework handles projection, implicit columns, null

columns and more. This commit converts the "new" ("compliant") text reader

to use the new framework. Options select the use of the V2 ("new") or V3

(row-set based) versions. Unit tests demonstrate V3 functionality.

closes #1683

  1. … 57 more files in changeset.
DRILL-5603: Replace String file paths to Hadoop Path - replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality

closes #1657

  1. … 82 more files in changeset.
DRILL-7024: Refactor ColumnWriter to simplify type-conversion shim

DRILL-7006 added a type conversion "shim" within the row set framework. Basically, we insert a "shim" column writer that takes data in one form (String, say), and does reader-specific conversions to a target format (INT, say).

The code works fine, but the shim class ends up needing to override a bunch of methods which it then passes along to the base writer. This PR refactors the code so that the conversion shim is simpler.

closes #1633

    • -0
    • +150
    ./rowSet/test/TestColumnConverter.java
    • -145
    • +0
    ./rowSet/test/TestColumnConvertor.java
    • -1
    • +4
    ./rowSet/test/TestFixedWidthWriter.java
    • -0
    • +3
    ./rowSet/test/TestHyperVectorReaders.java
    • -0
    • +3
    ./rowSet/test/TestIndirectReaders.java
    • -1
    • +4
    ./rowSet/test/TestOffsetVectorWriter.java
    • -0
    • +3
    ./rowSet/test/TestRepeatedListAccessors.java
    • -0
    • +3
    ./rowSet/test/TestRowSetComparison.java
    • -1
    • +3
    ./rowSet/test/TestScalarAccessors.java
    • -2
    • +4
    ./rowSet/test/TestVariableWidthWriter.java
    • -0
    • +4
    ./rowSet/test/TestVariantAccessors.java
  1. … 53 more files in changeset.
DRILL-7007: Use verify method in row set tests

Many of the early RowSet-based tests used the pattern:

new RowSetComparison(expected)

.verifyAndClearAll(result);

Revise this to use the simplified form:

RowSetUtilities.verify(expected, result);

The original form is retained when tests use additional functionality, such as the ability to perform multiple verifications on the same expected batch.

closes #1624

    • -3
    • +2
    ./rowSet/test/TestIndirectReaders.java
  1. … 9 more files in changeset.
DRILL-7006: Add type conversion to row writers

Modifies the column metadata and writer abstractions to allow a type conversion "shim" to be specified as part of the schema, then inserted as part of the row set writer. Allows, say, setting an Int or Date from a string, parsing the string to obtain the proper data type to store in the vector.

Type conversion not yet supported in the result set loader: some additional complexity needs to be resolved.

Adds unit tests for this functionality. Refactors some existing tests to remove rough edges.

closes #1623

    • -211
    • +0
    ./rowSet/TestRowSetComparison.java
    • -0
    • +145
    ./rowSet/test/TestColumnConvertor.java
    • -0
    • +838
    ./rowSet/test/TestRowSet.java
    • -0
    • +214
    ./rowSet/test/TestRowSetComparison.java
  1. … 10 more files in changeset.
DRILL-6903: SchemaBuilder code improvements

1. ColumnBuilder: setPrecisionAndScale method

2. SchemaContainer: addColumn method parameter AbstractColumnMetadata was changed to ColumnMetadata

3. MapBuilder / RepeatedListBuilder / UnionBuilder: added constructors without parent, made buildColumn method public

4. TupleMetadata: added toMetadataList method

5. Other refactoring

    • -3
    • +82
    ./rowSet/test/TestSchemaBuilder.java
    • -11
    • +10
    ./rowSet/test/TestVariantAccessors.java
  1. … 24 more files in changeset.
DRILL-6936: TestGracefulShutdown.gracefulShutdownThreadShouldBeInitializedBeforeClosingDrillbit fails if loopback address is set in hosts closes #1589

DRILL-6919: Fix compilation error in TestGracefulShutdown class for mapr profile

  1. … 1 more file in changeset.
DRILL-6912: NPE when other drillbit is already running

closes #1577

  1. … 1 more file in changeset.
DRILL-6901: Move schema builder to src/main

Moves the SchemaBuilder class out of the src/test name space into the src/main namespace. Specifically, into the existing record.metadata package.

Many files changed in this move. Corrected two minor issues: import of the wrong Arrays class and unnecessary annotations.

    • -96
    • +0
    ./rowSet/schema/RepeatedListBuilder.java
    • -214
    • +0
    ./rowSet/schema/SchemaBuilder.java
    • -29
    • +0
    ./rowSet/schema/SchemaContainer.java
    • -166
    • +0
    ./rowSet/schema/TupleBuilder.java
    • -103
    • +0
    ./rowSet/schema/UnionBuilder.java
  1. … 75 more files in changeset.