Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7414: EVF incorrectly sets buffer writer index after rollover

Enabling the vector validator on the "new" scan operator, in cases

in which overflow occurs, identified that the DrillBuf writer index

was not properly set for repeated vectors.

Enables such checking, adds unit tests, and fixes the writer index

issue.

closes #1878

    • -42
    • +42
    ./drill/exec/physical/impl/MockRecordBatch.java
  1. … 4 more files in changeset.
DRILL-7412: Minor unit test improvements

Many tests intentionally trigger errors. A debug-only log setting

sent those errors to stdout. The resulting stack dumps simply cluttered

the test output, so disabled error output to the console.

Drill can apply bounds checks to vectors. Tests run via Maven

enable bounds checking. Now, bounds checking is also enabled in

"debug mode" (when assertions are enabled, as in an IDE.)

Drill contains two test frameworks. The older BaseTestQuery was

marked as deprecated, but many tests still use it and are unlikely

to be changed soon. So, removed the deprecated marker to reduce the

number of spurious warnings.

Also includes a number of minor clean-ups.

closes #1876

    • -2
    • +3
    ./drill/exec/coord/zk/TestEphemeralStore.java
    • -4
    • +3
    ./drill/exec/record/vector/TestValueVector.java
  1. … 13 more files in changeset.
DRILL-5674: Support ZIP compression

1. Added ZipCodec implementation which can read / write single file.

2. Revisited Drill plugin formats to ensure 'openPossiblyCompressedStream' method is used in those which support compression.

3. Added unit tests.

4. General refactoring.

    • -0
    • +111
    ./drill/exec/store/dfs/TestCompressedFiles.java
  1. … 17 more files in changeset.
DRILL-7402: Suppress batch dumps for expected failures in tests

Drill provides a way to dump the last few batches when an error

occurs. However, in tests, we often deliberately cause something

to fail. In this case, the batch dump is unnecessary.

This enhancement adds a config property, disabled in tests, that

controls the dump activity. The option is enabled in the one test

that needs it enabled.

closes #1872

    • -1
    • +1
    ./drill/test/ClusterFixtureBuilder.java
  1. … 2 more files in changeset.
DRILL-7403: Validate batch checks, vector integretity in unit tests

Enhances the existing record batch checks to check all the various

batch record counts, and to more fully validate all vector types.

This code revealed that virtually all record batches have

problems: they omit setting some record count or other, they

introduce some form of vector corruption.

Since we want things to work as we make fixes, this change enables

the checks for only one record batch: the "new" scan. Others are

to come as they are fixed.

closes #1871

  1. … 3 more files in changeset.
DRILL-7385: Convert PCAP Format Plugin to EVF

    • -0
    • +103
    ./drill/exec/store/pcap/TestPcapEVFReader.java
  1. … 7 more files in changeset.
DRILL-6096: Provide mechanism to configure text writer configuration

1. Usage of format plugin configuration allows to specify line and field delimiters, quotes and escape characters.

2. Usage of system / session options allows to specify if writer should add headers, force quotes.

closes #1873

    • -0
    • +264
    ./drill/exec/physical/impl/writer/TestTextWriter.java
    • -27
    • +67
    ./drill/test/ClusterFixture.java
  1. … 17 more files in changeset.
DRILL-7377: Nested schemas for dynamic EVF columns

The Result Set Loader (part of EVF) allows adding columns up-front

before reading rows (so-called "early schema.") Such schemas allow

nested columns (maps with members, repeated lists with a type, etc.)

The Result Set Loader also allows adding columns dynamically

while loading data (so-called "late schema".) Previously, the code

assumed that columns would be added top-down: first the map, then

the map's contents, etc.

Charles found a need to allow adding a nested column (a repeated

list with a declared list type.)

This patch revises the code to use the same mechanism in both the

early- and late-schema cases, allowing adding nested columns at

any time.

Testing: Added a new unit test case for the repeated list late

schema with content case.

  1. … 5 more files in changeset.
DRILL-7358: Fix COUNT(*) for empty text files

Fixes a subtle error when a text file has a header (and so has a

schema), but is in a COUNT(*) query, so that no columns are

projected. Ensures that, in this case, an empty schema is

treated as a valid result set.

Tests: updated CSV tests to include this case.

closes #1867

  1. … 9 more files in changeset.
DRILL-7357: Expose Drill Metastore data through information_schema

1. Add additional columns to TABLES and COLUMNS tables.

2. Add PARTITIONS table.

3. General refactoring to adjust information_schema data retrieval from multiple sources.

closes #1860

    • -7
    • +17
    ./drill/exec/sql/TestInfoSchema.java
    • -0
    • +401
    ./drill/exec/sql/TestInfoSchemaWithMetastore.java
  1. … 28 more files in changeset.
DRILL-7373: Fix problems involving reading from DICT type

- Fixed FieldIdUtil to resolve reading from DICT for some complex cases;

- optimized reading from DICT given a key by passing an appropriate Object type to DictReader#find(...) and DictReader#read(...) methods when schema is known (e.g. when reading from Hive tables) instead of generating it on fly based on int or String path and key type;

- fixed error when accessing value by not existing key value in Avro table.

    • -0
    • +16
    ./drill/exec/store/avro/AvroFormatTest.java
  1. … 10 more files in changeset.
DRILL-7368: Fix Iceberg Metastore failure when filter column contains nulls

    • -1
    • +1
    ./drill/test/PhysicalOpUnitTestBase.java
  1. … 9 more files in changeset.
DRILL-7168: Implement ALTER SCHEMA ADD / REMOVE commands

    • -0
    • +64
    ./drill/exec/record/metadata/TestTupleSchema.java
  1. … 14 more files in changeset.
DRILL-7362: COUNT(*) on JSON with outer list results in JsonParse error

closes #1849

  1. … 3 more files in changeset.
DRILL-7326: Support repeated lists for CTAS parquet format

closes #1844

  1. … 4 more files in changeset.
DRILL-7350: Move RowSet related classes from test folder

    • -2
    • +2
    ./drill/exec/DrillSeparatePlanningTest.java
    • -4
    • +4
    ./drill/exec/cache/TestBatchSerialization.java
  1. … 278 more files in changeset.
DRILL-4517: Support reading empty Parquet files

1. Modified flat and complex parquet readers to output schema only when requested number of records to read is 0. In this case readers are not initialized to improve performance.

2. Allowed reading requested number of rows instead of all rows in the row group (DRILL-6528).

3. Fixed issue with nulls number determination in the row group (fixed IsPredicate#isAllNulls method).

4. Allowed reading empty parquet files via adding empty / fake row group.

5. General refactoring and unit tests.

6. Parquet tests categorization.

closes #1839

    • -1
    • +2
    ./drill/exec/store/mock/TestMockRowReader.java
    • -0
    • +417
    ./drill/exec/store/parquet/TestEmptyParquet.java
    • -0
    • +4
    ./drill/exec/store/parquet/TestParquetScan.java
  1. … 34 more files in changeset.
DRILL-7337: Add vararg UDFs support

    • -0
    • +36
    ./drill/exec/compile/TestClassTransformation.java
    • -0
    • +325
    ./drill/exec/fn/impl/TestVarArgFunctions.java
    • -0
    • +126
    ./drill/exec/fn/impl/testing/CountArgumentsAggFunctions.java
    • -0
    • +90
    ./drill/exec/fn/impl/testing/CountArgumentsFunctions.java
    • -0
    • +65
    ./drill/exec/fn/impl/testing/InvalidVarargFunctions.java
    • -0
    • +42
    ./drill/exec/fn/impl/testing/VarArgAddFunction.java
    • -0
    • +275
    ./drill/exec/fn/impl/testing/VarCharConcatFunctions.java
  1. … 30 more files in changeset.
DRILL-7335: Fix error when reading csv file with headers only

closes #1834

  1. … 1 more file in changeset.
DRILL-7332: Allow parsing empty schema

closes #1828

  1. … 2 more files in changeset.
DRILL-7327: Log Regex Plugin Won't Recognize Schema

The previous commit revised the plugin config classes to work

with table functions. That caused Jackson to stop working for

the classess. Fixed those issues and added unit tests.

closes #1827

    • -15
    • +65
    ./drill/exec/store/log/TestLogReader.java
  1. … 4 more files in changeset.
DRILL-7205: Drill fails to start when authentication is disabled

closes #1824

  1. … 1 more file in changeset.
DRILL-7314: Use TupleMetadata instead of concrete implementation

1. Add ser / de implementation for TupleMetadata interface based on types.

2. Replace TupleSchema usage where possible.

3. Move patcher classes into commons.

4. Upgrade some dependencies and general refactoring.

    • -0
    • +47
    ./drill/exec/record/metadata/TestTupleSchema.java
    • -6
    • +7
    ./drill/exec/store/pcapng/TestPcapngHeaders.java
  1. … 36 more files in changeset.
DRILL-7315: Revise precision and scale order in the method arguments

    • -5
    • +5
    ./drill/exec/fn/impl/TestCastFunctions.java
  1. … 26 more files in changeset.
DRILL-7307: casthigh for decimal type can lead to the issues with VarDecimalHolder

- Fixed code-gen for VarDecimal type

- Fixed code-gen issue with nullable holders for simple cast functions

with passed constants as arguments.

- Code-gen now honnoring DataType.Optional type defined by UDF for

NULL-IF-NULL functions.

    • -0
    • +34
    ./drill/exec/sql/TestSimpleCastFunctions.java
  1. … 9 more files in changeset.
DRILL-7310: Move schema-related classes from exec module to be able to use them in metastore module

closes #1816

    • -20
    • +41
    ./drill/exec/TestEmptyInputSql.java
    • -17
    • +14
    ./drill/exec/cache/TestBatchSerialization.java
    • -2
    • +5
    ./drill/exec/fn/impl/TestCastFunctions.java
  1. … 88 more files in changeset.
DRILL-7306: Disable schema-only batch for new scan framework

The EVF framework is set up to return a "fast schema" empty batch

with only schema as its first batch because, when the code was

written, it seemed that's how we wanted operators to work. However,

DRILL-7305 notes that many operators cannot handle empty batches.

Since the empty-batch bugs show that Drill does not, in fact,

provide a "fast schema" batch, this ticket asks to disable the

feature in the new scan framework. The feature is disabled with

a config option; it can be re-enabled if ever it is needed.

SQL differentiates between two subtle cases, and both are

supported by this change.

1. Empty results: the query found a schema, but no rows

are returned. If no reader returns any rows, but at

least one reader provides a schema, then the scan

returns an empty batch with the schema.

2. Null results: the query found no schema or rows. No

schema is returned. If no reader returns rows or

schema, then the scan returns no batch: it instead

immediately returns a DONE status.

For CSV, an empty file with headers returns the null result set

(because we don't know the schema.) An empty CSV file without headers

returns an empty result set because we do know the schema: it will

always be the columns array.

Old tests validate the original schema-batch mode, new tests

added to validate the no-schema-batch mode.

    • -9
    • +2
    ./drill/TestSchemaWithTableFunction.java
  1. … 30 more files in changeset.
DRILL-7302: Bump Apache Avro to 1.9.0

Apache Avro 1.9.0 brings a lot of new features:

Deprecate Joda-Time in favor of Java8 JSR310 and setting it as default

Remove support for Hadoop 1.x

Move from Jackson 1.x to 2.9

Add ZStandard Codec

Lots of updates on the dependencies to fix CVE's

Remove Jackson classes from public API

Apache Avro is built by default with Java 8

Apache Avro is compiled and tested with Java 11 to guarantee compatibility

Apache Avro MapReduce is compiled and tested with Hadoop 3

Apache Avro is now leaner, multiple dependencies were removed: guava, paranamer, commons-codec, and commons-logging

and many, many more!

close apache/drill#1812

  1. … 3 more files in changeset.
DRILL-7297: Query hangs in planning stage when Error is thrown

close apache/drill#1811

    • -0
    • +42
    ./drill/exec/fn/impl/testing/CustomErrorFunction.java
  1. … 1 more file in changeset.
DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

  1. … 119 more files in changeset.