drill

Clone Tools
  • last updated 19 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7341: Vector reAlloc may fail after exchange

closes #1838

DRILL-4517: Support reading empty Parquet files

1. Modified flat and complex parquet readers to output schema only when requested number of records to read is 0. In this case readers are not initialized to improve performance.

2. Allowed reading requested number of rows instead of all rows in the row group (DRILL-6528).

3. Fixed issue with nulls number determination in the row group (fixed IsPredicate#isAllNulls method).

4. Allowed reading empty parquet files via adding empty / fake row group.

5. General refactoring and unit tests.

6. Parquet tests categorization.

closes #1839

    • binary
    /contrib/storage-hive/core/src/test/resources/empty.parquet
  1. … 34 more files in changeset.
DRILL-7337: Add vararg UDFs support

  1. … 23 more files in changeset.
DRILL-7335: Fix error when reading csv file with headers only

closes #1834

DRILL-7334: Update Iceberg Metastore Parquet write mode

closes #1832

    • -9
    • +10
    /metastore/iceberg-metastore/README.md
DRILL-7332: Allow parsing empty schema

closes #1828

DRILL-7331: Drill Iceberg Metastore metadata expiration

closes #1831

    • -7
    • +18
    /metastore/iceberg-metastore/README.md
DRILL-7327: Log Regex Plugin Won't Recognize Schema

The previous commit revised the plugin config classes to work

with table functions. That caused Jackson to stop working for

the classess. Fixed those issues and added unit tests.

closes #1827

    • -0
    • +396
    /exec/java-exec/src/test/resources/regex/firewall.ssdlog
DRILL-6961: Handle exceptions during queries to information_schema

closes #1833

DRILL-7084: ResultSet getObject method throws not implemented exception if the column type is NULL

closes #1825

    • binary
    /exec/jdbc/src/test/resources/testGetObjectNull.parquet
DRILL-7205: Drill fails to start when authentication is disabled

closes #1824

DRILL-7314: Use TupleMetadata instead of concrete implementation

1. Add ser / de implementation for TupleMetadata interface based on types.

2. Replace TupleSchema usage where possible.

3. Move patcher classes into commons.

4. Upgrade some dependencies and general refactoring.

  1. … 26 more files in changeset.
DRILL-7317: Close ClassLoaders used for udf jars uploading when closing FunctionImplementationRegistry

- Fix issue with caching DrillMergeProjectRule and FunctionImplementationRegistry when different drillbits are started within the same JVM

DRILL-7316: Move classes from org.apache.drill.metastore into org.apache.drill.exec.metastore package in java-exec module

  1. … 18 more files in changeset.
DRILL-7315: Revise precision and scale order in the method arguments

  1. … 14 more files in changeset.
DRILL-7307: casthigh for decimal type can lead to the issues with VarDecimalHolder

- Fixed code-gen for VarDecimal type

- Fixed code-gen issue with nullable holders for simple cast functions

with passed constants as arguments.

- Code-gen now honnoring DataType.Optional type defined by UDF for

NULL-IF-NULL functions.

DRILL-7313: Use Hive schema for MaprDB native reader when field was empty

- Added all_text_mode option for hive maprDB Json

- Improved logic to convert Hive's schema into Drill's one

- Added unit tests for schema conversion

  1. … 13 more files in changeset.
DRILL-7310: Move schema-related classes from exec module to be able to use them in metastore module

closes #1816

  1. … 88 more files in changeset.
DRILL-7273: Introduce operators for handling metadata

closes #1886

    • -0
    • +119
    /docs/dev/MetastoreAnalyze.md
  1. … 142 more files in changeset.
DRILL-7174: Expose complex to Json control in the Drill C++ Client

closes #1814

    • -0
    • +55
    /contrib/native/client/src/test/DrillClientTest.cpp
DRILL-6711: Use jitpack repository for Drill Calcite project artifacts instead of repository.mapr.com

closes #1815

    • -0
    • +26
    /docs/dev/Calcite.md
DRILL-7306: Disable schema-only batch for new scan framework

The EVF framework is set up to return a "fast schema" empty batch

with only schema as its first batch because, when the code was

written, it seemed that's how we wanted operators to work. However,

DRILL-7305 notes that many operators cannot handle empty batches.

Since the empty-batch bugs show that Drill does not, in fact,

provide a "fast schema" batch, this ticket asks to disable the

feature in the new scan framework. The feature is disabled with

a config option; it can be re-enabled if ever it is needed.

SQL differentiates between two subtle cases, and both are

supported by this change.

1. Empty results: the query found a schema, but no rows

are returned. If no reader returns any rows, but at

least one reader provides a schema, then the scan

returns an empty batch with the schema.

2. Null results: the query found no schema or rows. No

schema is returned. If no reader returns rows or

schema, then the scan returns no batch: it instead

immediately returns a DONE status.

For CSV, an empty file with headers returns the null result set

(because we don't know the schema.) An empty CSV file without headers

returns an empty result set because we do know the schema: it will

always be the columns array.

Old tests validate the original schema-batch mode, new tests

added to validate the no-schema-batch mode.

  1. … 30 more files in changeset.
DRILL-7302: Bump Apache Avro to 1.9.0

Apache Avro 1.9.0 brings a lot of new features:

Deprecate Joda-Time in favor of Java8 JSR310 and setting it as default

Remove support for Hadoop 1.x

Move from Jackson 1.x to 2.9

Add ZStandard Codec

Lots of updates on the dependencies to fix CVE's

Remove Jackson classes from public API

Apache Avro is built by default with Java 8

Apache Avro is compiled and tested with Java 11 to guarantee compatibility

Apache Avro MapReduce is compiled and tested with Hadoop 3

Apache Avro is now leaner, multiple dependencies were removed: guava, paranamer, commons-codec, and commons-logging

and many, many more!

close apache/drill#1812

DRILL-7297: Query hangs in planning stage when Error is thrown

close apache/drill#1811

DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

  1. … 105 more files in changeset.
DRILL-7253: Read Hive struct w/o nulls

  1. … 3 more files in changeset.
DRILL-6951: Merge row set based mock data source

The mock data source is used in several tests to generate a large volume

of sample data, such as when testing spilling. The mock data source also

lets us try new plugin featues in a very simple context. During the

development of the row set framework, the mock data source was converted

to use the new framework to verify functionality. This commit upgrades

the mock data source with that work.

The work changes non of the functionality. It does, however, improve

memory usage. Batchs are limited, by default, to 10 MB in size. The row

set framework minimizes internal fragmentation in the largest vector.

(Previously, internal fragmentation averaged 25% but could be as high as

50%.)

As it turns out, the hash aggregate tests depended on the internal

fragmentation: without it, the hash agg no longer spilled for the same

row count. Adjusted the generated row counts to recreate a data volume

that caused spilling.

One test in particular always failed due to assertions in the hash agg

code. These seem true bugs and are described in DRILL-7301. After

multiple failed attempts to get the test to work, it ws disabled until

DRILL-7301 is fixed.

Added a new unit test to sanity check the mock data source. (No test

already existed for this functionality except as verified via other unit

tests.)

  1. … 7 more files in changeset.
DRILL-7272: Drill Metastore Read / Write API and Drill Iceberg Metastore implementation

1. Drill Metastore Read / Write API.

2. Drill Iceberg Metastore implementation in iceberg-metastore module.

3. Patches Guava Preconditions class for Apache Iceberg.

4. General refactoring.

5. Unit tests.

6. Documentation.

    • -0
    • +3
    /distribution/src/assemble/component.xml
    • -0
    • +178
    /metastore/iceberg-metastore/README.md
    • -0
    • +188
    /metastore/iceberg-metastore/pom.xml
  1. … 83 more files in changeset.
DRILL-7156: Support empty Parquet files creation

closes #1836

DRILL-7294: Regenerate protobufs

  1. … 94 more files in changeset.