Clone Tools
  • last updated 13 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6604: Upgrade Drill Hive client to Hive3.1 version

closes #2038

  1. … 18 more files in changeset.
DRILL-7620: Fix plugin mutability issues

A recent commit made the plugin registry more strict about

the rule that, once a plugin is registered, it must be

immutable. A flaw enforcing that rule in the UI put the

registry in an inconsistent state.

Also

* Registry-specific errors

* Push more operations from UI layer into registry

* Clean up semantics of "resolve" for plugins

* Add more unit tests

* Better handling of "bad" plugins

* Force plugin names to lower case

* Fix comparison bugs in some format plugins

  1. … 100 more files in changeset.
DRILL-7590: Refactor plugin registry

Major cleanup of the plugin registry to split it into components

in preparation for a proper plugin API.

Better coordinates the named and ephemeral plugin caches.

Cleans up the registry API. Sharpens rules for modifying

plugin configs.

closes #1988

  1. … 163 more files in changeset.
DRILL-7502: Invalid codegen for typeof() with UNION

Also fixes DRILL-6362: typeof() reports NULL for primitive

columns with a NULL value.

typeof() is meant to return "NULL" if a UNION has a NULL

value, but the column type when known, such as for non-UNION

columns.

Also fixes DRILL-7499: sqltypeof() function with an array returns

"ARRAY", not type. This was due to treating REPEATED like LIST.

Handling of the Union vector in code gen is problematic

with about three special cases. Existing code handled two

of the cases. This change handles the third case.

Figuring out the change required poking around quite a bit

of unclear code. Added comments and restructuring to make

that code a bit more clear.

The fix modified code gen for the Union Holder. It can now

"go back in time" to add the union reader at the point we

need it.

closes #1945

  1. … 53 more files in changeset.
DRILL-7483: Add support for 12 and 13 java versions

closes #1935

    • -0
    • +32
    ./HiveClusterTest.java
  1. … 15 more files in changeset.
DRILL-7441: Fix issues with fillEmpties, offset vectors

Fixes subtle issues with offset vectors and "fill empties"

logic.

Drill has an informal standard that if a batch has no rows, then

offset vectors within that batch should have zero size. Contrast

this with batches of size 1 that should have offset vectors of

size 2. Changed to enforce this rule throughout.

Nullable, repeated and variable-width vectors have "fill empties"

logic that is used in two places: when setting the value count and

when preparing to write a new value. The current logic is not

quite right for either case. Added tests and fixed the code to

properly handle each case.

Revised the batch validator to enforce the offset-vector length of 0 for

0-sized batches rule. The result was much simpler code.

Added tools to easily print a batch, restoring some code that

was recently lost when the RowSet classes were moved.

Code cleanup in all files touched.

Added logic to "dirty" allocated buffers when testing to ensure

logic is not sensitive to the "pristine" state of new buffers.

Added logic to the column writers to enforce the zero-size-batch rule

for offset vectors. Added unit tests for this case.

Fixed the column writers to set the "lastSet" mutator value for

nullable types since other code relies on this value.

Removed the "setCount" field in nullable vectors: turns out

it is not actually used.

closes #1896

  1. … 43 more files in changeset.
DRILL-7440: Failure during loading of RepeatedCount functions

closes #1894

    • -0
    • +22
    ./complex_types/TestHiveArrays.java
  1. … 3 more files in changeset.
DRILL-7406: Update Calcite to 1.21.0

1. DRILL-7386 - added tests to TestHiveStructs.

2. DRILL-4527 - the DrillAvgVarianceConvertlet can't be removed without test failures.

3. DRILL-6215 - switched to prepared statement in JdbcRecordReader.

4. DRILL-6905 - added test into TestExampleQueries.

5. DRILL-7415 - Fixed jdbc show tables when 2 tables with same name are present in different schemas.

6. DRILL-7340 - Fixed jdbc filter pushdown when few jdbc datasources enabled.

7. Split SqlConverter into multiple source files.

8. Minor refactorings for jdbc and other places.

closes #1940

    • -0
    • +21
    ./complex_types/TestHiveStructs.java
  1. … 54 more files in changeset.
DRILL-7254: Read Hive union w/o nulls

    • -1
    • +24
    ./complex_types/TestHiveArrays.java
    • -0
    • +22
    ./complex_types/TestHiveStructs.java
    • -0
    • +117
    ./complex_types/TestHiveUnions.java
  1. … 17 more files in changeset.
DRILL-7387: Failed to get value by int key from map nested into struct

    • -0
    • +12
    ./complex_types/TestHiveStructs.java
  1. … 1 more file in changeset.
DRILL-7380: Query of a field inside of an array of structs returns null

1. Fixed parquet reader projection for Logical lists (DrillParquetReader.java)

2. Fixed projection pushdown for RexFieldAccess (ProjectFieldsVisitor.java)

3. DrillParquetReader.getProjection(...) splitted into few methods

4. Added javadocs for PathSegment and SchemaPath

    • -5
    • +37
    ./complex_types/TestHiveArrays.java
    • -7
    • +98
    ./complex_types/TestHiveStructs.java
  1. … 5 more files in changeset.
DRILL-7357: Expose Drill Metastore data through information_schema

1. Add additional columns to TABLES and COLUMNS tables.

2. Add PARTITIONS table.

3. General refactoring to adjust information_schema data retrieval from multiple sources.

closes #1860

  1. … 33 more files in changeset.
DRILL-7252: Read Hive map using Dict<K,V> vector

    • -25
    • +108
    ./complex_types/TestHiveArrays.java
    • -0
    • +792
    ./complex_types/TestHiveMaps.java
    • -40
    • +40
    ./complex_types/TestHiveStructs.java
  1. … 13 more files in changeset.
DRILL-4517: Support reading empty Parquet files

1. Modified flat and complex parquet readers to output schema only when requested number of records to read is 0. In this case readers are not initialized to improve performance.

2. Allowed reading requested number of rows instead of all rows in the row group (DRILL-6528).

3. Fixed issue with nulls number determination in the row group (fixed IsPredicate#isAllNulls method).

4. Allowed reading empty parquet files via adding empty / fake row group.

5. General refactoring and unit tests.

6. Parquet tests categorization.

closes #1839

  1. … 47 more files in changeset.
DRILL-7253: Read Hive struct w/o nulls

    • -7
    • +133
    ./complex_types/TestHiveArrays.java
    • -0
    • +354
    ./complex_types/TestHiveStructs.java
  1. … 16 more files in changeset.
DRILL-7268: Read Hive array with parquet native reader

1. Fixed preserving of group originalType for projected schema

in DrillParquetReader

2. Added reading of LIST logical type to DrillParquetGroupConverter.

Intermediate noop converter used to skip writing for next nested

repeated field after recognition of parent field as LIST. For this

skipRepeated 'true' passed to child converter's constructor.

close apache/drill#1805

    • -880
    • +559
    ./complex_types/TestHiveArrays.java
  1. … 6 more files in changeset.
DRILL-7251: Read Hive array w/o nulls

1. HiveFieldConverter replaced by Hive writers for primitives

2. Created HiveValueWriterFactory and HiveListWriter to implement arrays support

4. Readers generation replaced by HiveDefaultRecordReader and HiveTextRecordReader

5. Few reader initializers replaced by one

6. Added method to repeated vardecimal writer

7. Minor fix for array column in View

    • -0
    • +1778
    ./complex_types/TestHiveArrays.java
  1. … 52 more files in changeset.
DRILL-6977: Improve Hive tests configuration

1. HiveTestBase data initialization moved to static block

to be initialized once for all derivatives.

2. Extracted Hive driver and storage plugin management from HiveTestDataGenerator

to HiveTestFixture class. This increased cohesion of generator and

added loose coupling between hive test configuration and data generation

tasks.

3. Replaced usage of Guava ImmutableLists with TestBaseViewSupport

helper methods by using standard JDK collections.

closes #1613

    • -0
    • +295
    ./HiveTestFixture.java
  1. … 5 more files in changeset.
DRILL-540: Allow querying hive views in Drill

1. Added DrillHiveViewTable which allows construction of DrillViewTable based

on Hive metadata

2. Added initialization of DrillHiveViewTable in HiveSchemaFactory

3. Extracted conversion of Hive data types from DrillHiveTable

to HiveToRelDataTypeConverter

4. Removed throwing of UnsupportedOperationException from HiveStoragePlugin

5. Added TestHiveViewsSupport and authorization tests

6. Added closeSilently() method to AutoCloseables

closes #1559

    • -0
    • +233
    ./TestHiveViewsSupport.java
  1. … 14 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 983 more files in changeset.
DRILL-6492: Ensure schema / workspace case insensitivity in Drill

1. StoragePluginsRegistryImpl was updated:

a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate.

b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations.

c. to load system storage plugins dynamically by @SystemStorage annotation.

2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map.

3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case).

4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map.

5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case.

6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so.

7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase.

8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively.

9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively.

git closes #1439

  1. … 54 more files in changeset.
DRILL-6575: Add store.hive.conf.properties option to allow set Hive properties at session level

closes #1365

  1. … 20 more files in changeset.
DRILL-6242 Use java.time.Local{Date|Time|DateTime} for Drill Date, Time, Timestamp types. (#3)

close apache/drill#1247

* DRILL-6242 - Use java.time.Local{Date|Time|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java

Fix merge conflicts and check style.

  1. … 46 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2063 more files in changeset.
DRILL-6331: Revisit Hive Drill native parquet implementation to be exposed to Drill optimizations (filter / limit push down, count to direct scan)

1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator.

2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule.

3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path.

Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables,

`populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values.

4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class.

5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability.

6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality.

7. Reduced excessive logging when parquet files metadata is read

closes #1214

  1. … 62 more files in changeset.
DRILL-6130: Fix NPE during physical plan submission for various storage plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.

2. Added physical plan submission unit test for all storage plugins in contrib module.

3. Refactoring.

closes #1108

  1. … 26 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 223 more files in changeset.
DRILL-5978: Updating of Apache and MapR Hive libraries to 2.3.2 and 2.1.2-mapr-1710 versions respectively

* Improvements to allow of reading Hive bucketed transactional ORC tables;

* Updating hive properties for tests and resolving dependencies and API conflicts:

- Fix for "hive.metastore.schema.verification", MetaException(message: Version information

not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool

METASTORE_SCHEMA_VERIFICATION="false" property is added

- Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional

tables are necessary in Hive metastore

- Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict

with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property

- jackson and parquet libraries are relocated in hive-exec-shade module

- org.apache.parquet:parquet-column Drill version is added to "hive-exec" to

allow of using Parquet empty group on MessageType level (PARQUET-278)

- Removing of commons-codec exclusion from hive core. This dependency is

necessary for hive-exec and hive-metastore.

- Setting Hive internal properties for transactional scan:

HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,

IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES

- "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM

- Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM

- Hive Calcite libraries are excluded (Calcite CBO was disabled)

- "jackson-core" dependency is added to DependencyManagement block in Drill root POM file

- For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included

- "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler".

close apache/drill#1111

  1. … 14 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 363 more files in changeset.
DRILL-5941: Skip header / footer improvements for Hive storage plugin

Overview:

1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941).

2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106).

3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRILL-5106).

Code changes:

1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization.

It will have two implementations:

a. Default (each input split group gets its own reader);

b. Empty (for empty tables);

2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0.

It will have two implementations:

a. Default (records will be processed one by one without buffering);

b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing).

3. When text table has header / footer each table file should be read as one unit. When file is being read as several input splits, they should be grouped.

For this purpose LogicalInputSplit class was introduced which replaced InputSplitWrapper class. New class stores list of grouped input splits and returns information about splits on group level.

Please note, during planning input splits are grouped only when data is being read from text table has header / footer each table, otherwise each input split is treated separately.

4. Allow HiveAbstractReader to have multiple input splits instead of one.

This closes #1030

  1. … 20 more files in changeset.