Clone Tools
  • last updated a few seconds ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7357: Expose Drill Metastore data through information_schema

1. Add additional columns to TABLES and COLUMNS tables.

2. Add PARTITIONS table.

3. General refactoring to adjust information_schema data retrieval from multiple sources.

closes #1860

    • -1
    • +1
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
  1. … 33 more files in changeset.
DRILL-7252: Read Hive map using Dict<K,V> vector

    • -0
    • +792
    ./exec/hive/complex_types/TestHiveMaps.java
  1. … 13 more files in changeset.
DRILL-4517: Support reading empty Parquet files

1. Modified flat and complex parquet readers to output schema only when requested number of records to read is 0. In this case readers are not initialized to improve performance.

2. Allowed reading requested number of rows instead of all rows in the row group (DRILL-6528).

3. Fixed issue with nulls number determination in the row group (fixed IsPredicate#isAllNulls method).

4. Allowed reading empty parquet files via adding empty / fake row group.

5. General refactoring and unit tests.

6. Parquet tests categorization.

closes #1839

    • -2
    • +14
    ./exec/TestHiveDrillNativeParquetReader.java
    • -0
    • +2
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -2
    • +14
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 45 more files in changeset.
DRILL-7313: Use Hive schema for MaprDB native reader when field was empty

- Added all_text_mode option for hive maprDB Json

- Improved logic to convert Hive's schema into Drill's one

- Added unit tests for schema conversion

    • -0
    • +178
    ./exec/store/hive/schema/TestSchemaConversion.java
  1. … 27 more files in changeset.
DRILL-7253: Read Hive struct w/o nulls

    • -0
    • +354
    ./exec/hive/complex_types/TestHiveStructs.java
  1. … 16 more files in changeset.
DRILL-7268: Read Hive array with parquet native reader

1. Fixed preserving of group originalType for projected schema

in DrillParquetReader

2. Added reading of LIST logical type to DrillParquetGroupConverter.

Intermediate noop converter used to skip writing for next nested

repeated field after recognition of parent field as LIST. For this

skipRepeated 'true' passed to child converter's constructor.

close apache/drill#1805

  1. … 6 more files in changeset.
DRILL-7251: Read Hive array w/o nulls

1. HiveFieldConverter replaced by Hive writers for primitives

2. Created HiveValueWriterFactory and HiveListWriter to implement arrays support

4. Readers generation replaced by HiveDefaultRecordReader and HiveTextRecordReader

5. Few reader initializers replaced by one

6. Added method to repeated vardecimal writer

7. Minor fix for array column in View

    • -0
    • +1778
    ./exec/hive/complex_types/TestHiveArrays.java
  1. … 52 more files in changeset.
DRILL-7115: Improve Hive schema show tables performance

1. To make SHOW TABLES for Hive schema work much faster, additional Drill

feature of showing only accesible tables when Storage-Based authorization

is enabled was sacrificed. Now the behaviour matches to Hive/Beeline, all

tables will be shown despite of accessibility. For details about previous

show tables results, check description of DRILL-540.

2. In HiveDatabaseSchema implemented faster getTableNamesAndTypes() method

and removed bulk related code.

3. Deprecated bulk related options and removed bulk code from AbstractSchema,

DrillHiveMetastoreClient.

4. For 8000 Hive tables query returned in 1.8 seconds, for combination of

4000 tables and 8000 views query returned in 2.3 seconds. Note, that

after first query table names will be cached and next queries will perform

in less than 1 sec.

5. Refactored WorkspaceSchemaFactory's getTableNamesAndTypes()

method to reuse existing getViews() method.

6. DrillHiveMetastoreClient was refactored. Classes were unnested and enclosed

within client package with restricted visibility. Also was updated cache

values type to avoid unnecessarry List to Set back and forth conversions.

Client creation methods moved to separate class. So the new package

exposes only factory and client class.

closes #1706

  1. … 20 more files in changeset.
DRILL-2326: Fix scalar replacement for the case when static method which does not return values is called

- Fix check for return function value to handle the case when created object is returned without assigning it to the local variable

closes #1687

    • -18
    • +17
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 3 more files in changeset.
DRILL-6927: Avoid double conversion from impala timestamp when hive native parquet reader is used closes #1655

    • -0
    • +18
    ./exec/TestHiveDrillNativeParquetReader.java
  1. … 1 more file in changeset.
DRILL-7200: Update Calcite to 1.19.0 / 1.20.0

    • -1
    • +2
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 46 more files in changeset.
DRILL-6977: Improve Hive tests configuration

1. HiveTestBase data initialization moved to static block

to be initialized once for all derivatives.

2. Extracted Hive driver and storage plugin management from HiveTestDataGenerator

to HiveTestFixture class. This increased cohesion of generator and

added loose coupling between hive test configuration and data generation

tasks.

3. Replaced usage of Guava ImmutableLists with TestBaseViewSupport

helper methods by using standard JDK collections.

closes #1613

    • -0
    • +295
    ./exec/hive/HiveTestFixture.java
    • -16
    • +18
    ./exec/hive/TestHiveStorage.java
    • -23
    • +24
    ./exec/sql/hive/TestViewSupportOnHiveTables.java
    • -122
    • +15
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 2 more files in changeset.
DRILL-4456: Add Hive translate UDF

closes #1527

    • -0
    • +13
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 1 more file in changeset.
DRILL-6744: Support varchar and decimal push down

1. Added enableStringsSignedMinMax parquet format plugin config and store.parquet.reader.strings_signed_min_max session option to control reading binary statistics for files generated by prior versions of Parquet 1.10.0.

2. Added ParquetReaderConfig to store configuration needed during reading parquet statistics or files.

3. Provided mechanism to enable varchar / decimal filter push down.

4. Added VersionUtil to compare Drill versions in string representation.

5. Added appropriate unit tests.

closes #1537

    • -0
    • +41
    ./exec/TestHiveDrillNativeParquetReader.java
    • -5
    • +5
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 40 more files in changeset.
DRILL-540: Allow querying hive views in Drill

1. Added DrillHiveViewTable which allows construction of DrillViewTable based

on Hive metadata

2. Added initialization of DrillHiveViewTable in HiveSchemaFactory

3. Extracted conversion of Hive data types from DrillHiveTable

to HiveToRelDataTypeConverter

4. Removed throwing of UnsupportedOperationException from HiveStoragePlugin

5. Added TestHiveViewsSupport and authorization tests

6. Added closeSilently() method to AutoCloseables

closes #1559

    • -0
    • +233
    ./exec/hive/TestHiveViewsSupport.java
    • -2
    • +12
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -4
    • +10
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 9 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

    • -1
    • +1
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
    • -1
    • +1
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -2
    • +2
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 976 more files in changeset.
DRILL-6492: Ensure schema / workspace case insensitivity in Drill

1. StoragePluginsRegistryImpl was updated:

a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate.

b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations.

c. to load system storage plugins dynamically by @SystemStorage annotation.

2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map.

3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case).

4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map.

5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case.

6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so.

7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase.

8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively.

9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively.

git closes #1439

    • -1
    • +1
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
  1. … 51 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

    • -2
    • +4
    ./exec/fn/hive/TestSampleHiveUDFs.java
  1. … 144 more files in changeset.
DRILL-6575: Add store.hive.conf.properties option to allow set Hive properties at session level

closes #1365

    • -0
    • +17
    ./exec/TestHiveDrillNativeParquetReader.java
    • -0
    • +2
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -35
    • +45
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 18 more files in changeset.
DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json table

closes #1314

    • -2
    • +2
    ./exec/TestHiveDrillNativeParquetReader.java
  1. … 16 more files in changeset.
DRILL-6438: Remove excess logging form the tests. - Removed usages of System.out and System.err from the test and replaced with loggers

closes #1284

    • -3
    • +2
    ./exec/test/Drill2130StorageHiveCoreHamcrestConfigurationTest.java
  1. … 89 more files in changeset.
DRILL-6242 Use java.time.Local{Date|Time|DateTime} for Drill Date, Time, Timestamp types. (#3)

close apache/drill#1247

* DRILL-6242 - Use java.time.Local{Date|Time|DateTime} classes to hold values from corresponding Drill date, time, and timestamp types.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/ExtendedJsonOutput.java

Fix merge conflicts and check style.

    • -11
    • +9
    ./exec/TestHiveDrillNativeParquetReader.java
    • -7
    • +8
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
    • -19
    • +18
    ./exec/hive/TestHiveStorage.java
  1. … 44 more files in changeset.
DRILL-6173: Support transitive closure during filter push down and partition pruning

closes #1216

    • -0
    • +15
    ./exec/TestHivePartitionPruning.java
  1. … 35 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

    • -1
    • +1
    ./exec/fn/hive/TestSampleHiveUDFs.java
    • -1
    • +1
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -1
    • +0
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 2052 more files in changeset.
DRILL-6331: Revisit Hive Drill native parquet implementation to be exposed to Drill optimizations (filter / limit push down, count to direct scan)

1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator.

2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule.

3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path.

Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables,

`populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values.

4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class.

5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability.

6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality.

7. Reduced excessive logging when parquet files metadata is read

closes #1214

    • -0
    • +248
    ./exec/TestHiveDrillNativeParquetReader.java
    • -19
    • +0
    ./exec/TestHivePartitionPruning.java
    • -13
    • +0
    ./exec/TestHiveProjectPushDown.java
    • -189
    • +26
    ./exec/hive/TestHiveStorage.java
    • -2
    • +4
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -21
    • +61
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 57 more files in changeset.
DRILL-6130: Fix NPE during physical plan submission for various storage plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.

2. Added physical plan submission unit test for all storage plugins in contrib module.

3. Refactoring.

closes #1108

  1. … 26 more files in changeset.
DRILL-6106: Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

closes #1099

  1. … 11 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 223 more files in changeset.
DRILL-5989: Categories some tests to speed up smoke tests. Made travis run tests.

closes #1053

  1. … 31 more files in changeset.
DRILL-5978: Updating of Apache and MapR Hive libraries to 2.3.2 and 2.1.2-mapr-1710 versions respectively

* Improvements to allow of reading Hive bucketed transactional ORC tables;

* Updating hive properties for tests and resolving dependencies and API conflicts:

- Fix for "hive.metastore.schema.verification", MetaException(message: Version information

not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool

METASTORE_SCHEMA_VERIFICATION="false" property is added

- Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional

tables are necessary in Hive metastore

- Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict

with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property

- jackson and parquet libraries are relocated in hive-exec-shade module

- org.apache.parquet:parquet-column Drill version is added to "hive-exec" to

allow of using Parquet empty group on MessageType level (PARQUET-278)

- Removing of commons-codec exclusion from hive core. This dependency is

necessary for hive-exec and hive-metastore.

- Setting Hive internal properties for transactional scan:

HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,

IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES

- "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM

- Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM

- Hive Calcite libraries are excluded (Calcite CBO was disabled)

- "jackson-core" dependency is added to DependencyManagement block in Drill root POM file

- For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included

- "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler".

close apache/drill#1111

    • -0
    • +3
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 10 more files in changeset.