Clone Tools
  • last updated 25 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6927: Avoid double conversion from impala timestamp when hive native parquet reader is used closes #1655

    • -13
    • +19
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 1 more file in changeset.
DRILL-6929: Exclude maprfs jar for default profile closes #1586

    • -180
    • +0
    ./ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
  1. … 7 more files in changeset.
DRILL-6744: Support varchar and decimal push down

1. Added enableStringsSignedMinMax parquet format plugin config and store.parquet.reader.strings_signed_min_max session option to control reading binary statistics for files generated by prior versions of Parquet 1.10.0.

2. Added ParquetReaderConfig to store configuration needed during reading parquet statistics or files.

3. Provided mechanism to enable varchar / decimal filter push down.

4. Added VersionUtil to compare Drill versions in string representation.

5. Added appropriate unit tests.

closes #1537

    • -3
    • +7
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 41 more files in changeset.
DRILL-6614: Allow usage of MapRDBFormatPlugin for HiveStoragePlugin

    • -8
    • +2
    ./ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
  1. … 6 more files in changeset.
DRILL-6575: Add store.hive.conf.properties option to allow set Hive properties at session level

closes #1365

    • -3
    • +3
    ./ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
    • -10
    • +11
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 20 more files in changeset.
DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json table

closes #1314

    • -0
    • +186
    ./ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
    • -111
    • +5
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 15 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

    • -1
    • +0
    ./HivePushPartitionFilterIntoScan.java
  1. … 231 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

    • -2
    • +1
    ./HivePushPartitionFilterIntoScan.java
  1. … 2066 more files in changeset.
DRILL-6331: Revisit Hive Drill native parquet implementation to be exposed to Drill optimizations (filter / limit push down, count to direct scan)

1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator.

2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule.

3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path.

Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables,

`populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values.

4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class.

5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability.

6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality.

7. Reduced excessive logging when parquet files metadata is read

closes #1214

    • -17
    • +40
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 64 more files in changeset.
DRILL-6381: (Part 3) Planner and Execution implementation to support Secondary Indexes

  1. Index Planning Rules and Plan generators

    - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.

- DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.

    - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.

    - Support planning with functional indexes such as CAST functions.

    - Enhance PlannerSettings with several configuration options for indexes.

  2. Index Selection and Statistics

    - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.

    - Costing of index intersection for comparison with single-index plans.

  3. Planning and execution operators

    - Support RangePartitioning physical operator during query planning and execution.

    - Support RowKeyJoin physical operator during query planning and execution.

    - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.

    - Enhance Materializer to keep track of subscan association with a particular rowkey join.

  4. Index Planning utilities

    - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.

    - Utility class to analyze filter condition and an input collation to determine output collation.

    - Helper classes to maintain index contexts for logical and physical planning phase.

    - IndexPlanUtils utility class for various helper methods.

  5. Miscellaneous

    - Separate physical rel for DirectScan.

    - Modify LimitExchangeTranspose rule to handle SingleMergeExchange.

- MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema

Co-authored-by: Aman Sinha <asinha@maprtech.com>

Co-authored-by: chunhui-shi <cshi@maprtech.com>

Co-authored-by: Gautam Parai <gparai@maprtech.com>

Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>

Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java

exec/java-exec/src/main/resources/drill-module.conf

logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java

Resolve merge comflicts and compilation issues.

    • -1
    • +1
    ./ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
  1. … 93 more files in changeset.
DRILL-6130: Fix NPE during physical plan submission for various storage plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.

2. Added physical plan submission unit test for all storage plugins in contrib module.

3. Refactoring.

closes #1108

    • -7
    • +7
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 26 more files in changeset.
DRILL-4264: Allow field names to include dots

    • -2
    • +2
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 98 more files in changeset.
DRILL-5043: Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID() #685

    • -2
    • +2
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 27 more files in changeset.
DRILL-5034: Select timestamp from hive generated parquet always return in UTC

- TIMESTAMP_IMPALA function is reverted to retaine local timezone

- TIMESTAMP_IMPALA_LOCALTIMEZONE is deleted

- Retain local timezone for the INT96 timestamp values in the parquet files while

PARQUET_READER_INT96_AS_TIMESTAMP option is on

Minor changes according to the review

Fix for the test, which relies on particular timezone

close #656

    • -1
    • +1
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 6 more files in changeset.
Revert "DRILL-4373: Drill and Hive have incompatible timestamp representations in parquet - added sys/sess option "store.parquet.int96_as_timestamp"; - added int96 to timestamp converter for both readers; - added unit tests;"

This reverts commit 7e7214b40784668d1599f265067f789aedb6cf86.

    • -2
    • +1
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 14 more files in changeset.
DRILL-5032: Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

close apache/drill#654

    • -7
    • +7
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 22 more files in changeset.
DRILL-4373: Drill and Hive have incompatible timestamp representations in parquet - added sys/sess option "store.parquet.int96_as_timestamp"; - added int96 to timestamp converter for both readers; - added unit tests;

This closes #600

    • -1
    • +2
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 15 more files in changeset.
DRILL-3745: Hive CHAR not supported

    • -0
    • +5
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 11 more files in changeset.
DRILL-4323: Handle skipAll query when use HiveDrillNativeParquetScan

Do not add Project when no column is needed to be read out from Scan (e.g., select count(*) from hive.table)

    • -3
    • +6
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 3 more files in changeset.
DRILL-4327: Fix rawtypes warnings in drill codebase

Fixing most rawtypes warning issues in drill modules.

Closes #347

    • -4
    • +4
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 77 more files in changeset.
DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherever needed.

Creating new instances of HiveConf() are very costly, we should avoid creating new ones as much as possible.

Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin wherever we need the HiveConf.

    • -6
    • +15
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 13 more files in changeset.
DRILL-2517: Move directory-based partition pruning to Calcite logical planning phase.

1) Make directory-based pruning rule both work in calcite logical and drill logical planning phase.

2) Only apply directory-based pruning in logical phase when there is no metadata cache.

3) Make FileSelection constructor public, since FileSelection.create() would modify selectionRoot.

    • -7
    • +8
    ./HivePushPartitionFilterIntoScan.java
  1. … 16 more files in changeset.
DRILL-3739: (part 2) Fix issues in reading Hive tables with StorageHandler configuration (eg. Hive-HBase tables)

    • -6
    • +17
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 4 more files in changeset.
DRILL-4194: Improve performance of the HiveScan metadata fetch operation

+ Use the stats (numRows) stored in Hive metastore whenever available to

calculate the costs for planning purpose

+ Delay the costly operation of loading of InputSplits until needed. When

InputSplits are loaded, cache them at query level to speedup subsequent

access.

this closes #301

    • -1
    • +2
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 4 more files in changeset.
DRILL-2517: (Prototype from Mehant) Move directory based partition pruning to logical phase.

    • -4
    • +5
    ./HivePushPartitionFilterIntoScan.java
  1. … 6 more files in changeset.
DRILL-3765: Move partitioning pruning to HepPlanner to avoid the performance overhead for redundant rule execution.

Add fall back option in planner.

close apache/drill#255

    • -2
    • +11
    ./HivePushPartitionFilterIntoScan.java
  1. … 9 more files in changeset.
DRILL-3938: Support reading from Hive tables that have schema altered after the creation

Also:

+ Remove "redoRecord" logic which is not needed after "automatic reallocation" (DRILL-1960) changes.

+ Remove HiveTestRecordReader. This is incomplete in implementation and not used anywhere. It is currently just

a burden to maintain with changes in its superclass HiveRecordReader

    • -1
    • +14
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 6 more files in changeset.
DRILL-3209: Add support for reading Hive parquet tables using Drill native parquet reader

    • -0
    • +292
    ./ConvertHiveParquetScanToDrillParquetScan.java
  1. … 15 more files in changeset.
DRILL-3579: Fix issues in reading Hive tables with partition value __HIVE_DEFAULT_PARTITION__

Also:

1) Currently the code that interprets partition values in string format to appropriate type

is duplicated in HiveRecordReader and HivePartitionDescriptor. Refactor the code into

common place HiveUtilities.

2) Add tests to test deserialization of partitions of all supported types.

    • -4
    • +8
    ./HivePushPartitionFilterIntoScan.java
  1. … 9 more files in changeset.
DRILL-3121: Add support for interpreter based partition pruning for Hive tables. Remove the old partition pruning logic.

    • -100
    • +50
    ./HivePushPartitionFilterIntoScan.java
  1. … 10 more files in changeset.