Clone Tools
  • last updated 13 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7337: Add vararg UDFs support

  1. … 37 more files in changeset.
DRILL-6927: Avoid double conversion from impala timestamp when hive native parquet reader is used closes #1655

    • -13
    • +19
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 1 more file in changeset.
DRILL-5603: Replace String file paths to Hadoop Path - replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality

closes #1657

  1. … 82 more files in changeset.
DRILL-6929: Exclude maprfs jar for default profile closes #1586

    • -180
    • +0
    ./logical/ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
  1. … 7 more files in changeset.
DRILL-6744: Support varchar and decimal push down

1. Added enableStringsSignedMinMax parquet format plugin config and store.parquet.reader.strings_signed_min_max session option to control reading binary statistics for files generated by prior versions of Parquet 1.10.0.

2. Added ParquetReaderConfig to store configuration needed during reading parquet statistics or files.

3. Provided mechanism to enable varchar / decimal filter push down.

4. Added VersionUtil to compare Drill versions in string representation.

5. Added appropriate unit tests.

closes #1537

    • -3
    • +7
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 41 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 983 more files in changeset.
DRILL-6614: Allow usage of MapRDBFormatPlugin for HiveStoragePlugin

    • -8
    • +2
    ./logical/ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
  1. … 6 more files in changeset.
DRILL-6575: Add store.hive.conf.properties option to allow set Hive properties at session level

closes #1365

    • -3
    • +3
    ./logical/ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
    • -10
    • +11
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 20 more files in changeset.
DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json table

closes #1314

    • -0
    • +186
    ./logical/ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
    • -111
    • +5
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 15 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

    • -1
    • +0
    ./logical/HivePushPartitionFilterIntoScan.java
  1. … 230 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

    • -1
    • +1
    ./HiveUDFOperatorWithoutInference.java
    • -2
    • +1
    ./logical/HivePushPartitionFilterIntoScan.java
  1. … 2063 more files in changeset.
DRILL-6331: Revisit Hive Drill native parquet implementation to be exposed to Drill optimizations (filter / limit push down, count to direct scan)

1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator.

2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule.

3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path.

Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables,

`populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values.

4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class.

5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability.

6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality.

7. Reduced excessive logging when parquet files metadata is read

closes #1214

    • -17
    • +40
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 64 more files in changeset.
DRILL-6381: (Part 3) Planner and Execution implementation to support Secondary Indexes

  1. Index Planning Rules and Plan generators

    - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.

- DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.

    - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.

    - Support planning with functional indexes such as CAST functions.

    - Enhance PlannerSettings with several configuration options for indexes.

  2. Index Selection and Statistics

    - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.

    - Costing of index intersection for comparison with single-index plans.

  3. Planning and execution operators

    - Support RangePartitioning physical operator during query planning and execution.

    - Support RowKeyJoin physical operator during query planning and execution.

    - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.

    - Enhance Materializer to keep track of subscan association with a particular rowkey join.

  4. Index Planning utilities

    - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.

    - Utility class to analyze filter condition and an input collation to determine output collation.

    - Helper classes to maintain index contexts for logical and physical planning phase.

    - IndexPlanUtils utility class for various helper methods.

  5. Miscellaneous

    - Separate physical rel for DirectScan.

    - Modify LimitExchangeTranspose rule to handle SingleMergeExchange.

- MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema

Co-authored-by: Aman Sinha <asinha@maprtech.com>

Co-authored-by: chunhui-shi <cshi@maprtech.com>

Co-authored-by: Gautam Parai <gparai@maprtech.com>

Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>

Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java

exec/java-exec/src/main/resources/drill-module.conf

logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java

Resolve merge comflicts and compilation issues.

    • -1
    • +1
    ./logical/ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java
  1. … 93 more files in changeset.
DRILL-6130: Fix NPE during physical plan submission for various storage plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.

2. Added physical plan submission unit test for all storage plugins in contrib module.

3. Refactoring.

closes #1108

    • -7
    • +7
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 25 more files in changeset.
DRILL-4264: Allow field names to include dots

    • -2
    • +2
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 98 more files in changeset.
DRILL-5043: Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID() #685

    • -2
    • +2
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 27 more files in changeset.
DRILL-5034: Select timestamp from hive generated parquet always return in UTC

- TIMESTAMP_IMPALA function is reverted to retaine local timezone

- TIMESTAMP_IMPALA_LOCALTIMEZONE is deleted

- Retain local timezone for the INT96 timestamp values in the parquet files while

PARQUET_READER_INT96_AS_TIMESTAMP option is on

Minor changes according to the review

Fix for the test, which relies on particular timezone

close #656

    • -1
    • +1
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 6 more files in changeset.
Revert "DRILL-4373: Drill and Hive have incompatible timestamp representations in parquet - added sys/sess option "store.parquet.int96_as_timestamp"; - added int96 to timestamp converter for both readers; - added unit tests;"

This reverts commit 7e7214b40784668d1599f265067f789aedb6cf86.

    • -2
    • +1
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 14 more files in changeset.
DRILL-5032: Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

close apache/drill#654

    • -7
    • +7
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 21 more files in changeset.
DRILL-4373: Drill and Hive have incompatible timestamp representations in parquet - added sys/sess option "store.parquet.int96_as_timestamp"; - added int96 to timestamp converter for both readers; - added unit tests;

This closes #600

    • -1
    • +2
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 15 more files in changeset.
DRILL-4530: Optimize partition pruning with metadata caching for the single partition case.

- Enhance PruneScanRule to detect single partitions based on referenced dirs in the filter.

- Keep a new status of EXPANDED_PARTIAL for FileSelection.

- Create separate .directories metadata file to prune directories first before files.

- Introduce cacheFileRoot attribute to keep track of the parent directory of the cache file after partition pruning.

Check if prefix components are non-null the very first time single partition info is initialized.

Add separate interface method to create scan using a cacheFileRoot.

Create filenames list with unique names using fileSet if available. Add several unit tests.

Populate only fileSet when expanding using the metadata cache.

Remove cacheFileRoot parameter from FileGroupScan's clone() method and instead leverage it from FileSelection.

Keep track of whether all partitions were previously pruned and process this state where needed.

close apache/drill#519

  1. … 33 more files in changeset.
DRILL-4372: (continued) Support for Window functions: - CUME_DIST - DENSE_RANK - PERCENT_RANK - RANK - ROW_NUMBER - NTILE - LEAD - LAG - FIRST_VALUE - LAST_VALUE

    • -0
    • +44
    ./HiveUDFOperatorWithoutInference.java
  1. … 23 more files in changeset.
DRILL-4372: (continued) Add option to disable/enable function output type inference

    • -0
    • +44
    ./HiveUDFOperatorNotInfer.java
  1. … 15 more files in changeset.
DRILL-4372: (continued) Type inference for HiveUDFs

  1. … 2 more files in changeset.
DRILL-4589: Reduce planning time for file system partition pruning by reducing filter evaluation overhead

  1. … 10 more files in changeset.
DRILL-3745: Hive CHAR not supported

    • -0
    • +5
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 11 more files in changeset.
DRILL-4323: Handle skipAll query when use HiveDrillNativeParquetScan

Do not add Project when no column is needed to be read out from Scan (e.g., select count(*) from hive.table)

    • -3
    • +6
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 3 more files in changeset.
DRILL-4327: Fix rawtypes warnings in drill codebase

Fixing most rawtypes warning issues in drill modules.

Closes #347

    • -4
    • +4
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 77 more files in changeset.
DRILL-4256: Create HiveConf per HiveStoragePlugin and reuse it wherever needed.

Creating new instances of HiveConf() are very costly, we should avoid creating new ones as much as possible.

Also get rid of hiveConfigOverride and use HiveConf in HiveStoregPlugin wherever we need the HiveConf.

    • -6
    • +15
    ./logical/ConvertHiveParquetScanToDrillParquetScan.java
  1. … 12 more files in changeset.
DRILL-2517: Move directory-based partition pruning to Calcite logical planning phase.

1) Make directory-based pruning rule both work in calcite logical and drill logical planning phase.

2) Only apply directory-based pruning in logical phase when there is no metadata cache.

3) Make FileSelection constructor public, since FileSelection.create() would modify selectionRoot.

    • -7
    • +8
    ./logical/HivePushPartitionFilterIntoScan.java
  1. … 15 more files in changeset.