Clone Tools
  • last updated 25 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

  1. … 97 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 974 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 143 more files in changeset.
DRILL-6617: Changing name of implicit RowId column from implicitColumn to implicitRIDColumn.

closes #1401

  1. … 20 more files in changeset.
DRILL-6639: Exception happens while displaying operator profiles for some queries

  1. … 16 more files in changeset.
DRILL-6639: Exception happens while displaying operator profiles for some queries

closes #1404

  1. … 16 more files in changeset.
DRILL-6635: PartitionLimit for Lateral/Unnest PartitionLimitBatch initial implementation Add unit tests for PartitionLimitBatch

    • -0
    • +62
    ./PartitionLimit.java
  1. … 6 more files in changeset.
DRILL-6617: Planner Side changed to propagate $drill_implicit_field$ information.

  1. … 35 more files in changeset.
DRILL-6503: Performance improvements in lateral

closes #1328

  1. … 3 more files in changeset.
DRILL-6545: Projection Push down into Lateral Join operator.

closes #1347

  1. … 14 more files in changeset.
DRILL-6385: Support JPPD feature

    • -0
    • +58
    ./RuntimeFilterPOP.java
  1. … 62 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 224 more files in changeset.
DRILL-6321: Lateral Join and Unnest - rules, options, logical plan supports

Included changes:

* Add planner.enable_unnest_lateral option. Default value set to false.

* Enable FilterCorrectRule

* Add support to logical plan

* Fix rebase errors for DRILL-6321 commits

  1. … 18 more files in changeset.
DRILL-6027: - Added fallback option for HashJoin. - No copy of incoming for single partition, and avoid HT resize. - Fix memory leak when cancelling while spill file is read - get correct schema when probe side is empty - Re-create the HashJoinProbe

  1. … 40 more files in changeset.
DRILL-6422: Update guava to 23.0 and shade it

- Fix compilation errors for new version of Guava.

- Remove usage of deprecated API

- Shade guava and add dependencies to the shaded version

- Ban unshaded package

- Introduce drill-shaded module and move guava-shaded under it

- Add methods to convert shaded guava lists to the unshaded ones

- Add instruction for publishing artifacts to the Apache repository

  1. … 81 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2052 more files in changeset.
DRILL-6321: Lateral Join and Unnest - initial implementation for parser and planning

  1. … 25 more files in changeset.
DRILL-6284: Add operator metrics for batch sizing for flatten

  1. … 6 more files in changeset.
DRILL-6381: (Part 3) Planner and Execution implementation to support Secondary Indexes

  1. Index Planning Rules and Plan generators

    - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.

- DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.

    - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.

    - Support planning with functional indexes such as CAST functions.

    - Enhance PlannerSettings with several configuration options for indexes.

  2. Index Selection and Statistics

    - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.

    - Costing of index intersection for comparison with single-index plans.

  3. Planning and execution operators

    - Support RangePartitioning physical operator during query planning and execution.

    - Support RowKeyJoin physical operator during query planning and execution.

    - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.

    - Enhance Materializer to keep track of subscan association with a particular rowkey join.

  4. Index Planning utilities

    - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.

    - Utility class to analyze filter condition and an input collation to determine output collation.

    - Helper classes to maintain index contexts for logical and physical planning phase.

    - IndexPlanUtils utility class for various helper methods.

  5. Miscellaneous

    - Separate physical rel for DirectScan.

    - Modify LimitExchangeTranspose rule to handle SingleMergeExchange.

- MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema

Co-authored-by: Aman Sinha <asinha@maprtech.com>

Co-authored-by: chunhui-shi <cshi@maprtech.com>

Co-authored-by: Gautam Parai <gparai@maprtech.com>

Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>

Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java

exec/java-exec/src/main/resources/drill-module.conf

logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java

Resolve merge comflicts and compilation issues.

    • -0
    • +60
    ./RangePartitionExchange.java
    • -0
    • +96
    ./RowKeyJoinPOP.java
  1. … 90 more files in changeset.
DRILL-6324: Unnest - Add tests with real Unnest and real Lateral.

Code cleanup, more comments, fix license headers, and more logging.

Refactor Unnest to allow setting in incoming batch after construction

fix compilation after rebase

This closes #1223

  1. … 13 more files in changeset.
DRILL-6323: Lateral Join - Refactor Join PopConfigs

  1. … 3 more files in changeset.
DRILL-6115: SingleMergeExchange is not scaling up when many minor fragments are allocated for a query.

DRILL-6115: Refactoring the existing code.

close apache/drill#1110

    • -0
    • +52
    ./OrderedMuxExchange.java
  1. … 15 more files in changeset.
DRILL-6027: - Added memory claculator - Added unit tests and docs. - Fixed IOB caused by output vector allocation. - Don't double count records that were spilled in HashJoin

  1. … 55 more files in changeset.
DRILL-6381: (Part 1) Secondary Index framework

  1. Secondary Index planning interfaces and abstract classes like DBGroupScan, DbSubScan, IndexDecriptor etc.

  2. Statistics and Cost model interfaces/classes: PluginCost, Statistics, StatisticsPayload, AbstractIndexStatistics

  3. ScanBatch and RecordReader to support repeatable scan

  4. Secondary Index execution related interfaces: RangePartitionSender, RowKeyJoin, PartitionFunction

5. MD-3979: Query using cast index plan fails with NPE

Co-authored-by: Aman Sinha <asinha@maprtech.com>

Co-authored-by: chunhui-shi <cshi@maprtech.com>

Co-authored-by: Gautam Parai <gparai@maprtech.com>

Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>

Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java

protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java

protocol/src/main/java/org/apache/drill/exec/proto/beans/CoreOperatorType.java

protocol/src/main/protobuf/UserBitShared.proto

    • -0
    • +73
    ./RangePartitionSender.java
  1. … 43 more files in changeset.
DRILL-6324: Unnest - Initial Implementation

- Based on Flatten

- Implement unnestRecords in UnnestTemplate

- Remove unnecessary code inherited from Flatten/Project. Add schema change handling.

- Fix build failure after rebase since RecordBatchSizer used by UNNEST was relocated to a different package

- Add unit tests

- Handling of input row splitting across multiple batches. Also do not kill incoming in killIncoming.

- Schema change generated by Unnest

  1. … 10 more files in changeset.
DRILL-6323: Lateral Join - Initial implementation

    • -0
    • +55
    ./LateralJoinPOP.java
  1. … 10 more files in changeset.
DRILL-6882: Handle the cases where RowKeyJoin's left pipeline being called multiple times.

close apache/drill#1562

  1. … 4 more files in changeset.
DRILL-5830: Resolve regressions to MapR DB from DRILL-5546

- Back out HBase changes

- Code cleanup

- Test utilities

- Fix for DRILL-5829

closes #968

  1. … 22 more files in changeset.
DRILL-5457: Spill implementation for Hash Aggregate

closes #822

  1. … 34 more files in changeset.
DRILL-5546: Handle schema change exception failure caused by empty input or empty batch.

1. Modify ScanBatch's logic when it iterates list of RecordReader.

1) Skip RecordReader if it returns 0 row && present same schema. A new schema (by calling Mutator.isNewSchema() ) means either a new top level field is added, or a field in a nested field is added, or an existing field type is changed.

2) Implicit columns are presumed to have constant schema, and are added to outgoing container before any regular column is added in.

3) ScanBatch will return NONE directly (called as "fast NONE"), if all its RecordReaders haver empty input and thus are skipped, in stead of returing OK_NEW_SCHEMA first.

2. Modify IteratorValidatorBatchIterator to allow

1) fast NONE ( before seeing a OK_NEW_SCHEMA)

2) batch with empty list of columns.

2. Modify JsonRecordReader when it get 0 row. Do not insert a nullable-int column for 0 row input. Together with ScanBatch, Drill will skip empty json files.

3. Modify binary operators such as join, union to handle fast none for either one side or both sides. Abstract the logic in AbstractBinaryRecordBatch, except for MergeJoin as its implementation is quite different from others.

4. Fix and refactor union all operator.

1) Correct union operator hanndling 0 input rows. Previously, it will ignore inputs with 0 row and put nullable-int into output schema, which causes various of schema change issue in down-stream operator. The new behavior is to take schema with 0 into account

in determining the output schema, in the same way with > 0 input rows. By doing that, we ensure Union operator will not behave like a schema-lossy operator.

2) Add a UnionInputIterator to simplify the logic to iterate the left/right inputs, removing significant chunk of duplicate codes in previous implementation.

The new union all operator reduces the code size into half, comparing the old one.

5. Introduce UntypedNullVector to handle convertFromJson() function, when the input batch contains 0 row.

Problem: The function convertFromJSon() is different from other regular functions in that it only knows the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have

a way to know the output type, and previously will assume Map type. That works under the assumption other operators like Union would ignore batch with 0 row, which is no longer

the case in the current implementation.

Solution: Use MinorType.NULL at the output type for convertFromJSON() when input contains 0 row. The new UntypedNullVector is used to represent a column with MinorType.NULL.

6. HBaseGroupScan convert star column into list of row_key and column family. HBaseRecordReader should reject column star since it expectes star has been converted somewhere else.

In HBase a column family always has map type, and a non-rowkey column always has nullable varbinary type, this ensures that HBaseRecordReader across different HBase regions will have the same top level schema, even if the region is

empty or prune all the rows due to filter pushdown optimization. In other words, we will not see different top level schema from different HBaseRecordReader for the same table.

However, such change will not be able to handle hard schema change : c1 exists in cf1 in one region, but not in another region. Further work is required to handle hard schema change.

7. Modify scan cost estimation when the query involves * column. This is to remove the planning randomness since previously two different operators could have same cost.

8. Add a new flag 'outputProj' to Project operator, to indicate if Project is for the query's final output. Such Project is added by TopProjectVisitor, to handle fast NONE when all the inputs to the query are empty

and are skipped.

1) column star is replaced with empty list

2) regular column reference is replaced with nullable-int column

3) An expression will go through ExpressionTreeMaterializer, and use the type of materialized expression as the output type

4) Return an OK_NEW_SCHEMA with the schema using the above logic, then return a NONE to down-stream operator.

9. Add unit test to test operators handling empty input.

10. Add unit test to test query when inputs are all empty.

DRILL-5546: Revise code based on review comments.

Handle implicit column in scan batch. Change interface in ScanBatch's constructor.

1) Ensure either the implicit column list is empty, or all the reader has the same set of implicit columns.

2) We could skip the implicit columns when check if there is a schema change coming from record reader.

3) ScanBatch accept a list in stead of iterator, since we may need go through the implicit column list multiple times, and verify the size of two lists are same.

ScanBatch code review comments. Add more unit tests.

Share code path in ProjectBatch to handle normal setupNewSchema() and handleNullInput().

- Move SimpleRecordBatch out of TopNBatch to make it sharable across different places.

- Add Unit test verify schema for star column query against multilevel tables.

Unit test framework change

- Fix memory leak in unit test framework.

- Allow SchemaTestBuilder to pass in BatchSchema.

close #906

  1. … 67 more files in changeset.