Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6961: Handle exceptions during queries to information_schema

closes #1833

  1. … 7 more files in changeset.
DRILL-6918: Skip ensureAtLeastOneField when there are no records

If none of the project / filter columns, exist in the vector, ensureAtLeastOneField (or the Scan operator) adds at least one field as nullable integer (or nullable varchar if `allTextmode` is enabled).

The downstream Filter operator would then go on to fail with `NumberFormatException` because it tries to convert empty fields to integers.

Since ensureAtLeastOneField is called after reading all the messages in a batch, it can be skipped if the batch is empty.

closes #1595

    • -1
    • +3
    ./drill/exec/store/kafka/KafkaRecordReader.java
DRILL-6381: Address code review comments (part 3).

DRILL-6381: Add missing joinControl logic for INTERSECT_DISTINCT.

- Modified HashJoin's probe phase to process INTERSECT_DISTINCT.

- NOTE: For build phase, the functionality will be same as for SemiJoin when it is added later.

DRILL-6381: Address code review comment for intersect_distinct.

DRILL-6381: Rebase on latest master and fix compilation issues.

DRILL-6381: Generate protobuf files for C++ native client.

DRILL-6381: Use shaded Guava classes. Add more comments and Javadoc.

  1. … 34 more files in changeset.
DRILL-6773: The renamed schema with aliases is not shown for queries on empty directories

closes #1492

  1. … 17 more files in changeset.
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

    • -7
    • +19
    ./drill/exec/store/kafka/KafkaRecordReader.java
  1. … 101 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

    • -5
    • +5
    ./drill/exec/store/kafka/KafkaGroupScan.java
    • -2
    • +2
    ./drill/exec/store/kafka/KafkaNodeProcessor.java
    • -3
    • +3
    ./drill/exec/store/kafka/KafkaRecordReader.java
    • -2
    • +2
    ./drill/exec/store/kafka/KafkaStoragePlugin.java
    • -1
    • +1
    ./drill/exec/store/kafka/KafkaSubScan.java
    • -2
    • +2
    ./drill/exec/store/kafka/MessageIterator.java
  1. … 974 more files in changeset.
DRILL-6492: Ensure schema / workspace case insensitivity in Drill

1. StoragePluginsRegistryImpl was updated:

a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate.

b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations.

c. to load system storage plugins dynamically by @SystemStorage annotation.

2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map.

3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case).

4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map.

5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case.

6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so.

7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase.

8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively.

9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively.

git closes #1439

  1. … 55 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 144 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

    • -1
    • +0
    ./drill/exec/store/kafka/KafkaRecordReader.java
  1. … 231 more files in changeset.
DRILL-5977: Implement Filter Pushdown in Drill-Kafka plugin

closes #1272

    • -38
    • +52
    ./drill/exec/store/kafka/KafkaGroupScan.java
    • -0
    • +186
    ./drill/exec/store/kafka/KafkaNodeProcessor.java
    • -0
    • +100
    ./drill/exec/store/kafka/KafkaPartitionScanSpec.java
    • -0
    • +345
    ./drill/exec/store/kafka/KafkaPartitionScanSpecBuilder.java
    • -0
    • +81
    ./drill/exec/store/kafka/KafkaPushDownFilterIntoScan.java
    • -3
    • +2
    ./drill/exec/store/kafka/KafkaRecordReader.java
    • -1
    • +1
    ./drill/exec/store/kafka/KafkaStoragePlugin.java
    • -68
    • +4
    ./drill/exec/store/kafka/KafkaSubScan.java
    • -2
    • +1
    ./drill/exec/store/kafka/MessageIterator.java
  1. … 5 more files in changeset.
DRILL-6381: (Part 3) Planner and Execution implementation to support Secondary Indexes

  1. Index Planning Rules and Plan generators

    - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.

- DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.

    - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.

    - Support planning with functional indexes such as CAST functions.

    - Enhance PlannerSettings with several configuration options for indexes.

  2. Index Selection and Statistics

    - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.

    - Costing of index intersection for comparison with single-index plans.

  3. Planning and execution operators

    - Support RangePartitioning physical operator during query planning and execution.

    - Support RowKeyJoin physical operator during query planning and execution.

    - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.

    - Enhance Materializer to keep track of subscan association with a particular rowkey join.

  4. Index Planning utilities

    - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.

    - Utility class to analyze filter condition and an input collation to determine output collation.

    - Helper classes to maintain index contexts for logical and physical planning phase.

    - IndexPlanUtils utility class for various helper methods.

  5. Miscellaneous

    - Separate physical rel for DirectScan.

    - Modify LimitExchangeTranspose rule to handle SingleMergeExchange.

- MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema

Co-authored-by: Aman Sinha <asinha@maprtech.com>

Co-authored-by: chunhui-shi <cshi@maprtech.com>

Co-authored-by: Gautam Parai <gparai@maprtech.com>

Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>

Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java

exec/java-exec/src/main/resources/drill-module.conf

logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java

Resolve merge comflicts and compilation issues.

  1. … 93 more files in changeset.
DRILL-6164: Heap memory leak during parquet scan and OOM

closes #1122

  1. … 15 more files in changeset.
DRILL-6436: Storage Plugin to have name and context moved to AbstractStoragePlugin

closes #1282

    • -6
    • +1
    ./drill/exec/store/kafka/KafkaStoragePlugin.java
  1. … 11 more files in changeset.
DRILL-6130: Fix NPE during physical plan submission for various storage plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.

2. Added physical plan submission unit test for all storage plugins in contrib module.

3. Refactoring.

closes #1108

    • -19
    • +21
    ./drill/exec/store/kafka/KafkaGroupScan.java
    • -29
    • +27
    ./drill/exec/store/kafka/KafkaSubScan.java
  1. … 25 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

    • -7
    • +6
    ./drill/exec/store/kafka/KafkaRecordReader.java
  1. … 222 more files in changeset.
DRILL-6049: Misc. hygiene and code cleanup changes

close apache/drill#1085

    • -1
    • +1
    ./drill/exec/store/kafka/KafkaRecordReader.java
  1. … 123 more files in changeset.
DRILL-4779: Kafka storage plugin (Kamesh Bhallamudi & Anil Kumar Batchu)

closes #1052

    • -0
    • +319
    ./drill/exec/store/kafka/KafkaGroupScan.java
    • -0
    • +145
    ./drill/exec/store/kafka/KafkaRecordReader.java
    • -0
    • +55
    ./drill/exec/store/kafka/KafkaScanBatchCreator.java
    • -0
    • +40
    ./drill/exec/store/kafka/KafkaScanSpec.java
    • -0
    • +100
    ./drill/exec/store/kafka/KafkaStoragePlugin.java
    • -0
    • +78
    ./drill/exec/store/kafka/KafkaStoragePluginConfig.java
    • -0
    • +177
    ./drill/exec/store/kafka/KafkaSubScan.java
    • -0
    • +114
    ./drill/exec/store/kafka/MessageIterator.java
    • -0
    • +37
    ./drill/exec/store/kafka/MetaDataField.java
    • -0
    • +104
    ./drill/exec/store/kafka/decoders/JsonMessageReader.java
    • -0
    • +45
    ./drill/exec/store/kafka/decoders/MessageReader.java
    • -0
    • +63
    ./drill/exec/store/kafka/decoders/MessageReaderFactory.java
    • -0
    • +24
    ./drill/exec/store/kafka/package-info.java
    • -0
    • +86
    ./drill/exec/store/kafka/schema/KafkaMessageSchema.java
    • -0
    • +45
    ./drill/exec/store/kafka/schema/KafkaSchemaFactory.java
  1. … 22 more files in changeset.
DRILL-5919: Add non-numeric support for JSON processing

1. Added two session options store.json.reader.non_numeric_numbers and store.json.reader.non_numeric_numbers that allow to read/write NaN and Infinity as numbers. By default these options

are set to true.

2. Extended signature of convert_toJSON and convert_fromJSON functions by adding second optional parameter

that enables/disables read/write NaN and Infinity. By default it is set true.

3. Added unit tests with nan, infitity values for math and aggregate functions

4. Replaced JsonReader's constructors with builder.

This closes #1026

  1. … 17 more files in changeset.
DRILL-1328: Support table statistics - Part 2

Add support for avg row-width and major type statistics.

Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance.

Update/fix rowcount, selectivity and ndv computations to improve plan costing.

Add options for configuring collection/usage of statistics.

Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs).

Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries.

Add support for CPU sampling and nested scalar columns.

Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures.

Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests.

FUNCS: Statistics functions as UDFs:

Separate

Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A.

* custom versions of "count" that always return BigInt

* HyperLogLog based NDV that returns BigInt that works only on VarChars

* HyperLogLog with binary output that only works on VarChars

OPS: Updated protobufs for new ops

OPS: Implemented StatisticsMerge

OPS: Implemented StatisticsUnpivot

ANALYZE: AnalyzeTable functionality

* JavaCC syntax more-or-less copied from LucidDB.

* (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel

ANALYZE: Add getMetadataTable() to AbstractSchema

USAGE: Change field access in QueryWrapper

USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel

* since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor

* This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans.

USAGE: Attach DrillStatsTable to DrillTable.

* DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table

* In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used.

** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated.

** Query is set up to extract only the most recent statistics results for each column.

closes #729

    • -0
    • +1
    ./drill/exec/store/kafka/KafkaGroupScan.java
  1. … 143 more files in changeset.