Clone Tools
  • last updated 19 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7196: Queries are still runnable on disabled plugins

- Storage client is not created anymore for disabled plugins

- GET "/storage/{name}.json" endpoint now working with

plugin configuration directly, without client instantination.

It have increased UI responsitivity.

- Hbase and mongo base test classes refactored to honor enabled

plugin attribute

- Fixed path contructor for mongo test datasets:

Now it is cross-platform

- Fixed test json files format which using plugin definitions

- Code cleanup

  1. … 106 more files in changeset.
DRILL-6381: Address code review comments (part 3).

DRILL-6381: Add missing joinControl logic for INTERSECT_DISTINCT.

- Modified HashJoin's probe phase to process INTERSECT_DISTINCT.

- NOTE: For build phase, the functionality will be same as for SemiJoin when it is added later.

DRILL-6381: Address code review comment for intersect_distinct.

DRILL-6381: Rebase on latest master and fix compilation issues.

DRILL-6381: Generate protobuf files for C++ native client.

DRILL-6381: Use shaded Guava classes. Add more comments and Javadoc.

  1. … 34 more files in changeset.
DRILL-6773: The renamed schema with aliases is not shown for queries on empty directories

closes #1492

  1. … 17 more files in changeset.
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

  1. … 102 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

    • -1
    • +1
    ./apache/drill/exec/store/hbase/HBaseUtils.java
  1. … 971 more files in changeset.
DRILL-6492: Ensure schema / workspace case insensitivity in Drill

1. StoragePluginsRegistryImpl was updated:

a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate.

b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations.

c. to load system storage plugins dynamically by @SystemStorage annotation.

2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map.

3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case).

4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map.

5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case.

6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so.

7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase.

8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively.

9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively.

git closes #1439

  1. … 55 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 143 more files in changeset.
DRILL-6489: Fix filter push down for Hbase & Mapr-DB binary tables when convert function is used in a view

  1. … 1 more file in changeset.
DRILL-4020: The not-equal operator returns incorrect results when used on the HBase row key

- Added a condition that checks if the filter to the scan specification doesn't have NOT_EQUAL operator

- Added testFilterPushDownRowKeyNotEqual() to TestHBaseFilterPushDown

closes #309

  1. … 1 more file in changeset.
Revert "DRILL-4020: The not-equal operator returns incorrect results when used on the HBase row key"

This reverts commit 0d5eda83fe34928ff60629e6a4903d43a1d82582.

  1. … 1 more file in changeset.
DRILL-6442: Adjust Hbase disk cost & row count estimation when filter push down is applied

closes #1288

DRILL-6386: Remove unused imports and star imports.

  1. … 231 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2052 more files in changeset.
DRILL-6381: (Part 3) Planner and Execution implementation to support Secondary Indexes

  1. Index Planning Rules and Plan generators

    - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.

- DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.

    - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.

    - Support planning with functional indexes such as CAST functions.

    - Enhance PlannerSettings with several configuration options for indexes.

  2. Index Selection and Statistics

    - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.

    - Costing of index intersection for comparison with single-index plans.

  3. Planning and execution operators

    - Support RangePartitioning physical operator during query planning and execution.

    - Support RowKeyJoin physical operator during query planning and execution.

    - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.

    - Enhance Materializer to keep track of subscan association with a particular rowkey join.

  4. Index Planning utilities

    - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.

    - Utility class to analyze filter condition and an input collation to determine output collation.

    - Helper classes to maintain index contexts for logical and physical planning phase.

    - IndexPlanUtils utility class for various helper methods.

  5. Miscellaneous

    - Separate physical rel for DirectScan.

    - Modify LimitExchangeTranspose rule to handle SingleMergeExchange.

- MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema

Co-authored-by: Aman Sinha <asinha@maprtech.com>

Co-authored-by: chunhui-shi <cshi@maprtech.com>

Co-authored-by: Gautam Parai <gparai@maprtech.com>

Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>

Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java

exec/java-exec/src/main/resources/drill-module.conf

logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java

Resolve merge comflicts and compilation issues.

  1. … 93 more files in changeset.
DRILL-6164: Heap memory leak during parquet scan and OOM

closes #1122

  1. … 15 more files in changeset.
DRILL-6436: Storage Plugin to have name and context moved to AbstractStoragePlugin

closes #1282

  1. … 11 more files in changeset.
DRILL-6130: Fix NPE during physical plan submission for various storage plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.

2. Added physical plan submission unit test for all storage plugins in contrib module.

3. Refactoring.

closes #1108

  1. … 25 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 222 more files in changeset.
DRILL-3993: Changes after code review.

  1. … 4 more files in changeset.
DRILL-3993: Changes to support Calcite 1.15.

Fix AssertionError: type mismatch for tests with aggregate functions.

Fix VARIANCE agg function

Remove using deprecated Subtype enum

Fix 'Failure while loading table a in database hbase' error

Fix 'Field ordinal 1 is invalid for type '(DrillRecordRow[*])'' unit test failures

  1. … 17 more files in changeset.
DRILL-5896: Handle HBase columns vector creation in the HBaseRecordReader

closes #1005

DRILL-5865: fix build

closes #986

DRILL-5743: Handling column family and column scan for hbase

closes #975

  1. … 4 more files in changeset.
DRILL-5775: Select * query on a maprdb binary table fails

- The common HBase/MapR-DB_binary verifyColumns() method;

- MapRDBBinaryTable is introduced for the purpose of expanding the wildcard on the planning stage;

- AbstractHBaseDrillTable class for MapRDBBinaryTable and DrillHBaseTable.

closes #973

    • -0
    • +79
    ./apache/drill/exec/store/hbase/AbstractHBaseDrillTable.java
    • -1
    • +26
    ./apache/drill/exec/store/hbase/HBaseUtils.java
  1. … 4 more files in changeset.
DRILL-5830: Resolve regressions to MapR DB from DRILL-5546

- Back out HBase changes

- Code cleanup

- Test utilities

- Fix for DRILL-5829

closes #968

  1. … 17 more files in changeset.
DRILL-4264: Allow field names to include dots

  1. … 97 more files in changeset.
DRILL-5546: Handle schema change exception failure caused by empty input or empty batch.

1. Modify ScanBatch's logic when it iterates list of RecordReader.

1) Skip RecordReader if it returns 0 row && present same schema. A new schema (by calling Mutator.isNewSchema() ) means either a new top level field is added, or a field in a nested field is added, or an existing field type is changed.

2) Implicit columns are presumed to have constant schema, and are added to outgoing container before any regular column is added in.

3) ScanBatch will return NONE directly (called as "fast NONE"), if all its RecordReaders haver empty input and thus are skipped, in stead of returing OK_NEW_SCHEMA first.

2. Modify IteratorValidatorBatchIterator to allow

1) fast NONE ( before seeing a OK_NEW_SCHEMA)

2) batch with empty list of columns.

2. Modify JsonRecordReader when it get 0 row. Do not insert a nullable-int column for 0 row input. Together with ScanBatch, Drill will skip empty json files.

3. Modify binary operators such as join, union to handle fast none for either one side or both sides. Abstract the logic in AbstractBinaryRecordBatch, except for MergeJoin as its implementation is quite different from others.

4. Fix and refactor union all operator.

1) Correct union operator hanndling 0 input rows. Previously, it will ignore inputs with 0 row and put nullable-int into output schema, which causes various of schema change issue in down-stream operator. The new behavior is to take schema with 0 into account

in determining the output schema, in the same way with > 0 input rows. By doing that, we ensure Union operator will not behave like a schema-lossy operator.

2) Add a UnionInputIterator to simplify the logic to iterate the left/right inputs, removing significant chunk of duplicate codes in previous implementation.

The new union all operator reduces the code size into half, comparing the old one.

5. Introduce UntypedNullVector to handle convertFromJson() function, when the input batch contains 0 row.

Problem: The function convertFromJSon() is different from other regular functions in that it only knows the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have

a way to know the output type, and previously will assume Map type. That works under the assumption other operators like Union would ignore batch with 0 row, which is no longer

the case in the current implementation.

Solution: Use MinorType.NULL at the output type for convertFromJSON() when input contains 0 row. The new UntypedNullVector is used to represent a column with MinorType.NULL.

6. HBaseGroupScan convert star column into list of row_key and column family. HBaseRecordReader should reject column star since it expectes star has been converted somewhere else.

In HBase a column family always has map type, and a non-rowkey column always has nullable varbinary type, this ensures that HBaseRecordReader across different HBase regions will have the same top level schema, even if the region is

empty or prune all the rows due to filter pushdown optimization. In other words, we will not see different top level schema from different HBaseRecordReader for the same table.

However, such change will not be able to handle hard schema change : c1 exists in cf1 in one region, but not in another region. Further work is required to handle hard schema change.

7. Modify scan cost estimation when the query involves * column. This is to remove the planning randomness since previously two different operators could have same cost.

8. Add a new flag 'outputProj' to Project operator, to indicate if Project is for the query's final output. Such Project is added by TopProjectVisitor, to handle fast NONE when all the inputs to the query are empty

and are skipped.

1) column star is replaced with empty list

2) regular column reference is replaced with nullable-int column

3) An expression will go through ExpressionTreeMaterializer, and use the type of materialized expression as the output type

4) Return an OK_NEW_SCHEMA with the schema using the above logic, then return a NONE to down-stream operator.

9. Add unit test to test operators handling empty input.

10. Add unit test to test query when inputs are all empty.

DRILL-5546: Revise code based on review comments.

Handle implicit column in scan batch. Change interface in ScanBatch's constructor.

1) Ensure either the implicit column list is empty, or all the reader has the same set of implicit columns.

2) We could skip the implicit columns when check if there is a schema change coming from record reader.

3) ScanBatch accept a list in stead of iterator, since we may need go through the implicit column list multiple times, and verify the size of two lists are same.

ScanBatch code review comments. Add more unit tests.

Share code path in ProjectBatch to handle normal setupNewSchema() and handleNullInput().

- Move SimpleRecordBatch out of TopNBatch to make it sharable across different places.

- Add Unit test verify schema for star column query against multilevel tables.

Unit test framework change

- Fix memory leak in unit test framework.

- Allow SchemaTestBuilder to pass in BatchSchema.

close #906

  1. … 65 more files in changeset.
DRILL-5516: Limit memory usage for Hbase reader

close apache/drill#839

DRILL-4963: Fix issues with dynamically loaded overloaded functions

close #701

  1. … 26 more files in changeset.
DRILL-4199: Add Support for HBase 1.X

Highlights of the changes:

* Replaced the old HBase APIs (HBaseAdmin/HTable) with the new HBase 1.1 APIs (Connection/Admin/Table).

* Added HBaseConnectionManager class which which manages the life-cycle of HBase connections inside a Drillbit process.

* Updated HBase dependencies version to 1.1.3 and 1.1.1-mapr-1602-m7-5.1.0 for default and "mapr" profiles respectively.

* Added `commons-logging` dependency in the `provided` scope to allow HBase test cluster to come up for Unit tests.

* Relaxed banned dependency rule for `commons-logging` library for `storage-hbase` module alone, in provided scope only.

* Removed the use of many deprecated APIs throughout the modules code.

* Added some missing test to HBase storage plugin's test suit.

* Move the GuavaPatcher code to main code execution path.

* Log a message if GuavaPatcher fails instead of exiting.

All unit tests are green.

Closes #443

    • -0
    • +109
    ./apache/drill/exec/store/hbase/HBaseConnectionManager.java
  1. … 26 more files in changeset.