Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7472: Fix ser / de for sys and information_schema schemas queries

closes #1925

    • -3
    • +2
    ./InfoSchemaPushFilterIntoRecordGenerator.java
  1. … 7 more files in changeset.
DRILL-7481: Fix raw type warnings in Iceberg Metastore and related classes

closes #1924

  1. … 4 more files in changeset.
DRILL-7471: DESCRIBE TABLE command fails with ClassCastException when Metastore is enabled

  1. … 8 more files in changeset.
DRILL-5844: Incorrect values of TABLE_TYPE returned from method DatabaseMetaData.getTables of JDBC API

closes #1904

  1. … 4 more files in changeset.
DRILL-7406: Update Calcite to 1.21.0

1. DRILL-7386 - added tests to TestHiveStructs.

2. DRILL-4527 - the DrillAvgVarianceConvertlet can't be removed without test failures.

3. DRILL-6215 - switched to prepared statement in JdbcRecordReader.

4. DRILL-6905 - added test into TestExampleQueries.

5. DRILL-7415 - Fixed jdbc show tables when 2 tables with same name are present in different schemas.

6. DRILL-7340 - Fixed jdbc filter pushdown when few jdbc datasources enabled.

7. Split SqlConverter into multiple source files.

8. Minor refactorings for jdbc and other places.

closes #1940

  1. … 54 more files in changeset.
DRILL-7359: Add support for DICT type in RowSet Framework

closes #1870

  1. … 82 more files in changeset.
DRILL-7357: Expose Drill Metastore data through information_schema

1. Add additional columns to TABLES and COLUMNS tables.

2. Add PARTITIONS table.

3. General refactoring to adjust information_schema data retrieval from multiple sources.

closes #1860

    • -0
    • +216
    ./FilterEvaluator.java
    • -13
    • +12
    ./InfoSchemaPushFilterIntoRecordGenerator.java
    • -309
    • +77
    ./InfoSchemaRecordGenerator.java
    • -0
    • +413
    ./RecordCollector.java
  1. … 19 more files in changeset.
DRILL-6961: Handle exceptions during queries to information_schema

closes #1833

  1. … 7 more files in changeset.
DRILL-7115: Improve Hive schema show tables performance

1. To make SHOW TABLES for Hive schema work much faster, additional Drill

feature of showing only accesible tables when Storage-Based authorization

is enabled was sacrificed. Now the behaviour matches to Hive/Beeline, all

tables will be shown despite of accessibility. For details about previous

show tables results, check description of DRILL-540.

2. In HiveDatabaseSchema implemented faster getTableNamesAndTypes() method

and removed bulk related code.

3. Deprecated bulk related options and removed bulk code from AbstractSchema,

DrillHiveMetastoreClient.

4. For 8000 Hive tables query returned in 1.8 seconds, for combination of

4000 tables and 8000 views query returned in 2.3 seconds. Note, that

after first query table names will be cached and next queries will perform

in less than 1 sec.

5. Refactored WorkspaceSchemaFactory's getTableNamesAndTypes()

method to reuse existing getViews() method.

6. DrillHiveMetastoreClient was refactored. Classes were unnested and enclosed

within client package with restricted visibility. Also was updated cache

values type to avoid unnecessarry List to Set back and forth conversions.

Client creation methods moved to separate class. So the new package

exposes only factory and client class.

closes #1706

  1. … 20 more files in changeset.
DRILL-6931: File listing: fix issue for S3 directory objects and improve performance for recursive listing closes #1590

  1. … 2 more files in changeset.
DRILL-6858: Add functionality to list directories / files with exceptions suppression

1. Add listDirectoriesSafe, listFilesSafe, listAllSafe in FileSystemUtil and DrillFileSystemUtil classes.

2. Use FileSystemUtil.listAllSafe during listing files in show files command and information_schema.files table.

closes #1547

  1. … 7 more files in changeset.
DRILL-6381: Address code review comments (part 3).

DRILL-6381: Add missing joinControl logic for INTERSECT_DISTINCT.

- Modified HashJoin's probe phase to process INTERSECT_DISTINCT.

- NOTE: For build phase, the functionality will be same as for SemiJoin when it is added later.

DRILL-6381: Address code review comment for intersect_distinct.

DRILL-6381: Rebase on latest master and fix compilation issues.

DRILL-6381: Generate protobuf files for C++ native client.

DRILL-6381: Use shaded Guava classes. Add more comments and Javadoc.

    • -1
    • +1
    ./InfoSchemaPushFilterIntoRecordGenerator.java
  1. … 34 more files in changeset.
DRILL-6773: The renamed schema with aliases is not shown for queries on empty directories

closes #1492

    • -1
    • +1
    ./InfoSchemaPushFilterIntoRecordGenerator.java
  1. … 17 more files in changeset.
DRILL-6753: Fix show files command to return result the same way as before

1. Add ACCESS_TIME column.

2. Re-write show files command to return result using ShowFilesCommandResult to maintain backward compatibility.

closes #1477

  1. … 5 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

    • -1
    • +1
    ./InfoSchemaPushFilterIntoRecordGenerator.java
  1. … 977 more files in changeset.
DRILL-6492: Ensure schema / workspace case insensitivity in Drill

1. StoragePluginsRegistryImpl was updated:

a. for backward compatibility at init to convert all existing storage plugins names to lower case, in case of duplicates, to log warning and skip the duplicate.

b. to wrap persistent plugins registry into case insensitive store wrapper (CaseInsensitivePersistentStore) to ensure all given keys are converted into lower case when performing insert, update, delete, search operations.

c. to load system storage plugins dynamically by @SystemStorage annotation.

2. StoragePlugins class was updated to stored storage plugins configs by name in case insensitive map.

3. SchemaUtilities.searchSchemaTree method was updated to convert all schema names into lower case to ensure that are they are matched case insensitively (all schemas are stored in Drill in lower case).

4. FileSystemConfig was updated to store workspaces by name in case insensitive hash map.

5. All plugins schema factories are now extend AbstractSchemaFactory to ensure that given schema name is converted to lower case.

6. New method areTableNamesAreCaseInsensitive was added to AbstractSchema to indicate if schema tables names are case insensitive. By default, false. Schema implementation is responsible for table names case insensitive search in case it supports one. Currently, information_schema, sys and hive do so.

7. System storage plugins (information_schema, sys) were refactored to ensure their schema, table names are case insensitive, also the annotation @SystemPlugin and additional constructor were added to allow dynamically load system plugins at storage plugin registry during init phase.

8. MetadataProvider was updated to concert all schema filter conditions into lower case to ensure schema would be matched case insensitively.

9. ShowSchemasHandler, ShowTablesHandler, DescribeTableHandler were updated to ensure schema / tables names (this depends if schema supports case insensitive table names) would be found case insensitively.

git closes #1439

  1. … 53 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 144 more files in changeset.
DRILL-6680: Expose show files command into INFORMATION_SCHEMA

  1. … 16 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 230 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

    • -2
    • +1
    ./InfoSchemaPushFilterIntoRecordGenerator.java
  1. … 2053 more files in changeset.
DRILL-6331: Revisit Hive Drill native parquet implementation to be exposed to Drill optimizations (filter / limit push down, count to direct scan)

1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator.

2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule.

3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path.

Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables,

`populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values.

4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class.

5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability.

6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality.

7. Reduced excessive logging when parquet files metadata is read

closes #1214

  1. … 64 more files in changeset.
DRILL-6381: (Part 3) Planner and Execution implementation to support Secondary Indexes

  1. Index Planning Rules and Plan generators

    - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.

- DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.

    - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.

    - Support planning with functional indexes such as CAST functions.

    - Enhance PlannerSettings with several configuration options for indexes.

  2. Index Selection and Statistics

    - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.

    - Costing of index intersection for comparison with single-index plans.

  3. Planning and execution operators

    - Support RangePartitioning physical operator during query planning and execution.

    - Support RowKeyJoin physical operator during query planning and execution.

    - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.

    - Enhance Materializer to keep track of subscan association with a particular rowkey join.

  4. Index Planning utilities

    - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.

    - Utility class to analyze filter condition and an input collation to determine output collation.

    - Helper classes to maintain index contexts for logical and physical planning phase.

    - IndexPlanUtils utility class for various helper methods.

  5. Miscellaneous

    - Separate physical rel for DirectScan.

    - Modify LimitExchangeTranspose rule to handle SingleMergeExchange.

- MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema

Co-authored-by: Aman Sinha <asinha@maprtech.com>

Co-authored-by: chunhui-shi <cshi@maprtech.com>

Co-authored-by: Gautam Parai <gparai@maprtech.com>

Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>

Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java

exec/java-exec/src/main/resources/drill-module.conf

logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java

Resolve merge comflicts and compilation issues.

    • -1
    • +1
    ./InfoSchemaPushFilterIntoRecordGenerator.java
  1. … 93 more files in changeset.
DRILL-6436: Storage Plugin to have name and context moved to AbstractStoragePlugin

closes #1282

  1. … 11 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 223 more files in changeset.
REVERTED: DRILL-5089

Dynamically load schema of storage plugin only when needed for every query

  1. … 15 more files in changeset.
DRILL-5972: Slow performance for query on INFORMATION_SCHEMA.TABLE

closes #1038

DRILL-3993: Fix unit test failures connected with support Calcite 1.13

- Use root schema as default for describe table statement.

Fix TestOpenTSDBPlugin.testDescribe() and TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema() unit tests.

- Modify expected results for tests:

TestPreparedStatementProvider.invalidQueryValidationError();

TestProjectPushDown.testTPCH1();

TestProjectPushDown.testTPCH3();

TestStorageBasedHiveAuthorization.selectUser1_db_u0_only();

TestStorageBasedHiveAuthorization.selectUser0_db_u1g1_only()

- Fix TestCTAS.whenTableQueryColumnHasStarAndTableFiledListIsSpecified(), TestViewSupport.createViewWhenViewQueryColumnHasStarAndViewFiledListIsSpecified(), TestInbuiltHiveUDFs.testIf(), testDisableUtf8SupportInQueryString unit tests.

- Fix UnsupportedOperationException and NPE for jdbc tests.

- Fix AssertionError: Conversion to relational algebra failed to preserve datatypes

*DrillCompoundIdentifier:

According to the changes, made in [CALCITE-546], star Identifier is replaced by empty string during parsing the query. Since Drill uses its own DrillCompoundIdentifier, it should also replace star by empty string before creating SqlIdentifier instance to avoid further errors connected with star column. see SqlIdentifier.isStar() method.

*SqlConverter:

In [CALCITE-1417] added simplification of expressions which should be projected every time when a new project rel node is created using RelBuilder. It causes assertion errors connected with types nullability. This hook was set to false to avoid project expressions simplification. See usage of this hook and RelBuilder.project() method.

In Drill the type nullability of the function depends on only the nullability of its arguments. In some cases, a function may return null value even if it had non-nullable arguments. When Calice simplifies expressions, it checks that the type of the result is the same as the type of the expression. Otherwise, makeCast() method is called. But when a function returns null literal, this cast does nothing, even when the function has a non-nullable type. So to avoid this issue, method makeCast() was overridden.

*DrillAvgVarianceConvertlet:

Problem with sum0 and specific changes in old Calcite (it is CALCITE-777). (see HistogramShuttle.visitCall method) Changes were made to avoid changes in Calcite.

*SqlConverter, DescribeTableHandler, ShowTablesHandler:

New Calcite tries to combine both default and specified workspaces during the query validation. In some cases, for example, when describe table statement is used, Calcite tries to find INFORMATION_SCHEMA in the schema used as default. When it does not find the schema, it tries to find a table with such name. For some storage plugins, such as opentsdb and hbase, when a table was not found, the error is thrown, and the query fails. To avoid this issue, default schema was changed to root schema for validation stage for describe table and show tables queries.

  1. … 17 more files in changeset.
DRILL-5089: Dynamically load schema of storage plugin only when needed for every query

For each query, loading all storage plugins and loading all workspaces under file system plugins is not needed.

This patch use DynamicRootSchema as the root schema for Drill. Which loads correspondent storage only when needed.

infoschema to read full schema information and load second level schema accordingly.

for workspaces under the same Filesyetm, no need to create FileSystem for each workspace.

use fs.access API to check permission which is available after HDFS 2.6 except for windows + local file system case.

Add unit tests to test with a broken mock storage: with a storage that will throw Exception in regiterSchema method,

all queries even on good storages shall fail without this fix(Drill still load all schemas from all storages).

This closes #1032

  1. … 15 more files in changeset.
DRILL-5089: Dynamically load schema of storage plugin only when needed for every query

For each query, loading all storage plugins and loading all workspaces under file system plugins is not needed.

This patch use DynamicRootSchema as the root schema for Drill. Which loads correspondent storage only when needed.

infoschema to read full schema information and load second level schema accordingly.

for workspaces under the same Filesyetm, no need to create FileSystem for each workspace.

use fs.access API to check permission which is available after HDFS 2.6 except for windows + local file system case.

Add unit tests to test with a broken mock storage: with a storage that will throw Exception in regiterSchema method,

all queries even on good storages shall fail without this fix(Drill still load all schemas from all storages).

(cherry picked from commit a66d1d7)

  1. … 15 more files in changeset.
DRILL-3993: Changes to support Calcite 1.13

- fixed all compiling errors (main changes were: Maven changes, chenges RelNode -> RelRoot, implementing some new methods from updated interfaces, chenges some literals, logger changes);

- fixed unexpected column errors, validation errors and assertion errors after Calcite update;

- fixed describe table/schema statement according to updated logic;

- added fixes with time-intervals;

- changed precision of BINARY to 65536 (was 1048576) according to updated logic (Calcite overrides bigger precision to own maxPrecision);

- ignored some incorrect tests with DRILL-3244;

- changed "Table not found" message to "Object not found within" according to new Calcite changes.

  1. … 70 more files in changeset.