Clone Tools
  • last updated 18 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7471: DESCRIBE TABLE command fails with ClassCastException when Metastore is enabled

    • -3
    • +3
    ./ConvertMetadataAggregateToDirectScanRule.java
  1. … 8 more files in changeset.
DRILL-7454: Convert Avro to EVF

1. Replaced old format implementation with EVF.

2. Updated, added and improved performance for Avro tests.

3. Code refactoring.

closes #1951

  1. … 32 more files in changeset.
DRILL-7450: Improve performance for ANALYZE command

- Implement two-phase aggregation for the lowest metadata aggregate to optimize performance

- Allow using complex functions with hash aggregate

- Use hash aggregation for PHASE_1of2 for ANALYZE to reduce memory usage and avoid sorting non-aggregated data

- Add sort above hash aggregation to fix correctness of merge exchange and stream aggregate

closes #1907

    • -0
    • +271
    ./ConvertMetadataAggregateToDirectScanRule.java
  1. … 58 more files in changeset.
DRILL-7418: MetadataDirectGroupScan improvements

1. Replaced files listing with selection root information to reduce query plan size in MetadataDirectGroupScan.

2. Fixed MetadataDirectGroupScan ser / de issues.

3. Added PlanMatcher to QueryBuilder for more convenient plan matching.

4. Re-written TestConvertCountToDirectScan to use ClusterTest.

5. Refactoring and code clean up.

    • -15
    • +12
    ./ConvertCountToDirectScanRule.java
  1. … 12 more files in changeset.
DRILL-7406: Update Calcite to 1.21.0

1. DRILL-7386 - added tests to TestHiveStructs.

2. DRILL-4527 - the DrillAvgVarianceConvertlet can't be removed without test failures.

3. DRILL-6215 - switched to prepared statement in JdbcRecordReader.

4. DRILL-6905 - added test into TestExampleQueries.

5. DRILL-7415 - Fixed jdbc show tables when 2 tables with same name are present in different schemas.

6. DRILL-7340 - Fixed jdbc filter pushdown when few jdbc datasources enabled.

7. Split SqlConverter into multiple source files.

8. Minor refactorings for jdbc and other places.

closes #1940

  1. … 54 more files in changeset.
DRILL-5674: Support ZIP compression

1. Added ZipCodec implementation which can read / write single file.

2. Revisited Drill plugin formats to ensure 'openPossiblyCompressedStream' method is used in those which support compression.

3. Added unit tests.

4. General refactoring.

  1. … 17 more files in changeset.
DRILL-7373: Fix problems involving reading from DICT type

- Fixed FieldIdUtil to resolve reading from DICT for some complex cases;

- optimized reading from DICT given a key by passing an appropriate Object type to DictReader#find(...) and DictReader#read(...) methods when schema is known (e.g. when reading from Hive tables) instead of generating it on fly based on int or String path and key type;

- fixed error when accessing value by not existing key value in Avro table.

  1. … 10 more files in changeset.
DRILL-7317: Close ClassLoaders used for udf jars uploading when closing FunctionImplementationRegistry

- Fix issue with caching DrillMergeProjectRule and FunctionImplementationRegistry when different drillbits are started within the same JVM

  1. … 3 more files in changeset.
DRILL-7316: Move classes from org.apache.drill.metastore into org.apache.drill.exec.metastore package in java-exec module

  1. … 31 more files in changeset.
DRILL-7315: Revise precision and scale order in the method arguments

  1. … 28 more files in changeset.
DRILL-7273: Introduce operators for handling metadata

closes #1886

    • -0
    • +74
    ./MetadataAggRel.java
    • -0
    • +85
    ./MetadataControllerRel.java
    • -0
    • +73
    ./MetadataHandlerRel.java
  1. … 151 more files in changeset.
DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

  1. … 117 more files in changeset.
DRILL-7253: Read Hive struct w/o nulls

  1. … 17 more files in changeset.
DRILL-7238: Fixed ConvertCountToDirectScan to handle non-existent columns

closes #1781

  1. … 1 more file in changeset.
DRILL-7183: TPCDS query 10, 35, 69 take longer with sf 1000 when Statistics are disabled. This commit reverts the changes done for DRILL-6997.

  1. … 4 more files in changeset.
DRILL-7098: File Metadata Metastore Plugin closes #1754

  1. … 58 more files in changeset.
DRILL-7166: Count query with wildcard should skip reading of metadata summary file

  1. … 1 more file in changeset.
DRILL-7064: Leverage the summary metadata for plain COUNT aggregates.

Add unit test

Modify MetadataDirectGroupScan to track summary file information and use in unit test.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata_V4.java

Fix NPE for DrillTable to account for non-eligible tables.

Fix bug with direct scan after directory pruning. Add unit test.

Address review comments.

closes #1736

    • -0
    • +305
    ./ConvertCountToDirectScanRule.java
  1. … 8 more files in changeset.
DRILL-6965: Implement schema table function parameter

1. Added common schema table function parameter with can be used as single unit or with format plugin table function parameters.

2. Allowed creating schema without columns, in case if user needs only to indicate table properties.

3. Added unit tests.

closes #1777

  1. … 30 more files in changeset.
DRILL-7089: Implement caching for TableMetadataProvider at query level and adapt statistics to use Drill metastore API

closes #1728

  1. … 48 more files in changeset.
DRILL-7095: Expose table schema (TupleMetadata) to physical operator (EasySubScan)

1. Add system / session option store.table.use_schema_file to control if file schema can be used during query execution. False by default.

2. Added methods in StoragePlugin interface which allow to create Group Scan with provided table schema.

3. EasyGroupScan and EasySubScan now contain table schema, also they are able to serialize / deserialize it along with other scan properties.

4. DrillTable which is the main entry point for schema provisioning, has method to store schema and later uses it to create physical scan.

5. WorkspaceSchema when returning Drill table instance will get table schema from table root if available and if store.table.use_schema_file is set to true.

This PR is the next step for Schema Provisioning project which currently exposes schema only for text reader.

closes #1696

  1. … 15 more files in changeset.
DRILL-6852: Adapt current Parquet Metadata cache implementation to use Drill Metastore API

Co-authored-by: Volodymyr Vysotskyi <vvovyk@gmail.com>

Co-authored-by: Vitalii Diravka <vitalii@apache.org>

close apache/drill#1646

  1. … 65 more files in changeset.
DRILL-7072: Query with semi join fails for JDBC storage plugin closes #1674

  1. … 1 more file in changeset.
DRILL-7200: Update Calcite to 1.19.0 / 1.20.0

    • -2
    • +2
    ./DrillProjectPushIntoLateralJoinRule.java
  1. … 36 more files in changeset.
DRILL-5603: Replace String file paths to Hadoop Path - replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality

closes #1657

  1. … 83 more files in changeset.
DRILL-7038: Queries on partitioned columns scan the entire datasets

- Added new optimizer rule which checks if query references directory columns only and has DISTINCT or GROUP BY operation. If the condition holds, instead of scanning full file set the following will be performed:

1) if there is cache metadata file, these directories will be read from it,

2) otherwise directories will be gathered from selection object (PartitionLocation).

In the end Scan node will be transformed to DrillValuesRel (containing constant literals) with gathered values so no scan will be performed.

closes #1640

  1. … 6 more files in changeset.
DRILL-7019: Add check for redundant imports

close apache/drill#1629

  1. … 22 more files in changeset.
DRILL-6910: Allow applying DrillPushProjectIntoScanRule at physical phase

closes #1619

    • -17
    • +94
    ./DrillPushProjectIntoScanRule.java
  1. … 2 more files in changeset.
DRILL-6997: Semijoin is changing the join ordering for some tpcds queries.

close apache/drill#1620

    • -0
    • +183
    ./DrillSemiJoinRule.java
  1. … 5 more files in changeset.
DRILL-6959: Fix loss of precision when casting time and timestamp literals in filter condition closes #1607

  1. … 1 more file in changeset.