Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7620: Fix plugin mutability issues

A recent commit made the plugin registry more strict about

the rule that, once a plugin is registered, it must be

immutable. A flaw enforcing that rule in the UI put the

registry in an inconsistent state.

Also

* Registry-specific errors

* Push more operations from UI layer into registry

* Clean up semantics of "resolve" for plugins

* Add more unit tests

* Better handling of "bad" plugins

* Force plugin names to lower case

* Fix comparison bugs in some format plugins

  1. … 101 more files in changeset.
DRILL-7590: Refactor plugin registry

Major cleanup of the plugin registry to split it into components

in preparation for a proper plugin API.

Better coordinates the named and ephemeral plugin caches.

Cleans up the registry API. Sharpens rules for modifying

plugin configs.

closes #1988

  1. … 158 more files in changeset.
DRILL-7467: Jdbc plugin enhancements and fixes

1. Added logic to close data source when plugin is closed.

2. Added disabled jdbc plugin template to the bootstrap storage plugins.

3. Added new jdbc storage plugin configuration property sourceParameters which would allow setting data source parameters described in BasicDataSource Configuration Parameters.

4. Upgraded commons-dbcp2 version and added it to the dependency management section in common pom.xml.

closes #1956

  1. … 17 more files in changeset.
DRILL-7454: Convert Avro to EVF

1. Replaced old format implementation with EVF.

2. Updated, added and improved performance for Avro tests.

3. Code refactoring.

closes #1951

  1. … 31 more files in changeset.
DRILL-7450: Improve performance for ANALYZE command

- Implement two-phase aggregation for the lowest metadata aggregate to optimize performance

- Allow using complex functions with hash aggregate

- Use hash aggregation for PHASE_1of2 for ANALYZE to reduce memory usage and avoid sorting non-aggregated data

- Add sort above hash aggregation to fix correctness of merge exchange and stream aggregate

closes #1907

  1. … 59 more files in changeset.
DRILL-7273: Introduce operators for handling metadata

closes #1886

    • -0
    • +38
    ./data/MetadataAggregate.java
    • -0
    • +50
    ./data/MetadataController.java
    • -0
    • +38
    ./data/MetadataHandler.java
  1. … 154 more files in changeset.
DRILL-1328: Support table statistics

  1. … 50 more files in changeset.
DRILL-6798: Planner changes to support semi-join.

    • -0
    • +52
    ./data/LogicalSemiJoin.java
  1. … 19 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 970 more files in changeset.
DRILL-6494: Drill Plugins Handler

- Storage Plugins Handler service is used op the Drill start-up stage and it updates storage plugins configs from

storage-plugins-override.conf file. If plugins configs are present in the persistence store - they are updated,

otherwise bootstrap plugins are updated and the result configs are loaded to persistence store. If the enabled

status is absent in the storage-plugins-override.conf file, the last plugin config enabled status persists.

- 'drill.exec.storage.action_on_plugins_override_file' Boot option is added. This is the action, which should be

performed on the storage-plugins-override.conf file after successful updating storage plugins configs.

Possible values are: "none" (default), "rename" and "remove".

- The "NULL" issue with updating Hive plugin config by REST is solved. But clients are still being instantiated for disabled

plugins - DRILL-6412.

- "org.honton.chas.hocon:jackson-dataformat-hocon" library is added for the proper deserializing HOCON conf file

- additional refactoring: "com.typesafe:config" and "org.apache.commons:commons-lang3" are placed into DependencyManagement

block with proper versions; correct properties for metrics in "drill-override-example.conf" are specified

closes #1345

  1. … 34 more files in changeset.
DRILL-6424: Updating FasterXML Jackson libraries

closes #1274

  1. … 4 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 228 more files in changeset.
DRILL-6389: Fixed building javadocs - Added documentation about how to build javadocs - Fixed some of the javadoc warnings

closes #1276

  1. … 65 more files in changeset.
DRILL-6321: Lateral Join and Unnest - rules, options, logical plan supports

Included changes:

* Add planner.enable_unnest_lateral option. Default value set to false.

* Enable FilterCorrectRule

* Add support to logical plan

* Fix rebase errors for DRILL-6321 commits

  1. … 17 more files in changeset.
DRILL-6422: Update guava to 23.0 and shade it

- Fix compilation errors for new version of Guava.

- Remove usage of deprecated API

- Shade guava and add dependencies to the shaded version

- Ban unshaded package

- Introduce drill-shaded module and move guava-shaded under it

- Add methods to convert shaded guava lists to the unshaded ones

- Add instruction for publishing artifacts to the Apache repository

  1. … 82 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2052 more files in changeset.
DRILL-6321: Lateral Join and Unnest - initial implementation for parser and planning

    • -0
    • +60
    ./data/LateralJoin.java
  1. … 23 more files in changeset.
DRILL-6381: (Part 3) Planner and Execution implementation to support Secondary Indexes

  1. Index Planning Rules and Plan generators

    - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.

- DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.

    - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.

    - Support planning with functional indexes such as CAST functions.

    - Enhance PlannerSettings with several configuration options for indexes.

  2. Index Selection and Statistics

    - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.

    - Costing of index intersection for comparison with single-index plans.

  3. Planning and execution operators

    - Support RangePartitioning physical operator during query planning and execution.

    - Support RowKeyJoin physical operator during query planning and execution.

    - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.

    - Enhance Materializer to keep track of subscan association with a particular rowkey join.

  4. Index Planning utilities

    - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.

    - Utility class to analyze filter condition and an input collation to determine output collation.

    - Helper classes to maintain index contexts for logical and physical planning phase.

    - IndexPlanUtils utility class for various helper methods.

  5. Miscellaneous

    - Separate physical rel for DirectScan.

    - Modify LimitExchangeTranspose rule to handle SingleMergeExchange.

- MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema

Co-authored-by: Aman Sinha <asinha@maprtech.com>

Co-authored-by: chunhui-shi <cshi@maprtech.com>

Co-authored-by: Gautam Parai <gparai@maprtech.com>

Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>

Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java

exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java

exec/java-exec/src/main/resources/drill-module.conf

logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java

Resolve merge comflicts and compilation issues.

  1. … 93 more files in changeset.
DRILL-6049: Misc. hygiene and code cleanup changes

close apache/drill#1085

  1. … 123 more files in changeset.
DRILL-5325: Unit tests for the managed sort

Uses the sub-operator test framework (DRILL-5318), including the test

row set abstraction (DRILL-5323) to enable unit testing of the

“managed” external sort. This PR allows early review of the code, but

cannot be pulled until the dependencies (mentioned above) are pulled.

Refactors the external sort code into small chunks that can be unit

tested, then “wraps” that code in tests for all interesting data types,

record batch sizes, and so on.

Refactors some of the operator definitions to more easily allow

programmatic setup in the unit tests.

Fixes a number of bugs discovered by the unit tests. The biggest

changes were in the new code: the code that computes spilling and

merging based on memory levels.

Otherwise, although GitHub will show many files change, most of the

changes are simply moving blocks of code around to create smaller units

that can be tested independently.

Includes a refactoring of the code that does spilling, along with a

complete set of low-level unit tests.

Excludes long-running sort tests.

Defines a test category for long-running tests.

First attempt to provide a way to run such tests from Maven.

closes #808

  1. … 50 more files in changeset.
DRILL-5104: Foreman should not set external sort memory for a physical plan

Physical plans include a plan for memory allocations. However, the code

path in Foreman replans external sort memory, even for a physical plan.

This makes it impossible to use a physical plan to test memory

configuration.

This change avoids changing memory settings in a physical plan; while

preserving the adjustments for logical plans or SQL queries.

Revised to put a property in the plan itself. Old plans, and those

generated from SQL, will have memory allocations applied. Plans

marked as already "resource management" planned will be used as-is.

Includes a unit test that demonstrates the new behavior.

close apache/drill#703

  1. … 5 more files in changeset.
DRILL-4448: Clean up deserialization of oderings in sorts

Fix sort operator deserialization and validation to respect existing

contract specified in the tests.

  1. … 1 more file in changeset.
DRILL-4445: Standardize the Physical and Logical plan nodes to use Lists instead of arrays for their inputs

Remove some extra translation logic used to move between the

two representations.

TODO - look back the the Join logical node, has two JsonCreator annotations,

but only one will be used. Not sure if the behavior of which is chosen

is considered documented behavior, should just fix it on our end.

  1. … 20 more files in changeset.
DRILL-4327: Fix rawtypes warnings in drill codebase

Fixing most rawtypes warning issues in drill modules.

Closes #347

  1. … 75 more files in changeset.
DRILL-4278: Heap memory leak issues

- Fix issue where WorkspaceConfig was not returning consistent hashCode()s for equal objects.

- Fix issue where we were misusing recycler causing object reference leaks

This closes #331.

  1. … 5 more files in changeset.
DRILL-3987: (CLEANUP) Final cleanups to get complete working build/distribution

- small cleanups

- move Hook to drill-adbc

- update distribution assembly to include new modules

This closes #250

  1. … 31 more files in changeset.
DRILL-3987: (MOVE) Move logical expressions and operators out of common. Move to new drill-logical model.

    • -0
    • +40
    ./FormatPluginConfig.java
    • -0
    • +44
    ./FormatPluginConfigBase.java
    • -0
    • +122
    ./LogicalPlan.java
    • -0
    • +56
    ./LogicalPlanBuilder.java
    • -0
    • +121
    ./PlanProperties.java
    • -0
    • +43
    ./StoragePluginConfig.java
    • -0
    • +37
    ./StoragePluginConfigBase.java
    • -0
    • +35
    ./UnexpectedOperatorType.java
    • -0
    • +43
    ./ValidationError.java
    • -0
    • +46
    ./data/AbstractBuilder.java
    • -0
    • +42
    ./data/AbstractSingleBuilder.java
    • -0
    • +101
    ./data/GroupingAggregate.java
  1. … 199 more files in changeset.
DRILL-1328: Support table statistics - Part 2

Add support for avg row-width and major type statistics.

Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance.

Update/fix rowcount, selectivity and ndv computations to improve plan costing.

Add options for configuring collection/usage of statistics.

Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs).

Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries.

Add support for CPU sampling and nested scalar columns.

Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures.

Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests.

FUNCS: Statistics functions as UDFs:

Separate

Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A.

* custom versions of "count" that always return BigInt

* HyperLogLog based NDV that returns BigInt that works only on VarChars

* HyperLogLog with binary output that only works on VarChars

OPS: Updated protobufs for new ops

OPS: Implemented StatisticsMerge

OPS: Implemented StatisticsUnpivot

ANALYZE: AnalyzeTable functionality

* JavaCC syntax more-or-less copied from LucidDB.

* (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel

ANALYZE: Add getMetadataTable() to AbstractSchema

USAGE: Change field access in QueryWrapper

USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel

* since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor

* This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans.

USAGE: Attach DrillStatsTable to DrillTable.

* DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table

* In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used.

** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated.

** Query is set up to extract only the most recent statistics results for each column.

closes #729

  1. … 143 more files in changeset.