Clone
 

aman sinha <asinha@maprtech.com> in drill

DRILL-7242: Handle additional boundary cases and compute better estimates when popular values span multiple buckets.

Address review comments.

close apache/drill#1785

DRILL-7228: Upgrade to a newer version of t-digest to address inaccuracies in histogram buckets. closes #1774

DRILL-7187: Improve selectivity estimation of BETWEEN predicates and arbitrary combination of range predicates.

Address review comments.

Modify unit test expected rowcount after rebasing.

close apache/drill#1772

DRILL-7152: During histogram creation handle the case when all values of a column are NULLs.

close apache/drill#1730

DRILL-7064: Leverage the summary metadata for plain COUNT aggregates.

Add unit test

Modify MetadataDirectGroupScan to track summary file information and use in unit test.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata_V4.java

Fix NPE for DrillTable to account for non-eligible tables.

Fix bug with direct scan after directory pruning. Add unit test.

Address review comments.

closes #1736

DRILL-7119: Compute range predicate selectivity using histograms.

Address code review comments. Add unit test for histogram usage.

close apache/drill#1733

DRILL-7113: Fix creation of filter conditions for IS NULL and IS NOT NULL for MapR-DB format plugin

close apache/drill#1704

DRILL-7117: Support creation of equi-depth histogram for selected data types.

Support int/bigint/float4/float8, time/timestamp/date and boolean.

Build the histogram from the t-digest byte array and serialize as JSON string.

More changes for serialization/deserialization.

Add code-gen stubs (empty) for VarChar/VarBinary types.

Address review comments (part 1). Add unit test.

Address review comments (part 2) for sampling.

close apache/drill#1715

DRILL-6818: Add descriptions to secondary index options.

closes #1545

DRILL-6381: Address code review comments (part 3).

DRILL-6381: Add missing joinControl logic for INTERSECT_DISTINCT.

- Modified HashJoin's probe phase to process INTERSECT_DISTINCT.

- NOTE: For build phase, the functionality will be same as for SemiJoin when it is added later.

DRILL-6381: Address code review comment for intersect_distinct.

DRILL-6381: Rebase on latest master and fix compilation issues.

DRILL-6381: Generate protobuf files for C++ native client.

DRILL-6381: Use shaded Guava classes. Add more comments and Javadoc.

  1. … 20 more files in changeset.
DRILL-6381: Address review comments (part 2): fix formatting issues and add javadoc.

  1. … 15 more files in changeset.
DRILL-6381: Address code review comments.

DRILL-6381: (Part 5) Update Javadoc for a few interfaces.

DRILL-5086: Fix conversion of min and max values to appropriate data type.

close apache/drill#674

DRILL-4877: If pruning was not applicable only keep the selectionRoot in the entries field.

DRILL-4857: Maintain pruning status and populate ParquetGroupScan's entries field with only the selection root if no partition pruning was done.

close apache/drill#575

DRILL-4833: Insert exchanges on the inputs of union-all such that the parent and children can be independently parallelized.

Add planner option to enable/disable distribution for union-all.

close apache/drill#566

DRILL-4846: Fix a few performance issues for metadata access:

- Create a MetadataContext that can be shared among multiple invocations of the Metadata APIs.

- Check directory modification time only if not previously checked.

- Remove a redundant call for metadata read.

- Added more logging.

- Consolidate couple of metadata methods.

close apache/drill#569

DRILL-4786: Read the metadata cache file from the least common ancestor directory when multiple partitions are selected.

close apache/drill#553

DRILL-4794: Fix a premature exit of the outer loop during pruning.

close apache/drill#550

Update version to 1.8.0-SNAPSHOT.

    • -1
    • +1
    /contrib/data/tpch-sample-data/pom.xml
  1. … 12 more files in changeset.
[maven-release-plugin] prepare release drill-1.7.0

    • -1
    • +1
    /contrib/data/tpch-sample-data/pom.xml
  1. … 12 more files in changeset.
Added Aman's GPG key.

Added Aman's GPG key.

DRILL-4693: Ensure final column re-ordering is done if any select list expression is convert_fromjson.

close apache/drill#508

DRILL-4679: When convert() functions are present, ensure that ProjectRecordBatch produces a schema even for empty result set.

Add unit tests

Modify doAlloc() to accept record count parameter (addresses review comment)

DRILL-4530: Optimize partition pruning with metadata caching for the single partition case.

- Enhance PruneScanRule to detect single partitions based on referenced dirs in the filter.

- Keep a new status of EXPANDED_PARTIAL for FileSelection.

- Create separate .directories metadata file to prune directories first before files.

- Introduce cacheFileRoot attribute to keep track of the parent directory of the cache file after partition pruning.

Check if prefix components are non-null the very first time single partition info is initialized.

Add separate interface method to create scan using a cacheFileRoot.

Create filenames list with unique names using fileSet if available. Add several unit tests.

Populate only fileSet when expanding using the metadata cache.

Remove cacheFileRoot parameter from FileGroupScan's clone() method and instead leverage it from FileSelection.

Keep track of whether all partitions were previously pruned and process this state where needed.

close apache/drill#519

  1. … 19 more files in changeset.
DRILL-4479: For empty fields under all_text_mode enabled (a) use varchar for the default columns and (b) ensure we create fields corresponding to all columns.

close apache/drill#420

DRILL-4287: During initial DrillTable creation don't read the metadata cache file; instead do it during ParquetGroupScan.

Maintain state in FileSelection to keep track of whether certain operations have been done on that selection.

Remove ParquetFileSelection since its only purpose was to carry the metadata cache information which is not needed anymore.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java

exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFileSelection.java

Resolve issues after rebasing:

1) JsonIgnore fileSelection in ParquetGroupScan

2) FileSysemPartitionDescriptor change.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java

DRILL-4287: Address code review comments and follow-up changes after rebasing:

- In FileSelection: updated call to the Stopwatch, set all flags appropriately in minusDirectories(), modify supportDirPruning()

- In ParquetGroupScan: Simplify directory checking in constructor, set the parquetTableMetadata field after reading metadata cache.

- Fix unit tests to use an alias for the reserved dir<N> columns as partition-by columns.

More follow-up changes:

- Get rid of fileSelection attribute in ParquetGroupScan

- Initialize entries after expanding the selection when metadata cache is used

- For non-metadata cache, don't do any expansion in the constructor; let init() handle it

- In FileSystemPartitionDescriptor, the createPartitionSublists is modified to check for parquet scan

When reading from metadata cache , ensure selection root does not contain the scheme and authority prefix. Minor refactoring.

Address code review comments and fix a bug. Simplify FileSelection state management based on review comment.

close apache/drill#376

DRILL-4147: Change union-all's output distribution trait to ANY.

Additional unit tests.

Address review comments.

close apache/drill#555