Clone Tools
  • last updated 24 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7278: Refactor result set loader projection mechanism

Drill 1.16 added a enhanced scan framework based on the row set

mechanisms, and a "provisioned schema" feature build on top

of that framework. Conversion of the log reader plugin to use

the framework identified additional features we wish to add,

such as marking a column as "special" (not expanded in a wildcard

query.)

This work identified that the code added for provisioned schemas in

Drill 1.16 worked, but is a bit overly complex, making it hard to add

the desired new feature.

This patch refactors the "reader" projection code:

* Create a "projection set" mechanism that the reader can query to ask,

"the caller just added a column. Should it be projected or not?"

* Unifies the type conversion mechanism added as part of provisioned

schemas.

* Added the "special column" property for both "reader" and "provided"

schemas.

* Verified that provisioned schemas work with maps (at least on the scan

framework side.)

* Replaced the previous "schema transformer" mechanism with a new "type

conversion" mechanism that unifies type conversion, provided schemas

and an optional custom type conversion mechanism.

* Column writers can report if they are projected. Moved this query

from metadata to the column writer itself.

* Extended and clarified documentation of the feature.

* Revised and/or added unit tests.

closes #1797

  1. … 58 more files in changeset.
DRILL-7257: Set nullable var-width vector lastSet value

Turns out this is due to a subtle issue with variable-width nullable

vectors. Such vectors have a lastSet attribute in the Mutator class.

When using "transfer pairs" to copy values, the code somehow decides

to zero-fill from the lastSet value to the record count. The row set

framework did not set this value, meaning that the RemovingRecordBatch

zero-filled the dir0 column when it chose to use transfer pairs rather

than copying values. The use of transfer pairs occurs when all rows in

a batch pass the filter prior to the removing record batch.

Modified the nullable vector writer to properly set the lastSet value at

the end of each batch. Added a unit test to verify the value is set

correctly.

Includes a bit of code clean-up.

  1. … 8 more files in changeset.
DRILL-7181: Improve V3 text reader (row set) error messages

Adds an error context to the User Error mechanism. The context allows

information to be passed through an intermediate layer and applied when

errors are raised in lower-level code; without the need for that

low-level code to know the details of the error context information.

Modifies the scan framework and V3 text plugin to use the framework to

improve error messages.

Refines how the `columns` column can be used with the text reader. If

headers are used, then `columns` is just another column. An error is

raised, however, if `columns[x]` is used when headers are enabled.

Added another builder abstraction where a constructor argument list

became too long.

Added the drill file system and split to the file schema negotiator

to simplify reader construction.

Added additional unit tests to fully define the `columns` column

behavior.

  1. … 22 more files in changeset.
DRILL-7250: Query with CTE fails when its name matches to the table name without access

  1. … 3 more files in changeset.
DRILL-7251: Read Hive array w/o nulls

1. HiveFieldConverter replaced by Hive writers for primitives

2. Created HiveValueWriterFactory and HiveListWriter to implement arrays support

4. Readers generation replaced by HiveDefaultRecordReader and HiveTextRecordReader

5. Few reader initializers replaced by one

6. Added method to repeated vardecimal writer

7. Minor fix for array column in View

  1. … 53 more files in changeset.
DRILL-7196: Queries are still runnable on disabled plugins

- Storage client is not created anymore for disabled plugins

- GET "/storage/{name}.json" endpoint now working with

plugin configuration directly, without client instantination.

It have increased UI responsitivity.

- Hbase and mongo base test classes refactored to honor enabled

plugin attribute

- Fixed path contructor for mongo test datasets:

Now it is cross-platform

- Fixed test json files format which using plugin definitions

- Code cleanup

    • -34
    • +35
    ./resources/agg/hashagg/q7_1.json
    • -34
    • +35
    ./resources/agg/hashagg/q7_2.json
    • -33
    • +34
    ./resources/agg/hashagg/q7_3.json
    • -47
    • +48
    ./resources/agg/hashagg/q8_1.json
    • -33
    • +31
    ./resources/common/test_hashtable1.json
    • -35
    • +32
    ./resources/decimal/cast_decimal_int.json
    • -54
    • +52
    ./resources/decimal/cast_decimal_vardecimal.json
    • -35
    • +32
    ./resources/decimal/cast_int_decimal.json
    • -42
    • +35
    ./resources/decimal/cast_simple_decimal.json
    • -38
    • +35
    ./resources/decimal/cast_vardecimal_decimal.json
  1. … 92 more files in changeset.
DRILL-7242: Handle additional boundary cases and compute better estimates when popular values span multiple buckets.

Address review comments.

close apache/drill#1785

  1. … 3 more files in changeset.
DRILL-7237: Fix single_value aggregate function for variable length types

- Add implementations of single_value for complex data types

closes #1782

    • -3
    • +2
    ./java/org/apache/drill/PlanTestBase.java
    • -0
    • +15
    ./java/org/apache/drill/TestExampleQueries.java
  1. … 10 more files in changeset.
DRILL-4782 / DRILL-7139: Fix DATE_ADD and TO_TIME functions

- cast function for the day interval changed to round milliseconds to complete days

- ToDateTypeFunctions#toTime now returning milliseconds of day

- updated the way how DayInterval subtracts and adds, to follow the cast function logic

UT core updates:

- added vectorValue function to the queryBuilder to simplify retrieving value of the vector

- refactored singleton query result functions at queryBuilder

    • -43
    • +111
    ./java/org/apache/drill/test/QueryBuilder.java
  1. … 3 more files in changeset.
DRILL-7238: Fixed ConvertCountToDirectScan to handle non-existent columns

closes #1781

  1. … 1 more file in changeset.
DRILL-7227: Fix predicate check in DrillRelOptUtil.analyzeSimpleEquiJoin

closes #1775

  1. … 2 more files in changeset.
DRILL-7225: Fixed merging ColumnTypeInfo for files with different schemas closes #1773

  1. … 1 more file in changeset.
DRILL-7050: RexNode convert exception in sub-query closes #1770

    • -1
    • +24
    ./java/org/apache/drill/TestCorrelation.java
    • -0
    • +19
    ./java/org/apache/drill/exec/sql/TestCTTAS.java
  1. … 2 more files in changeset.
DRILL-7187: Improve selectivity estimation of BETWEEN predicates and arbitrary combination of range predicates.

Address review comments.

Modify unit test expected rowcount after rebasing.

close apache/drill#1772

  1. … 5 more files in changeset.
DRILL-7164: KafkaFilterPushdownTest is sometimes failing to pattern match correctly

closes #1760

    • -16
    • +32
    ./java/org/apache/drill/PlanTestBase.java
  1. … 1 more file in changeset.
DRILL-7183: TPCDS query 10, 35, 69 take longer with sf 1000 when Statistics are disabled. This commit reverts the changes done for DRILL-6997.

  1. … 5 more files in changeset.
DRILL-6988. Utility of the too long error message when syntax error

- Adding Drill wrapper around SqlparseException to customize produced by Calcite messages

- Fix Drill SQL parse exception formatter to calculate proper position for "^" character

closes #1753

  1. … 3 more files in changeset.
DRILL-6974: SET option command modification

- ALTER ... RESET ... and ALTER ... SET ... sub-parsers separated to 2

different SqlCall classes with same parent SqlSetOption

- parserImpls modified to handle new syntax of ALTER... SET...

expresion:

a) ALTER ... SET option.name - option.value - setting option value

b) ALTER ... SET option.name - display option value

- Handler for SqlSetOption separated to SetOptionHandler and

ResetOptionhandler for better representation of handled statements

- Base abstract class AbstractSqlSetHandler created to not repeat

shared implementation of same functions

- SetOptionHandler covered with unit tests for each statement

form.

Fix issues stated in the review

closes #1763

  1. … 9 more files in changeset.
DRILL-7167: Implemented DESCRIBE TABLE statement

- altered parser implementation to honor DESCRIBE TABLE syntax

- extended test coverage to check the new statement

closes #1747

  1. … 1 more file in changeset.
DRILL-7171: Create metadata directories cache file in the leaf level directories to support ConvertCountToDirectScan optimization. closes #1748

  1. … 1 more file in changeset.
DRILL-7166: Count query with wildcard should skip reading of metadata summary file

  1. … 1 more file in changeset.
DRILL-7159: Fix typeString method to return correct name for MAP (aka STRUCT) closes #1741

  1. … 2 more files in changeset.
DRILL-7049: REST API returns the toString of byte arrays (VARBINARY types)

closes #1739

  1. … 1 more file in changeset.
DRILL-7152: During histogram creation handle the case when all values of a column are NULLs.

close apache/drill#1730

  1. … 1 more file in changeset.
DRILL-7045: Updates to address review comments

closes #7134

DRILL-7146: Query failing with NPE when ZK queue is enabled.

  1. … 1 more file in changeset.
DRILL-7143: Support default value for empty columns

Modifies the prior work to add default values for columns. The prior work added defaults

when the entire column is missing from a reader (the old Nullable Int column). The Row

Set mechanism now will also "fill empty" slots with the default value.

Added default support for the column writers. The writers automatically obtain the

default value from the column schema. The default can also be set explicitly on

the column writer.

Updated the null column mechanism to use this feature rather than the ad-hoc

implemention in the prior commit.

Semantics changed a bit. Only Required columns take a default. The default value

is ignored or nullable columns since nullable columns already have a file default: NULL.

Other changes:

* Updated the CSV-with-schema tests to illustrate the new behavior.

* Made multiple fixes for Boolean and Decimal columns and added unit tests.

* Upgraded Fremarker to version 2.3.28 to allow use of the continue statement.

* Reimplemented the Bit column reader and writer to use the BitVector directly since this vector is rather special.

* Added get/set Boolean methods for column accessors

* Moved the BooleanType class to the common package

* Added more CSV unit tests to explore decimal types, booleans, and defaults

* Add special handling for blank fields in from-string conversions

* Added options to the conversion factory to specify blank-handling behavior.

CSV uses a mapping of blanks to null (nullable) or default value (non-nullable)

closes #1726

    • -1
    • +1
    ./java/org/apache/drill/test/ClusterTest.java
    • -0
    • +182
    ./java/org/apache/drill/test/rowSet/test/TestDummyWriter.java
  1. … 58 more files in changeset.
DRILL-7140: RM: Drillbits fail with "No enum constant org.apache.drill.exec.resourcemgr.config.selectionpolicy.QueueSelectionPolicy.SelectionPolicy.bestfit"

closes #1720

  1. … 1 more file in changeset.
DRILL-7138: Implement command to describe schema for table

closes #1719

    • -0
    • +112
    ./java/org/apache/drill/TestSchemaCommands.java
  1. … 6 more files in changeset.
DRILL-7062: Initial implementation of run-time rowgroup pruning closes #1738

  1. … 23 more files in changeset.