drill

Clone Tools
  • last updated 14 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-2180: Star column is enabled to work along with complex expression

    • -0
    • +26
    /exec/java-exec/src/test/resources/store/text/sample.json
DRILL-2466: Fix "<a>.VARCHAR -> <b>.NVARCHAR" to "-> <b>.VARCHAR" (Types.h).

- Fixed mapping from TypeProtos.MinorType.VARCHAR to java.sql.Types.NVARCHAR

to be to java.sql.Types.VARCHAR.

- Also renamed getSqlType to getJdbcType, getSqlTypeName to getSqlTypeName.

DRILL-2453: Handle the case where incoming has no schema in PartitionSender.

DRILL-2311: In ProjectRecordBatch, even if a column from incoming recordbatch does not need to be classified, the output name for this column is still ensured to be unique

DRILL-2730: Use different paths for ExternalSort spills

DRILL-2467: Fix "datatype" to "datetype" for test Hive DATE column.

DRILL-1957: Support nested loop join planning in order to enable NOT-IN, Inequality, Cartesian, uncorrelated EXISTS planning.

Add support for nested loop join planning where right input is scalar and is broadcast.

Add check for scalar subquery for NLJ. Add support for creating a Filter-NLJ plan.

Rebase on the branch with Jinfeng's Calcite rebasing work.

Conflicts:

exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java

Add unit tests for NLJoin.

Added test for inequality join.

Tests with BroadcastExchange, with HJ/MJ disabled.

Fix filter push down for NL joins by modifying row count computation for joins with always true conditions. Rebase on master. Refactor unit tests.

Improved checking of preconditions for NL join.

Handle the case where scalar aggregate is a child of Filter.

DRILL-1957: Support nested loop join planning in order to enable NOT-IN, Inequality, EXISTS planning.

Better checks for cartesian and inequality joins. Rebase on latest master.

Refactor costing for logical join. Add tests. Enable more TPC-H tests.

Remove the check for cartesian join from DrillJoinRel constructor.

Clear left and right keys before calling splitJoinCondition.

Address review comments: Remove redundant call to getJoinCategory. Added comment in DrillRuleSet.

  1. … 5 more files in changeset.
DRILL-2106: Fix SplitUpComplexExpression rule to correctly detect last used column reference in the project expression

    • -0
    • +12
    /exec/java-exec/src/test/resources/flatten/drill-2106-result.json
DRILL-3200: Add Window functions: ROW_NUMBER, RANK, PERCENT_RANK, DENSE_RANK and CUME_DIST

- enum WindowFrameRecordBatch.WindowFunction to handle supported window function and their corresponding output MajorType

- renamed WindowFrameTemplate -> DefaultFrameTemplate, cleaned the template to handle the default frame efficiently:

. a batch can be processed as soon as we find the last peer row of it's last row

. once a batch is processed it can be safely released => we can transfer it's value vectors to the container instead of copying them

- DefaultFrameTemplate.Partition tracks the current window frame and computes the following window functions automatically: row_number, rank, dense_rank, percent_rank, cume_dist. It doesn't need to aggregate the value vectors to compute these window functions

- updated TestWindowFrame to check the results of row_number, rank, dense_rank, percent_rank and cume_dist in various cases

. added a debug config option to MSorter to control the size of batches. This is needed by TestWindowFrame so it can use small test data files (20 rows per batch)

. removed contrib/data/window-test-data

- WindowFrameRecordBatch properly releases saved batches if the query stops prematurely

- GenerateTestData can be used to generate test data for the window function unit tests [it's a work in progress and can be either improved to make it developer friendly or removed from the final patch]

- using newly created WindowDataBatch in place of RecordDataBatch, to expose FragmentContext and VectorAccessible (fixes DRILL-3218)

- window.enable is true by default

    • -64
    • +0
    /contrib/data/window-test-data/pom.xml
  1. … 32 more files in changeset.
DRILL-2448: Enable standard implicit cast between Varchar and Varbinary rather than outdated special case in softEquals.

This is necessary to allow the interpreted expression system to evaluate these functions in the same manner as the code-generation based expression evaluation system does today.

DRILL-2446: Improvement in finding Drill log dir

DRILL-2446: Improvement in finding Drill log dir

DRILL-2441: For outer-join, if there is any inequality condition, Cartesian-Join exception will be thrown

DRILL-2128.1: Preparatory changes: Labeled result cols.; formatted SQL. [MetaImpl]

DRILL-2128.2: Fixed DatabaseMetaData.getColumns's DATA_TYPE, TYPE_NAME.

- Created basic test for DATA_TYPE and TYPE_NAME.

- Fixed DATA_TYPE: Added mapping from type name/descriptor strings from

INFORMATION_SCHEMA.COLUMNS.DATA_TYPE to java.sql.Types.* integer type codes

for DatabaseMetaData.getColumns's DATA_TYPE.

- Fixed TYPE_NAME: Added TYPE_NAME returning type name/descriptor strings from

INFORMATION_SCHEMA.COLUMNS.DATA_TYPE

- Added FIXMEs for some missing/misnamed/wrong fields. (See DRILL-2420.)

DRILL-2397, new data types doc, misc other fixes

    • -16
    • +23
    /_docs/connect/002-plugin-conf.md
    • -20
    • +11
    /_docs/data-sources/001-hive-types.md
    • -6
    • +6
    /_docs/data-sources/003-parquet-ref.md
    • binary
    /_docs/img/connect-plugin.png
    • -60
    • +103
    /_docs/sql-ref/001-data-types.md
    • -12
    • +16
    /_docs/sql-ref/002-lexical-structure.md
    • -0
    • +77
    /_docs/sql-ref/data-types/002-disparate-data-types.md
DRILL-2225: Fix missing PartitionSenderRootExec stats.

DRILL-2413: FileSystemPlugin refactoring: avoid sharing DrillFileSystem across schemas

  1. … 19 more files in changeset.
DRILL-2414: Give proper error message if Union-All is applied on schema-less tables

DRILL-1833: Avoid storing view names in PStore cache

...always rely on view files in schema location for listing views.

DRILL-2060: Constant folding rule

2060 update - Constant folding work completed.

Fix issue with date, time and timestamp literal creation.

Fix literal creation during expression interpretation to match nullability of incoming expression.

Fix decimal literals in interpreted expression eval.

Disable test with an exposed planning bug when the project instance of the constant folding rule is enabled. The rule is not actually influencing the final plan when the rule is firing and making expression reductions. This is due to our current cost model fro project which just counts the number of expressions and does not consider expression complexity. The issues have been logged in DRILL-2218 for further investigation, they do not need to be solved to merge the other constant folding rules and all of the interpreted expression work that has been done.

Get rid of clutter in RuleSets, explanation has been moved to the 2218 JIRA.

Belongs with 2060, fix constant expression executor to use the new constant expression interpreter interface that returns a ValueHolder instead a ValueVector with a single value filled in.

2060 update - change test baseline due to new column ordering (no functional or performance impacting changes to plan)

2060 - address Aman's comments.

add test ignore - DRILL-2218

Baseline update for project pushdown test (only column ordering on a scan, no functional or performance impacting plan changes)

Turn back on project instance.

Small casting bug in constant executor.

Don't fold hive UDFs.

Modify DrillBuf to allow a BufferManager to be the owning context for a DrilllBuf.

TODO - refactor to remove remaining common code from OperatorContext and FragmentContext,

have them both use the new BufferManager.

Add system option for disabling constant folding.

2060 update - test option to disable constant folding.

Update RuleSets to actually allow turning the constant folding rules on and off as well as establish general pattern for turning logical rules on an off, similar to how some physical rules can be already.

Change the estimated row count in EasyGroupScan to report a number of files in the case where the file size indicates an estimated total count of 0 records. Allows very small files to be pruned.

Fix folding expressions that result in null after refactoring the interpreted expression evaluation to return a ValueHolder in the case of a constant expression. Previously a value vector was returned in the same manner as the interpreter can still do when given an input VectorAccessible and an expression that may contain fild references. Calling getObject on the output vector previously gracefully handled nulls as they were passed into the Calciate API to create literals. This process has to be a bit more manual now.

Address Jinfeng's review comments.

A few more review comments.

Disable cost calculation change, complete fix will come in 2553.

Throw a runtime exception of there is an error materializing the expression, as the same materialization will take place at query execution time we should fail early.

Add a test that does prune appropriately, still have a test for the outstanding issue tracked in DRILL-2553.

Small fix for test to properly set session option and set it back after completion.

Fixing comment that was garbled somehow.

small fix for case where expression returns a null result during constant folding.

Add a little defensive code to give a good error message if a type that does not appear in the mapping from Drill to Calcite types attempts to be folded into a null value.

  1. … 7 more files in changeset.
DRILL-2406: part 2 - Allow interpreted expression evaluation at planning time.

Changes needed after rebase to expose function determinism to calcite appropriately.

Address Jacques review comments.

Address chris' review comments.

Make things work now that BufferManager is AutoClosable.

Fixes tests that were creating plan fragments directly to create their own query start time,

as this information is now passed along from QueryContext during standard query initialization

(this enables the query start time and timezone to be available to planning time expression

evaluation).

Fix docs in BufferManger.

Update UDF interface to track determinism rather than randomness.

DRILL-412: FoodMart data (account.json) cause JsonParseException

DRILL-367: FoodMart data (category.json) packaged with Drill does not conform with JSON specification

Obsolete Pentaho repo

DRILL-2695: Add Support for large in conditions through the use of the Values operator. Update JSON reader to support reading Extended JSON. Update JSON writer to support writing extended JSON data. Update JSON reader to automatically unwrap a file that includes a single top-level array (used by values). Update Options manager to use getOption(<Type>Validator) to directly retrieve typed value. Remove JSON rewinding Add support for CONVERT_TO( [], 'SIMPLEJSON') to disable extended types as part of udf use.

  1. … 51 more files in changeset.
DRILL-2275: Added support to get information about current cluster memory and threads

+ SystemRecordReader reads a SystemRecord e.g. MemoryRecord

+ Added generic data type for static tables

+ GroupScan can enforce width to be maximum width on ExcessiveExchangeRemover

+ GroupScan has minimum width for SimpleParallelizer

  1. … 8 more files in changeset.
DRILL-2010: MergeJoin: Store/restore the right batch state when existing join loop due to output batch full.

DRILL-2381 lexical structure plus fixes

    • -59
    • +41
    /_docs/data-sources/003-parquet-ref.md
    • -209
    • +201
    /_docs/data-sources/004-json-ref.md
    • -0
    • +141
    /_docs/sql-ref/002-lexical-structure.md
    • -0
    • +70
    /_docs/sql-ref/003-operators.md
    • -0
    • +186
    /_docs/sql-ref/004-functions.md
    • -10
    • +0
    /_docs/sql-ref/004-nest-functions.md
    • -0
    • +10
    /_docs/sql-ref/005-nest-functions.md
    • -0
    • +9
    /_docs/sql-ref/006-cmd-summary.md
    • -0
    • +16
    /_docs/sql-ref/007-reserved-wds.md
DRILL-2358: Ensure DrillScanRel differentiates skip-all, scan-all & scan-some in a backward compatible fashion

DRILL-2220: Complex reader unable to read FIXED_LEN_BYTE_ARRAY types in parquet file

DRILL-2402: Update hash functions to use seed strategy as opposed to xor strategy.

Also: Simplify and consolidate expression materialization.