Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7759: Code compilation exception for queries containing (untyped) NULL

  1. … 3 more files in changeset.
DRILL-7739: Allow implicit casts from required to nullable data type

closes #2080

  1. … 2 more files in changeset.
DRILL-7622: Compilation error when using HLL / TDigest with `group by` clause

closes #2009

  1. … 1 more file in changeset.
DRILL-7590: Refactor plugin registry

Major cleanup of the plugin registry to split it into components

in preparation for a proper plugin API.

Better coordinates the named and ephemeral plugin caches.

Cleans up the registry API. Sharpens rules for modifying

plugin configs.

closes #1988

    • -8
    • +10
    ./fn/FunctionImplementationRegistry.java
    • -12
    • +12
    ./fn/registry/FunctionRegistryHolder.java
    • -3
    • +7
    ./fn/registry/LocalFunctionRegistry.java
    • -6
    • +8
    ./fn/registry/RemoteFunctionRegistry.java
  1. … 160 more files in changeset.
DRILL-7634: Rollup of code cleanup changes

Collection of code cleanup changes. The most significant

is to create constants for function names.

closes #2020

    • -17
    • +16
    ./ExpressionTreeMaterializer.java
    • -2
    • +2
    ./fn/FunctionImplementationRegistry.java
    • -24
    • +20
    ./fn/interpreter/InterpreterEvaluator.java
  1. … 116 more files in changeset.
DRILL-7530: Fix class names in loggers

1. Fix incorrect class names for loggers.

2. Minor code cleanup.

closes #1957

  1. … 55 more files in changeset.
DRILL-7506: Simplify code gen error handling

Pushes code gen error handling close to the code gen itself to

allow clearer error messages. Doing so avoids the need to bubble

code gen exceptions up the call stack, resulting in cleaner

operator code.

closes #1948

  1. … 40 more files in changeset.
DRILL-7502: Invalid codegen for typeof() with UNION

Also fixes DRILL-6362: typeof() reports NULL for primitive

columns with a NULL value.

typeof() is meant to return "NULL" if a UNION has a NULL

value, but the column type when known, such as for non-UNION

columns.

Also fixes DRILL-7499: sqltypeof() function with an array returns

"ARRAY", not type. This was due to treating REPEATED like LIST.

Handling of the Union vector in code gen is problematic

with about three special cases. Existing code handled two

of the cases. This change handles the third case.

Figuring out the change required poking around quite a bit

of unclear code. Added comments and restructuring to make

that code a bit more clear.

The fix modified code gen for the Union Holder. It can now

"go back in time" to add the union reader at the point we

need it.

closes #1945

    • -85
    • +117
    ./ExpressionTreeMaterializer.java
  1. … 40 more files in changeset.
DRILL-7473: Parquet reader failed to get field of repeated map

closes #1933

  1. … 5 more files in changeset.
DRILL-7479: Partial fixes for metadata parameterized type issues

See DRILL-7479 and DRILL-7480 for an explanation. Adds generic

type parameters where needed to avoid the need to supporess

warnings. However, type parameters are probably not needed

at all and should be removed in the future for reasons explained

in DRILL-7480.

closes #1923

  1. … 35 more files in changeset.
DRILL-7450: Improve performance for ANALYZE command

- Implement two-phase aggregation for the lowest metadata aggregate to optimize performance

- Allow using complex functions with hash aggregate

- Use hash aggregation for PHASE_1of2 for ANALYZE to reduce memory usage and avoid sorting non-aggregated data

- Add sort above hash aggregation to fix correctness of merge exchange and stream aggregate

closes #1907

    • -12
    • +28
    ./fn/DrillComplexWriterAggFuncHolder.java
  1. … 56 more files in changeset.
DRILL-7440: Failure during loading of RepeatedCount functions

closes #1894

  1. … 3 more files in changeset.
DRILL-7436: Fix record count, vector structure issues in several operators

Adds additional vector checks to the BatchValidator.

Enables checking for the following operators:

* FilterRecordBatch

* PartitionLimitRecordBatch

* UnnestRecordBatch

* HashAggBatch

* RemovingRecordBatch

Fixes vector count issues for each of these.

Fixes empty-batch (record count = 0) handling in several of the

above operators. Added a method to VectorContainer to correctly

create an empty batch. (An empty batch, counter-intuitively,

needs vectors allocated to hold the 0 value in the first

position of each offset vector.)

Disables verbose logging for MongoDB tests. Details are written to

the log rather than the console.

Disables two invalid Mongo tests. See DRILL-7428.

Adjusts the expression tree materializer to not add the LATE type

to Union vectors. (See DRILL-7435.)

Ensures that Union vectors contain valid vectors for each subtype.

The present fix is a work-around, see DRILL-7434 for a better

long-term fix.

Cleans up code formatting and other minor issues in each file touched

during the fixes in this PR.

    • -44
    • +73
    ./ExpressionTreeMaterializer.java
    • -16
    • +18
    ./annotations/FunctionTemplate.java
  1. … 31 more files in changeset.
DRILL-7424: Project operator fails to set the container row count

Enabled the "batch validator" for the Project operator. Ran tests.

Exceptions occurred because, in some paths, the Project operator

fails to set the container row count.

Fixes the project operator. Cleans up formatting issues in files

touched during the investigation. Cleaned up batch-related issues

in Project.

    • -6
    • +9
    ./fn/impl/AbstractSqlPatternMatcher.java
    • -39
    • +39
    ./fn/impl/StringFunctionHelpers.java
  1. … 7 more files in changeset.
DRILL-7254: Read Hive union w/o nulls

  1. … 20 more files in changeset.
DRILL-7387: Failed to get value by int key from map nested into struct

  1. … 2 more files in changeset.
DRILL-7373: Fix problems involving reading from DICT type

- Fixed FieldIdUtil to resolve reading from DICT for some complex cases;

- optimized reading from DICT given a key by passing an appropriate Object type to DictReader#find(...) and DictReader#read(...) methods when schema is known (e.g. when reading from Hive tables) instead of generating it on fly based on int or String path and key type;

- fixed error when accessing value by not existing key value in Avro table.

  1. … 10 more files in changeset.
DRILL-4517: Support reading empty Parquet files

1. Modified flat and complex parquet readers to output schema only when requested number of records to read is 0. In this case readers are not initialized to improve performance.

2. Allowed reading requested number of rows instead of all rows in the row group (DRILL-6528).

3. Fixed issue with nulls number determination in the row group (fixed IsPredicate#isAllNulls method).

4. Allowed reading empty parquet files via adding empty / fake row group.

5. General refactoring and unit tests.

6. Parquet tests categorization.

closes #1839

  1. … 48 more files in changeset.
DRILL-7337: Add vararg UDFs support

    • -14
    • +14
    ./ExpressionTreeMaterializer.java
    • -0
    • +49
    ./fn/impl/CollectToListFunction.java
    • -4
    • +12
    ./fn/interpreter/InterpreterEvaluator.java
    • -13
    • +19
    ./fn/registry/LocalFunctionRegistry.java
  1. … 26 more files in changeset.
DRILL-7317: Close ClassLoaders used for udf jars uploading when closing FunctionImplementationRegistry

- Fix issue with caching DrillMergeProjectRule and FunctionImplementationRegistry when different drillbits are started within the same JVM

    • -1
    • +2
    ./fn/FunctionImplementationRegistry.java
    • -14
    • +61
    ./fn/registry/FunctionRegistryHolder.java
    • -1
    • +6
    ./fn/registry/LocalFunctionRegistry.java
  1. … 1 more file in changeset.
DRILL-7315: Revise precision and scale order in the method arguments

    • -2
    • +2
    ./fn/interpreter/InterpreterEvaluator.java
  1. … 27 more files in changeset.
DRILL-7307: casthigh for decimal type can lead to the issues with VarDecimalHolder

- Fixed code-gen for VarDecimal type

- Fixed code-gen issue with nullable holders for simple cast functions

with passed constants as arguments.

- Code-gen now honnoring DataType.Optional type defined by UDF for

NULL-IF-NULL functions.

    • -1
    • +2
    ./fn/DrillComplexWriterAggFuncHolder.java
    • -1
    • +3
    ./fn/DrillComplexWriterFuncHolder.java
  1. … 3 more files in changeset.
DRILL-7273: Introduce operators for handling metadata

closes #1886

    • -1
    • +1
    ./fn/DrillComplexWriterAggFuncHolder.java
    • -2
    • +2
    ./fn/impl/AggregateErrorFunctions.java
    • -0
    • +77
    ./fn/impl/CollectListMapsAggFunction.java
    • -0
    • +68
    ./fn/impl/CollectToListVarcharAggFunction.java
    • -0
    • +55
    ./fn/impl/ParentPathFunction.java
    • -0
    • +251
    ./fn/impl/SchemaFunctions.java
  1. … 150 more files in changeset.
DRILL-7271: Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore

  1. … 116 more files in changeset.
DRILL-7253: Read Hive struct w/o nulls

    • -0
    • +57
    ./fn/impl/RowConstructorFunction.java
  1. … 17 more files in changeset.
DRILL-7228: Upgrade to a newer version of t-digest to address inaccuracies in histogram buckets. closes #1774

  1. … 3 more files in changeset.
DRILL-7098: File Metadata Metastore Plugin closes #1754

  1. … 57 more files in changeset.
DRILL-7152: During histogram creation handle the case when all values of a column are NULLs.

close apache/drill#1730

    • -128
    • +192
    ./fn/impl/TDigestFunctions.java
  1. … 1 more file in changeset.
DRILL-7143: Support default value for empty columns

Modifies the prior work to add default values for columns. The prior work added defaults

when the entire column is missing from a reader (the old Nullable Int column). The Row

Set mechanism now will also "fill empty" slots with the default value.

Added default support for the column writers. The writers automatically obtain the

default value from the column schema. The default can also be set explicitly on

the column writer.

Updated the null column mechanism to use this feature rather than the ad-hoc

implemention in the prior commit.

Semantics changed a bit. Only Required columns take a default. The default value

is ignored or nullable columns since nullable columns already have a file default: NULL.

Other changes:

* Updated the CSV-with-schema tests to illustrate the new behavior.

* Made multiple fixes for Boolean and Decimal columns and added unit tests.

* Upgraded Fremarker to version 2.3.28 to allow use of the continue statement.

* Reimplemented the Bit column reader and writer to use the BitVector directly since this vector is rather special.

* Added get/set Boolean methods for column accessors

* Moved the BooleanType class to the common package

* Added more CSV unit tests to explore decimal types, booleans, and defaults

* Add special handling for blank fields in from-string conversions

* Added options to the conversion factory to specify blank-handling behavior.

CSV uses a mapping of blanks to null (nullable) or default value (non-nullable)

closes #1726

  1. … 71 more files in changeset.
DRILL-7096: Develop vector for canonical Map<K,V>

- Added new type DICT;

- Created value vectors for the type for single and repeated modes;

- Implemented corresponding FieldReaders and FieldWriters;

- Made changes in EvaluationVisitor to be able to read values from the map by key;

- Made changes to DrillParquetGroupConverter to be able to read Parquet's MAP type;

- Added an option `store.parquet.reader.enable_map_support` to disable reading MAP type as DICT from Parquet files;

- Updated AvroRecordReader to use new DICT type for Avro's MAP;

- Added support of the new type to ParquetRecordWriter.

  1. … 106 more files in changeset.