Clone Tools
  • last updated 28 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6173: Support transitive closure during filter push down and partition pruning

closes #1216

    • -0
    • +15
    ./exec/TestHivePartitionPruning.java
  1. … 35 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

    • -1
    • +1
    ./exec/fn/hive/TestSampleHiveUDFs.java
    • -1
    • +1
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -1
    • +0
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 2052 more files in changeset.
DRILL-6331: Revisit Hive Drill native parquet implementation to be exposed to Drill optimizations (filter / limit push down, count to direct scan)

1. Factored out common logic for Drill parquet reader and Hive Drill native parquet readers: AbstractParquetGroupScan, AbstractParquetRowGroupScan, AbstractParquetScanBatchCreator.

2. Rules that worked previously only with ParquetGroupScan, now can be applied for any class that extends AbstractParquetGroupScan: DrillFilterItemStarReWriterRule, ParquetPruneScanRule, PruneScanRule.

3. Hive populated partition values based on information returned from Hive metastore. Drill populates partition values based on path difference between selection root and actual file path.

Before ColumnExplorer populated partition values based on Drill approach. Since now ColumnExplorer populates values for parquet files from Hive tables,

`populateImplicitColumns` method logic was changed to populated partition columns only based on given partition values.

4. Refactored ParquetPartitionDescriptor to be responsible for populating partition values rather than storing this logic in parquet group scan class.

5. Metadata class was moved to separate metadata package (org.apache.drill.exec.store.parquet.metadata). Factored out several inner classed to improve code readability.

6. Collected all Drill native parquet reader unit tests into one class TestHiveDrillNativeParquetReader, also added new tests to cover new functionality.

7. Reduced excessive logging when parquet files metadata is read

closes #1214

    • -0
    • +248
    ./exec/TestHiveDrillNativeParquetReader.java
    • -19
    • +0
    ./exec/TestHivePartitionPruning.java
    • -13
    • +0
    ./exec/TestHiveProjectPushDown.java
    • -189
    • +26
    ./exec/hive/TestHiveStorage.java
    • -2
    • +4
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -21
    • +61
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 57 more files in changeset.
DRILL-6130: Fix NPE during physical plan submission for various storage plugins

1. Fixed ser / de issues for Hive, Kafka, Hbase plugins.

2. Added physical plan submission unit test for all storage plugins in contrib module.

3. Refactoring.

closes #1108

  1. … 26 more files in changeset.
DRILL-6106: Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

closes #1099

  1. … 11 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 223 more files in changeset.
DRILL-5989: Categories some tests to speed up smoke tests. Made travis run tests.

closes #1053

  1. … 31 more files in changeset.
DRILL-5978: Updating of Apache and MapR Hive libraries to 2.3.2 and 2.1.2-mapr-1710 versions respectively

* Improvements to allow of reading Hive bucketed transactional ORC tables;

* Updating hive properties for tests and resolving dependencies and API conflicts:

- Fix for "hive.metastore.schema.verification", MetaException(message: Version information

not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool

METASTORE_SCHEMA_VERIFICATION="false" property is added

- Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional

tables are necessary in Hive metastore

- Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict

with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property

- jackson and parquet libraries are relocated in hive-exec-shade module

- org.apache.parquet:parquet-column Drill version is added to "hive-exec" to

allow of using Parquet empty group on MessageType level (PARQUET-278)

- Removing of commons-codec exclusion from hive core. This dependency is

necessary for hive-exec and hive-metastore.

- Setting Hive internal properties for transactional scan:

HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,

IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES

- "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM

- Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM

- Hive Calcite libraries are excluded (Calcite CBO was disabled)

- "jackson-core" dependency is added to DependencyManagement block in Drill root POM file

- For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included

- "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler".

close apache/drill#1111

    • -0
    • +3
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 10 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

    • -2
    • +2
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
    • -4
    • +1
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -9
    • +37
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 358 more files in changeset.
DRILL-5941: Skip header / footer improvements for Hive storage plugin

Overview:

1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941).

2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106).

3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRILL-5106).

Code changes:

1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization.

It will have two implementations:

a. Default (each input split group gets its own reader);

b. Empty (for empty tables);

2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0.

It will have two implementations:

a. Default (records will be processed one by one without buffering);

b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing).

3. When text table has header / footer each table file should be read as one unit. When file is being read as several input splits, they should be grouped.

For this purpose LogicalInputSplit class was introduced which replaced InputSplitWrapper class. New class stores list of grouped input splits and returns information about splits on group level.

Please note, during planning input splits are grouped only when data is being read from text table has header / footer each table, otherwise each input split is treated separately.

4. Allow HiveAbstractReader to have multiple input splits instead of one.

This closes #1030

    • -12
    • +22
    ./exec/hive/TestHiveStorage.java
    • -6
    • +8
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -27
    • +47
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 18 more files in changeset.
DRILL-3993: Fix unit test failures connected with support Calcite 1.13

- Use root schema as default for describe table statement.

Fix TestOpenTSDBPlugin.testDescribe() and TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema() unit tests.

- Modify expected results for tests:

TestPreparedStatementProvider.invalidQueryValidationError();

TestProjectPushDown.testTPCH1();

TestProjectPushDown.testTPCH3();

TestStorageBasedHiveAuthorization.selectUser1_db_u0_only();

TestStorageBasedHiveAuthorization.selectUser0_db_u1g1_only()

- Fix TestCTAS.whenTableQueryColumnHasStarAndTableFiledListIsSpecified(), TestViewSupport.createViewWhenViewQueryColumnHasStarAndViewFiledListIsSpecified(), TestInbuiltHiveUDFs.testIf(), testDisableUtf8SupportInQueryString unit tests.

- Fix UnsupportedOperationException and NPE for jdbc tests.

- Fix AssertionError: Conversion to relational algebra failed to preserve datatypes

*DrillCompoundIdentifier:

According to the changes, made in [CALCITE-546], star Identifier is replaced by empty string during parsing the query. Since Drill uses its own DrillCompoundIdentifier, it should also replace star by empty string before creating SqlIdentifier instance to avoid further errors connected with star column. see SqlIdentifier.isStar() method.

*SqlConverter:

In [CALCITE-1417] added simplification of expressions which should be projected every time when a new project rel node is created using RelBuilder. It causes assertion errors connected with types nullability. This hook was set to false to avoid project expressions simplification. See usage of this hook and RelBuilder.project() method.

In Drill the type nullability of the function depends on only the nullability of its arguments. In some cases, a function may return null value even if it had non-nullable arguments. When Calice simplifies expressions, it checks that the type of the result is the same as the type of the expression. Otherwise, makeCast() method is called. But when a function returns null literal, this cast does nothing, even when the function has a non-nullable type. So to avoid this issue, method makeCast() was overridden.

*DrillAvgVarianceConvertlet:

Problem with sum0 and specific changes in old Calcite (it is CALCITE-777). (see HistogramShuttle.visitCall method) Changes were made to avoid changes in Calcite.

*SqlConverter, DescribeTableHandler, ShowTablesHandler:

New Calcite tries to combine both default and specified workspaces during the query validation. In some cases, for example, when describe table statement is used, Calcite tries to find INFORMATION_SCHEMA in the schema used as default. When it does not find the schema, it tries to find a table with such name. For some storage plugins, such as opentsdb and hbase, when a table was not found, the error is thrown, and the query fails. To avoid this issue, default schema was changed to root schema for validation stage for describe table and show tables queries.

    • -1
    • +1
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 16 more files in changeset.
DRILL-5832: Change OperatorFixture to use system option manager

- Rename FixtureBuilder to ClusterFixtureBuilder

- Provide alternative way to reset system/session options

- Fix for DRILL-5833: random failure in TestParquetWriter

- Provide strict, but clear, errors for missing options

closes #970

  1. … 51 more files in changeset.
DRILL-5772: Enable UTF-8 support in query string by default

1. Bump up Drill Calcite version to in include CALCITE-2014 changes.

2. Add saffron.properties file to the Drill conf folder.

3. Add appopriate unit tests.

closes #936

  1. … 6 more files in changeset.
DRILL-5752 this change includes:

1. Increased test parallelism and fixed associated bugs

2. Added test categories and categorized tests appropriately

- Don't exclude anything by default

- Increase test timeout

- Fixed flakey test

closes #940

    • -0
    • +4
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
    • -0
    • +4
    ./exec/fn/hive/TestSampleHiveUDFs.java
    • -0
    • +4
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -1
    • +1
    ./exec/test/Drill2130StorageHiveCoreHamcrestConfigurationTest.java
  1. … 254 more files in changeset.
DRILL-5002: Using hive's date functions on top of date column gives wrong results for local time-zone

closes #937

    • -1
    • +42
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 2 more files in changeset.
DRILL-5723: Added System Internal Options That can be Modified at Runtime Changes include:

1. Addition of internal options.

2. Refactoring of OptionManagers and OptionValidators.

3. Fixed ambiguity in the meaning of an option type, and changed its name to accessibleScopes.

4. Updated javadocs in the Option System classes.

5. Added RestClientFixture for testing the Rest API.

6. Fixed flakey test in TestExceptionInjection caused by race condition.

7. Fixed various tests which started zookeeper but failed to shut it down at the end of tests.

8. Added port hunting to the Drill Webserver for testing

9. Fixed various flaky tests

10. Fix compile issue

closes #923

    • -1
    • +1
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 84 more files in changeset.
DRILL-3250: Drill fails to compare multi-byte characters from hive table - A small refactoring of original fix of this issue (DRILL-4039); - Added test for the fix.

  1. … 3 more files in changeset.
DRILL-5459: Extend physical operator test framework to test mini plans consisting of multiple operators.

This closes #823

    • -2
    • +2
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 16 more files in changeset.
DRILL-5419: Calculate return string length for literals & some string functions

1. Revisited calculation logic for string literals and some string functions

(cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement,

coalesce, first_value, last_value, lag, lead).

Synchronized return type length calculation logic between limit 0 and regular queries.

2. Deprecated width and changed it to precision for string types in MajorType.

3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType.

FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n).

New annotation in UDFs ReturnType will indicate which return type strategy should be used.

4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535.

5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY.

6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder).

This closes #819

    • -2
    • +3
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
  1. … 77 more files in changeset.
DRILL-4868: fix how hive function set DrillBuf.

This closes #695

    • -0
    • +34
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 3 more files in changeset.
DRILL-5032: Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

close apache/drill#654

    • -1
    • +28
    ./exec/TestHivePartitionPruning.java
    • -0
    • +2
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -0
    • +9
    ./exec/store/hive/HiveTestDataGenerator.java
    • -0
    • +111
    ./exec/store/hive/schema/TestColumnListCache.java
  1. … 19 more files in changeset.
DRILL-4826: Query against INFORMATION_SCHEMA.TABLES degrades as the number of views increases

This closes #592

    • -0
    • +34
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
  1. … 8 more files in changeset.
DRILL-4618: Correct the usage of random flag in Hive function registry

+ Function visitor should not use previous function holder if this function is non-deterministic

closes #509

    • -0
    • +10
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 4 more files in changeset.
DRILL-4673: Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return - implement DROP TABLE IF EXISTS and DROP VIEW IF EXISTS; - added unit test for DROP TABLE IF EXISTS; - added unit test for DROP VIEW IF EXISTS; - added unit test for "IF" hive UDF.

This closes #541

    • -1
    • +11
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 10 more files in changeset.
DRILL-4577: Construct a specific path for querying all the tables from a hive database

  1. … 8 more files in changeset.
DRILL-3623: For limit 0 queries, optionally use a shorter execution path when result column types are known

+ "planner.enable_limit0_optimization" option is disabled by default

+ Print plan in PlanTestBase if TEST_QUERY_PRINTING_SILENT is set

+ Fix DrillTestWrapper to verify expected and actual schema

+ Correct the schema of results in TestInbuiltHiveUDFs#testXpath_Double

This closes #405

    • -1
    • +1
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 12 more files in changeset.
DRILL-4372: (continued) Support for Window functions: - CUME_DIST - DENSE_RANK - PERCENT_RANK - RANK - ROW_NUMBER - NTILE - LEAD - LAG - FIRST_VALUE - LAST_VALUE

    • -19
    • +18
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 25 more files in changeset.
DRILL-4459: Resolve SchemaChangeException while querying hive json table

- Replace drill var16char to varchar datatype for hive string datatype

- Change testGenericUDF() and testUDF() to use VarChar instead of Var16Char

- Add unit test for hive GET_JSON_OBJECT UDF

closes #431

    • -25
    • +26
    ./exec/fn/hive/TestHiveUDFs.java
    • -0
    • +14
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
    • -0
    • +1
    ./exec/hive/TestInfoSchemaOnHiveStorage.java
    • -0
    • +6
    ./exec/store/hive/HiveTestDataGenerator.java
  1. … 8 more files in changeset.
DRILL-4372: (continued) Type inference for HiveUDFs

    • -0
    • +28
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 2 more files in changeset.
DRILL-4441: Fix varchar data read out of Avro filtering incorrectly due to metadata bug

The precision of the Varchar datatype was not being set causing inconsistent

truncation of values to the default length of 1. Fixed the same issue with varbinary.

The test framework was previously taking a string as the baseline for a binary value,

which cannot express all possible values. Fixed the test to intstead use a byte array.

Thie required updating the hive tests that were using the old method of specifying

baselines with a String.

Fix cast to varbinary when reading from a data source with schema needed for writing

a test.

Updated patch to remove varchar lengths from table creation.

This issue was fixed more generally by DRILL-4465, which provides a default

type length for varchar and varbinary during the setup of calcite. This update now

just provides tests to verify the fix in this case.

Closes #393

    • -1
    • +1
    ./exec/fn/hive/TestInbuiltHiveUDFs.java
  1. … 4 more files in changeset.