Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6604: Upgrade Drill Hive client to Hive3.1 version

closes #2038

    • -0
    • +38
    ./data/Hive2DateTypes.tdd
    • -0
    • +38
    ./data/Hive3DateTypes.tdd
    • -6
    • +18
    ./templates/ObjectInspectorHelper.java
    • -11
    • +26
    ./templates/ObjectInspectors.java
  1. … 13 more files in changeset.
DRILL-7592: Add missing licenses and update plugins exclusion list and fix licenses

closes #1989

  1. … 84 more files in changeset.
DRILL-7463: Apache license is not added to the generated classes

closes #1916

  1. … 2 more files in changeset.
DRILL-7251: Read Hive array w/o nulls

1. HiveFieldConverter replaced by Hive writers for primitives

2. Created HiveValueWriterFactory and HiveListWriter to implement arrays support

4. Readers generation replaced by HiveDefaultRecordReader and HiveTextRecordReader

5. Few reader initializers replaced by one

6. Added method to repeated vardecimal writer

7. Minor fix for array column in View

    • -178
    • +0
    ./templates/HiveRecordReaders.java
  1. … 51 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

    • -2
    • +2
    ./templates/ObjectInspectorHelper.java
  1. … 984 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

    • -2
    • +1
    ./templates/ObjectInspectorHelper.java
  1. … 2063 more files in changeset.
DRILL-6094: Decimal data type enhancements

Add ExprVisitors for VARDECIMAL

Modify writers/readers to support VARDECIMAL

- Added usage of VarDecimal for parquet, hive, maprdb, jdbc;

- Added options to store decimals as int32 and int64 or fixed_len_byte_array or binary;

Add UDFs for VARDECIMAL data type

- modify type inference rules

- remove UDFs for obsolete DECIMAL types

Enable DECIMAL data type by default

Add unit tests for DECIMAL data type

Fix mapping for NLJ when literal with non-primitive type is used in join conditions

Refresh protobuf C++ source files

Changes in C++ files

Add support for decimal logical type in Avro.

Add support for date, time and timestamp logical types.

Update Avro version to 1.8.2.

    • -22
    • +22
    ./templates/ObjectInspectors.java
  1. … 200 more files in changeset.
DRILL-6106: Use valueOf method instead of constructor since valueOf has a higher performance by caching frequently requested values.

closes #1099

  1. … 11 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 223 more files in changeset.
DRILL-5978: Updating of Apache and MapR Hive libraries to 2.3.2 and 2.1.2-mapr-1710 versions respectively

* Improvements to allow of reading Hive bucketed transactional ORC tables;

* Updating hive properties for tests and resolving dependencies and API conflicts:

- Fix for "hive.metastore.schema.verification", MetaException(message: Version information

not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool

METASTORE_SCHEMA_VERIFICATION="false" property is added

- Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional

tables are necessary in Hive metastore

- Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict

with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property

- jackson and parquet libraries are relocated in hive-exec-shade module

- org.apache.parquet:parquet-column Drill version is added to "hive-exec" to

allow of using Parquet empty group on MessageType level (PARQUET-278)

- Removing of commons-codec exclusion from hive core. This dependency is

necessary for hive-exec and hive-metastore.

- Setting Hive internal properties for transactional scan:

HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,

IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES

- "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM

- Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM

- Hive Calcite libraries are excluded (Calcite CBO was disabled)

- "jackson-core" dependency is added to DependencyManagement block in Drill root POM file

- For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included

- "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler".

close apache/drill#1111

  1. … 14 more files in changeset.
DRILL-5941: Skip header / footer improvements for Hive storage plugin

Overview:

1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941).

2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106).

3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRILL-5106).

Code changes:

1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization.

It will have two implementations:

a. Default (each input split group gets its own reader);

b. Empty (for empty tables);

2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0.

It will have two implementations:

a. Default (records will be processed one by one without buffering);

b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing).

3. When text table has header / footer each table file should be read as one unit. When file is being read as several input splits, they should be grouped.

For this purpose LogicalInputSplit class was introduced which replaced InputSplitWrapper class. New class stores list of grouped input splits and returns information about splits on group level.

Please note, during planning input splits are grouped only when data is being read from text table has header / footer each table, otherwise each input split is treated separately.

4. Allow HiveAbstractReader to have multiple input splits instead of one.

This closes #1030

    • -174
    • +55
    ./templates/HiveRecordReaders.java
  1. … 20 more files in changeset.
DRILL-5002: Using hive's date functions on top of date column gives wrong results for local time-zone

closes #937

    • -1
    • +5
    ./templates/ObjectInspectorHelper.java
  1. … 1 more file in changeset.
DRILL-4868: fix how hive function set DrillBuf.

This closes #695

    • -35
    • +22
    ./templates/ObjectInspectorHelper.java
  1. … 3 more files in changeset.
DRILL-4982: Separate Hive reader classes for different data formats to improve performance.

1, Separating Hive reader classes allows optimization to apply on different classes in optimized ways. This separation effectively avoid the performance degradation of scan.

2, Do not apply Skip footer/header mechanism on most Hive formats. This skip mechanism introduces extra checks on each incoming records.

close apache/drill#638

    • -0
    • +50
    ./data/HiveFormats.tdd
    • -0
    • +300
    ./templates/HiveRecordReaders.java
  1. … 4 more files in changeset.
DRILL-5032: Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

close apache/drill#654

  1. … 22 more files in changeset.
DRILL-4459: Resolve SchemaChangeException while querying hive json table

- Replace drill var16char to varchar datatype for hive string datatype

- Change testGenericUDF() and testUDF() to use VarChar instead of Var16Char

- Add unit test for hive GET_JSON_OBJECT UDF

closes #431

    • -8
    • +18
    ./templates/ObjectInspectorHelper.java
    • -10
    • +10
    ./templates/ObjectInspectors.java
  1. … 9 more files in changeset.
DRILL-3745: Hive CHAR not supported

    • -1
    • +16
    ./templates/ObjectInspectorHelper.java
  1. … 9 more files in changeset.
DRILL-3273: Pass an empty DeferredObject to Hive UDFs for null argument value

+ Handle nulls in ObjectInspector implementations for Drill types.

    • -130
    • +189
    ./templates/ObjectInspectors.java
  1. … 6 more files in changeset.
DRILL-1347: Update Hive storage plugin to Hive version 0.13.1 from current version 0.12.0.

  1. … 9 more files in changeset.
Fix issues with Hive function generation to support DrillBuf

    • -5
    • +12
    ./templates/ObjectInspectorHelper.java
  1. … 1 more file in changeset.
DRILL-1192: Hive Scalar UDFs: Add Date, TimeStamp and Decimal type support

Also following refactoring:

+ Minimize the number of variables in HiveTypes.tdd

+ Make use of Hive TypeEntries and Hive AbstractPrimitiveObjectInspector to simplify Drill ObjectInspectors implementations.

Test:

+ Add Hive UDF test implementations and testcases to cover all supported types (passing data into Hive UDF and reading data returned from Hive UDF).

    • -12
    • +33
    ./templates/ObjectInspectorHelper.java
    • -48
    • +175
    ./templates/ObjectInspectors.java
  1. … 12 more files in changeset.
Switch to DrillBuf Add @Inject DrillBuf Move comparison functions to memory sensitive ones Add scalar replacement functionality for value holders Simplify date parsing function Add local compiled code caching

  1. … 213 more files in changeset.
DRILL-1024: Move hive storage code out of 'exec/java-exec' into 'contrib/storage-hive' module.

+ Create two modules in contrib/storage-hive

++ contrib/storage-hive/hive-exec-shade: creates shaded hive-exec.jar

++ contrib/storage-hive/core: contains Hive storage code (schema, record reader and functions)

+ Update TestHiveUDFs.java to use BaseTestQuery instead of SimpleRootExec

    • -0
    • +100
    ./data/HiveTypes.tdd
    • -0
    • +18
    ./includes/license.ftl
    • -0
    • +192
    ./templates/ObjectInspectorHelper.java
    • -0
    • +114
    ./templates/ObjectInspectors.java
  1. … 66 more files in changeset.