drill

Clone Tools
  • last updated 16 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-5694: Do not allow queries to access paths outside the current workspace root

closes #1050

DRILL-5867: List profiles in pages rather than a long verbose listing

Leverage existing DataTables libraries to paginate a long pre-fetched list of profiles for listing.

Added benefit of querying through that list (search field) is also available for a user

Minor change made to the display text for prefetching of profiles (DRILL-5259) so that it is not confused with what this commit adds to the UI.

This closes #1029

throttling edits

DRILL-5962: Add function STAsJSON to extend GIS support

This closes #1036

throttling doc add

    • binary
    /_docs/img/queue-threshold.png
    • binary
    /_docs/img/throttling.png
DRILL-5801: Gantt chart (fragment timeline) enhancements

1. Labelled X and Y axes on the Gantt Chart that expresses the fragments' timelines

2. Support mouse hover to reveal major fragment ID

This closes #1035

dots in column names - lexical structure

DRILL-5960: Add ST_AsGeoJSON functionality from PostGIS

This closes #1034

DRILL-5978: Updating of Apache and MapR Hive libraries to 2.3.2 and 2.1.2-mapr-1710 versions respectively

* Improvements to allow of reading Hive bucketed transactional ORC tables;

* Updating hive properties for tests and resolving dependencies and API conflicts:

- Fix for "hive.metastore.schema.verification", MetaException(message: Version information

not found in metastore) https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool

METASTORE_SCHEMA_VERIFICATION="false" property is added

- Added METASTORE_AUTO_CREATE_ALL="true", properties to tests, because some additional

tables are necessary in Hive metastore

- Disabling calcite CBO for (Hive's CalcitePlanner) for tests, because it is in conflict

with Drill's Calcite version for Drill unit tests. HIVE_CBO_ENABLED="false" property

- jackson and parquet libraries are relocated in hive-exec-shade module

- org.apache.parquet:parquet-column Drill version is added to "hive-exec" to

allow of using Parquet empty group on MessageType level (PARQUET-278)

- Removing of commons-codec exclusion from hive core. This dependency is

necessary for hive-exec and hive-metastore.

- Setting Hive internal properties for transactional scan:

HiveConf.HIVE_TRANSACTIONAL_TABLE_SCAN and for schema evolution: HiveConf.HIVE_SCHEMA_EVOLUTION,

IOConstants.SCHEMA_EVOLUTION_COLUMNS, IOConstants.SCHEMA_EVOLUTION_COLUMNS_TYPES

- "io.dropwizard.metrics:metrics-core" with last 4.0.2 version is added to dependencyManagement block in Drill root POM

- Exclusion of "hive-exec" in "hive-hbase-handler" is already in Drill root dependencyManagement POM

- Hive Calcite libraries are excluded (Calcite CBO was disabled)

- "jackson-core" dependency is added to DependencyManagement block in Drill root POM file

- For MapR Hive 2.1 client older "com.fasterxml.jackson.core:jackson-databind" is included

- "log4j:log4j" dependency is excluded from "hive-exec", "hive-metastore", "hive-hbase-handler".

close apache/drill#1111

DRILL-5919: Add non-numeric support for JSON processing

1. Added two session options store.json.reader.non_numeric_numbers and store.json.reader.non_numeric_numbers that allow to read/write NaN and Infinity as numbers. By default these options

are set to true.

2. Extended signature of convert_toJSON and convert_fromJSON functions by adding second optional parameter

that enables/disables read/write NaN and Infinity. By default it is set true.

3. Added unit tests with nan, infitity values for math and aggregate functions

4. Replaced JsonReader's constructors with builder.

This closes #1026

  1. … 3 more files in changeset.
DRILL-5926: The TestValueVector tests would run out of memory. Increased the MaxDirectMemorySize for the forked test processes in the pom to avoid this.

DRILL-5922: - The QueryContext was never closed when the Foreman finished, so it's child allocator was never closed. Now it is. - The PlanSplitter created a QueryContext temporarily to construct an RPC message but never closed it. Now the temp QueryContext is closed. - The waitForExit method was error prone. Changed it to use the standard condition variable pattern. - Fixed timeouts in graceful shutdown tests

DRILL-5923: Display name for query state

closes #1021

DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

    • -0
    • +104
    /common/src/test/java/org/apache/drill/test/TestTools.java
  1. … 351 more files in changeset.
DRILL-5941: Skip header / footer improvements for Hive storage plugin

Overview:

1. When table has header / footer process input splits fo the same file in one reader (bug fix for DRILL-5941).

2. Apply skip header logic during reader initialization only once to avoid checks during reading the data (DRILL-5106).

3. Apply skip footer logic only when footer is more then 0, otherwise default processing will be done without buffering data in queue (DRILL-5106).

Code changes:

1. AbstractReadersInitializer was introduced to factor out common logic during readers intialization.

It will have two implementations:

a. Default (each input split group gets its own reader);

b. Empty (for empty tables);

2. AbstractRecordsInspector was introduced to improve performance when table has footer is less or equals to 0.

It will have two implementations:

a. Default (records will be processed one by one without buffering);

b. SkipFooter (queue will be used to buffer N records that should be skipped in the end of file processing).

3. When text table has header / footer each table file should be read as one unit. When file is being read as several input splits, they should be grouped.

For this purpose LogicalInputSplit class was introduced which replaced InputSplitWrapper class. New class stores list of grouped input splits and returns information about splits on group level.

Please note, during planning input splits are grouped only when data is being read from text table has header / footer each table, otherwise each input split is treated separately.

4. Allow HiveAbstractReader to have multiple input splits instead of one.

This closes #1030

  1. … 7 more files in changeset.
DRILL-5921: Display counter metrics in table

closes #1020

DRILL-5917: Ban org.json:json library in Drill

This closes #1031

DRILL-5943: Avoid the strong check introduced by DRILL-5582 for PLAIN mechanism

This closes #1028

DRILL-5936: Refactor MergingRecordBatch based on code inspection

This closes #1025

DRILL-3993: Fix unit test failures connected with support Calcite 1.13

- Use root schema as default for describe table statement.

Fix TestOpenTSDBPlugin.testDescribe() and TestInfoSchemaOnHiveStorage.varCharMaxLengthAndDecimalPrecisionInInfoSchema() unit tests.

- Modify expected results for tests:

TestPreparedStatementProvider.invalidQueryValidationError();

TestProjectPushDown.testTPCH1();

TestProjectPushDown.testTPCH3();

TestStorageBasedHiveAuthorization.selectUser1_db_u0_only();

TestStorageBasedHiveAuthorization.selectUser0_db_u1g1_only()

- Fix TestCTAS.whenTableQueryColumnHasStarAndTableFiledListIsSpecified(), TestViewSupport.createViewWhenViewQueryColumnHasStarAndViewFiledListIsSpecified(), TestInbuiltHiveUDFs.testIf(), testDisableUtf8SupportInQueryString unit tests.

- Fix UnsupportedOperationException and NPE for jdbc tests.

- Fix AssertionError: Conversion to relational algebra failed to preserve datatypes

*DrillCompoundIdentifier:

According to the changes, made in [CALCITE-546], star Identifier is replaced by empty string during parsing the query. Since Drill uses its own DrillCompoundIdentifier, it should also replace star by empty string before creating SqlIdentifier instance to avoid further errors connected with star column. see SqlIdentifier.isStar() method.

*SqlConverter:

In [CALCITE-1417] added simplification of expressions which should be projected every time when a new project rel node is created using RelBuilder. It causes assertion errors connected with types nullability. This hook was set to false to avoid project expressions simplification. See usage of this hook and RelBuilder.project() method.

In Drill the type nullability of the function depends on only the nullability of its arguments. In some cases, a function may return null value even if it had non-nullable arguments. When Calice simplifies expressions, it checks that the type of the result is the same as the type of the expression. Otherwise, makeCast() method is called. But when a function returns null literal, this cast does nothing, even when the function has a non-nullable type. So to avoid this issue, method makeCast() was overridden.

*DrillAvgVarianceConvertlet:

Problem with sum0 and specific changes in old Calcite (it is CALCITE-777). (see HistogramShuttle.visitCall method) Changes were made to avoid changes in Calcite.

*SqlConverter, DescribeTableHandler, ShowTablesHandler:

New Calcite tries to combine both default and specified workspaces during the query validation. In some cases, for example, when describe table statement is used, Calcite tries to find INFORMATION_SCHEMA in the schema used as default. When it does not find the schema, it tries to find a table with such name. For some storage plugins, such as opentsdb and hbase, when a table was not found, the error is thrown, and the query fails. To avoid this issue, default schema was changed to root schema for validation stage for describe table and show tables queries.

  1. … 3 more files in changeset.
DRILL-5089: Dynamically load schema of storage plugin only when needed for every query

For each query, loading all storage plugins and loading all workspaces under file system plugins is not needed.

This patch use DynamicRootSchema as the root schema for Drill. Which loads correspondent storage only when needed.

infoschema to read full schema information and load second level schema accordingly.

for workspaces under the same Filesyetm, no need to create FileSystem for each workspace.

use fs.access API to check permission which is available after HDFS 2.6 except for windows + local file system case.

Add unit tests to test with a broken mock storage: with a storage that will throw Exception in regiterSchema method,

all queries even on good storages shall fail without this fix(Drill still load all schemas from all storages).

This closes #1032

DRILL-5089: Dynamically load schema of storage plugin only when needed for every query

For each query, loading all storage plugins and loading all workspaces under file system plugins is not needed.

This patch use DynamicRootSchema as the root schema for Drill. Which loads correspondent storage only when needed.

infoschema to read full schema information and load second level schema accordingly.

for workspaces under the same Filesyetm, no need to create FileSystem for each workspace.

use fs.access API to check permission which is available after HDFS 2.6 except for windows + local file system case.

Add unit tests to test with a broken mock storage: with a storage that will throw Exception in regiterSchema method,

all queries even on good storages shall fail without this fix(Drill still load all schemas from all storages).

(cherry picked from commit a66d1d7)

DRILL-3993: Fix failed tests after Calcite update

- fix temporary table errors according to updated logic;

- fixed errors when we trying to make select from hbase table with schema name in query (example: "SELECT row_key FROM hbase.TestTableNullStr) from hbase schema (did "USE hbase" before). Added test for it;

- added fix for views which were created on Calcite 1.4 and test for it.

DRILL-5909: Added new Counter metrics

closes #1019

DRILL-5834: Add Networking Functions

close apache/drill#1018

DRILL-5924: native-client: Support user-specified CXX_FLAGS

This closes #1022

DRILL-5911: Upgrade esri-geometry-api version to 2.0.0 to avoid dependency on org.json library

closes #1012

DRILL-5910: Logging exception when custom AuthenticatorFactory not found

closes #1013

DRILL-5822: The query with "SELECT *" with "ORDER BY" clause and `planner.slice_target`=1 doesn't preserve column order

- The commit for DRILL-847 is oudated. There is no need to canonicalize the batch or container since RecordBatchLoader

swallows the "schema change" for now if two batches have different column ordering.

closes #1017

DRILL-5771: Fix serDe errors for format plugins

1. Fix various serde issues for format plugins described in DRILL-5771.

2. Throw meaninful exception instead of NPE when table is not found when table function is used.

3. Added unit tests for all format plugins for ensure serde is checked (physical plan is generated in json format and then submitted).

4. Fix physical plan submission on Windows (DRILL-4640).

This closes #1014