drill

Clone Tools
  • last updated 16 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7236: SqlLine 1.8 upgrade

closes #1804

edit docs

DRILL-7258: Remove field width limit for text reader

The V2 text reader enforced a limit of 64K characters when using

column headers, but not when using the columns[] array. The V3 reader

enforced the 64K limit in both cases.

This patch removes the limit in both cases. The limit now is the

16MB vector size limit. With headers, no one column can exceed 16MB.

With the columns[] array, no one row can exceed 16MB. (The 16MB

limit is set by the Netty memory allocator.)

Added an "appendBytes()" method to the scalar column writer which adds

additional bytes to those already written for a specific column or

array element value. The method is implemented for VarChar, Var16Char

and VarBinary vectors. It throws an exception for all other types.

When used with a type conversion shim, the appendBytes() method throws

an exception. This should be OK because, the previous setBytes() should

have failed because a huge value is not acceptable for numeric or date

types conversions.

Added unit tests of the append feature, and for the append feature in

the batch overflow case (when appending bytes causes the vector or

batch to overflow.) Also added tests to verify the lack of column width

limit with the text reader, both with and without headers.

closes #1802

  1. … 10 more files in changeset.
DRILL-7279: Enable provided schema for text files without headers

* Allows a provided schema for text files without headers. The

provided schema columns replace the `columns` column that is

normally used.

* Allows customizing text format properties using table properties.

The table properties "override" properties set in the plugin config.

* Added unit tests for the newly supported use cases.

* Fixed bug in quote escape handling.

closes #1798

DRILL-7261: Simplify Easy framework config for new scan

Most format plugins are created using the Easy format plugin. A recent

change added support for the "row set" scan framework. After converting

the text and log reader plugins, it became clear that the setup code

could be made simpler.

* Add the user name to the "file scan" framework.

* Pass the file system, split and user name to the batch reader via

the "schema negotiator" rather than via the constructor.

* Create the traditional "scan batch" scan or the new row-set scan via

functions instead of classes.

* Add Easy config option and method to choose the kind of scan

framework.

* Add Easy config options for some newer options such as whether the

plugin supports statistics.

Simplified reader creation

* The batch reader can be created just by overriding a method.

* A default error context is provided if the plugin does not provide

one.

Tested by running all unit tests for the CSV reader which is based on

the new framework, and by testing the converted log reader (that reader

is not part of this commit.)

closes #1796

Updates for new maprdb format plugin options, rn update

DRILL-7276: Fixed an XSS vulnerability in Drill Web-UI query profile page

DRILL-7267: Add Slack Link in Documentation

DRILL-7278: Refactor result set loader projection mechanism

Drill 1.16 added a enhanced scan framework based on the row set

mechanisms, and a "provisioned schema" feature build on top

of that framework. Conversion of the log reader plugin to use

the framework identified additional features we wish to add,

such as marking a column as "special" (not expanded in a wildcard

query.)

This work identified that the code added for provisioned schemas in

Drill 1.16 worked, but is a bit overly complex, making it hard to add

the desired new feature.

This patch refactors the "reader" projection code:

* Create a "projection set" mechanism that the reader can query to ask,

"the caller just added a column. Should it be projected or not?"

* Unifies the type conversion mechanism added as part of provisioned

schemas.

* Added the "special column" property for both "reader" and "provided"

schemas.

* Verified that provisioned schemas work with maps (at least on the scan

framework side.)

* Replaced the previous "schema transformer" mechanism with a new "type

conversion" mechanism that unifies type conversion, provided schemas

and an optional custom type conversion mechanism.

* Column writers can report if they are projected. Moved this query

from metadata to the column writer itself.

* Extended and clarified documentation of the feature.

* Revised and/or added unit tests.

closes #1797

  1. … 58 more files in changeset.
DRILL-7257: Set nullable var-width vector lastSet value

Turns out this is due to a subtle issue with variable-width nullable

vectors. Such vectors have a lastSet attribute in the Mutator class.

When using "transfer pairs" to copy values, the code somehow decides

to zero-fill from the lastSet value to the record count. The row set

framework did not set this value, meaning that the RemovingRecordBatch

zero-filled the dir0 column when it chose to use transfer pairs rather

than copying values. The use of transfer pairs occurs when all rows in

a batch pass the filter prior to the removing record batch.

Modified the nullable vector writer to properly set the lastSet value at

the end of each batch. Added a unit test to verify the value is set

correctly.

Includes a bit of code clean-up.

DRILL-7181: Improve V3 text reader (row set) error messages

Adds an error context to the User Error mechanism. The context allows

information to be passed through an intermediate layer and applied when

errors are raised in lower-level code; without the need for that

low-level code to know the details of the error context information.

Modifies the scan framework and V3 text plugin to use the framework to

improve error messages.

Refines how the `columns` column can be used with the text reader. If

headers are used, then `columns` is just another column. An error is

raised, however, if `columns[x]` is used when headers are enabled.

Added another builder abstraction where a constructor argument list

became too long.

Added the drill file system and split to the file schema negotiator

to simplify reader construction.

Added additional unit tests to fully define the `columns` column

behavior.

  1. … 20 more files in changeset.
DRILL-7204: Add proper validation when creating plugin

- Added validation for an empty plugin name.

- Added an URL encoding for pluing name, so plugins with special characters can be accessed without issues.

- Replaced alerts with modal windows.

- Added a confirmation dialog when disabling a plugin on Update page.

DRILL-7250: Query with CTE fails when its name matches to the table name without access

DRILL-7251: Read Hive array w/o nulls

1. HiveFieldConverter replaced by Hive writers for primitives

2. Created HiveValueWriterFactory and HiveListWriter to implement arrays support

4. Readers generation replaced by HiveDefaultRecordReader and HiveTextRecordReader

5. Few reader initializers replaced by one

6. Added method to repeated vardecimal writer

7. Minor fix for array column in View

  1. … 39 more files in changeset.
DRILL-7196: Queries are still runnable on disabled plugins

- Storage client is not created anymore for disabled plugins

- GET "/storage/{name}.json" endpoint now working with

plugin configuration directly, without client instantination.

It have increased UI responsitivity.

- Hbase and mongo base test classes refactored to honor enabled

plugin attribute

- Fixed path contructor for mongo test datasets:

Now it is cross-platform

- Fixed test json files format which using plugin definitions

- Code cleanup

    • -25
    • +26
    /common/src/test/resources/basic_physical.json
    • -24
    • +24
    /common/src/test/resources/dsort-logical.json
    • -2
    • +2
    /common/src/test/resources/jdbc_plan.json
    • -78
    • +81
    /common/src/test/resources/simple_plan.json
  1. … 92 more files in changeset.
DRILL-7242: Handle additional boundary cases and compute better estimates when popular values span multiple buckets.

Address review comments.

close apache/drill#1785

DRILL-7245: Cap NDV at row count after applying filters

closes #1786

add setMaxRows method info to JDBC driver page

DRILL-7240: Catch runtime pruning filter-match exceptions and do not prune these rowgroups

closes #1783

DRILL-7237: Fix single_value aggregate function for variable length types

- Add implementations of single_value for complex data types

closes #1782

    • -1
    • +0
    /exec/java-exec/src/main/codegen/config.fmpp
DRILL-4782 / DRILL-7139: Fix DATE_ADD and TO_TIME functions

- cast function for the day interval changed to round milliseconds to complete days

- ToDateTypeFunctions#toTime now returning milliseconds of day

- updated the way how DayInterval subtracts and adds, to follow the cast function logic

UT core updates:

- added vectorValue function to the queryBuilder to simplify retrieving value of the vector

- refactored singleton query result functions at queryBuilder

edit date

DRILL-7238: Fixed ConvertCountToDirectScan to handle non-existent columns

closes #1781

edit download page to include link to how to verify downloaded files

edit index page for meetup 2019

edit 2019 meetup blog

edit for meetup blog

add blog for meetup and link to meetup

    • -0
    • +8
    /blog/_posts/2019-05-02-drill-user-meetup.md
update what's new page

edits