drill

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Reorganize content for securing drill updates

    • -268
    • +0
    /_docs/configure-drill/076-configuring-user-impersonation-with-hive-authorization.md
    • -111
    • +0
    /_docs/configure-drill/078-configuring-web-ui-and-rest-api-security.md
Securing Drill-Kerberos Auth Feature Docs

    • -0
    • +9
    /_docs/configure-drill/031-securing-drill.md
    • -0
    • +29
    /_docs/configure-drill/securing-drill/030-roles-privileges.md
    • -0
    • +268
    /_docs/configure-drill/securing-drill/060-configuring-user-impersonation-with-hive-authorization.md
    • binary
    /_docs/img/kerberauthprocess.png
    • binary
    /_docs/img/kerberclientserver.png
    • binary
    /_docs/img/plainauthprocess.png
    • binary
    /_docs/img/securecommunicationpaths.png
remove DRILL-5270 from RN list

fixed link to CTTAS page from blog

updates to _data

Link updates to point to Drill 1.10

    • -4
    • +4
    /_docs/install/047-installing-drill-on-the-cluster.md
    • -2
    • +2
    /_docs/tutorials/050-analyzing-highly-dynamic-datasets.md
DRILL-5356: Refactor Parquet Record Reader

The Parquet reader is Drill's premier data source and has worked very well

for many years. As with any piece of code, it has grown in complexity over

that time and has become hard to understand and maintain.

In work in another project, we found that Parquet is accidentally creating

"low density" batches: record batches with little actual data compared to

the amount of memory allocated. We'd like to fix that.

However, the current complexity of the reader code creates a barrier to

making improvements: the code is so complex that it is often better to

leave bugs unfixed, or risk spending large amounts of time struggling to

make small changes.

This commit offers to help revitalize the Parquet reader. Functionality is

identical to the code in master; but code has been pulled apart into

various classes each of which focuses on one part of the task: building

up a schema, keeping track of read state, a strategy for reading various

combinations of records, etc. The idea is that it is easier to understand

several small, focused classes than one huge, complex class. Indeed, the

idea of small, focused classes is common in the industry; it is nothing new.

Unit tests pass with the change. Since no logic has chanaged, we only moved

lines of code, that is a good indication that everything still works.

Also includes fixes based on review comments.

closes #789

    • -0
    • +20
    /exec/java-exec/src/test/resources/parquet/expected/bogus.csv
    • -0
    • +20
    /exec/java-exec/src/test/resources/parquet/expected/star.csv
Drill 1.10

    • -86
    • +86
    /_docs/configure-drill/079-configuring-drill-to-read-web-server-logs.md
update docs for 1.10

docs for the Drill 1.10 release

    • -0
    • +86
    /_docs/configure-drill/079-configuring-drill-to-read-web-server-logs.md
    • -0
    • +29
    /blog/_posts/2017-03-15-drill-1.10-released.md
DRILL-5318: Sub-operator test fixture

This commit depends on:

* DRILL-5323

This PR cannot be accepted (or built) until the above are pulled and

this PR is rebased on top of them. The PR is issued now so that reviews

can be done in parallel.

Provides the following:

* A new OperatorFixture to set up all the objects needed to test at the

sub-operator level. This relies on the refactoring to create the

required interfaces.

* Pulls the config builder code out of the cluster fixture builder so

that configs can be build for sub-operator tests.

* Modifies the QueryBuilder test tool to run a query and get back one

of the new row set objects to allow direct inspection of data returned

from a query.

* Modifies the cluster fixture to create a JDBC connection to the test

cluster. (Use requires putting the Drill JDBC project on the test class

path since exec does not depend on JDBC.)

Created a common subclass for the cluster and operator fixtures to

abstract out the allocator and config. Also provides temp directory

support to the operator fixture.

Merged with DRILL-5415 (Improve Fixture Builder to configure client

properties)

Moved row set tests here from DRILL-5323 so that DRILL-5323 is self

contained. (The tests depend on the fixtures defined here.)

Added comments where needed.

Puts code back as it was prior to a code review comment. The code is

redundant, but necessarily so due to code which is specific to several

primitive types.

closes #788

DRILL-5323: Test tools for row sets

Provide test tools to create, populate and compare row sets

To simplify tests, we need a TestRowSet concept that wraps a

VectorContainer and provides easy ways to:

- Define a schema for the row set.

- Create a set of vectors that implement the schema.

- Populate the row set with test data via code.

- Add an SV2 to the row set.

- Pass the row set to operator components (such as generated code

blocks.)

- Examine the contents of a row set

- Compare the results of the operation with an expected result set.

- Dispose of the underling direct memory when work is done.

This code builds on that in DRILL-5324 to provide a complete row set

API. See DRILL-5318 for the spec.

Note: this code can be reviewed as-is, but cannot be committed until

after DRILL-5324 is committed: this code has compile-time dependencies

on that code. This PR will be rebased once DRILL-5324 is pulled into

master.

Handles maps and intervals

The row set schema is refined to provide two forms of schema. A

physical schema shows the nested structure of the data with maps

expanding into their contents.

Updates the row set schema builder to easily build a schema with maps.

An access schema shows the row “flattened” to include just scalar

(non-map) columns, with all columns at a single level, with dotted

names identifying nested fields. This form makes for very simple access.

Then, provides tools for reading and writing batches with maps by

presenting the flattened view to the row reader and writer.

HyperVectors have a very complex structure for maps. The hyper row set

implementation takes a first crack at mapping that structure into the

standardized row set format.

Also provides a handy way to set an INTERVAL column from an int. There

is no good mapping from an int to an interval, so an arbitrary

convention is used. This convention is not generally useful, but is

very handy for quickly generating test data.

As before, this is a partial PR. The code here still depends on

DRILL-5324 to provide the column accessors needed by the row reader and

writer.

All this code is getting rather complex, so this commit includes a unit

test of the schema and row set code.

Revisions to support arrays

Arrays require a somewhat different API. Refactored to allow arrays to

appear as a field type.

While refactoring, moved interfaces to more logical locations.

Added more comments.

Rejiggered the row set schema to provide both a physical and flattened

(access) schema, both driven from the original batch schema.

Pushed some accessor and writer classes into the accessor layer.

Added tests for arrays.

Also added more comments where needed.

Moved tests to DRILL-5318

The test classes previously here depend on the new “operator fixture”.

To provide a non-cyclic checkin order, moved the tests to the PR with

the fixtures so that this PR is clear of dependencies. The tests were

reviewed in the context of DRILL-5318.

Also pulls in batch sizer support for map fields which are required by

the tests.

closes #785

Doc updates for Drill 1.10

    • -0
    • +307
    /_docs/rn/001-1.10.0-rn.md
DRILL-5355: Misc. code cleanup closes #784

  1. … 9 more files in changeset.
Update version to 1.11.0-SNAPSHOT

    • -1
    • +1
    /contrib/data/tpch-sample-data/pom.xml
  1. … 13 more files in changeset.
add CTTAS doc for Drill 1.10 release

DRILL-5352: Profile parser printing for multi fragments

Enhances the recently added ProfileParser to display run times for

queries that contain multiple fragments. (The original version handled

just a single fragment.)

Prints the query in “classic” mode if it is linear, or in the new

semi-indented mode if the query forms a tree.

Also cleans up formatting - removing spaces between parens.

Fixes from review

close apache/drill#782

* Fixed process time percent.

* Added support for getting operator profiles in a multi-fragment query.

post content for AD 1.10 release

    • binary
    /_docs/img/jdbc_connection_tries.png
    • binary
    /_docs/img/multiple_drill_versions.jpg
    • -0
    • +26
    /_docs/install/070-identifying-multiple-drill-versions-in-a-cluster.md
Test-specific column accessor implementation. Provides a simplified, unified set of access methods for value vectors specifically for wrting simple, compact unit test code.

* Interfaces for column readers and writers

* Interfaces for tuple (row and map) readers and writers

* Generated implementations

* Base implementation used by the generated code

* Factory class to create the proper reader or writer given a major

type (type and cardinality)

* Utilities for generic access, type conversions, etc.

Many vector types can be mapped to an int for get and set. One key

exception are the decimal types: decimals, by definition, require a

different representation. In Java, that is `BigDecimal`. Added get, set

and setSafe accessors as required for each decimal type that uses

`BigDecimal` to hold data.

The generated code builds on the `valueVectorTypes.tdd` file, adding

additional properties needed to generate the accessors.

The PR also includes a number of code cleanups done while reviewing

existing code. In particular `DecimalUtility` was very roughly

formatted and thus hard to follow.

Supports Drill’s interval types (INTERVAL, INTERVALDAY,

INTERVALYEAR) in the form of the Joda interval class.

Adds support for Map vectors. Maps are treated as nested tuples and are

expanded out to create a flattened row in the schema. The accessors

then access rows using the flattened column index or the combined name

(“a.b”).

Supports arrays via a writer interface that appends values as written,

and an indexed, random-access reader interface.

Removed HTTP log parser from JDBC jar to keep the JDBC jar from getting

too big.

close apache/drill#783

    • -0
    • +331
    /exec/vector/src/main/codegen/templates/ColumnAccessors.java
  1. … 21 more files in changeset.
DRILL-5344: External sort priority queue copier fails with an empty batch

Unit tests showed that the “priority queue copier” does not handle an

empty batch. This has not been an issue because code elsewhere in the

sort specifically works around this issue. This fix resolves the issue

at the source to avoid the need for future work-arounds.

closes #778

DRILL-5349: Fix TestParquetWriter unit tests when synchronous parquet reader is used.

close apache/drill#780

DRILL-5330: NPE in FunctionImplementationRegistry

Fixes:

* DRILL-5330: NPE in

FunctionImplementationRegistry.functionReplacement()

* DRILL-5331:

NPE in FunctionImplementationRegistry.findDrillFunction() if dynamic

UDFs disabled

When running in a unit test, the dynamic UDF (DUDF) mechanism is not

available. When running in production, the DUDF mechanism is available,

but may be disabled.

One confusing aspect of this code is that the function registry

is given the option manager, but the option manager is not yet valid

(not yet initialized) in the function registry constructor. So, we

cannot access the option manager in the function registry constructor.

In any event, the existing system options cannot be used to disable DUDF

support. For obscure reasons, DUDF support is always enabled, even when

disabled by the user.

Instead, for DRILL-5331, we added a config option to "really" disable DUDFS.

The property is set only for tests, disables DUDF support.

Note that, in the future, this option could be generalized to

"off, read-only, on" to capture the full set of DUDF modes.

But, for now, just turning this off is sufficient.

For DRILL-5330, we use an existing option validator rather than

accessing the raw option directly.

Also includes a bit of code cleanup in the class in question.

The result is that the code now works when used in a sub-operator unit

test.

close apache/drill#777

[maven-release-plugin] prepare release drill-1.10.0

    • -1
    • +1
    /contrib/data/tpch-sample-data/pom.xml
  1. … 13 more files in changeset.
DRILL-5165: For limit all case, no need to push down limit to scan

DRILL-5326: Unit tests failures related to the SERVER_METADTA

- adding of the sql type name for the "GENERIC_OBJECT";

- changing "NullCollation" in the "ServerMetaProvider" to the correct default value;

- changing RpcType to GET_SERVER_META in the appropriate ServerMethod

close #775

DRILL-5316: Check drillbits size before we attempt to access the vector element

close apache/drill#772

DRILL-5315: Address small typo in the comment in drillClient.hpp closes #771

DRILL-4335: Apache Drill should support network encryption.

NOTE: This pull request provides support for on-wire encryption using SASL framework. Communication channel covered is:

1) C++ Drill Client and Drillbit channel.

close apache/drill#809

    • -3
    • +68
    /contrib/native/client/src/protobuf/User.pb.h
Add Arina to team list

DRILL-4678: Tune metadata by generating a dispatcher at runtime

main code changes are in Calcite library.

update drill's calcite version to 1.4.0-drill-r20.

close #793

  1. … 25 more files in changeset.