common

Clone Tools
  • last updated 29 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-5723: Added System Internal Options That can be Modified at Runtime Changes include:

1. Addition of internal options.

2. Refactoring of OptionManagers and OptionValidators.

3. Fixed ambiguity in the meaning of an option type, and changed its name to accessibleScopes.

4. Updated javadocs in the Option System classes.

5. Added RestClientFixture for testing the Rest API.

6. Fixed flakey test in TestExceptionInjection caused by race condition.

7. Fixed various tests which started zookeeper but failed to shut it down at the end of tests.

8. Added port hunting to the Drill Webserver for testing

9. Fixed various flaky tests

10. Fix compile issue

closes #923

  1. … 85 more files in changeset.
Update version to 1.12.0-SNAPSHOT

  1. … 27 more files in changeset.
[maven-release-plugin] prepare release drill-1.11.0

  1. … 27 more files in changeset.
DRILL-5431: SSL Support (Java) - Java client server SSL implementation

Also enable OpenSSL support

Also fix exclusions and java-exec pom file to eliminate netty-tcnative as a transitive dependency on all projects

  1. … 30 more files in changeset.
DRILL-5560: Create configuration file for distribution specific configuration This closes #848

  1. … 1 more file in changeset.
DRILL-5546: Handle schema change exception failure caused by empty input or empty batch.

1. Modify ScanBatch's logic when it iterates list of RecordReader.

1) Skip RecordReader if it returns 0 row && present same schema. A new schema (by calling Mutator.isNewSchema() ) means either a new top level field is added, or a field in a nested field is added, or an existing field type is changed.

2) Implicit columns are presumed to have constant schema, and are added to outgoing container before any regular column is added in.

3) ScanBatch will return NONE directly (called as "fast NONE"), if all its RecordReaders haver empty input and thus are skipped, in stead of returing OK_NEW_SCHEMA first.

2. Modify IteratorValidatorBatchIterator to allow

1) fast NONE ( before seeing a OK_NEW_SCHEMA)

2) batch with empty list of columns.

2. Modify JsonRecordReader when it get 0 row. Do not insert a nullable-int column for 0 row input. Together with ScanBatch, Drill will skip empty json files.

3. Modify binary operators such as join, union to handle fast none for either one side or both sides. Abstract the logic in AbstractBinaryRecordBatch, except for MergeJoin as its implementation is quite different from others.

4. Fix and refactor union all operator.

1) Correct union operator hanndling 0 input rows. Previously, it will ignore inputs with 0 row and put nullable-int into output schema, which causes various of schema change issue in down-stream operator. The new behavior is to take schema with 0 into account

in determining the output schema, in the same way with > 0 input rows. By doing that, we ensure Union operator will not behave like a schema-lossy operator.

2) Add a UnionInputIterator to simplify the logic to iterate the left/right inputs, removing significant chunk of duplicate codes in previous implementation.

The new union all operator reduces the code size into half, comparing the old one.

5. Introduce UntypedNullVector to handle convertFromJson() function, when the input batch contains 0 row.

Problem: The function convertFromJSon() is different from other regular functions in that it only knows the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have

a way to know the output type, and previously will assume Map type. That works under the assumption other operators like Union would ignore batch with 0 row, which is no longer

the case in the current implementation.

Solution: Use MinorType.NULL at the output type for convertFromJSON() when input contains 0 row. The new UntypedNullVector is used to represent a column with MinorType.NULL.

6. HBaseGroupScan convert star column into list of row_key and column family. HBaseRecordReader should reject column star since it expectes star has been converted somewhere else.

In HBase a column family always has map type, and a non-rowkey column always has nullable varbinary type, this ensures that HBaseRecordReader across different HBase regions will have the same top level schema, even if the region is

empty or prune all the rows due to filter pushdown optimization. In other words, we will not see different top level schema from different HBaseRecordReader for the same table.

However, such change will not be able to handle hard schema change : c1 exists in cf1 in one region, but not in another region. Further work is required to handle hard schema change.

7. Modify scan cost estimation when the query involves * column. This is to remove the planning randomness since previously two different operators could have same cost.

8. Add a new flag 'outputProj' to Project operator, to indicate if Project is for the query's final output. Such Project is added by TopProjectVisitor, to handle fast NONE when all the inputs to the query are empty

and are skipped.

1) column star is replaced with empty list

2) regular column reference is replaced with nullable-int column

3) An expression will go through ExpressionTreeMaterializer, and use the type of materialized expression as the output type

4) Return an OK_NEW_SCHEMA with the schema using the above logic, then return a NONE to down-stream operator.

9. Add unit test to test operators handling empty input.

10. Add unit test to test query when inputs are all empty.

DRILL-5546: Revise code based on review comments.

Handle implicit column in scan batch. Change interface in ScanBatch's constructor.

1) Ensure either the implicit column list is empty, or all the reader has the same set of implicit columns.

2) We could skip the implicit columns when check if there is a schema change coming from record reader.

3) ScanBatch accept a list in stead of iterator, since we may need go through the implicit column list multiple times, and verify the size of two lists are same.

ScanBatch code review comments. Add more unit tests.

Share code path in ProjectBatch to handle normal setupNewSchema() and handleNullInput().

- Move SimpleRecordBatch out of TopNBatch to make it sharable across different places.

- Add Unit test verify schema for star column query against multilevel tables.

Unit test framework change

- Fix memory leak in unit test framework.

- Allow SchemaTestBuilder to pass in BatchSchema.

close #906

  1. … 67 more files in changeset.
DRILL-5512: Standardize error handling in ScanBatch

Standardizes error handling to throw a UserException. Prior code threw

various exceptions, called the fail() method, or returned a variety of

status codes.

closes #838

  1. … 5 more files in changeset.
DRILL-5325: Unit tests for the managed sort

Uses the sub-operator test framework (DRILL-5318), including the test

row set abstraction (DRILL-5323) to enable unit testing of the

“managed” external sort. This PR allows early review of the code, but

cannot be pulled until the dependencies (mentioned above) are pulled.

Refactors the external sort code into small chunks that can be unit

tested, then “wraps” that code in tests for all interesting data types,

record batch sizes, and so on.

Refactors some of the operator definitions to more easily allow

programmatic setup in the unit tests.

Fixes a number of bugs discovered by the unit tests. The biggest

changes were in the new code: the code that computes spilling and

merging based on memory levels.

Otherwise, although GitHub will show many files change, most of the

changes are simply moving blocks of code around to create smaller units

that can be tested independently.

Includes a refactoring of the code that does spilling, along with a

complete set of low-level unit tests.

Excludes long-running sort tests.

Defines a test category for long-running tests.

First attempt to provide a way to run such tests from Maven.

closes #808

    • -0
    • +70
    ./src/main/java/org/apache/drill/test/SecondaryTest.java
  1. … 50 more files in changeset.
DRILL-5419: Calculate return string length for literals & some string functions

1. Revisited calculation logic for string literals and some string functions

(cast, upper, lower, initcap, reverse, concat, concat operator, rpad, lpad, case statement,

coalesce, first_value, last_value, lag, lead).

Synchronized return type length calculation logic between limit 0 and regular queries.

2. Deprecated width and changed it to precision for string types in MajorType.

3. Revisited FunctionScope and splitted it into FunctionScope and ReturnType.

FunctionScope will indicate only function usage in term of number of in / out rows, (n -> 1, 1 -> 1, 1->n).

New annotation in UDFs ReturnType will indicate which return type strategy should be used.

4. Changed MAX_VARCHAR_LENGTH from 65536 to 65535.

5. Updated calculation of precision and display size for INTERVALYEAR & INTERVALDAY.

6. Refactored part of function code-gen logic (ValueReference, WorkspaceReference, FunctionAttributes, DrillFuncHolder).

This closes #819

  1. … 78 more files in changeset.
Update version to 1.11.0-SNAPSHOT

  1. … 27 more files in changeset.
[maven-release-plugin] prepare release drill-1.10.0

  1. … 27 more files in changeset.
DRILL-5326: Unit tests failures related to the SERVER_METADTA

- adding of the sql type name for the "GENERIC_OBJECT";

- changing "NullCollation" in the "ServerMetaProvider" to the correct default value;

- changing RpcType to GET_SERVER_META in the appropriate ServerMethod

close #775

  1. … 3 more files in changeset.
DRILL-5260: Extend "Cluster Fixture" test framework

- Config option to suppress printing of CSV and other output. (Allows

printing for single tests, not printing when running from Maven.)

- Parsing of query profiles to extract plan and run time information.

- Fix bug in log fixture when enabling logging for a package.

- Improved ZK support.

- Set up the new CTTAS default temporary workspace for tests.

- Clean up persistent storage files on disk to avoid CTTAS startup

failures.

- Provides a set of examples for how to use the cluster fixture.

closes #753

  1. … 17 more files in changeset.
DRILL-4335: Apache Drill should support network encryption.

NOTE: This pull request provides support for on-wire encryption using SASL framework. The communication channel that are covered are:

1) Between Drill JDBC client and Drillbit.

2) Between Drillbit to Drillbit i.e. control/data channels.

3) It has UI change to view encryption is enabled on which network channel and number of encrypted/unencrypted connections for

user/control/data connections.

close apache/drill#773

  1. … 60 more files in changeset.
DRILL-4280: CORE (user to bit authentication, Java)

+ Add logic for authentication in UserClient and UserServer with

backward compatibility in both directions

+ Add abstract extension to ServerConnection and ClientConnection

+ Add concrete extensions to abstract connections:

BitToUserConnection and UserToBitConnection

+ Add ConnectionConfig interface with abstract and concrete

implementations to encapsulate configuration for server-side

connections

+ Encapsulate all requests handled by UserServer in

UserServerRequestHandler

+ Clear UserSession when connection is closed either by user or

bit

+ Add DrillProperties to encapsulate all connection properties

used during connection time

  1. … 15 more files in changeset.
DRILL-4280: HYGIENE

+ Do not recreate DrillConfig object in PamUserAuthenticator

+ Add new factory method to CaseInsensitiveMap

+ Clean documentation

  1. … 2 more files in changeset.
DRILL-4280: CORE (service login)

+ Support Drillbit login to KDC using Hadoop's

UserGroupInformation library

+ Set hostname in BootstrapContext

+ Use process user's short name in ImpersonationUtil

+ Add KerberosUtil class

    • -0
    • +93
    ./src/main/java/org/apache/drill/common/KerberosUtil.java
  1. … 4 more files in changeset.
Update version to 1.10.0-SNAPSHOT

  1. … 27 more files in changeset.
DRILL-5056: Fix UserException to write full message to log

A case occurred in which the External Sort failed during spilling.

All that was written to the log was:

2016-11-18 ... INFO o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred

Modified the logging code to provide more information to aid tracking down problems that occur in the field.

closes #665

[maven-release-plugin] prepare release drill-1.9.0

  1. … 27 more files in changeset.
DRILL-4994: Add back JDBC prepared statement for older servers

When the JDBC client is connected to an older Drill server, it always

attempted to use server-side prepared statement with no fallback.

With this change, client will check server version and will fallback to the

previous client-side prepared statement (which is still limited to only execute

queries and does not provide metadata).

close #613

    • -0
    • +157
    ./src/main/java/org/apache/drill/common/Version.java
    • -0
    • +103
    ./src/test/java/org/apache/drill/common/TestVersion.java
  1. … 22 more files in changeset.
DRILL-1950: Parquet rowgroup level filter pushdown in query planning time.

Implement Parquet rowgroup level filter pushdown. The filter pushdown is performed in

in Drill physical planning phase.

Only a local filter, which refers to columns in a single table, is qualified for filter pushdown.

A filter may be qualified if it is a simple comparison filter, or a compound "and/or" filter consists of

simple comparison filter. Data types allowed in comparison filter are int, bigint, float, double, date,

timestamp, time. Comparison operators are =, !=, <, <=, >, >=. Operands have to be a column of the above

data types, or an explicit cast or implicit cast function, or a constant expressions.

This closes #637

  1. … 43 more files in changeset.
DRILL-4969: Basic implementation of display size for column metadata

Add a simple implementation of display size, based on the ODBC table

available at: https://msdn.microsoft.com/en-us/library/ms713974(v=vs.85).aspx

closes #632

  1. … 5 more files in changeset.
DRILL-4369: Exchange name and version infos during handshake

There's no name and version exchanged between client and server over the User RPC

channel.

On client side, having access to the server name and version is useful to expose it

to the user (through JDBC or ODBC api like DatabaseMetadata#getDatabaseProductVersion()),

or to implement fallback strategy when some recent API are not available (like

metadata API).

On the server side, having access to the client version might be useful for audit

purposes and eventually to implement fallback strategy if it doesn't require a RPC

version change.

this closes #622

  1. … 21 more files in changeset.
DRILL-4945: Report INTERVAL exact type as column type name

closes #618

  1. … 1 more file in changeset.
DRILL-4800: Various fixes. Fix buffer underflow exception in BufferedDirectBufInputStream. Fix writer index for in64 dictionary encoded types. Added logging to help debug. Fix memory leaks. Work around issues with of InputStream.available() ( Do not use hasRemainder; Remove check for EOF in BufferedDirectBufInputStream.read() ). Finalize defaults. Remove commented code. Addressed review comments

This closes #611

  1. … 11 more files in changeset.
Update version to 1.9.0-SNAPSHOT.

  1. … 26 more files in changeset.
DRILL-4862: Fix binary_string to use an injected buffer as out buffer

+ Previously, binary_string used the input buffer as output buffer. So after calling binary_string, the original content was destroyed. Other expressions/ functions that need to access the original input buffer get wrong results.

+ This fix also sets readerIndex and writerIndex correctly for the output buffer, otherwise the consumer of the output buffer will hit issues.

closes #604

  1. … 3 more files in changeset.
[maven-release-plugin] prepare release drill-1.8.0

  1. … 26 more files in changeset.
DRILL-4800: Parallelize column reading. Read/Decode fixed width fields in parallel Decoding var length columns in parallel Use simplified decompress method for Gzip and Snappy decompression. Avoids concurrency issue with Parquet decompression. (It's also faster). Stress test Parquet read write Parallel column reader is disabled by default (may perform less well under higher concurrency)

  1. … 10 more files in changeset.