Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6422: Replace guava imports with shaded ones

  1. … 984 more files in changeset.
DRILL-6438: Remove excess logging form the tests. - Removed usages of System.out and System.err from the test and replaced with loggers

closes #1284

  1. … 90 more files in changeset.
DRILL-6422: Update guava to 23.0 and shade it

- Fix compilation errors for new version of Guava.

- Remove usage of deprecated API

- Shade guava and add dependencies to the shaded version

- Ban unshaded package

- Introduce drill-shaded module and move guava-shaded under it

- Add methods to convert shaded guava lists to the unshaded ones

- Add instruction for publishing artifacts to the Apache repository

  1. … 82 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 223 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 365 more files in changeset.
DRILL-5752 this change includes:

1. Increased test parallelism and fixed associated bugs

2. Added test categories and categorized tests appropriately

- Don't exclude anything by default

- Increase test timeout

- Fixed flakey test

closes #940

  1. … 265 more files in changeset.
DRILL-5546: Handle schema change exception failure caused by empty input or empty batch.

1. Modify ScanBatch's logic when it iterates list of RecordReader.

1) Skip RecordReader if it returns 0 row && present same schema. A new schema (by calling Mutator.isNewSchema() ) means either a new top level field is added, or a field in a nested field is added, or an existing field type is changed.

2) Implicit columns are presumed to have constant schema, and are added to outgoing container before any regular column is added in.

3) ScanBatch will return NONE directly (called as "fast NONE"), if all its RecordReaders haver empty input and thus are skipped, in stead of returing OK_NEW_SCHEMA first.

2. Modify IteratorValidatorBatchIterator to allow

1) fast NONE ( before seeing a OK_NEW_SCHEMA)

2) batch with empty list of columns.

2. Modify JsonRecordReader when it get 0 row. Do not insert a nullable-int column for 0 row input. Together with ScanBatch, Drill will skip empty json files.

3. Modify binary operators such as join, union to handle fast none for either one side or both sides. Abstract the logic in AbstractBinaryRecordBatch, except for MergeJoin as its implementation is quite different from others.

4. Fix and refactor union all operator.

1) Correct union operator hanndling 0 input rows. Previously, it will ignore inputs with 0 row and put nullable-int into output schema, which causes various of schema change issue in down-stream operator. The new behavior is to take schema with 0 into account

in determining the output schema, in the same way with > 0 input rows. By doing that, we ensure Union operator will not behave like a schema-lossy operator.

2) Add a UnionInputIterator to simplify the logic to iterate the left/right inputs, removing significant chunk of duplicate codes in previous implementation.

The new union all operator reduces the code size into half, comparing the old one.

5. Introduce UntypedNullVector to handle convertFromJson() function, when the input batch contains 0 row.

Problem: The function convertFromJSon() is different from other regular functions in that it only knows the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have

a way to know the output type, and previously will assume Map type. That works under the assumption other operators like Union would ignore batch with 0 row, which is no longer

the case in the current implementation.

Solution: Use MinorType.NULL at the output type for convertFromJSON() when input contains 0 row. The new UntypedNullVector is used to represent a column with MinorType.NULL.

6. HBaseGroupScan convert star column into list of row_key and column family. HBaseRecordReader should reject column star since it expectes star has been converted somewhere else.

In HBase a column family always has map type, and a non-rowkey column always has nullable varbinary type, this ensures that HBaseRecordReader across different HBase regions will have the same top level schema, even if the region is

empty or prune all the rows due to filter pushdown optimization. In other words, we will not see different top level schema from different HBaseRecordReader for the same table.

However, such change will not be able to handle hard schema change : c1 exists in cf1 in one region, but not in another region. Further work is required to handle hard schema change.

7. Modify scan cost estimation when the query involves * column. This is to remove the planning randomness since previously two different operators could have same cost.

8. Add a new flag 'outputProj' to Project operator, to indicate if Project is for the query's final output. Such Project is added by TopProjectVisitor, to handle fast NONE when all the inputs to the query are empty

and are skipped.

1) column star is replaced with empty list

2) regular column reference is replaced with nullable-int column

3) An expression will go through ExpressionTreeMaterializer, and use the type of materialized expression as the output type

4) Return an OK_NEW_SCHEMA with the schema using the above logic, then return a NONE to down-stream operator.

9. Add unit test to test operators handling empty input.

10. Add unit test to test query when inputs are all empty.

DRILL-5546: Revise code based on review comments.

Handle implicit column in scan batch. Change interface in ScanBatch's constructor.

1) Ensure either the implicit column list is empty, or all the reader has the same set of implicit columns.

2) We could skip the implicit columns when check if there is a schema change coming from record reader.

3) ScanBatch accept a list in stead of iterator, since we may need go through the implicit column list multiple times, and verify the size of two lists are same.

ScanBatch code review comments. Add more unit tests.

Share code path in ProjectBatch to handle normal setupNewSchema() and handleNullInput().

- Move SimpleRecordBatch out of TopNBatch to make it sharable across different places.

- Add Unit test verify schema for star column query against multilevel tables.

Unit test framework change

- Fix memory leak in unit test framework.

- Allow SchemaTestBuilder to pass in BatchSchema.

close #906

  1. … 67 more files in changeset.
DRILL-5485: Remove WebServer dependency on DrillClient

1. Added WebUserConnection/AnonWebUserConnection and their providers for Authenticated and Anonymous web users.

2. Updated to store the UserSession, BufferAllocator and other session states inside the HttpSession of Jetty instead

of storing in DrillUserPrincipal. For each request now a new instance of WebUserConnection will be created. However

for authenticated users the UserSession and other states will be re-used whereas for Anonymous Users it will created

for each request and later re-cycled after query execution.

close #829

  1. … 46 more files in changeset.
DRILL-5116: Enable generated code debugging in each Drill operator

DRILL-5052 added the ability to debug generated code. The reviewer suggested

permitting the technique to be used for all Drill operators. This PR provides

the required fixes. Most were small changes, others dealt with the rather

clever way that the existing byte-code merge converted static nested classes

to non-static inner classes, with the way that constructors were inserted

at the byte-code level and so on. See the JIRA for the details.

This code passed the unit tests twice: once with the traditional byte-code

manipulations, a second time using "plain-old Java" code compilation.

Plain-old Java is turned off by default, but can be turned on for all

operators with a single config change: see the JIRA for info. Consider

the plain-old Java option to be experimental: very handy for debugging,

perhaps not quite tested enough for production use.

close apache/drill#716

  1. … 62 more files in changeset.
DRILL-4715: Fix java compilation error in run-time generated code when query has large number of expressions.

Refactor unit test in drillbit context initialization and pass in option manager.

close apache/drill#521

  1. … 53 more files in changeset.
DRILL-3742: Classpath scanning and build improvement

Makes the classpath scanning a build time class discovery

Makes the fmpp generation incremental

Removes some slowness in DrillBit closing

Reduces the build time by 30%

This closes #148

  1. … 143 more files in changeset.
DRILL-3598: use a factory to create the root allocator. - made the constructor for TopLevelAllocator package private to enforce this

Delete a test that had been commented out for over a year, it no longer compiles due to interface changes and there is plenty of other testing for hash aggregate.

  1. … 43 more files in changeset.
DRILL-634: Cleanup/organize Java imports and trailing whitespaces from Drill code

  1. … 769 more files in changeset.
DRILL-1166: Session option to enable/disable debug information in runtime generated Java code

+ By default, debug options are enabled but can be disabled by setting the session option `exec.java_compiler_debug` to false.

+ Allow the defaults for compiler options to be set through configuration.

  1. … 23 more files in changeset.
Switch to DrillBuf Add @Inject DrillBuf Move comparison functions to memory sensitive ones Add scalar replacement functionality for value holders Simplify date parsing function Add local compiled code caching

  1. … 213 more files in changeset.
DRILL-733: Reduce memory footprint of unit tests

  1. … 33 more files in changeset.
Make tests extend shared base class. Add additional tracking in base class around memory usage per test.

  1. … 57 more files in changeset.
DRILL-334: Subdivide Drillbit control and data messages. Add support for socket backpressure. Add TopLevel and Child memory allocator with debug mode to capture memory leaks. Various memory leak fixes to get build to complete.

Also includes fixes from reviews by Tim.

  1. … 212 more files in changeset.
DRILL-274: Spooling batch buffer

  1. … 38 more files in changeset.
DRILL-312: Modularize org.apache.drill.exec.physical.impl.ImplCreator using operator creator registry

  1. … 17 more files in changeset.
DRILL-300: Move to "com.codahale.metrics" from "com.yammer.metrics"

  1. … 24 more files in changeset.
DRILL-221 Add license header to all files

  1. … 829 more files in changeset.
DRILL-165: Reorganize directories (moves only)

    • -0
    • +68
    ./TestSimpleUnion.java
  1. … 1739 more files in changeset.