Clone Tools
  • last updated 25 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7442: Create multi-batch row set reader

Adds a ResultSetReader that works across multiple batches

in a result set. Reuses the same row set and readers if

schema is unchanged, creates a new set if the schema changes.

Adds a unit test for the result set reader.

Adds a "rebind" capability to the row set readers to focus

on new buffers under an existing set of vectors. Used when

a new batch arrives, if the schema is unchanged.

Extends row set classses to be aware of the BatchAccessor class

which encapsulates a container and optional selection vector,

and tracks schema changes.

Moves row set tests into the same package as the row sets.

(Row set classes were moved a while back, but the tests were

not moved.)

Renames some BatchAccessor methods.

closes #1897

  1. … 62 more files in changeset.
DRILL-7441: Fix issues with fillEmpties, offset vectors

Fixes subtle issues with offset vectors and "fill empties"

logic.

Drill has an informal standard that if a batch has no rows, then

offset vectors within that batch should have zero size. Contrast

this with batches of size 1 that should have offset vectors of

size 2. Changed to enforce this rule throughout.

Nullable, repeated and variable-width vectors have "fill empties"

logic that is used in two places: when setting the value count and

when preparing to write a new value. The current logic is not

quite right for either case. Added tests and fixed the code to

properly handle each case.

Revised the batch validator to enforce the offset-vector length of 0 for

0-sized batches rule. The result was much simpler code.

Added tools to easily print a batch, restoring some code that

was recently lost when the RowSet classes were moved.

Code cleanup in all files touched.

Added logic to "dirty" allocated buffers when testing to ensure

logic is not sensitive to the "pristine" state of new buffers.

Added logic to the column writers to enforce the zero-size-batch rule

for offset vectors. Added unit tests for this case.

Fixed the column writers to set the "lastSet" mutator value for

nullable types since other code relies on this value.

Removed the "setCount" field in nullable vectors: turns out

it is not actually used.

closes #1896

  1. … 43 more files in changeset.
DRILL-7412: Minor unit test improvements

Many tests intentionally trigger errors. A debug-only log setting

sent those errors to stdout. The resulting stack dumps simply cluttered

the test output, so disabled error output to the console.

Drill can apply bounds checks to vectors. Tests run via Maven

enable bounds checking. Now, bounds checking is also enabled in

"debug mode" (when assertions are enabled, as in an IDE.)

Drill contains two test frameworks. The older BaseTestQuery was

marked as deprecated, but many tests still use it and are unlikely

to be changed soon. So, removed the deprecated marker to reduce the

number of spurious warnings.

Also includes a number of minor clean-ups.

closes #1876

  1. … 17 more files in changeset.
DRILL-7100: Fixed IllegalArgumentException when reading Parquet data

  1. … 4 more files in changeset.
DRILL-7019: Add check for redundant imports

close apache/drill#1629

    • -2
    • +0
    ./io/netty/buffer/UnsafeDirectLittleEndian.java
  1. … 23 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

    • -2
    • +2
    ./org/apache/drill/exec/memory/Accountant.java
  1. … 979 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 144 more files in changeset.
DRILL-6468: CatastrophicFailures should not do a graceful shutdown of drill when terminating the JVM.

closes #1306

  1. … 6 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 231 more files in changeset.
DRILL-6389: Fixed building javadocs - Added documentation about how to build javadocs - Fixed some of the javadoc warnings

closes #1276

    • -2
    • +2
    ./org/apache/drill/exec/ops/BufferManager.java
  1. … 62 more files in changeset.
DRILL-5846: Improve parquet performance for Flat Data Types

closes #1060

  1. … 26 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

    • -1
    • +1
    ./io/netty/buffer/ExpandableByteBuf.java
    • -2
    • +1
    ./io/netty/buffer/UnsafeDirectLittleEndian.java
    • -2
    • +2
    ./org/apache/drill/exec/ops/BufferManager.java
    • -1
    • +1
    ./org/apache/drill/exec/util/Pointer.java
  1. … 2052 more files in changeset.
DRILL-6053: Avoid excessive locking in LocalPersistentStore

closes #1163

  1. … 18 more files in changeset.
DRILL-6230: Extend row set readers to handle hyper vectors

closes #1161

  1. … 64 more files in changeset.
DRILL-6202: Deprecate usage of IndexOutOfBoundsException to re-alloc vectors

closes #1144

  1. … 9 more files in changeset.
DRILL-6102: Fix ConcurrentModificationException in the BaseAllocator's print method

closes #1100

DRILL-6080: Sort incorrectly limits batch size to 65535 records

closes #1090

* Sort incorrectly limits batch size to 65535 records rather than 65536.

* This PR also includes a few code cleanup items.

* Fix for overflow in offset vector in row set writer

* Performance tool update

* Replace "unsafe" methods with "set" methods

* Also fixes an indexing issue with nullable writers

* Removed debug & timing code

* Increase strictness for batch size

  1. … 10 more files in changeset.
DRILL-6004: Direct buffer bounds checking should be disabled by default

This closes #1070

  1. … 6 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 365 more files in changeset.
DRILL-5830: Resolve regressions to MapR DB from DRILL-5546

- Back out HBase changes

- Code cleanup

- Test utilities

- Fix for DRILL-5829

closes #968

  1. … 22 more files in changeset.
DRILL-5808: Reduce memory allocator strictness for "managed" operators

closes #958

    • -4
    • +94
    ./org/apache/drill/exec/memory/Accountant.java
  1. … 3 more files in changeset.
DRILL-5694: Handle OOM in HashAggr by spill and retry, reserve memory, spinner

  1. … 20 more files in changeset.
DRILL-5657: Size-aware vector writer structure

- Vector and accessor layer

- Row Set layer

- Tuple and column models

- Revised write-time metadata

- "Result set loader" layer

this closes #914

    • -29
    • +33
    ./io/netty/buffer/PooledByteBufAllocatorL.java
    • -35
    • +54
    ./org/apache/drill/exec/memory/AllocationManager.java
  1. … 185 more files in changeset.
DRILL-5723: Added System Internal Options That can be Modified at Runtime Changes include:

1. Addition of internal options.

2. Refactoring of OptionManagers and OptionValidators.

3. Fixed ambiguity in the meaning of an option type, and changed its name to accessibleScopes.

4. Updated javadocs in the Option System classes.

5. Added RestClientFixture for testing the Rest API.

6. Fixed flakey test in TestExceptionInjection caused by race condition.

7. Fixed various tests which started zookeeper but failed to shut it down at the end of tests.

8. Added port hunting to the Drill Webserver for testing

9. Fixed various flaky tests

10. Fix compile issue

closes #923

    • -1
    • +1
    ./org/apache/drill/exec/util/Pointer.java
  1. … 85 more files in changeset.
DRILL-5431: Upgrade Netty to 4.0.47

    • -9
    • +31
    ./io/netty/buffer/UnsafeDirectLittleEndian.java
  1. … 1 more file in changeset.
DRILL-5517: Size-aware set methods in value vectors

Please see DRILL-5517 for an explanation.

Also includes a workaround for DRILL-5529.

Implements a setEmpties method for repeated and non-nullable

variable-width types in support of the revised column accessors.

Unit test included. Without the setEmpties call, the tests fail with

vector corruption. With the call, things work properly.

closes #840

    • -19
    • +56
    ./io/netty/buffer/UnsafeDirectLittleEndian.java
  1. … 24 more files in changeset.
DRILL-5325: Unit tests for the managed sort

Uses the sub-operator test framework (DRILL-5318), including the test

row set abstraction (DRILL-5323) to enable unit testing of the

“managed” external sort. This PR allows early review of the code, but

cannot be pulled until the dependencies (mentioned above) are pulled.

Refactors the external sort code into small chunks that can be unit

tested, then “wraps” that code in tests for all interesting data types,

record batch sizes, and so on.

Refactors some of the operator definitions to more easily allow

programmatic setup in the unit tests.

Fixes a number of bugs discovered by the unit tests. The biggest

changes were in the new code: the code that computes spilling and

merging based on memory levels.

Otherwise, although GitHub will show many files change, most of the

changes are simply moving blocks of code around to create smaller units

that can be tested independently.

Includes a refactoring of the code that does spilling, along with a

complete set of low-level unit tests.

Excludes long-running sort tests.

Defines a test category for long-running tests.

First attempt to provide a way to run such tests from Maven.

closes #808

  1. … 50 more files in changeset.
DRILL-5601: Rollup of external sort fixes an improvements

- DRILL-5513: Managed External Sort : OOM error during the merge phase

- DRILL-5519: Sort fails to spill and results in an OOM

- DRILL-5522: OOM during the merge and spill process of the managed external sort

- DRILL-5594: Excessive buffer reallocations during merge phase of external sort

- DRILL-5597: Incorrect "bits" vector allocation in nullable vectors allocateNew()

- DRILL-5602: Repeated List Vector fails to initialize the offset vector

- DRILL-5617: Spill file name collisions when spill file is on a shared file system

0 DRILL-5445: bug in repeated map vector deserialization

- Workaround for DRILL-5656: Streaming Agg Batch forces sort to retain in-memory batches past NONE

- Fixes for the "record batch sizer" to handle for UNION, MAP, LIST types

Fixes a longstanding bug in the deserialization of a repeated map

vector read from a spill file. A few minor code cleanups also.

All of the bugs have to do with handling low-memory conditions, and with

correctly estimating the sizes of vectors, even when those vectors come

from the spill file or from an exchange. Hence, the changes for all of

the above issues are interrelated.

Also includes some fixes for tests:

* Certain tests require the ability to enforce the output size of the

memory merge/sort. Restored this option.

* Resolve issue with TestDrillbitResilience

* A particular test injects a fault during in-memory sort, but used a

single-batch input (which does not need the merge phase.)

Rather than introduce a new config property (the earlier solution),

altered the test to use input that returns more than one batch.

Two fixes forBasicPhysicalOpUnitTest — a test that uses JMockit to create a “fake”

fragment context.

1. No drillbit endpoint is available, so the SpillSet change that adds

the node to the spill path failed. Solution was to omit the node path

segment in such tests.

2. The Dynamic UDF registry is null causing a crash. This has nothing

to do with sort. Perhaps some pre-existing error? Anyway, added a check

for this condition.

close #860

  1. … 61 more files in changeset.
DRILL-5385: Vector serializer fails to read saved SV2

Unit testing revealed that the VectorAccessorSerializable class claims

to serialize SV2s, but, in fact, does not. Actually, it writes them,

but does not read them, resulting in corrupted data on read.

Fortunately, no code appears to serialize sv2s at present. Still, it is

a bug and needs to be fixed.

First task is to add serialization code for the sv2.

That revealed that the recently-added code to save DrillBufs using a

shared buffer had a bug: it relied on the writer index to know how much

data is in the buffer. Turns out sv2 buffers don’t set this index. So,

new versions of the write function takes a write length.

Then, closer inspection of the read code revealed duplicated code. So,

DrillBuf allocation moved into a version of the read function that now

does reading and DrillBuf allocation.

Turns out that value vectors, but not SV2s, can be built from a

Drillbuf. Added a matching constructor to the SV2 class.

Finally, cleaned up the code a bit to make it easier to follow. Also

allowed test code to access the handy timer already present in the code.

closes #800

  1. … 3 more files in changeset.
DRILL-5275: Sort spill is slow due to repeated allocations

Rather than create a heap buffer per vector when writing and reading,

the revised code creates a single, shared buffer used for all I/O

within a particular container. This improves performance by reducing GC

and CPU costs during I/Os.

Move I/O buffer, and methods to allocator

Allows the buffer to be shared. Especially in the sort, this is

important, as the sort may have many serializations open at once.

closes #754

  1. … 1 more file in changeset.