Clone Tools
  • last updated 10 mins ago
Constraints: committers
Constraints: files
Constraints: dates
DRILL-7441: Fix issues with fillEmpties, offset vectors

Fixes subtle issues with offset vectors and "fill empties"


Drill has an informal standard that if a batch has no rows, then

offset vectors within that batch should have zero size. Contrast

this with batches of size 1 that should have offset vectors of

size 2. Changed to enforce this rule throughout.

Nullable, repeated and variable-width vectors have "fill empties"

logic that is used in two places: when setting the value count and

when preparing to write a new value. The current logic is not

quite right for either case. Added tests and fixed the code to

properly handle each case.

Revised the batch validator to enforce the offset-vector length of 0 for

0-sized batches rule. The result was much simpler code.

Added tools to easily print a batch, restoring some code that

was recently lost when the RowSet classes were moved.

Code cleanup in all files touched.

Added logic to "dirty" allocated buffers when testing to ensure

logic is not sensitive to the "pristine" state of new buffers.

Added logic to the column writers to enforce the zero-size-batch rule

for offset vectors. Added unit tests for this case.

Fixed the column writers to set the "lastSet" mutator value for

nullable types since other code relies on this value.

Removed the "setCount" field in nullable vectors: turns out

it is not actually used.

closes #1896

  1. … 43 more files in changeset.
DRILL-7412: Minor unit test improvements

Many tests intentionally trigger errors. A debug-only log setting

sent those errors to stdout. The resulting stack dumps simply cluttered

the test output, so disabled error output to the console.

Drill can apply bounds checks to vectors. Tests run via Maven

enable bounds checking. Now, bounds checking is also enabled in

"debug mode" (when assertions are enabled, as in an IDE.)

Drill contains two test frameworks. The older BaseTestQuery was

marked as deprecated, but many tests still use it and are unlikely

to be changed soon. So, removed the deprecated marker to reduce the

number of spurious warnings.

Also includes a number of minor clean-ups.

closes #1876

  1. … 17 more files in changeset.
DRILL-7100: Fixed IllegalArgumentException when reading Parquet data

  1. … 4 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 980 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 144 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 231 more files in changeset.
DRILL-6389: Fixed building javadocs - Added documentation about how to build javadocs - Fixed some of the javadoc warnings

closes #1276

  1. … 64 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2059 more files in changeset.
DRILL-6053: Avoid excessive locking in LocalPersistentStore

closes #1163

  1. … 18 more files in changeset.
DRILL-6102: Fix ConcurrentModificationException in the BaseAllocator's print method

closes #1100

DRILL-6004: Direct buffer bounds checking should be disabled by default

This closes #1070

  1. … 7 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:


- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.


- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.


- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.


- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 365 more files in changeset.
DRILL-5808: Reduce memory allocator strictness for "managed" operators

closes #958

  1. … 3 more files in changeset.
DRILL-5694: Handle OOM in HashAggr by spill and retry, reserve memory, spinner

  1. … 20 more files in changeset.
DRILL-5657: Size-aware vector writer structure

- Vector and accessor layer

- Row Set layer

- Tuple and column models

- Revised write-time metadata

- "Result set loader" layer

this closes #914

  1. … 187 more files in changeset.
DRILL-5325: Unit tests for the managed sort

Uses the sub-operator test framework (DRILL-5318), including the test

row set abstraction (DRILL-5323) to enable unit testing of the

“managed” external sort. This PR allows early review of the code, but

cannot be pulled until the dependencies (mentioned above) are pulled.

Refactors the external sort code into small chunks that can be unit

tested, then “wraps” that code in tests for all interesting data types,

record batch sizes, and so on.

Refactors some of the operator definitions to more easily allow

programmatic setup in the unit tests.

Fixes a number of bugs discovered by the unit tests. The biggest

changes were in the new code: the code that computes spilling and

merging based on memory levels.

Otherwise, although GitHub will show many files change, most of the

changes are simply moving blocks of code around to create smaller units

that can be tested independently.

Includes a refactoring of the code that does spilling, along with a

complete set of low-level unit tests.

Excludes long-running sort tests.

Defines a test category for long-running tests.

First attempt to provide a way to run such tests from Maven.

closes #808

  1. … 50 more files in changeset.
DRILL-5601: Rollup of external sort fixes an improvements

- DRILL-5513: Managed External Sort : OOM error during the merge phase

- DRILL-5519: Sort fails to spill and results in an OOM

- DRILL-5522: OOM during the merge and spill process of the managed external sort

- DRILL-5594: Excessive buffer reallocations during merge phase of external sort

- DRILL-5597: Incorrect "bits" vector allocation in nullable vectors allocateNew()

- DRILL-5602: Repeated List Vector fails to initialize the offset vector

- DRILL-5617: Spill file name collisions when spill file is on a shared file system

0 DRILL-5445: bug in repeated map vector deserialization

- Workaround for DRILL-5656: Streaming Agg Batch forces sort to retain in-memory batches past NONE

- Fixes for the "record batch sizer" to handle for UNION, MAP, LIST types

Fixes a longstanding bug in the deserialization of a repeated map

vector read from a spill file. A few minor code cleanups also.

All of the bugs have to do with handling low-memory conditions, and with

correctly estimating the sizes of vectors, even when those vectors come

from the spill file or from an exchange. Hence, the changes for all of

the above issues are interrelated.

Also includes some fixes for tests:

* Certain tests require the ability to enforce the output size of the

memory merge/sort. Restored this option.

* Resolve issue with TestDrillbitResilience

* A particular test injects a fault during in-memory sort, but used a

single-batch input (which does not need the merge phase.)

Rather than introduce a new config property (the earlier solution),

altered the test to use input that returns more than one batch.

Two fixes forBasicPhysicalOpUnitTest — a test that uses JMockit to create a “fake”

fragment context.

1. No drillbit endpoint is available, so the SpillSet change that adds

the node to the spill path failed. Solution was to omit the node path

segment in such tests.

2. The Dynamic UDF registry is null causing a crash. This has nothing

to do with sort. Perhaps some pre-existing error? Anyway, added a check

for this condition.

close #860

  1. … 62 more files in changeset.
DRILL-5385: Vector serializer fails to read saved SV2

Unit testing revealed that the VectorAccessorSerializable class claims

to serialize SV2s, but, in fact, does not. Actually, it writes them,

but does not read them, resulting in corrupted data on read.

Fortunately, no code appears to serialize sv2s at present. Still, it is

a bug and needs to be fixed.

First task is to add serialization code for the sv2.

That revealed that the recently-added code to save DrillBufs using a

shared buffer had a bug: it relied on the writer index to know how much

data is in the buffer. Turns out sv2 buffers don’t set this index. So,

new versions of the write function takes a write length.

Then, closer inspection of the read code revealed duplicated code. So,

DrillBuf allocation moved into a version of the read function that now

does reading and DrillBuf allocation.

Turns out that value vectors, but not SV2s, can be built from a

Drillbuf. Added a matching constructor to the SV2 class.

Finally, cleaned up the code a bit to make it easier to follow. Also

allowed test code to access the handy timer already present in the code.

closes #800

  1. … 3 more files in changeset.
DRILL-5275: Sort spill is slow due to repeated allocations

Rather than create a heap buffer per vector when writing and reading,

the revised code creates a single, shared buffer used for all I/O

within a particular container. This improves performance by reducing GC

and CPU costs during I/Os.

Move I/O buffer, and methods to allocator

Allows the buffer to be shared. Especially in the sort, this is

important, as the sort may have many serializations open at once.

closes #754

  1. … 1 more file in changeset.
DRILL-5080: Memory-managed version of external sort

Please see JIRA entry for reasons for revision, design spec and list of


This PR covers the changes to the external sort itself. Tests for this

operator require the test framework in DRILL-5126 and the mock data

source in DRILL-5152. Tests for this operator will be issued as a

separate PR once those two dependencies are committed.

Until then, the new operator is disabled by default. It can be enabled

using drill.sort.external.disable_managed: false.

The operator now spills before receiving a new batch. Revised memory calcs and

merge calcs to make them a bit clearer and provide more margin of error

for the power-of-two allocations used when allocating vectors.

We have two external sort implementations, but only one operator code

for both. They can use only one Metrics enum between them. When adding

new metrics to the new version, didn’t add matching metrics to the old

one. This fixes that issue. (The issue will go away once the old one is


Revised memory calculations to reflect limit of 16 MB per vector.

Current revision limits to 16 MB per output batch to be safe. Next

revision will enforce per-vector limits to allow the overall batch to

be larger when possible.

Also simplified the merge-time calculations.

Original code provided only crude methods to learn the size of a record

batch. Adds a "RecordBatchSizer" to provide detailed analysis so the

sort can know the amount of memory used to buffer a batch, the number

of rows, and the expected row width once the rows are copied to a

spill file or the output.

Moved generic spill classes to a separate package.

Created parameters for spill batch size and merge batch size. Separated

these values in code. Deprecated the min, max spill parameters as they

no longer add much value. Minor code rearranging.

Bug fix

Fixes a corner case of merging spilled files in a low-memory condition.

Fixes from code review

close apache/drill#717

  1. … 21 more files in changeset.
DRILL-4654: Add new metrics to the MetricRegistry

+ New metrics:

- drill.queries.enqueued

number of queries that have been submitted to the drillbit but have

not started

- drill.queries.running

number of running queries for which this drillbit is the foreman

- drill.queries.completed

number of completed queries (or cancelled or failed) for which this

drillbit was the foreman

- drill.fragments.running

number of query fragments that are running in the drillbit

- drill.allocator.root.used

amount of memory used in bytes by the internal memory allocator

- drill.allocator.root.peak

peak amount of memory used in bytes by the internal memory allocator

- fd.usage

ratio of used to total file descriptors (on *nix systems)

+ Rename "current" to "used" for RPC allocator current memory usage to

follow convention

+ Borrow SystemPropertyUtil class from Netty

+ Configure DrillMetrics through system properties

+ Remove unused methods and imports

closes #495

  1. … 12 more files in changeset.
DRILL-4131: Move RPC allocators under Drill's root allocator & accounting

- Allow settings to be set to ensure RPC reservation and maximums (currently unset by default). Defaults set in drill-module.conf

- Add new metrics to report RPC layer memory consumption.

- Check for memory leaks from RPC layer at shutdown.

- Add a multi-Drillbit single JVM safe DrillMetrics.register()

- Remove invalid verifyAllocator checks while RPC connection (and PING/PONG) are maintained

This closes #327.

  1. … 11 more files in changeset.
DRILL-4246: Fix Allocator concurrency bug and improve error detection

- Rename the internal DrillBuf field to udle to better express its purpose.

- Rename AllocatorManager to AllocationManager to better express its purpose.

- Address situation where dangling ledger could be transferred into while it was being released released by protecting association and release inside the AllocationManager.

- Add allocator assertions to ensure allocator operations are done while the allocator is open.

- Simplify AllocationManager locking model.

- Exclude HDFS reference to netty-all

- Improve debugging messages for allocators (and fix debug message bugs)

This closes #323.

    • -0
    • +434
  1. … 3 more files in changeset.
DRILL-4134: Allocator Improvements

- make Allocator mostly lockless

- change BaseAllocator maps to direct references

- add documentation around memory management model

- move transfer and ownership methods to DrillBuf

- Improve debug messaging.

- Fix/revert sort changes

- Remove unused fragment limit flag

- Add time to HistoricalLog events

- Remove reservation amount from RootAllocator constructor (since not allowed)

- Fix concurrency issue where allocator is closing at same moment as incoming batch transfer, causing leaked memory and/or query failure.

- Add new AutoCloseables.close(Iterable<AutoCloseable>)

- Remove extraneous DataResponseHandler and Impl (and update TestBitRpc to use smarter mock of FragmentManager)

- Remove the concept of poison pill record batches, using instead FragmentContext.isOverMemoryLimit()

- Update incoming data batches so that they are transferred under protection of a close lock

- Improve field names in IncomingBuffers and move synchronization to collectors as opposed to IncomingBuffers (also change decrementing to decrementToZero rather than two part check).

This closes #238.

    • -0
    • +274
    • -0
    • +86
    • -0
    • +31
    • -0
    • +356
    • -0
    • +739
    • -0
    • +53
    • -0
    • +141
    • -0
    • +39
  1. … 106 more files in changeset.
DRILL-4134: Add new allocator

  1. … 28 more files in changeset.
DRILL-3987: (CLEANUP) Final cleanups to get complete working build/distribution

- small cleanups

- move Hook to drill-adbc

- update distribution assembly to include new modules

This closes #250

  1. … 31 more files in changeset.
DRILL-3987: (REFACTOR) Extract BoundsChecking check from AssertionUtil. Remove unused file.

    • -0
    • +17
  1. … 7 more files in changeset.
DRILL-3987: (MOVE) Extract RPC, memory-base and memory-impl as separate modules.

    • -0
    • +140
    • -0
    • +28
  1. … 139 more files in changeset.