Clone Tools
  • last updated 27 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7487: Removes the unused OUT_OF_MEMORY iterator status

See JIRA ticket for full explanation.

closes #1930

  1. … 42 more files in changeset.
DRILL-6832: Removes the old "unmanaged" external sort

When the "managed" external sort was implemented a couple

of years back, we retained the original "unmanaged" version

out of an abundance of caution. The new version is now

battle tested and it is time to retire the original one.

closes #1929

    • -163
    • +0
    ./PriorityQueueCopierTemplate.java
    • -383
    • +0
    ./PriorityQueueCopierWrapper.java
  1. … 50 more files in changeset.
DRILL-7436: Fix record count, vector structure issues in several operators

Adds additional vector checks to the BatchValidator.

Enables checking for the following operators:

* FilterRecordBatch

* PartitionLimitRecordBatch

* UnnestRecordBatch

* HashAggBatch

* RemovingRecordBatch

Fixes vector count issues for each of these.

Fixes empty-batch (record count = 0) handling in several of the

above operators. Added a method to VectorContainer to correctly

create an empty batch. (An empty batch, counter-intuitively,

needs vectors allocated to hold the 0 value in the first

position of each offset vector.)

Disables verbose logging for MongoDB tests. Details are written to

the log rather than the console.

Disables two invalid Mongo tests. See DRILL-7428.

Adjusts the expression tree materializer to not add the LATE type

to Union vectors. (See DRILL-7435.)

Ensures that Union vectors contain valid vectors for each subtype.

The present fix is a work-around, see DRILL-7434 for a better

long-term fix.

Cleans up code formatting and other minor issues in each file touched

during the fixes in this PR.

  1. … 36 more files in changeset.
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

  1. … 220 more files in changeset.
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

  1. … 100 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 978 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 144 more files in changeset.
DRILL-6516: Fix memory leak issue with Sort and StreamingAgg together

DRILL-6498: Support for EMIT outcome in ExternalSortBatch

* DRILL-6498: Support for EMIT outcome in ExternalSortBatch

* Updated TestTopNEmitOutcome to use RowSetComparison for comparing expected and actual output batches produced

closes #1323

  1. … 4 more files in changeset.
DRILL-6512: Remove unnecessary processing overhead from RecordBatchSizer

closes #1341

  1. … 11 more files in changeset.
DRILL-6435: MappingSet is stateful, so it can't be shared between threads

  1. … 8 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 231 more files in changeset.
DRILL-6389: Fixed building javadocs - Added documentation about how to build javadocs - Fixed some of the javadoc warnings

closes #1276

  1. … 65 more files in changeset.
DRILL-6333: Fixed Quotation marks

Initial step to making the source-code ready for Javadoc generation

This closes #1229

  1. … 6 more files in changeset.
DRILL-6180: Use System Option "output_batch_size" for External Sort

closes #1129

  1. … 3 more files in changeset.
DRILL-6138: Move RecordBatchSizer to org.apache.drill.exec.record package

This closes #1115

  1. … 10 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 221 more files in changeset.
DRILL-6049: Misc. hygiene and code cleanup changes

close apache/drill#1085

  1. … 118 more files in changeset.
DRILL-6030: Managed sort should minimize number of batches in a k-way merge

This closes #1075

  1. … 2 more files in changeset.
DRILL-6002: Avoid memory copy from direct buffer to heap while spilling to local disk

close apache/drill#1058

  1. … 4 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 363 more files in changeset.
DRILL-5842: Refactor fragment, operator contexts

This closes #978

  1. … 26 more files in changeset.
DRILL-5808: Reduce memory allocator strictness for "managed" operators

closes #958

  1. … 6 more files in changeset.
DRILL-5694: Handle OOM in HashAggr by spill and retry, reserve memory, spinner

  1. … 20 more files in changeset.
DRILL-5443: Rollup of external sort fixes

- DRILL-5758: the “record batch sizer” did not handle repeated columns correctly.

- Enabled managed sort by default

- Fix check style warning

- Fix for DRILL-5670

Estimation for size of spill batch read from disk was off. For some

reason, Drill needs an amount of memory 2x the data size. The previous

estimate was 1.5x. That error, accumulated over 47 columns, was enough

to cause an OOM.

- Code cleanup discovered during the investigation.

- Exception if reAlloc tries to double a zero-size vector

- DRILL-5804: Fixes issues with zero-length vector allocations.

- Better estimates array cardinality when it is fractional.

- Uses fractional cardinality to allocate new arrays.

- Prevents an infinite loop on reAlloc if the array starts empty.

- Fixed unit test issue

- Change batch size variables from int to long

closes #932

  1. … 14 more files in changeset.
DRILL-4264: Allow field names to include dots

  1. … 98 more files in changeset.
DRILL-5457: Spill implementation for Hash Aggregate

closes #822

  1. … 35 more files in changeset.
DRILL-5325: Unit tests for the managed sort

Uses the sub-operator test framework (DRILL-5318), including the test

row set abstraction (DRILL-5323) to enable unit testing of the

“managed” external sort. This PR allows early review of the code, but

cannot be pulled until the dependencies (mentioned above) are pulled.

Refactors the external sort code into small chunks that can be unit

tested, then “wraps” that code in tests for all interesting data types,

record batch sizes, and so on.

Refactors some of the operator definitions to more easily allow

programmatic setup in the unit tests.

Fixes a number of bugs discovered by the unit tests. The biggest

changes were in the new code: the code that computes spilling and

merging based on memory levels.

Otherwise, although GitHub will show many files change, most of the

changes are simply moving blocks of code around to create smaller units

that can be tested independently.

Includes a refactoring of the code that does spilling, along with a

complete set of low-level unit tests.

Excludes long-running sort tests.

Defines a test category for long-running tests.

First attempt to provide a way to run such tests from Maven.

closes #808

    • -0
    • +90
    ./BaseSortWrapper.java
    • -0
    • +232
    ./BufferedBatches.java
    • -0
    • +261
    ./MergeSortWrapper.java
    • -41
    • +33
    ./PriorityQueueCopierTemplate.java
    • -0
    • +341
    ./PriorityQueueCopierWrapper.java
    • -0
    • +121
    ./SortConfig.java
  1. … 36 more files in changeset.
DRILL-5601: Rollup of external sort fixes an improvements

- DRILL-5513: Managed External Sort : OOM error during the merge phase

- DRILL-5519: Sort fails to spill and results in an OOM

- DRILL-5522: OOM during the merge and spill process of the managed external sort

- DRILL-5594: Excessive buffer reallocations during merge phase of external sort

- DRILL-5597: Incorrect "bits" vector allocation in nullable vectors allocateNew()

- DRILL-5602: Repeated List Vector fails to initialize the offset vector

- DRILL-5617: Spill file name collisions when spill file is on a shared file system

0 DRILL-5445: bug in repeated map vector deserialization

- Workaround for DRILL-5656: Streaming Agg Batch forces sort to retain in-memory batches past NONE

- Fixes for the "record batch sizer" to handle for UNION, MAP, LIST types

Fixes a longstanding bug in the deserialization of a repeated map

vector read from a spill file. A few minor code cleanups also.

All of the bugs have to do with handling low-memory conditions, and with

correctly estimating the sizes of vectors, even when those vectors come

from the spill file or from an exchange. Hence, the changes for all of

the above issues are interrelated.

Also includes some fixes for tests:

* Certain tests require the ability to enforce the output size of the

memory merge/sort. Restored this option.

* Resolve issue with TestDrillbitResilience

* A particular test injects a fault during in-memory sort, but used a

single-batch input (which does not need the merge phase.)

Rather than introduce a new config property (the earlier solution),

altered the test to use input that returns more than one batch.

Two fixes forBasicPhysicalOpUnitTest — a test that uses JMockit to create a “fake”

fragment context.

1. No drillbit endpoint is available, so the SpillSet change that adds

the node to the spill path failed. Solution was to omit the node path

segment in such tests.

2. The Dynamic UDF registry is null causing a crash. This has nothing

to do with sort. Perhaps some pre-existing error? Anyway, added a check

for this condition.

close #860

    • -21
    • +32
    ./PriorityQueueCopierWrapper.java
  1. … 51 more files in changeset.
DRILL-5344: External sort priority queue copier fails with an empty batch

Unit tests showed that the “priority queue copier” does not handle an

empty batch. This has not been an issue because code elsewhere in the

sort specifically works around this issue. This fix resolves the issue

at the source to avoid the need for future work-arounds.

closes #778