Clone Tools
  • last updated 21 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7436: Fix record count, vector structure issues in several operators

Adds additional vector checks to the BatchValidator.

Enables checking for the following operators:

* FilterRecordBatch

* PartitionLimitRecordBatch

* UnnestRecordBatch

* HashAggBatch

* RemovingRecordBatch

Fixes vector count issues for each of these.

Fixes empty-batch (record count = 0) handling in several of the

above operators. Added a method to VectorContainer to correctly

create an empty batch. (An empty batch, counter-intuitively,

needs vectors allocated to hold the 0 value in the first

position of each offset vector.)

Disables verbose logging for MongoDB tests. Details are written to

the log rather than the console.

Disables two invalid Mongo tests. See DRILL-7428.

Adjusts the expression tree materializer to not add the LATE type

to Union vectors. (See DRILL-7435.)

Ensures that Union vectors contain valid vectors for each subtype.

The present fix is a work-around, see DRILL-7434 for a better

long-term fix.

Cleans up code formatting and other minor issues in each file touched

during the fixes in this PR.

    • -21
    • +27
    ./managed/ExternalSortBatch.java
  1. … 36 more files in changeset.
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

    • -1
    • +0
    ./managed/PriorityQueueCopierTemplate.java
    • -1
    • +0
    ./managed/PriorityQueueCopierWrapper.java
  1. … 218 more files in changeset.
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

  1. … 97 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

    • -1
    • +1
    ./managed/PriorityQueueCopierWrapper.java
  1. … 973 more files in changeset.
DRILL-6566: Reduce Hash Agg Batch size and estimate when low available memory (#1438)

DRILL-6566: Reduce Hash Agg Batch size and estimate when mem available is low

  1. … 10 more files in changeset.
DRILL-6656: Disallow extra semicolons and multiple statements on the same line.

closes #1415

  1. … 144 more files in changeset.
DRILL-5365: Enforced DrillFileSystem immutability.

closes #1296

  1. … 2 more files in changeset.
DRILL-6516: Fix memory leak issue with Sort and StreamingAgg together

    • -26
    • +28
    ./managed/ExternalSortBatch.java
DRILL-6498: Support for EMIT outcome in ExternalSortBatch

* DRILL-6498: Support for EMIT outcome in ExternalSortBatch

* Updated TestTopNEmitOutcome to use RowSetComparison for comparing expected and actual output batches produced

closes #1323

    • -60
    • +181
    ./managed/ExternalSortBatch.java
    • -0
    • +30
    ./managed/PriorityQueueCopierWrapper.java
  1. … 4 more files in changeset.
DRILL-6512: Remove unnecessary processing overhead from RecordBatchSizer

closes #1341

  1. … 11 more files in changeset.
DRILL-6435: MappingSet is stateful, so it can't be shared between threads

    • -1
    • +1
    ./managed/PriorityQueueCopierWrapper.java
  1. … 7 more files in changeset.
DRILL-6386: Remove unused imports and star imports.

  1. … 231 more files in changeset.
DRILL-6389: Fixed building javadocs - Added documentation about how to build javadocs - Fixed some of the javadoc warnings

closes #1276

  1. … 64 more files in changeset.
DRILL-6333: Fixed Quotation marks

Initial step to making the source-code ready for Javadoc generation

This closes #1229

  1. … 6 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2062 more files in changeset.
DRILL-6180: Use System Option "output_batch_size" for External Sort

closes #1129

  1. … 3 more files in changeset.
DRILL-6138: Move RecordBatchSizer to org.apache.drill.exec.record package

This closes #1115

  1. … 10 more files in changeset.
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 216 more files in changeset.
DRILL-6080: Sort incorrectly limits batch size to 65535 records

closes #1090

* Sort incorrectly limits batch size to 65535 records rather than 65536.

* This PR also includes a few code cleanup items.

* Fix for overflow in offset vector in row set writer

* Performance tool update

* Replace "unsafe" methods with "set" methods

* Also fixes an indexing issue with nullable writers

* Removed debug & timing code

* Increase strictness for batch size

  1. … 10 more files in changeset.
DRILL-6049: Misc. hygiene and code cleanup changes

close apache/drill#1085

    • -1
    • +1
    ./managed/PriorityQueueCopierWrapper.java
  1. … 118 more files in changeset.
DRILL-6030: Managed sort should minimize number of batches in a k-way merge

This closes #1075

  1. … 2 more files in changeset.
DRILL-6002: Avoid memory copy from direct buffer to heap while spilling to local disk

close apache/drill#1058

  1. … 4 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

    • -3
    • +1
    ./managed/PriorityQueueCopierWrapper.java
  1. … 362 more files in changeset.
DRILL-5842: Refactor fragment, operator contexts

This closes #978

    • -4
    • +6
    ./managed/PriorityQueueCopierWrapper.java
  1. … 24 more files in changeset.
DRILL-5808: Reduce memory allocator strictness for "managed" operators

closes #958

  1. … 6 more files in changeset.
DRILL-5694: Handle OOM in HashAggr by spill and retry, reserve memory, spinner

  1. … 20 more files in changeset.
DRILL-5443: Rollup of external sort fixes

- DRILL-5758: the “record batch sizer” did not handle repeated columns correctly.

- Enabled managed sort by default

- Fix check style warning

- Fix for DRILL-5670

Estimation for size of spill batch read from disk was off. For some

reason, Drill needs an amount of memory 2x the data size. The previous

estimate was 1.5x. That error, accumulated over 47 columns, was enough

to cause an OOM.

- Code cleanup discovered during the investigation.

- Exception if reAlloc tries to double a zero-size vector

- DRILL-5804: Fixes issues with zero-length vector allocations.

- Better estimates array cardinality when it is fractional.

- Uses fractional cardinality to allocate new arrays.

- Prevents an infinite loop on reAlloc if the array starts empty.

- Fixed unit test issue

- Change batch size variables from int to long

closes #932

    • -2
    • +4
    ./managed/PriorityQueueCopierWrapper.java
  1. … 14 more files in changeset.
DRILL-4264: Allow field names to include dots

    • -1
    • +1
    ./managed/PriorityQueueCopierWrapper.java
  1. … 97 more files in changeset.
DRILL-5457: Spill implementation for Hash Aggregate

closes #822

  1. … 35 more files in changeset.
DRILL-5601: Rollup of external sort fixes an improvements

- DRILL-5513: Managed External Sort : OOM error during the merge phase

- DRILL-5519: Sort fails to spill and results in an OOM

- DRILL-5522: OOM during the merge and spill process of the managed external sort

- DRILL-5594: Excessive buffer reallocations during merge phase of external sort

- DRILL-5597: Incorrect "bits" vector allocation in nullable vectors allocateNew()

- DRILL-5602: Repeated List Vector fails to initialize the offset vector

- DRILL-5617: Spill file name collisions when spill file is on a shared file system

0 DRILL-5445: bug in repeated map vector deserialization

- Workaround for DRILL-5656: Streaming Agg Batch forces sort to retain in-memory batches past NONE

- Fixes for the "record batch sizer" to handle for UNION, MAP, LIST types

Fixes a longstanding bug in the deserialization of a repeated map

vector read from a spill file. A few minor code cleanups also.

All of the bugs have to do with handling low-memory conditions, and with

correctly estimating the sizes of vectors, even when those vectors come

from the spill file or from an exchange. Hence, the changes for all of

the above issues are interrelated.

Also includes some fixes for tests:

* Certain tests require the ability to enforce the output size of the

memory merge/sort. Restored this option.

* Resolve issue with TestDrillbitResilience

* A particular test injects a fault during in-memory sort, but used a

single-batch input (which does not need the merge phase.)

Rather than introduce a new config property (the earlier solution),

altered the test to use input that returns more than one batch.

Two fixes forBasicPhysicalOpUnitTest — a test that uses JMockit to create a “fake”

fragment context.

1. No drillbit endpoint is available, so the SpillSet change that adds

the node to the spill path failed. Solution was to omit the node path

segment in such tests.

2. The Dynamic UDF registry is null causing a crash. This has nothing

to do with sort. Perhaps some pre-existing error? Anyway, added a check

for this condition.

close #860

    • -1
    • +0
    ./managed/PriorityQueueCopierTemplate.java
    • -21
    • +32
    ./managed/PriorityQueueCopierWrapper.java
    • -121
    • +402
    ./managed/SortMemoryManager.java
  1. … 50 more files in changeset.