Clone Tools
  • last updated 20 mins ago
Constraints: committers
Constraints: files
Constraints: dates
DRILL-5730: Mock testing improvements and interface improvements

closes #1045

  1. … 222 more files in changeset.
DRILL-3993: Changes after code review.

  1. … 4 more files in changeset.
DRILL-3993: Changes to support Calcite 1.15.

Fix AssertionError: type mismatch for tests with aggregate functions.

Fix VARIANCE agg function

Remove using deprecated Subtype enum

Fix 'Failure while loading table a in database hbase' error

Fix 'Field ordinal 1 is invalid for type '(DrillRecordRow[*])'' unit test failures

  1. … 17 more files in changeset.
DRILL-5896: Handle HBase columns vector creation in the HBaseRecordReader

closes #1005

DRILL-5865: fix build

closes #986

DRILL-5743: Handling column family and column scan for hbase

closes #975

  1. … 4 more files in changeset.
DRILL-5775: Select * query on a maprdb binary table fails

- The common HBase/MapR-DB_binary verifyColumns() method;

- MapRDBBinaryTable is introduced for the purpose of expanding the wildcard on the planning stage;

- AbstractHBaseDrillTable class for MapRDBBinaryTable and DrillHBaseTable.

closes #973

    • -0
    • +79
    • -1
    • +26
  1. … 4 more files in changeset.
DRILL-5830: Resolve regressions to MapR DB from DRILL-5546

- Back out HBase changes

- Code cleanup

- Test utilities

- Fix for DRILL-5829

closes #968

  1. … 17 more files in changeset.
DRILL-4264: Allow field names to include dots

  1. … 97 more files in changeset.
DRILL-5546: Handle schema change exception failure caused by empty input or empty batch.

1. Modify ScanBatch's logic when it iterates list of RecordReader.

1) Skip RecordReader if it returns 0 row && present same schema. A new schema (by calling Mutator.isNewSchema() ) means either a new top level field is added, or a field in a nested field is added, or an existing field type is changed.

2) Implicit columns are presumed to have constant schema, and are added to outgoing container before any regular column is added in.

3) ScanBatch will return NONE directly (called as "fast NONE"), if all its RecordReaders haver empty input and thus are skipped, in stead of returing OK_NEW_SCHEMA first.

2. Modify IteratorValidatorBatchIterator to allow

1) fast NONE ( before seeing a OK_NEW_SCHEMA)

2) batch with empty list of columns.

2. Modify JsonRecordReader when it get 0 row. Do not insert a nullable-int column for 0 row input. Together with ScanBatch, Drill will skip empty json files.

3. Modify binary operators such as join, union to handle fast none for either one side or both sides. Abstract the logic in AbstractBinaryRecordBatch, except for MergeJoin as its implementation is quite different from others.

4. Fix and refactor union all operator.

1) Correct union operator hanndling 0 input rows. Previously, it will ignore inputs with 0 row and put nullable-int into output schema, which causes various of schema change issue in down-stream operator. The new behavior is to take schema with 0 into account

in determining the output schema, in the same way with > 0 input rows. By doing that, we ensure Union operator will not behave like a schema-lossy operator.

2) Add a UnionInputIterator to simplify the logic to iterate the left/right inputs, removing significant chunk of duplicate codes in previous implementation.

The new union all operator reduces the code size into half, comparing the old one.

5. Introduce UntypedNullVector to handle convertFromJson() function, when the input batch contains 0 row.

Problem: The function convertFromJSon() is different from other regular functions in that it only knows the output schema after evaluation is performed. When input has 0 row, Drill essentially does not have

a way to know the output type, and previously will assume Map type. That works under the assumption other operators like Union would ignore batch with 0 row, which is no longer

the case in the current implementation.

Solution: Use MinorType.NULL at the output type for convertFromJSON() when input contains 0 row. The new UntypedNullVector is used to represent a column with MinorType.NULL.

6. HBaseGroupScan convert star column into list of row_key and column family. HBaseRecordReader should reject column star since it expectes star has been converted somewhere else.

In HBase a column family always has map type, and a non-rowkey column always has nullable varbinary type, this ensures that HBaseRecordReader across different HBase regions will have the same top level schema, even if the region is

empty or prune all the rows due to filter pushdown optimization. In other words, we will not see different top level schema from different HBaseRecordReader for the same table.

However, such change will not be able to handle hard schema change : c1 exists in cf1 in one region, but not in another region. Further work is required to handle hard schema change.

7. Modify scan cost estimation when the query involves * column. This is to remove the planning randomness since previously two different operators could have same cost.

8. Add a new flag 'outputProj' to Project operator, to indicate if Project is for the query's final output. Such Project is added by TopProjectVisitor, to handle fast NONE when all the inputs to the query are empty

and are skipped.

1) column star is replaced with empty list

2) regular column reference is replaced with nullable-int column

3) An expression will go through ExpressionTreeMaterializer, and use the type of materialized expression as the output type

4) Return an OK_NEW_SCHEMA with the schema using the above logic, then return a NONE to down-stream operator.

9. Add unit test to test operators handling empty input.

10. Add unit test to test query when inputs are all empty.

DRILL-5546: Revise code based on review comments.

Handle implicit column in scan batch. Change interface in ScanBatch's constructor.

1) Ensure either the implicit column list is empty, or all the reader has the same set of implicit columns.

2) We could skip the implicit columns when check if there is a schema change coming from record reader.

3) ScanBatch accept a list in stead of iterator, since we may need go through the implicit column list multiple times, and verify the size of two lists are same.

ScanBatch code review comments. Add more unit tests.

Share code path in ProjectBatch to handle normal setupNewSchema() and handleNullInput().

- Move SimpleRecordBatch out of TopNBatch to make it sharable across different places.

- Add Unit test verify schema for star column query against multilevel tables.

Unit test framework change

- Fix memory leak in unit test framework.

- Allow SchemaTestBuilder to pass in BatchSchema.

close #906

  1. … 65 more files in changeset.
DRILL-5516: Limit memory usage for Hbase reader

close apache/drill#839

DRILL-4963: Fix issues with dynamically loaded overloaded functions

close #701

  1. … 26 more files in changeset.
DRILL-4199: Add Support for HBase 1.X

Highlights of the changes:

* Replaced the old HBase APIs (HBaseAdmin/HTable) with the new HBase 1.1 APIs (Connection/Admin/Table).

* Added HBaseConnectionManager class which which manages the life-cycle of HBase connections inside a Drillbit process.

* Updated HBase dependencies version to 1.1.3 and 1.1.1-mapr-1602-m7-5.1.0 for default and "mapr" profiles respectively.

* Added `commons-logging` dependency in the `provided` scope to allow HBase test cluster to come up for Unit tests.

* Relaxed banned dependency rule for `commons-logging` library for `storage-hbase` module alone, in provided scope only.

* Removed the use of many deprecated APIs throughout the modules code.

* Added some missing test to HBase storage plugin's test suit.

* Move the GuavaPatcher code to main code execution path.

* Log a message if GuavaPatcher fails instead of exiting.

All unit tests are green.

Closes #443

    • -0
    • +109
  1. … 26 more files in changeset.
DRILL-4275: create TransientStore for short-lived objects; refactor PersistentStore to introduce pagination mechanism

    • -0
    • +238
  1. … 97 more files in changeset.
DRILL-4387: GroupScan or ScanBatchCreator should not use star column in case of skipAll query.

The skipAll query should be handled in RecordReader.

  1. … 5 more files in changeset.
DRILL-4382: Remove dependency on drill-logical from vector package

  1. … 80 more files in changeset.
DRILL-4198: Enhance StoragePlugin interface to expose logical space rules for planning purpose

Also move Hive partition pruning rules to logical storage plugin rulesets.

this closes #300

  1. … 9 more files in changeset.
DRILL-4020: The not-equal operator returns incorrect results when used on the HBase row key

- Added a condition that checks if the filter to the scan specification doesn't have NOT_EQUAL operator

- Added testFilterPushDownRowKeyNotEqual() to TestHBaseFilterPushDown

This closes #309

  1. … 1 more file in changeset.
DRILL-2288: Fix ScanBatch violation of IterOutcome protocol and downstream chain of bugs.


2288: Pt. 1 Core: Added unit test. [Drill2288GetColumnsMetadataWhenNoRowsTest, empty.json]

2288: Pt. 1 Core: Changed HBase test table #1's # of regions from 1 to 2. [HBaseTestsSuite]

Also added TODO(DRILL-3954) comment about # of regions.

2288: Pt. 2 Core: Documented IterOutcome much more clearly. [RecordBatch]

Also edited some related Javadoc.

2288: Pt. 2 Hyg.: Edited doc., added @Override, etc. [AbstractRecordBatch, RecordBatch]

Purged unused SetupOutcome.

Added @Override.

Edited comments.

Fix some comments to doc. comments.

2288: Pt. 3 Core&Hyg.: Added validation of IterOutcome sequence. [IteratorValidatorBatchIterator]


Renamed internal members for clarity.

Added comments.

2288: Pt. 4 Core: Fixed a NONE -> OK_NEW_SCHEMA in [ScanBatch]

(With nearby comments.)

2288: Pt. 4 Hyg.: Edited comments, reordered, whitespace. [ScanBatch]


Added comments.


2288: Pt. 4 Core+: Fixed UnionAllRecordBatch to receive IterOutcome sequence right. (3659) [UnionAllRecordBatch]

2288: Pt. 5 Core: Fixed ScanBatch.Mutator.isNewSchema() to stop spurious "new schema" reports (fix short-circuit OR, to call resetting method right). [ScanBatch]

2288: Pt. 5 Hyg.: Renamed, edited comments, reordered. [ScanBatch, SchemaChangeCallBack, AbstractSingleRecordBatch]

Renamed getSchemaChange -> getSchemaChangedAndReset.

Renamed schemaChange -> schemaChanged.

Added doc. comments.


2288: Pt. 6 Core: Avoided dummy Null.IntVec. column in JsonReader when not needed (MapWriter.isEmptyMap()). [JsonReader, 3 vector files]

2288: Pt. 6 Hyg.: Edited comments, message. Fixed message formatting. [RecordReader, JSONFormatPlugin, JSONRecordReader, AbstractMapVector, JsonReader]

Fixed message formatting.

Edited comments.

Edited message.

Fixed spurious line break.

2288: Pt. 7 Core: Added column families in HBaseRecordReader* to avoid dummy Null.IntVec. clash. [HBaseRecordReader]

2288: Pt. 8 Core.1: Cleared recordCount in OrderedPartitionRecordBatch.innerNext(). [OrderedPartitionRecordBatch]

2288: Pt. 8 Core.2: Cleared recordCount in ProjectRecordBatch.innerNext. [ProjectRecordBatch]

2288: Pt. 8 Core.3: Cleared recordCount in TopNBatch.innerNext. [TopNBatch]

2288: Pt. 9 Core: Had UnorderedReceiverBatch reset RecordBatchLoader's record count. [UnorderedReceiverBatch, RecordBatchLoader]

2288: Pt. 9 Hyg.: Added comments. [RecordBatchLoader]

2288: Pt. 10 Core: Worked around mismatched map child vectors in MapVector.getObject(). [MapVector]

2288: Pt. 11 Core: Added OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

2288: Pt. 12 Core: Fixed memory leak in BaseTestQuery's printing.

Fixed bad skipping of RecordBatchLoader.clear(...) and

QueryDataBatch.load(...) for zero-row batches in printResult(...).

Also, dropped suppression of call to

VectorUtil.showVectorAccessibleContent(...) (so zero-row batches are

as visible as others).

2288: Pt. 13 Core: Fixed test that used unhandled periods in column alias identifiers.

2288: Misc.: Added # of rows to showVectorAccessibleContent's output. [VectorUtil]

2288: Misc.: Added simple/partial toString() [VectorContainer, AbstractRecordReader, JSONRecordReader, BaseValueVector, FieldSelection, AbstractBaseWriter]

2288: Misc. Hyg.: Added doc. comments to VectorContainer. [VectorContainer]

2288: Misc. Hyg.: Edited comment. [DrillStringUtils]

2288: Misc. Hyg.: Clarified message for unhandled identifier containing period.

2288: Pt. 3 Core&Hyg. Upd.: Added schema comparison result to logging. [IteratorValidatorBatchIterator]

2288: Pt. 7 Core Upd.: Handled HBase columns too re NullableIntVectors. [HBaseRecordReader, TestTableGenerator, TestHBaseFilterPushDown]

Created map-child vectors for requested columns.

Added unit test method testDummyColumnsAreAvoided, adding new row to test table,

updated some row counts.

2288: Pt. 7 Hyg. Upd.: Edited comment. [HBaseRecordReader]

2288: Pt. 11 Core Upd.: REVERTED all of bad OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

This reverts commit 0939660f4620c03da97f4e1bf25a27514e6d0b81.

2288: Pt. 6 Core Upd.: Added isEmptyMap override in new (just-rebased-in) PromotableWriter. [PromotableWriter]

Adjusted definition and default implementation of isEmptyMap (to handle MongoDB

storage plugin's use of JsonReader).

2288: Pt. 6 Hyg. Upd.: Purged old atLeastOneWrite flag. [JsonReader]

2288: Pt. 14: Disabled newly dying test testNestedFlatten().

  1. … 39 more files in changeset.
DRILL-2583, DRILL-3428: Catch exceptions, and throw UserException#dataReadError with more context. This closes #161

+ Added convenient method to UserException for String.format(...)

  1. … 2 more files in changeset.
DRILL-3581: Upgrade to Guava 18.0

- Replace Stopwatch constructors with .createStarted() or .createUnstarted()

- Stop using InputSupplier and Closeables.closeQuietly

- Clean up quiet closes to log or (preferably) propagate.

- Add log4j to enforcer exclusions.

- Update HBaseTestSuite to add patching of Closeables.closeQuietly() and Stopwatch legacy methods. Only needed when running HBaseMiniCluster.

- Remove log4j from HBase's pom to provide exception logging.

- Remove log4j from Hive's shaded pom.

- Update Catastrophic failures to use the same pattern to ensure reporting.

- Update test framework to avoid trying IPv6 resolution. (This removes 90s pause from HBase startup in my tests)

This closes #361.

This closes #157.

  1. … 64 more files in changeset.
DRILL-1942-readers: - add extends AutoCloseable to RecordReader, and rename cleanup() to close(). - fix many warnings - formatting fixes


- renamed cleanup() to close in the new JdbcRecordReader

Close apache/drill#154

  1. … 15 more files in changeset.
DRILL-4028: Update Drill to leverage latest version of Parquet library.

- Remove references to the shaded version of a Jackson @JsonCreator annotation from parquet, replace with proper fasterxml version.

- Fixing imports using the wrong parquet packages after rebase.

- Fixing issues with Drill parquet read a write path after merging the Drill parquet fork back into mainline.

- Fixed the issue with the writer, needed to flush the RecordConsumer in the ParquetRecordWriter.

- Consolidate page reading code

- Added some test to print out some additional context when an ordered comparison of two datasets fails in a test.

- Fix up parquet API usage in Hive Module.

- Adding unit test to read a write all types in parquet, the decimal types and interval year have some issues.

- Use direct codec factory from new package in the parquet library now that it has been moved.

- Moving the test for Direct Codec Factory out of the Drill source as the class itself has been moved.

- Small fix after consolidating two different ByteBuffer based implementations of BytesInput.

- Small fixes to accommodate interface changes.

- Small changes to remove direct references to DirectCodecFactory, this class is not accessible outside of parquet, but an instance with the same contract is now accessible with a new factory method on CodecFactory.

- Fixed failing test using miniDFS when reading a larger parquet file.

This closes #236

  1. … 55 more files in changeset.
DRILL-3621: Fix incorrect result if HBase filter contains row_key "or" filter or in list filter

Add unit test for row_key "or" filter and row_key in list filter.

Modify expected results for couple of existing unit tests, by specifying more strict regex pattern.

Add one row in Hbase test table, per review comment.

  1. … 4 more files in changeset.
DRILL-3492: Add support for encoding/decoding of to/from OrderedBytes format


This change allows encoding/decoding of data from/to 'double', 'float',

'bigint', 'int' and 'utf8' data types to/from OrderedBytes format.

It also allows for OrderedByte encoded row-keys to be stored in

ascending as well as descending order.

The following JIRA added the OrderedBytes encoding to HBase:

This encoding scheme will preserve the sort-order of the native

data-type when it is stored as sorted byte arrays on disk.

Thus, it will help the HBase storage plugin if the row-keys have been

encoded in OrderedBytes format.

This functionality allows us to prune the scan ranges, thus reading much

lesser data from the server.

Testing Done:

Added a new unit-test class which

derives from class.

Also add new test cases to TestHBaseFilterPushDown class that will test

if we were able to push-down filters correctly and if the results are


DRILL-3492 - * Remove repeated allocations of byte arrays and PositionedByteRange objects

on heap(as suggested by Jason).

* Remove OrderedBytes encode/decode operations on UTF8 types.

Reasons -

1. These operations are slow and incur a lot of heap allocations

2. UTF8 types maintain their natural sort order when stored as binary arrays.

DRILL-3492 - Remove test code that creates test tables with UTF8 OrderedByte encoding.

  1. … 5 more files in changeset.
DRILL-3364: Prune scan range if the filter is on the leading field with byte comparable encoding

The change adds support to perform row-key range pruning when the row-key prefix


UINT8_BE encoded.

Testing Done: Added a unit-tests for the new feature, also ran all existing

unit-tests to make sure there is no regression.

  1. … 11 more files in changeset.
DRILL-1651: Push filter past project with ITEM operator into HBase scan.

  1. … 1 more file in changeset.
DRILL-3500: Add OptimizerRulesContext which exposes information needed by storage plugin specific optimizer rules Add FunctionLookupContext to enable materializing function calls without having access to the entire function registry

  1. … 15 more files in changeset.
DRILL-2006: Updated Text reader. Increases variations of text files Drill can work with.

Text reader is heavily inspired by uniVocity parser although it is now byte based and customized for Drill's memory representations.

Also updated the RecordReader interface so that OperatorContext is presented at setup time rather than being a separate call.

  1. … 43 more files in changeset.
DRILL-2848: Provide option to disable decimal data type Disable casting to decimal Disable reading decimal from Parquet Disable reading decimal from Hive Add unit tests Modify existing tests to enable decimal data type

  1. … 38 more files in changeset.