Clone Tools
  • last updated 26 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-5516: Limit memory usage for Hbase reader

close apache/drill#839

DRILL-4963: Fix issues with dynamically loaded overloaded functions

close #701

  1. … 26 more files in changeset.
DRILL-4199: Add Support for HBase 1.X

Highlights of the changes:

* Replaced the old HBase APIs (HBaseAdmin/HTable) with the new HBase 1.1 APIs (Connection/Admin/Table).

* Added HBaseConnectionManager class which which manages the life-cycle of HBase connections inside a Drillbit process.

* Updated HBase dependencies version to 1.1.3 and 1.1.1-mapr-1602-m7-5.1.0 for default and "mapr" profiles respectively.

* Added `commons-logging` dependency in the `provided` scope to allow HBase test cluster to come up for Unit tests.

* Relaxed banned dependency rule for `commons-logging` library for `storage-hbase` module alone, in provided scope only.

* Removed the use of many deprecated APIs throughout the modules code.

* Added some missing test to HBase storage plugin's test suit.

* Move the GuavaPatcher code to main code execution path.

* Log a message if GuavaPatcher fails instead of exiting.

All unit tests are green.

Closes #443

    • -0
    • +109
    ./apache/drill/exec/store/hbase/HBaseConnectionManager.java
  1. … 26 more files in changeset.
DRILL-4275: create TransientStore for short-lived objects; refactor PersistentStore to introduce pagination mechanism

    • -0
    • +238
    ./apache/drill/exec/store/hbase/config/HBasePersistentStore.java
  1. … 97 more files in changeset.
DRILL-4387: GroupScan or ScanBatchCreator should not use star column in case of skipAll query.

The skipAll query should be handled in RecordReader.

  1. … 5 more files in changeset.
DRILL-4382: Remove dependency on drill-logical from vector package

  1. … 80 more files in changeset.
DRILL-4198: Enhance StoragePlugin interface to expose logical space rules for planning purpose

Also move Hive partition pruning rules to logical storage plugin rulesets.

this closes #300

  1. … 9 more files in changeset.
DRILL-4020: The not-equal operator returns incorrect results when used on the HBase row key

- Added a condition that checks if the filter to the scan specification doesn't have NOT_EQUAL operator

- Added testFilterPushDownRowKeyNotEqual() to TestHBaseFilterPushDown

This closes #309

  1. … 1 more file in changeset.
DRILL-2288: Fix ScanBatch violation of IterOutcome protocol and downstream chain of bugs.

Increments:

2288: Pt. 1 Core: Added unit test. [Drill2288GetColumnsMetadataWhenNoRowsTest, empty.json]

2288: Pt. 1 Core: Changed HBase test table #1's # of regions from 1 to 2. [HBaseTestsSuite]

Also added TODO(DRILL-3954) comment about # of regions.

2288: Pt. 2 Core: Documented IterOutcome much more clearly. [RecordBatch]

Also edited some related Javadoc.

2288: Pt. 2 Hyg.: Edited doc., added @Override, etc. [AbstractRecordBatch, RecordBatch]

Purged unused SetupOutcome.

Added @Override.

Edited comments.

Fix some comments to doc. comments.

2288: Pt. 3 Core&Hyg.: Added validation of IterOutcome sequence. [IteratorValidatorBatchIterator]

Also:

Renamed internal members for clarity.

Added comments.

2288: Pt. 4 Core: Fixed a NONE -> OK_NEW_SCHEMA in ScanBatch.next(). [ScanBatch]

(With nearby comments.)

2288: Pt. 4 Hyg.: Edited comments, reordered, whitespace. [ScanBatch]

Reordered

Added comments.

Aligned.

2288: Pt. 4 Core+: Fixed UnionAllRecordBatch to receive IterOutcome sequence right. (3659) [UnionAllRecordBatch]

2288: Pt. 5 Core: Fixed ScanBatch.Mutator.isNewSchema() to stop spurious "new schema" reports (fix short-circuit OR, to call resetting method right). [ScanBatch]

2288: Pt. 5 Hyg.: Renamed, edited comments, reordered. [ScanBatch, SchemaChangeCallBack, AbstractSingleRecordBatch]

Renamed getSchemaChange -> getSchemaChangedAndReset.

Renamed schemaChange -> schemaChanged.

Added doc. comments.

Aligned.

2288: Pt. 6 Core: Avoided dummy Null.IntVec. column in JsonReader when not needed (MapWriter.isEmptyMap()). [JsonReader, 3 vector files]

2288: Pt. 6 Hyg.: Edited comments, message. Fixed message formatting. [RecordReader, JSONFormatPlugin, JSONRecordReader, AbstractMapVector, JsonReader]

Fixed message formatting.

Edited comments.

Edited message.

Fixed spurious line break.

2288: Pt. 7 Core: Added column families in HBaseRecordReader* to avoid dummy Null.IntVec. clash. [HBaseRecordReader]

2288: Pt. 8 Core.1: Cleared recordCount in OrderedPartitionRecordBatch.innerNext(). [OrderedPartitionRecordBatch]

2288: Pt. 8 Core.2: Cleared recordCount in ProjectRecordBatch.innerNext. [ProjectRecordBatch]

2288: Pt. 8 Core.3: Cleared recordCount in TopNBatch.innerNext. [TopNBatch]

2288: Pt. 9 Core: Had UnorderedReceiverBatch reset RecordBatchLoader's record count. [UnorderedReceiverBatch, RecordBatchLoader]

2288: Pt. 9 Hyg.: Added comments. [RecordBatchLoader]

2288: Pt. 10 Core: Worked around mismatched map child vectors in MapVector.getObject(). [MapVector]

2288: Pt. 11 Core: Added OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

2288: Pt. 12 Core: Fixed memory leak in BaseTestQuery's printing.

Fixed bad skipping of RecordBatchLoader.clear(...) and

QueryDataBatch.load(...) for zero-row batches in printResult(...).

Also, dropped suppression of call to

VectorUtil.showVectorAccessibleContent(...) (so zero-row batches are

as visible as others).

2288: Pt. 13 Core: Fixed test that used unhandled periods in column alias identifiers.

2288: Misc.: Added # of rows to showVectorAccessibleContent's output. [VectorUtil]

2288: Misc.: Added simple/partial toString() [VectorContainer, AbstractRecordReader, JSONRecordReader, BaseValueVector, FieldSelection, AbstractBaseWriter]

2288: Misc. Hyg.: Added doc. comments to VectorContainer. [VectorContainer]

2288: Misc. Hyg.: Edited comment. [DrillStringUtils]

2288: Misc. Hyg.: Clarified message for unhandled identifier containing period.

2288: Pt. 3 Core&Hyg. Upd.: Added schema comparison result to logging. [IteratorValidatorBatchIterator]

2288: Pt. 7 Core Upd.: Handled HBase columns too re NullableIntVectors. [HBaseRecordReader, TestTableGenerator, TestHBaseFilterPushDown]

Created map-child vectors for requested columns.

Added unit test method testDummyColumnsAreAvoided, adding new row to test table,

updated some row counts.

2288: Pt. 7 Hyg. Upd.: Edited comment. [HBaseRecordReader]

2288: Pt. 11 Core Upd.: REVERTED all of bad OK_NEW_SCHEMA schema comparison for HashAgg. [HashAggTemplate]

This reverts commit 0939660f4620c03da97f4e1bf25a27514e6d0b81.

2288: Pt. 6 Core Upd.: Added isEmptyMap override in new (just-rebased-in) PromotableWriter. [PromotableWriter]

Adjusted definition and default implementation of isEmptyMap (to handle MongoDB

storage plugin's use of JsonReader).

2288: Pt. 6 Hyg. Upd.: Purged old atLeastOneWrite flag. [JsonReader]

2288: Pt. 14: Disabled newly dying test testNestedFlatten().

  1. … 39 more files in changeset.
DRILL-2583, DRILL-3428: Catch exceptions, and throw UserException#dataReadError with more context. This closes #161

+ Added convenient method to UserException for String.format(...)

  1. … 2 more files in changeset.
DRILL-3581: Upgrade to Guava 18.0

- Replace Stopwatch constructors with .createStarted() or .createUnstarted()

- Stop using InputSupplier and Closeables.closeQuietly

- Clean up quiet closes to log or (preferably) propagate.

- Add log4j to enforcer exclusions.

- Update HBaseTestSuite to add patching of Closeables.closeQuietly() and Stopwatch legacy methods. Only needed when running HBaseMiniCluster.

- Remove log4j from HBase's pom to provide exception logging.

- Remove log4j from Hive's shaded pom.

- Update Catastrophic failures to use the same pattern to ensure reporting.

- Update test framework to avoid trying IPv6 resolution. (This removes 90s pause from HBase startup in my tests)

This closes #361.

This closes #157.

  1. … 64 more files in changeset.
DRILL-1942-readers: - add extends AutoCloseable to RecordReader, and rename cleanup() to close(). - fix many warnings - formatting fixes

DRILL-1942-readers:

- renamed cleanup() to close in the new JdbcRecordReader

Close apache/drill#154

  1. … 15 more files in changeset.
DRILL-4028: Update Drill to leverage latest version of Parquet library.

- Remove references to the shaded version of a Jackson @JsonCreator annotation from parquet, replace with proper fasterxml version.

- Fixing imports using the wrong parquet packages after rebase.

- Fixing issues with Drill parquet read a write path after merging the Drill parquet fork back into mainline.

- Fixed the issue with the writer, needed to flush the RecordConsumer in the ParquetRecordWriter.

- Consolidate page reading code

- Added some test to print out some additional context when an ordered comparison of two datasets fails in a test.

- Fix up parquet API usage in Hive Module.

- Adding unit test to read a write all types in parquet, the decimal types and interval year have some issues.

- Use direct codec factory from new package in the parquet library now that it has been moved.

- Moving the test for Direct Codec Factory out of the Drill source as the class itself has been moved.

- Small fix after consolidating two different ByteBuffer based implementations of BytesInput.

- Small fixes to accommodate interface changes.

- Small changes to remove direct references to DirectCodecFactory, this class is not accessible outside of parquet, but an instance with the same contract is now accessible with a new factory method on CodecFactory.

- Fixed failing test using miniDFS when reading a larger parquet file.

This closes #236

  1. … 55 more files in changeset.
DRILL-3621: Fix incorrect result if HBase filter contains row_key "or" filter or in list filter

Add unit test for row_key "or" filter and row_key in list filter.

Modify expected results for couple of existing unit tests, by specifying more strict regex pattern.

Add one row in Hbase test table, per review comment.

  1. … 4 more files in changeset.
DRILL-3492: Add support for encoding/decoding of to/from OrderedBytes format

Description:

This change allows encoding/decoding of data from/to 'double', 'float',

'bigint', 'int' and 'utf8' data types to/from OrderedBytes format.

It also allows for OrderedByte encoded row-keys to be stored in

ascending as well as descending order.

The following JIRA added the OrderedBytes encoding to HBase:

https://issues.apache.org/jira/browse/HBASE-8201

This encoding scheme will preserve the sort-order of the native

data-type when it is stored as sorted byte arrays on disk.

Thus, it will help the HBase storage plugin if the row-keys have been

encoded in OrderedBytes format.

This functionality allows us to prune the scan ranges, thus reading much

lesser data from the server.

Testing Done:

Added a new unit-test class TestOrderedBytesConvertFunctions.java which

derives from TestConvertFunctions.java class.

Also add new test cases to TestHBaseFilterPushDown class that will test

if we were able to push-down filters correctly and if the results are

correct.

DRILL-3492 - * Remove repeated allocations of byte arrays and PositionedByteRange objects

on heap(as suggested by Jason).

* Remove OrderedBytes encode/decode operations on UTF8 types.

Reasons -

1. These operations are slow and incur a lot of heap allocations

2. UTF8 types maintain their natural sort order when stored as binary arrays.

DRILL-3492 - Remove test code that creates test tables with UTF8 OrderedByte encoding.

  1. … 5 more files in changeset.
DRILL-3364: Prune scan range if the filter is on the leading field with byte comparable encoding

The change adds support to perform row-key range pruning when the row-key prefix

is interpreted as UINT4_BE, TIMESTAMP_EPOCH_BE, TIME_EPOCH_BE, DATE_EPOCH_BE,

UINT8_BE encoded.

Testing Done: Added a unit-tests for the new feature, also ran all existing

unit-tests to make sure there is no regression.

  1. … 11 more files in changeset.
DRILL-1651: Push filter past project with ITEM operator into HBase scan.

  1. … 1 more file in changeset.
DRILL-3500: Add OptimizerRulesContext which exposes information needed by storage plugin specific optimizer rules Add FunctionLookupContext to enable materializing function calls without having access to the entire function registry

  1. … 15 more files in changeset.
DRILL-2006: Updated Text reader. Increases variations of text files Drill can work with.

Text reader is heavily inspired by uniVocity parser although it is now byte based and customized for Drill's memory representations.

Also updated the RecordReader interface so that OperatorContext is presented at setup time rather than being a separate call.

  1. … 43 more files in changeset.
DRILL-2848: Provide option to disable decimal data type Disable casting to decimal Disable reading decimal from Parquet Disable reading decimal from Hive Add unit tests Modify existing tests to enable decimal data type

  1. … 38 more files in changeset.
DRILL-2826: Simplify and centralize Operator Cleanup

- Remove cleanup method from RecordBatch interface

- Make OperatorContext creation and closing the management of FragmentContext

- Make OperatorContext an abstract class and the impl only available to FragmentContext

- Make RecordBatch closing the responsibility of the RootExec

- Make all closes be suppresing closes to maximize memory release in failure

- Add new CloseableRecordBatch interface used by RootExec

- Make RootExec AutoCloseable

- Update RecordBatchCreator to return CloseableRecordBatches so that RootExec can maintain list

- Generate list of operators through change in ImplCreator

  1. … 95 more files in changeset.
DRILL-2567: CONVERT_FROM in where clause cause the query to fail in planning phase

Set the writeIndex of ByteBuf returned by Unpooled.wrappedBuffer() to 0.

+ Added a unit test to exercise the code path.

  1. … 1 more file in changeset.
DRILL-2567: CONVERT_FROM in where clause cause the query to fail in planning phase

Set the writeIndex of ByteBuf returned by Unpooled.wrappedBuffer() to 0.

+ Added a unit test to exercise the code path.

  1. … 1 more file in changeset.
DRILL-2514: Add support for impersonation in FileSystem storage plugin.

  1. … 66 more files in changeset.
DRILL-2413: FileSystemPlugin refactoring: avoid sharing DrillFileSystem across schemas

  1. … 32 more files in changeset.
DRILL-1690: Issue with using HBase plugin to access row_key only

DRILL-2173: Partition queries to drive dynamic pruning

Adds new interface on the QueryContext as well as individual schemas for exploring partitions of tables.

Adds injectable type for partition explorer for use in UDFs. This is hooked into both to expression

materialization and interpreted evaluation. The FragmentContext throws an exception to tell users to turn on

constant folding if a UDF that uses the PartitionExplorer makes it past planning.

2173 update -Address Chris' review comments.

Change the PartitionExplorer to return an Iterable<String> instead of String[]

Add interface level description to PartitionExplorer and StoragePluginPartitionExplorer.

New inner class in FileSystemPlugin to fulfill the new Iterable interface for partitions.

Formatting/cleanup fixes

Clean up error reporting code in MaxDir UDF. Remove method to get a string from a DrillBuf, as it was already defined in StringFunctionHelpers. Add new utility method to specifically convert a VarCharHolder to a string to remove some boilerplate.

Fixed an errant copy paste in a comment and removed unused imports.

Fix docs in FileSystemPlugin, belongs with the 2173 changes.

Fix references in Javadoc to properly use @link instead of @see.

2173 fixes, correctly return an empty list of sub-partitions if the path requested in the partition explorer interface is a file. Fix a few docs.

More 2173, finishing Chris' comments

2173 update - Add validation for PartitionExplorer injectable in UdfUtiltiers.

small change to fix refactored unit tests.

cleanup for 2173

Fix maxdir UDF so it can compile in runtime generated code as well as the interpreted expression system (needed to fully qualify classes and interfaces). It still fails to execute, as we prevent requesting a schema from a non-root fragment. We do not expect these types of functions to ever be used without constant folding so this should not be an issue.

Update error message in the case where the partition explorer is being used outside of planning.

Adding free marker generated maxdir, imaxdir, mindir and imindir

remove import that violates build checks, fix typo in new test class name

Separate out SubDirectoryList from FileSystemSchemaFactory.

Fix unit test to correctly test all four functions.

Update partition explorer to take List instead of Collection. As the lists are used in parallel it should be explicit that these are expected to be ordered (which Collections do not guarantee).

Drop the extra file generated due to the header in the free marker template and fix a typo and remove an unused import.

  1. … 19 more files in changeset.
DRILL-2090: Update HBase storage plugin to support HBase 0.98

    • -10
    • +8
    ./apache/drill/exec/store/hbase/HBaseUtils.java
  1. … 6 more files in changeset.
DRILL-1960: Automatic reallocation

  1. … 64 more files in changeset.
DRILL-1947: Cache PStore/EStore instances rather than recreating on each need. As part of this, make sure that PStoreConfig doesn't use identity equality.

  1. … 17 more files in changeset.