Clone Tools
  • last updated 28 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7350: Move RowSet related classes from test folder

  1. … 291 more files in changeset.
DRILL-7306: Disable schema-only batch for new scan framework

The EVF framework is set up to return a "fast schema" empty batch

with only schema as its first batch because, when the code was

written, it seemed that's how we wanted operators to work. However,

DRILL-7305 notes that many operators cannot handle empty batches.

Since the empty-batch bugs show that Drill does not, in fact,

provide a "fast schema" batch, this ticket asks to disable the

feature in the new scan framework. The feature is disabled with

a config option; it can be re-enabled if ever it is needed.

SQL differentiates between two subtle cases, and both are

supported by this change.

1. Empty results: the query found a schema, but no rows

are returned. If no reader returns any rows, but at

least one reader provides a schema, then the scan

returns an empty batch with the schema.

2. Null results: the query found no schema or rows. No

schema is returned. If no reader returns rows or

schema, then the scan returns no batch: it instead

immediately returns a DONE status.

For CSV, an empty file with headers returns the null result set

(because we don't know the schema.) An empty CSV file without headers

returns an empty result set because we do know the schema: it will

always be the columns array.

Old tests validate the original schema-batch mode, new tests

added to validate the no-schema-batch mode.

  1. … 44 more files in changeset.
DRILL-7261: Simplify Easy framework config for new scan

Most format plugins are created using the Easy format plugin. A recent

change added support for the "row set" scan framework. After converting

the text and log reader plugins, it became clear that the setup code

could be made simpler.

* Add the user name to the "file scan" framework.

* Pass the file system, split and user name to the batch reader via

the "schema negotiator" rather than via the constructor.

* Create the traditional "scan batch" scan or the new row-set scan via

functions instead of classes.

* Add Easy config option and method to choose the kind of scan

framework.

* Add Easy config options for some newer options such as whether the

plugin supports statistics.

Simplified reader creation

* The batch reader can be created just by overriding a method.

* A default error context is provided if the plugin does not provide

one.

Tested by running all unit tests for the CSV reader which is based on

the new framework, and by testing the converted log reader (that reader

is not part of this commit.)

closes #1796

  1. … 6 more files in changeset.
DRILL-7181: Improve V3 text reader (row set) error messages

Adds an error context to the User Error mechanism. The context allows

information to be passed through an intermediate layer and applied when

errors are raised in lower-level code; without the need for that

low-level code to know the details of the error context information.

Modifies the scan framework and V3 text plugin to use the framework to

improve error messages.

Refines how the `columns` column can be used with the text reader. If

headers are used, then `columns` is just another column. An error is

raised, however, if `columns[x]` is used when headers are enabled.

Added another builder abstraction where a constructor argument list

became too long.

Added the drill file system and split to the file schema negotiator

to simplify reader construction.

Added additional unit tests to fully define the `columns` column

behavior.

  1. … 34 more files in changeset.
DRILL-7011: Support schema in scan framework

* Adds schema support to the row set-based scan framework and to the "V3" text reader based on that framework.

* Adding the schema made clear that passing options as a long list of constructor arguments was not sustainable. Refactored code to use a builder pattern instead.

* Added support for default values in the "null column loader", which required adding a "setValue" method to the column accessors.

* Added unit tests for all new or changed functionality. See TestCsvWithSchema for the overall test of the entire integrated mechanism.

* Added tests for explicit projection with schema

* Better handling of date/time in column accessors

* Converted recent column metadata work from Java 8 date/time to Joda.

* Added more CSV-with-schema unit tests

* Removed the ID fields from "resolved columns", used "instanceof" instead.

* Added wildcard projection with an output schema. Handles both "lenient" and "strict" schemas.

* Tagged projection columns with their output schema, when available.

* Scan projection added modes for wildcard with an output schema. The reader projection added support for merging reader and output schemas.

* Includes refactoring of scan operator tests (the test file grew too large.)

* Renamed some classes to avoid confusing reader schemas with output schemas.

* Added unit tests for the new functionality.

* Added "lenient" wildcard with schema test for CSV

* Added more type conversions: string-to-bit, many-to-string

* Fixed bug in column writer for VarDecimal

* Added missing unit tests, and fixed bugs, in Bit column reader/writer

* Cleaned up a number of unneded "SuppressWarnings"

closes #1711

  1. … 218 more files in changeset.
DRILL-7074: Scan framework fixes and enhancements

Roll-up of fixes an enhancements that emerged from the effort to host the CSV reader on the new framework.

closes #1676

  1. … 38 more files in changeset.
DRILL-6952: Host compliant text reader on the row set framework

The result set loader allows controlling batch sizes. The new scan framework

built on top of that framework handles projection, implicit columns, null

columns and more. This commit converts the "new" ("compliant") text reader

to use the new framework. Options select the use of the V2 ("new") or V3

(row-set based) versions. Unit tests demonstrate V3 functionality.

closes #1683

  1. … 56 more files in changeset.
DRILL-5603: Replace String file paths to Hadoop Path - replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality

closes #1657

  1. … 82 more files in changeset.
DRILL-6950: Row set-based scan framework

Adds the "plumbing" that connects the scan operator to the result set loader and the scan projection framework. See the various package-info.java files for the technical datails. Also adds a large number of tests.

This PR does not yet introduce an actual scan operator: that will follow in subsequent PRs.

closes #1618

    • -0
    • +166
    ./BaseFileScanFramework.java
    • -0
    • +75
    ./FileMetadataColumn.java
    • -0
    • +59
    ./FileMetadataColumnDefn.java
    • -0
    • +120
    ./FileMetadataColumnsParser.java
    • -0
    • +223
    ./FileMetadataManager.java
    • -0
    • +78
    ./FileScanFramework.java
    • -0
    • +47
    ./MetadataColumn.java
    • -0
    • +63
    ./PartitionColumn.java
  1. … 52 more files in changeset.
DRILL-6540: Upgrade to HADOOP-3.0.3 libraries

- accomodate apache and mapr profiles with hadoop 3.0 libraries

- update HBase version

- fix jdbc-all woodox dependency

- unban Apache commons-logging dependency

  1. … 10 more files in changeset.