Clone Tools
  • last updated 18 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6168: Revise format plugin table functions

Allows table functions to inherit properties from a

defined format plugin.

Also DRILL-7612: enforces immutability for all format plugins.

  1. … 46 more files in changeset.
DRILL-7620: Fix plugin mutability issues

A recent commit made the plugin registry more strict about

the rule that, once a plugin is registered, it must be

immutable. A flaw enforcing that rule in the UI put the

registry in an inconsistent state.

Also

* Registry-specific errors

* Push more operations from UI layer into registry

* Clean up semantics of "resolve" for plugins

* Add more unit tests

* Better handling of "bad" plugins

* Force plugin names to lower case

* Fix comparison bugs in some format plugins

  1. … 101 more files in changeset.
DRILL-7601: Shift column conversion to reader from scan framework

Allows the column writers to be generic, moves scan-specific

conversions into each reader where needed, implemented in

a reader-specific way.

Adds a revised way of handling projections in the result set

loader that is not coupled with conversion, as the prior

design was.

Updates the CSV, Avro, Log and HDF5 readers.

closes #1993

  1. … 220 more files in changeset.
DRILL-7590: Refactor plugin registry

Major cleanup of the plugin registry to split it into components

in preparation for a proper plugin API.

Better coordinates the named and ephemeral plugin caches.

Cleans up the registry API. Sharpens rules for modifying

plugin configs.

closes #1988

  1. … 163 more files in changeset.
DRILL-7350: Move RowSet related classes from test folder

  1. … 292 more files in changeset.
DRILL-7327: Log Regex Plugin Won't Recognize Schema

The previous commit revised the plugin config classes to work

with table functions. That caused Jackson to stop working for

the classess. Fixed those issues and added unit tests.

closes #1827

  1. … 4 more files in changeset.
DRILL-7310: Move schema-related classes from exec module to be able to use them in metastore module

closes #1816

  1. … 102 more files in changeset.
DRILL-7306: Disable schema-only batch for new scan framework

The EVF framework is set up to return a "fast schema" empty batch

with only schema as its first batch because, when the code was

written, it seemed that's how we wanted operators to work. However,

DRILL-7305 notes that many operators cannot handle empty batches.

Since the empty-batch bugs show that Drill does not, in fact,

provide a "fast schema" batch, this ticket asks to disable the

feature in the new scan framework. The feature is disabled with

a config option; it can be re-enabled if ever it is needed.

SQL differentiates between two subtle cases, and both are

supported by this change.

1. Empty results: the query found a schema, but no rows

are returned. If no reader returns any rows, but at

least one reader provides a schema, then the scan

returns an empty batch with the schema.

2. Null results: the query found no schema or rows. No

schema is returned. If no reader returns rows or

schema, then the scan returns no batch: it instead

immediately returns a DONE status.

For CSV, an empty file with headers returns the null result set

(because we don't know the schema.) An empty CSV file without headers

returns an empty result set because we do know the schema: it will

always be the columns array.

Old tests validate the original schema-batch mode, new tests

added to validate the no-schema-batch mode.

  1. … 44 more files in changeset.
DRILL-7293: Convert the regex ("log") plugin to use EVF

Converts the log format plugin (which uses a regex for parsing) to work

with the Extended Vector Format.

User-visible behavior changes added to the README file.

* Use the plugin config object to pass config to the Easy framework.

* Use the EVF scan mechanism in place of the legacy "ScanBatch"

mechanism.

* Minor code and README cleanup.

* Replace ad-hoc type conversion with builtin conversions

The provided schema support in the enhanced vector framework (EVF)

provides automatic conversions from VARCHAR to most types. The log

format plugin was created before EVF was available and provided its own

conversion mechanism. This commit removes the ad-hoc conversion code and

instead uses the log plugin config schema information to create an

"output schema" just as if it was provided by the provided schema

framework.

Because we need the schema in the plugin (rather than the reader), moved

the schema-parsing code out of the reader into the plugin. The plugin

creates two schemas: an "output schema" with the desired output types,

and a "reader schema" that uses only VARCHAR. This causes the EVF to

perform conversions.

* Enable provided schema support

Allows the user to specify types using either the format config (as

previously) or a provided schema. If a schema is provided, it will match

columns using names specified in the format config.

The provided schema can specify both types and modes (nullable or not

null.)

If a schema is provided, then the types specified in the plugin config

are ignored. No attempt is made to merge schemas.

If a schema is provided, but a column is omitted from the schema, the

type defaults to VARCHAR.

* Added ability to specify regex in table properties

Allows the user to specify the regex, and the column schema,

using a CREATE SCHEMA statement. The README file provides the details.

Unit tests demonstrate and verify the functionality.

* Used the custom error context provided by EVF to enhance the log format

reader error messages.

* Added user name to default EVF error context

* Added support for table functions

Can set the regex and maxErrors fields, but not the schema.

Schema will default to "field_0", "field_1", etc. of type

VARCHAR.

* Added unit tests to verify the functionality.

* Added a check, and a test, for a regex with no groups.

* Added columns array support

When the log regex plugin is given no schema, it previously

created a list of columns "field_0", "field_1", etc. After

this change, the plugin instead follows the pattern set by

the text plugin: it will place all fields into the columns

array. (The two special fields are still separate.)

A few adjustments were necessary to the columns array

framework to allow use of the special columns along with

the `columns` column.

Modified unit tests and the README to reflect this change.

The change should be backward compatible because few users

are likely relying on the dummy field names.

Added unit tests to verify that schema-based table

functions work. A test shows that, due to the unforunate

config property name "schema", users of this plugin cannot

combine a config table function with the schema attribute

in the way promised in DRILL-6965.

  1. … 18 more files in changeset.
DRILL-7278: Refactor result set loader projection mechanism

Drill 1.16 added a enhanced scan framework based on the row set

mechanisms, and a "provisioned schema" feature build on top

of that framework. Conversion of the log reader plugin to use

the framework identified additional features we wish to add,

such as marking a column as "special" (not expanded in a wildcard

query.)

This work identified that the code added for provisioned schemas in

Drill 1.16 worked, but is a bit overly complex, making it hard to add

the desired new feature.

This patch refactors the "reader" projection code:

* Create a "projection set" mechanism that the reader can query to ask,

"the caller just added a column. Should it be projected or not?"

* Unifies the type conversion mechanism added as part of provisioned

schemas.

* Added the "special column" property for both "reader" and "provided"

schemas.

* Verified that provisioned schemas work with maps (at least on the scan

framework side.)

* Replaced the previous "schema transformer" mechanism with a new "type

conversion" mechanism that unifies type conversion, provided schemas

and an optional custom type conversion mechanism.

* Column writers can report if they are projected. Moved this query

from metadata to the column writer itself.

* Extended and clarified documentation of the feature.

* Revised and/or added unit tests.

closes #1797

  1. … 72 more files in changeset.
DRILL-7007: Use verify method in row set tests

Many of the early RowSet-based tests used the pattern:

new RowSetComparison(expected)

.verifyAndClearAll(result);

Revise this to use the simplified form:

RowSetUtilities.verify(expected, result);

The original form is retained when tests use additional functionality, such as the ability to perform multiple verifications on the same expected batch.

closes #1624

  1. … 9 more files in changeset.
DRILL-6970 Fix issue with logregex format plugin where drillbuf was overflowing

closes #1673

  1. … 2 more files in changeset.
DRILL-6901: Move schema builder to src/main

Moves the SchemaBuilder class out of the src/test name space into the src/main namespace. Specifically, into the existing record.metadata package.

Many files changed in this move. Corrected two minor issues: import of the wrong Arrays class and unnecessary annotations.

  1. … 89 more files in changeset.
DRILL-6104: Add Log/Regex Format Plugin

closes #1114

    • -0
    • +366
    ./TestLogReader.java
  1. … 13 more files in changeset.