Clone
Paul Rogers <par0328@yahoo.com>
committed
on 24 Jun
DRILL-7306: Disable schema-only batch for new scan framework
The EVF framework is set up to return a "fast schema" empty batch
with only sch… Show more
DRILL-7306: Disable schema-only batch for new scan framework

The EVF framework is set up to return a "fast schema" empty batch

with only schema as its first batch because, when the code was

written, it seemed that's how we wanted operators to work. However,

DRILL-7305 notes that many operators cannot handle empty batches.

Since the empty-batch bugs show that Drill does not, in fact,

provide a "fast schema" batch, this ticket asks to disable the

feature in the new scan framework. The feature is disabled with

a config option; it can be re-enabled if ever it is needed.

SQL differentiates between two subtle cases, and both are

supported by this change.

1. Empty results: the query found a schema, but no rows

  are returned. If no reader returns any rows, but at

  least one reader provides a schema, then the scan

  returns an empty batch with the schema.

2. Null results: the query found no schema or rows. No

  schema is returned. If no reader returns rows or

  schema, then the scan returns no batch: it instead

  immediately returns a DONE status.

For CSV, an empty file with headers returns the null result set

(because we don't know the schema.) An empty CSV file without headers

returns an empty result set because we do know the schema: it will

always be the columns array.

Old tests validate the original schema-batch mode, new tests

added to validate the no-schema-batch mode.

Show less