Clone
Paul Rogers <par0328@yahoo.com>
committed
on 19 Jun
DRILL-6951: Merge row set based mock data source
The mock data source is used in several tests to generate a large volume
of sample data, su… Show more
DRILL-6951: Merge row set based mock data source

The mock data source is used in several tests to generate a large volume

of sample data, such as when testing spilling. The mock data source also

lets us try new plugin featues in a very simple context. During the

development of the row set framework, the mock data source was converted

to use the new framework to verify functionality. This commit upgrades

the mock data source with that work.

The work changes non of the functionality. It does, however, improve

memory usage. Batchs are limited, by default, to 10 MB in size. The row

set framework minimizes internal fragmentation in the largest vector.

(Previously, internal fragmentation averaged 25% but could be as high as

50%.)

As it turns out, the hash aggregate tests depended on the internal

fragmentation: without it, the hash agg no longer spilled for the same

row count. Adjusted the generated row counts to recreate a data volume

that caused spilling.

One test in particular always failed due to assertions in the hash agg

code. These seem true bugs and are described in DRILL-7301. After

multiple failed attempts to get the test to work, it ws disabled until

DRILL-7301 is fixed.

Added a new unit test to sanity check the mock data source. (No test

already existed for this functionality except as verified via other unit

tests.)

Show less