Clone
adeneche <adeneche@gmail.com>
committed
on 13 Mar 15
DRILL-3200: Add Window functions: ROW_NUMBER, RANK, PERCENT_RANK, DENSE_RANK and CUME_DIST
- enum WindowFrameRecordBatch.WindowFunction to h… Show more
DRILL-3200: Add Window functions: ROW_NUMBER, RANK, PERCENT_RANK, DENSE_RANK and CUME_DIST

- enum WindowFrameRecordBatch.WindowFunction to handle supported window function and their corresponding output MajorType

- renamed WindowFrameTemplate -> DefaultFrameTemplate, cleaned the template to handle the default frame efficiently:

. a batch can be processed as soon as we find the last peer row of it's last row

. once a batch is processed it can be safely released => we can transfer it's value vectors to the container instead of copying them

- DefaultFrameTemplate.Partition tracks the current window frame and computes the following window functions automatically: row_number, rank, dense_rank, percent_rank, cume_dist. It doesn't need to aggregate the value vectors to compute these window functions

- updated TestWindowFrame to check the results of row_number, rank, dense_rank, percent_rank and cume_dist in various cases

. added a debug config option to MSorter to control the size of batches. This is needed by TestWindowFrame so it can use small test data files (20 rows per batch)

. removed contrib/data/window-test-data

- WindowFrameRecordBatch properly releases saved batches if the query stops prematurely

- GenerateTestData can be used to generate test data for the window function unit tests [it's a work in progress and can be either improved to make it developer friendly or removed from the final patch]

- using newly created WindowDataBatch in place of RecordDataBatch, to expose FragmentContext and VectorAccessible (fixes DRILL-3218)

- window.enable is true by default

Show less

master + 17 more