Clone Tools
  • last updated 20 mins ago
Constraints: committers
Constraints: files
Constraints: dates
Merge fixes.

  1. … 19 more files in changeset.
DRILL-933: Remove old physical operator cost & size concepts, add automatic size-based parallelization

  1. … 81 more files in changeset.
DRILL-836: Drill needs to return complex types (e.g., map and array) as a JSON string

  1. … 17 more files in changeset.
DRILL-968: Use checkstyle plugin to prevent inadvertent use of shaded Guava classes

+ Disallow non-static '*' imports in handwritten code.

+ Updated the current code to be in compliance.

+ Run 'rat' plugin in 'validate' phase.

  1. … 102 more files in changeset.
DRILL-600: Support planning for Union-All. Added infrastructure for planning Union-Distinct (not enabled yet).

  1. … 18 more files in changeset.
Remove references to jcommander's copy of Guava's Lists class.

  1. … 31 more files in changeset.
status changes

  1. … 62 more files in changeset.
Fix bug in MergeJoin when there are repeating values across left batches. Ensure join type is specified. Add join type to some physical plan in unit test case.

Re-enable a testcase for MergeJoin. +

  1. … 6 more files in changeset.
DRILL-707 : Replace ValueAllocator with allocateNewSafe() in SVR. WIP.

remove valueallocator in SVR.

SV for Limit OP.

Selection vector remover. More WIP.

code clean up.

reverse rule change.

  1. … 5 more files in changeset.
DRILL-679: Support create table as query (CTAS) (contd.).

Continuation to e19606593f3173d8f82ca3074186e9ca7a960ce2.

Refactoring and align the writer interfaces similar to reader interfaces at the storage and file format level.

  1. … 48 more files in changeset.
DRILL-679: Support create table as query (CTAS).

  1. … 47 more files in changeset.
DRILL-578: Performance fixes in Hash Join.

Includes a few minor fixes when hash join receives empty batches on build / probe side

  1. … 6 more files in changeset.
DRILL-557: Fix mismatch between Jackson annotations and field names.

  1. … 12 more files in changeset.
DRILL-576: Add costing - new plans for joins and aggregations, including distributions. - Utilize GroupScan getSize() for costing - Add cleanup() methods to MergeJoinBatch and HashJoinBatch. - Don't match hash aggr rule if number of grouping cols is 0. - Fix initialization of maxOccupiedIndex in HashAggr and HashTable. - Fix less-than comparison for cost when row counts are the same. - Improve fragment identification for better debugging.

    • -0
    • +93
  1. … 48 more files in changeset.
fix merge join to handle case where right batch has zero records

  1. … 4 more files in changeset.
Fix the duplicate field names in join operator. Work in progress for column star.

  1. … 3 more files in changeset.
DRILL-335: Implement Hash Aggregation

1. Implementation of the hash aggregation execution operator - this has two main parts: the HashAggTemplate and the HashAggBatch.

2. Implementation of a hash table which is used by the hash aggregation. The hash table hash two main parts: the HashTableTemplate and the ChainedHashTable. The hash table internally uses the notion of 'BatchHolder' to keep track of all keys that can fit within one batch of 64K values. New BatchHolder objects are created as needed. Each BatchHolder has its own vector container. The HashAggregate also has a similar structure and it keeps track of the workspace variables.

(NOTE: An initial design document for the hash aggregation and hash table was already attached with Drill-335. The document has not yet been updated with the latest implementation ... will try to do that in the near future).

3. Jinfeng's changes to use workspace vectors in the generated code for aggregate functions (previously, for streaming aggregate we only needed to maintain workspace variable for 1 running group; however for hash aggregate we need to maintain it for all groups).

4. Fix for Drill-318: because of #3 above, the previous fix for Drill-318 is not valid anymore. I modified the template generation code for the aggregate functions such that they conform to the new infrastructure.

5. The original AggTemplate, AggBatch and Aggregator classes have been moved to corresponding StreamingAggTemplate, StreamingAggBatch and StreamingAggregator in order to differentiate it from hash aggregation. These appear as new files but the code there has not changed.

I have run several tests manually as part of TestHashAggr...these tests use TPC-H data and in particular a relatively large 'Orders' table. However, I have not yet packaged the tests to run as part of JUnit since the location and size of the parquet files needs to be figured out. I will continue to work on that.

    • -0
    • +119
  1. … 48 more files in changeset.
DRILL-450: Add exchange rules, move from BasicOptimizer to Optiq

  1. … 120 more files in changeset.
DRILL-505: Hash Join

Support for left outer, right outer and full joins

Support for multiple join conditions

Add following tests

- Multiple condition join

- Join on JSON scan

- Multi batch join

- Simple equality join

    • -0
    • +143
  1. … 19 more files in changeset.
DRILL-386: Implement External Sort operator

  1. … 38 more files in changeset.
DRILL-385: Implement Top-N sort operator

  1. … 19 more files in changeset.
DRILL-257: Move SQL parsing to server side. Switch to Avatica based JDBC driver. Update QuerySubmitter to support SQL queries. Update SqlAccesors to support getObject() Remove ref, clean up SQL packages some. Various performance fixes. Updating result set so first set of results must be returned before control is return to client to allow metadata to populate for aggressive tools like sqlline Move timeout functionality to TestTools. Update Expression materializer so that it will return a nullable int if a field is not found. Update Project record batch to support simple wildcard queries. Updates to move JSON record reader test to expecting VarCharVector.getObject to return a String rather than a byte[].

  1. … 305 more files in changeset.
DRILL-334: Subdivide Drillbit control and data messages. Add support for socket backpressure. Add TopLevel and Child memory allocator with debug mode to capture memory leaks. Various memory leak fixes to get build to complete.

Also includes fixes from reviews by Tim.

  1. … 212 more files in changeset.
DRILL-281 Add broadcast sender

    • -0
    • +79
    • -0
    • +66
  1. … 10 more files in changeset.
DRILL-229: N-WAY merging receiver

    • -0
    • +85
    • -0
    • +92
  1. … 13 more files in changeset.
DRILL-254: Add iterator validator and correct interface violations

    • -0
    • +48
  1. … 11 more files in changeset.
DRILL-256 revised patch

  1. … 21 more files in changeset.
DRILL-230: Addressing comments in code review, abstract out references to HazelCache and add comments

  1. … 33 more files in changeset.
DRILL-230: Build a sampling range partitioner

    • -0
    • +93
    • -0
    • +93
  1. … 44 more files in changeset.
DRILL-221 Add license header to all files

  1. … 815 more files in changeset.