Clone Tools
  • last updated 26 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Change folder structure for Java repackage

Change only the folders, not the files, for our package name change.

This will break the build, and needs to be followed by a change to

the package name in all of the source files. However performing

the folder move and file change in two steps lets Git understand

that the files are the same, and lets us track revisions across

those files.

Change-Id: I08aff75e25ac7c6298c32cf3402febbc4a318c2a

Reviewed-on: https://asterix-gerrit.ics.uci.edu/307

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Chris Hillery <ceej@lambda.nu>

  1. … 3879 more files in changeset.
Add Apache RAT License Auditor plugin, and fix missing licenses

Change-Id: I39d92ec6654c73b4e6b8ba76dd66770bb60c7b79

Reviewed-on: https://asterix-gerrit.ics.uci.edu/260

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Chris Hillery <ceej@lambda.nu>

Reviewed-by: Till Westmann <tillw@apache.org>

  1. … 32 more files in changeset.
VariableSizeFrame(VSizeFrame) support for Hyracks.

This patch replaced Frame/Accessor/Appender with the new API which

supports BigObject.

The ExternalSorter/TopKSorter/ExternalGroupSorter

have been implemented to support big object.

The Groupby && Join should work with BigObject also. But it will break the

memory budget when it encounter a big object. I will fix the memory

problem later in a separate CR.

The design about the frame allocation is

here:https://docs.google.com/presentation/d/15h9iQf5OYsgGZoQTbGHkj1yS2G9q2fd0s1lDAD1EJq0/edit?usp=sharing

Suggest review order:

Patch 12: It includes all of the sorting operators.

Patch 13: It applys the new IFrame API to all Hyracks codes.

Patch 14: Some bug fixes to pass all Asterix's tests.

Patch 15: Skip it!

Patch 16: Some bug fixes to the Asterix's tests in small frame setting.

Later Patch: address the comments

Change-Id: I2e08692078683f6f2cf17387e39037ad851fc05b

Reviewed-on: https://asterix-gerrit.ics.uci.edu/234

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 205 more files in changeset.
Fix the HashFunction Bug in OptimizedHybridHashJoinOperatorDescriptor

The following commits from your working branch will be included:

The HashFunction used for InMemoryHashJoin is not update with level when

the OptimizedHybridHashJoin switches to InMemoryHashJoin. As the result,

it becomes the NestedLoopJoin after the 2nd round.

This patch is a fix for it.

Change-Id: Id25c85b7fadbb6bb969d0d94a51c60ac2573938e

Reviewed-on: https://asterix-gerrit.ics.uci.edu/285

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Pouria Pirzadeh <pouria.pirzadeh@gmail.com>

Add a flag for LSM-based indices to indicate whether force pages to disk devices during flush and merge.

Change-Id: I988716c03cffe30b008e144d3a478ee25e367212

Reviewed-on: https://asterix-gerrit.ics.uci.edu/240

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

    • -0
    • +5
    ./ics/hyracks/dataflow/std/file/FileSplit.java
  1. … 80 more files in changeset.
Range connector update with order by hint.

Change-Id: Iec1fbd79f62bfeef2081858bdfab3ff894f63e03

Reviewed-on: https://asterix-gerrit.ics.uci.edu/253

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Ildar Absalyamov <ildar.absalyamov@gmail.com>

  1. … 12 more files in changeset.
Issue 867: Handle delimited files using CR-only line separators

Also simplify record- and field-counting logic.

Change-Id: Ie28abda93fc9e5996008fac8b60aaf906df49cb7

Reviewed-on: https://asterix-gerrit.ics.uci.edu/246

Reviewed-by: Ian Maxon <imaxon@uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Preston Carman <ecarm002@ucr.edu>

- Fixed Type Casting issue - Reorganized duplicated internal class in the DelimitedDataParser and DelimitedDataParserFactory - Prevented a user from creating an inverted index on a dataset with a variable-length PK

Change-Id: Ice2027388bbd1641e94fa97118e0a3b2b6415461

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/231

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <westmann@gmail.com>

avoid duplication of Pointable code in SerializerDeserializer

Change-Id: Ia98985fc994e48d7d6a37dfaade0178b6644d836

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/221

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 39 more files in changeset.
- Fixed Type Casting issue - Reorganized duplicated internal class in the DelimitedDataParser and DelimitedDataParserFactory - Prevented a user from creating an inverted index on a dataset with a variable-length PK

Change-Id: Ic5606501223b8d860b49a258ff49afacd7d76b9a

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/191

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <westmann@gmail.com>

  1. … 52 more files in changeset.
Adding hash join logging comments.

commit 513c3a7899dc64af3c3cdec96fad9093a4ca2c5f

Merge: b27e9b5 82609d9

Author: Eldon Carman <ecarm002@ucr.edu>

Date: Thu Feb 5 12:47:52 2015 -0800

Adding hash join logging comments.

Change-Id: Iade2c53436e5ae82c31305d6f618c780cd72568b

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/219

Reviewed-by: Ian Maxon <imaxon@uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Pouria Pirzadeh <pouria.pirzadeh@gmail.com>

  1. … 3 more files in changeset.
Moved MaterializeOperator and NestedSubplanToJoinRule to Hyracks.

Change-Id: I74f62bc26706fc72c1baf05f27ce8cdf219cb778

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/168

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <westmann@gmail.com>

  1. … 19 more files in changeset.
This change list includes several fixes: 1. Adds a rule to push subplan into group-by 2. Adds a rule to eliminate subplan with input cardinality one 3. Fix the nested running aggregate runtime 4. Adds a wrapper of FrameTupleAppender to internally flush full frames. A TODO item is to cleanup existing usage of FrameTupleAppender to use the wrapper, which makes code simpler.

Change-Id: I647f9bce2f40700b18bdcad1fa64fb8f0a26838b

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/149

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Preston Carman <ecarm002@ucr.edu>

Reviewed-by: Till Westmann <westmann@gmail.com>

  1. … 9 more files in changeset.
Fixed CSV parser to recognize quote and delimiter inside a string.

Change-Id: Iac102286ff90d2b4cc54b1183fa024dec006c3b3

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/134

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <westmann@gmail.com>

  1. … 2 more files in changeset.
- Added Tokenize Operator in addition to the bulkload operator changes that were made by Zachary Heilbron. The tokenize operator is only added to the logical plan when bulk-loading the data. - Each secondary index is now updated in the separate branch by using the replicate operator. - Sink Operator now accepts multiple inputs. - Fixed the bulk-load so that it correctly produces auto-generated PK.

Change-Id: Ifb591754dba5eb4a9207edaa4e658f4cc745893a

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/78

Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

    • -0
    • +41
    ./ics/hyracks/dataflow/std/misc/SinkOperatorDescriptor.java
  1. … 53 more files in changeset.
Added replicate operator with materialization

be more aggressive to find shared plans in ExtractCommonOperatorRule

- find all the isomorphic subgraphs instead of just the ones on join build branches

- while expanding candidates handle the operators with multiple inputs

- analyze the DAG to find all the operators that can be co-scheduled, and infer the dependencies between clusters

- based on the dependencies, decide which outputs of a replicate operator needs materialization

- if the shared branch needs materialization, and it consists of only trivial operators (such as assign, unnest, datasource scan), that branch is discarded from the candidates

- modified the replicate operator descriptor to materialize the input if needed, and read from the materialized file for the outputs that requires materialization

- removed redundant decor variables in group-by

- fixed a bug on computing live variables for unnest-map operator: if the operator does not propagate inputs, those input variables should not be live anymore

- fixed a bug in ComplexUnnestToProductRule

Change-Id: If221d1507844f9409bf1163f93b0c04ef5848578

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/86

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

    • -0
    • +86
    ./ics/hyracks/dataflow/std/misc/MaterializerTaskState.java
  1. … 41 more files in changeset.
Several bug fixes in HHJ, NLJ, and tokenizer

- in HHJ handle the case when it spills and skipInMemoryHJ is set to false,

- check for memsize in NLJ and correctly set memsize in HHJ,

- make counthashed-ngram-token() to skip the bits for length & type

Change-Id: I908345f993019b0bfd0ac0bcb3e497a42295b623

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/96

Reviewed-by: Pouria Pirzadeh <pouria.pirzadeh@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 1 more file in changeset.
Several major changes in hyracks: -- reduced CC/NC communications for reporting partition request and availability; partition request/availability are only reported for the case of send-side materialized (without pipelining) policies in case of task re-attempt. -- changed buffer cache to dynamically allocate memory based on needs instead of pre-allocating -- changed each network channel to lazily allocate memory based on needs, and changed materialized connectors to lazily allocate files based on needs -- changed several major CCNCCFunctions to use non-java serde -- added a sort-based group-by operator which pushes group-by aggregations into an external sort -- make external sort a stable sort

1,3,and 4 is to reduce the job overhead.

2 is to reduce the unecessary NC resource consumptions such as memory and files.

5 and 6 are improvements to runtime operators.

One change in algebricks:

-- implemented a rule to push group-by aggregation into sort, i.e., using the sort-based gby operator

Several important changes in pregelix:

-- remove static states in vertex

-- direct check halt bit without deserialization

-- optimize the sort algorithm by packing yet-another 2-byte normalized key into the tPointers array

Change-Id: Id696f9a9f1647b4a025b8b33d20b3a89127c60d6

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/35

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <westmann@gmail.com>

  1. … 270 more files in changeset.
small improvements - scan disk files less often - some small cleanup

Merge branch 'master' into yingyi/fullstack_fix

  1. … 1 more file in changeset.
Fixed a bug on unclosed running aggregation runtime; fixed an issue on two adjacent exchange operators (connectors) when duplicate sort operator is removed.

  1. … 5 more files in changeset.
checkpoint: addressed Vinayak's review comments

  1. … 1 more file in changeset.
checkpoint: added support on running aggregation using group-by runtime. Aggregator interface is also updated in order to handle both accumulating and running aggregation.

  1. … 11 more files in changeset.
support both quick sort and merge sort as in-memory sort algorithms, but merge sort is the default one

    • -0
    • +21
    ./ics/hyracks/dataflow/std/sort/Algorithm.java
    • -0
    • +253
    ./ics/hyracks/dataflow/std/sort/FrameSorterMergeSort.java
    • -0
    • +252
    ./ics/hyracks/dataflow/std/sort/FrameSorterQuickSort.java
    • -0
    • +37
    ./ics/hyracks/dataflow/std/sort/IFrameSorter.java
  1. … 7 more files in changeset.
revert error message printing

  1. … 1 more file in changeset.
fix the cache miss problem in the sort merge reader

  1. … 14 more files in changeset.
revert the referenceentry change

address Vinayak's code review comments

  1. … 17 more files in changeset.
avoid ByteBuffer.getInt() call in FrameSorter and FrameTupleAccessor to improve the performance

  1. … 2 more files in changeset.
code clean-up before commit