Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
[ASTERIXDB-2488][COMP] Support aggregate window functions

- user model changes: yes

- storage format changes: no

- interface changes: no

Details:

- Implement aggregate window functions:

agg_func() OVER (frame_var AS)? (PARTITION BY ... ORDER BY ... frame_spec)

- Where agg_func is a SQL/SQL++ aggregate function

- Fix percent_rank() to always return 0 for the first tuple

- Fix ntile() to handle NULL argument

- Log query after each rewrite rule in SqlppQueryRewriter

- Implement toString() for ADayTimeDuration, fix it for AYearMonthDuration

- Add seek() method to RunFileReader

Change-Id: If0f71118a04c2dbd3462070673d52e67f076b7e1

Reviewed-on: https://asterix-gerrit.ics.uci.edu/3049

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Ali Alsuliman <ali.al.solaiman@gmail.com>

    • -7
    • +7
    ./leftouterjoin-probe-pidx-with-join-edit-distance-check-idx_01.plan
    • -14
    • +14
    ./leftouterjoin-probe-pidx-with-join-edit-distance-check-idx_01_ps.plan
    • -19
    • +19
    ./leftouterjoin-probe-pidx-with-join-jaccard-check-idx_01.plan
    • -38
    • +38
    ./leftouterjoin-probe-pidx-with-join-jaccard-check-idx_01_ps.plan
  1. … 840 more files in changeset.
[ASTERIXDB-2286][COMP][FUN][HYR] Parallel Sort Optimization

- user model changes: yes

- storage format changes: no

- interface changes: yes

details:

- new plan for sort operation which includes sampling and

replicating the stream of data to be sorted. Sort-merge connector

is removed from the plan. The sorted result now is in multiple partitions.

- new optimization rule to check whether full parallel sort is applicable.

- new Forward operator to read the replicated sort input stream and

to receive the ouput of the sampling.

- new sequential merge connector to merge a globally ordered result residing

in multiple partitions (in addition to the connector's partition computer).

- "asterix-lang-aql/pom.xml" is changed as a result of refactoring

code related to the range map handling.

- new private sampling function to generate the range map object

(local & global functions) & their type computers.

user model changes:

- new compiler property is added to enable and disable parallel sort.

interface changes:

- "ILogicalOperatorVisitor.java" includes Forward Operator.

- "ITuplePartitionComputer.java" includes initialize() to enable partitioner

to do some initialization. FieldRangePartitionComputerFactory uses it to

pick a range map.

- "ITuplePartitionComputerFactory.java". createPartitioner() is changed to

createPartitioner(IHyracksTaskContext hyracksTaskContext). Context is needed

for transferring the range map throught the context.

Change-Id: I73e128029a46f45e6b68c23dfb9310d5de10582f

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2393

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>

    • -0
    • +165
    ./leftouterjoin-probe-pidx-with-join-edit-distance-check-idx_01_ps.plan
    • -0
    • +353
    ./leftouterjoin-probe-pidx-with-join-jaccard-check-idx_01_ps.plan
    • -0
    • +54
    ./ngram-contains_ps.plan
  1. … 355 more files in changeset.
[NO ISSUE][COMP][RT] Enable multiway similarity joins

- Enable the FuzzyJoinRule that transforms

a nested-loop-similarity-join plan to a three-stage-similarity join.

- Modify FuzzyJoinRuleCollections.

- Add the ExtractCommonExpressionRule to extract common expressions

in the star-like multiple similarity join substitutions.

- Add the InlineSubplanInputForNestedTupleSourceRule to translate

the generated subplan from the similarity function-derived

substitution into join in case of nested schemas.

- Use similarity-jaccard-prefix to enable the pp+ join strategy.

- Use the right side to build the heavy hash join on

the prefix tokens from both sides.

- Add RemoveAssign/Variables/AggRules to iteratively remove unused

assign/vars once FuzzyJoinRule is applied in each round.

- Add three new optimization cases for multiway similarity joins.

- link-like multiway similarity joins

- star-like multiway similarity joins

- hybrid multiway similarity joins with the both styles of similarity joins.

- Add a check whether a similarity function is on

a select over an existing similarity join.

- Change the inverted-index-based similarity join to the three-stage-similarity join

due to efficiency considerations.

Change-Id: I8736f104905eeda763d39709e002c2b9629278cc

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1076

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>

Reviewed-by: Taewoo Kim <wangsaeu@gmail.com>

    • -21
    • +145
    ./leftouterjoin-probe-pidx-with-join-jaccard-check-idx_01.plan
    • -21
    • +136
    ./ngram-fuzzyeq-jaccard_01.plan
    • -21
    • +136
    ./ngram-fuzzyeq-jaccard_02.plan
    • -22
    • +136
    ./ngram-fuzzyeq-jaccard_03.plan
    • -21
    • +136
    ./ngram-jaccard-check_01.plan
    • -21
    • +136
    ./ngram-jaccard-check_02.plan
    • -22
    • +136
    ./ngram-jaccard-check_03.plan
    • -22
    • +138
    ./ngram-jaccard-check_04.plan
    • -21
    • +136
    ./word-fuzzyeq-jaccard_01.plan
    • -21
    • +136
    ./word-fuzzyeq-jaccard_02.plan
  1. … 246 more files in changeset.
[ASTERIXDB-2412][COMP] ExtractCommonExpressionsRule fix

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

ExtractCommonExpressionsRule should not be applied to JOIN by using

Cartesian Product + SELECT since it will add extract overhead. Also,

blindly adding SELECT without checking GROUP-BY and other possible OPs

in between could cause type inference error.

Change-Id: I20e1fa161c42e0494c7ca587b8bffdc80d656058

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2770

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>

    • -23
    • +24
    ./ngram-edit-distance-check_04.plan
    • -23
    • +24
    ./olist-edit-distance-check_04.plan
  1. … 17 more files in changeset.
[ASTERIXDB-2366][TEST] Optimizer tests cleanup for SQL++

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

The current optimizerTest actually doesn't use the SQL++ test cases.

The existed test cases for SQLPP also have various issues.

This patch cleans part of the test cases which failed at variable names

changes in the result query plan.

Change-Id: I8dbe67d6376d517a4919e8478a6e88326b3e1cc0

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2591

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>

    • -7
    • +7
    ./leftouterjoin-probe-pidx-with-join-edit-distance-check-idx_01.plan
    • -7
    • +7
    ./leftouterjoin-probe-pidx-with-join-jaccard-check-idx_01.plan
  1. … 445 more files in changeset.
[ASTERIXDB-2119][COMP] Fix variable ordering of projects

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

The current IntroduceProjectsRule implementation uses HashSet to

calculate projected variables, which makes the ordering of output

variables unpreditable. This patch fixes this undesired behavior by

using LinkedHashSet to ensure the project variables have the same

ordering from the original variables.

Change-Id: Id96a5fe048dd11b7f2e97f4d4a802736ba5ba003

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2043

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Taewoo Kim <wangsaeu@gmail.com>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

    • -19
    • +17
    ./leftouterjoin-probe-pidx-with-join-edit-distance-check-idx_01.plan
    • -22
    • +21
    ./ngram-edit-distance-check_04.plan
    • -22
    • +21
    ./olist-edit-distance-check_04.plan
  1. … 5 more files in changeset.
Changed the physical tag of ReplicatePOperator (SPLIT -> REPLICATE)

Change-Id: Ic298f90c5bc9875cea1017aff17a524214596b1e

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1219

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <tillw@apache.org>

    • -5
    • +5
    ./leftouterjoin-probe-pidx-with-join-edit-distance-check-idx_01.plan
    • -2
    • +2
    ./leftouterjoin-probe-pidx-with-join-jaccard-check-idx_01.plan
  1. … 79 more files in changeset.
ASTERIXDB-1487: fix the wrong plan when we prune the selective branch.

1. Add the test case of ASTERIX-1487 with single join branch required.

2. Disable the join branch pruning in case of unnestmap following datasourcescan.

- We need to prune the join branch when it is NOT required by the upstream operators and its generated join key is derived from the same DATASOURCE of the other branch.

- We SHOULD NOT prune the join branch if there exists a selective operator (UNNESTMAP, LOUNNESTMAP, LIMIT, SELECT) located between the join operator and DATASOURCESCAN.

Change-Id: I1aef69a2278853fd9f8020da6639331b367ed5ad

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1119

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 9 more files in changeset.
ASTERIXDB-1572 and ASTERIXDB-1591: fix and regression tests.

- push aggregates into subplans;

- fix recursive variable mapping in subquery decorrelation.

Change-Id: I7092dd2fa7c9193ff919b27464854936f48261b0

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1161

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <tillw@apache.org>

  1. … 30 more files in changeset.
ASTERIXDB-1407: let the build branch to broadcast for NestedLoopJoin.

-Change the broadcast branch;

-Fix a bug in SuperActivityOperatorNodePushable;

-Fix jobbuilder to use a fixed location (within query) for operators

with "count=1" constraint;

-Fix OptimizerTest to generate the same directory structure for

actual files as expected files.

-Updates the test query plans.

Change-Id: I0988624406d2f7460f0ee5ac7b4829d81e48c652

Reviewed-on: https://asterix-gerrit.ics.uci.edu/828

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Jianfeng Jia <jianfeng.jia@gmail.com>

    • -2
    • +2
    ./leftouterjoin-probe-pidx-with-join-edit-distance-check-idx_01.plan
  1. … 83 more files in changeset.