Clone Tools
  • last updated 18 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
[NO ISSUE][COMP][RT] Enable multiway similarity joins

- Enable the FuzzyJoinRule that transforms

a nested-loop-similarity-join plan to a three-stage-similarity join.

- Modify FuzzyJoinRuleCollections.

- Add the ExtractCommonExpressionRule to extract common expressions

in the star-like multiple similarity join substitutions.

- Add the InlineSubplanInputForNestedTupleSourceRule to translate

the generated subplan from the similarity function-derived

substitution into join in case of nested schemas.

- Use similarity-jaccard-prefix to enable the pp+ join strategy.

- Use the right side to build the heavy hash join on

the prefix tokens from both sides.

- Add RemoveAssign/Variables/AggRules to iteratively remove unused

assign/vars once FuzzyJoinRule is applied in each round.

- Add three new optimization cases for multiway similarity joins.

- link-like multiway similarity joins

- star-like multiway similarity joins

- hybrid multiway similarity joins with the both styles of similarity joins.

- Add a check whether a similarity function is on

a select over an existing similarity join.

- Change the inverted-index-based similarity join to the three-stage-similarity join

due to efficiency considerations.

Change-Id: I8736f104905eeda763d39709e002c2b9629278cc

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1076

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>

Reviewed-by: Taewoo Kim <wangsaeu@gmail.com>

    • -11
    • +10
    ./similarity/SimilarityFilters.java
    • -87
    • +54
    ./similarity/SimilarityFiltersJaccard.java
  1. … 259 more files in changeset.
[ASTERIXDB-2256] Reformat sources using code format template

Change-Id: I4faa141c1a8c9700d5e9ac50b839acc9d1eede73

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2310

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

    • -7
    • +7
    ./similarity/SimilarityFiltersJaccard.java
    • -2
    • +2
    ./similarity/SimilarityMetricJaccard.java
    • -1
    • +2
    ./tokenizer/TokenizerBufferedFactory.java
  1. … 977 more files in changeset.
[NO ISSUE][HYR][*DB] Minor refactoring / address SonarQube comments

Change-Id: Icf10b6df0fdc006675d8f0da6fd06d50200c6b6a

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2098

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

  1. … 59 more files in changeset.
ASTERIXDB-1778: Optimize the edit-distance-check function

- Only calculate 2 * (threshold + 1) cells, rather than all cells per row.

- Terminate the calculation steps early when it become obvious that

the possible edit-distance value is greater than the given threshold.

There is no reason to compute all cells in the 2 dimensional array.

- Move the location of IListIterator to Hyracks since we now have

a CharacterIterator in a String. Change the name to ISequenceIterator.

- Add the section for the function in the manual.

- Remove letter counting filtering method since it is only applicable for

the string in ASCII range (0 ~ 127).

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1481

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

BAD: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Jianfeng Jia <jianfeng.jia@gmail.com>

    • -5
    • +25
    ./similarity/IGenericSimilarityMetric.java
    • -180
    • +106
    ./similarity/SimilarityMetricEditDistance.java
    • -19
    • +4
    ./similarity/SimilarityMetricJaccard.java
  1. … 10 more files in changeset.
Misc Cleanup, SonarQube Fixes

Change-Id: If87126cdd435067a50087e339522a36021fbc2c0

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1108

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <tillw@apache.org>

    • -2
    • +1
    ./recordgroup/RecordGroupLengthCount.java
  1. … 9 more files in changeset.
remove AsterixRuntimeException

Change-Id: Ica9d828bffceabe3b614f68886bc860e34f593b4

Reviewed-on: https://asterix-gerrit.ics.uci.edu/856

Tested-by: Michael Blow <michael.blow@couchbase.com>

Reviewed-by: Michael Blow <michael.blow@couchbase.com>

  1. … 8 more files in changeset.