Clone Tools
  • last updated 11 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
[ASTERIXDB-2555][RT][COMP] Make hash join use logical comparison

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

This patch changes the hash join operator to use the join condition

to evaluate if tuples are equal when joining. Binary physical comparators

have been removed. The join condition evaluator is in TuplePairEvaluator.

- extraced TuplePairEvaluatorFactory out of nested loop join class

into a separate class so that it is shared among nested loop and

hash join.

- switched from FrameTuplePairComparator to ITuplePairComparator in

in OptimizedHybridHashJoin and InMemoryHashJoin.

- moved debugging code from OptimizedHybridHashJoin into a separate

class, JoinUtil.

- temporarily made the logical comparison of multisets use raw binary

comparison instead of returning null until the logic is implemented.

- made IBinaryBooleanInspector a functional interface and updated

the implementations.

- updated record and array test cases to reflect the new

behaviour of hash join where logical comparison could produce null.

Also, updated sorting, group by and distinct test cases since

the input data has been modified.

- added two new input files arrays1nulls.adm & arrays2nulls.adm

to be used by the open dataset. previous arrays1.adm & arrays2.adm

are used by the closed dataset since it cannot accept arrays with

null values.

Change-Id: If1834967fdd913fdc76003f09636b2450d07cd5e

Reviewed-on: https://asterix-gerrit.ics.uci.edu/3387

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

  1. … 48 more files in changeset.
[ASTERIXDB-2554][HYR] Add UTF8 and byte array comparator factories

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

Add comparator factories for UTF8StringPointable, UTF8StringLowercasePointable,

UTF8StringLowercaseTokenPointable and ByteArrayPointable instead of using

PointableBinaryComparatorFactory, a wrapping factory that will create a factory

each time (which also creates a comparator each time).

Change-Id: Ied6a29210a3dc1ba9fd553fb0a67ff4340e4571f

Reviewed-on: https://asterix-gerrit.ics.uci.edu/3355

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Ali Alsuliman <ali.al.solaiman@gmail.com>

  1. … 47 more files in changeset.
[ASTERIXDB-2516][RT] Move primitive comparators to Hyracks and make singleton

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

- moved 2 comparators, boolean and long comparators from asterix to hyracks.

- added byte, short, integer, float and double comparator

factories to Hyracks to replace PointableBinaryComparatorFactory.

- removed checking lengths of 0 from PointableBinaryComparatorFactory.

- changed tests to use the primitive factories.

Change-Id: If15dc4e0dd0db942a4cadb15abbe56cbfe617b48

Reviewed-on: https://asterix-gerrit.ics.uci.edu/3294

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>

  1. … 52 more files in changeset.
[ASTERIXDB-2523][RT][COMP] add support for hashing array fields

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

Add support for hashing array fields.

- Modified AMurmurHash3BinaryHashFunctionFamily and extracted the hashing function

into a private named hashing function "GenericHashFunction". Added hashing arrays.

- Modified hash join to include generating hash functions for the right branch

since now hash functions are type-dependent and cannot use the same hash functions

generated for the left branch.

- Added test cases.

Change-Id: Ibd0dc7f270730140226f54445705822049f5c863

Reviewed-on: https://asterix-gerrit.ics.uci.edu/3241

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <tillw@apache.org>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 54 more files in changeset.
[ASTERIXDB-2516][RT] prepare physical comparators for deep comparison

- user model changes: no

- storage format changes: no

- interface changes: yes

Details:

This change is to make physical comparators type-aware in order to do

deep comparison of complex types like arrays and records. The IAType

is propagated to the comparators.

- added new methods in IBinaryComparatorFactoryProvider to accept the

type of left and right inputs for operations like hash join where

the join key types come from different dataset sources.

- defaulted some arrays functions to use the old comparator behaviour temporarily

until complex comparison is implemented

- modified AObjectAscBinaryComparatorFactory & AObjectDescBinaryComparatorFactory to

create a comparator with IAType information. Changed the serialization/deserialization

of their instances to take care of the newly added fields since they are not

present in old instances.

Change-Id: I02011e7151398d5f5f9ba9c1e1db6518484b9fe5

Reviewed-on: https://asterix-gerrit.ics.uci.edu/3229

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Dmitry Lychagin <dmitry.lychagin@couchbase.com>

Reviewed-by: Till Westmann <tillw@apache.org>

  1. … 24 more files in changeset.
[NO ISSUE][OTH] Fix hyracks-api Dependences

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

- Ensure hyracks-api module depends only on hyracks-util. This way

new APIs can be added to hyracks-api and used on all other modules

without facing cyclic dependency issues.

Change-Id: I7f4329b3dad99c256fb2e10a7863aaca41990ce0

Reviewed-on: https://asterix-gerrit.ics.uci.edu/3047

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <tillw@apache.org>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 45 more files in changeset.
[ASTERIXDB-2256] Reformat sources using code format template

Change-Id: I4faa141c1a8c9700d5e9ac50b839acc9d1eede73

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2310

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

  1. … 982 more files in changeset.
[ASTERIXDB-2149] Enable multiple normalized keys in sort

- user model changes: no

- storage format changes: no

- interface changes: yes. The interface of sort is changed.

Currently, during the (in-memory) sort, we use an int normalized keys to

speed up comparisions by avoiding random memory accesses. However, this

technique is inefficient if the first 4 bytes of the sorting keys are

not distinctive. From performance point of view, it's better to use

longer normalized keys when it's possible (2-3x improvements).

This is enabled by this patch by:

- Allowing multiple normalized keys during sort, and the length of each

normalized key can be longer (multiple integers).

- Enable memory budgeting of pointer directories as well during sort

(but for performance, we still use int[], instead of byte[] from frame).

The next patch will enable the AsterixDB layer to use this feature to

speed up sort performance.

Change-Id: I4354242ff731b4b006b8446b58f65873047dde78

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2127

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

  1. … 29 more files in changeset.
ASTERIXDB-1556, ASTERIXDB-1733: Hash Group By and Hash Join conform to the memory budget

- External Hash Group By and Hash Join now conform to the memory budget (compiler.groupmemory and compiler.joinmemory)

- For Optimzed Hybrid Hash Join, we calculate the expected hash table size when the build phase is done and

try to spill one or more partitions if the freespace can't afford the hash table size.

- For External Hash Group By, the number of hash entries (hash table size) is calculated based on

an estimation of the aggregated tuple size and possible hash values for the given field size in that tuple.

- Garbage Collection feature has been added to SerializableHashTable. For external hash group-by,

whenever we spill a data partition to the disk, we also check the ratio of garbage in the hash table.

If it's greater than the given threshold, we conduct a GC on Hash Table.

Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1056

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

BAD: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 41 more files in changeset.
Continue Cleaning Up File References and Splits

1. Make FileSplit an abstract class with two subclasses;

Managed and Unmanaged. A Managed FileSplit can be mapped

in a new subclass MappedFileSplit that maps a relative path to an

IO device. UnmanagedFileSplit is for files outside the io devices.

2. Remove all usages of absolute paths in file split in test cases. The

only remaining place is the write statement.

3. Fix some of the hidden issues in the tests that were working because

of our use of the absolute paths.

4. Revert the decision of selecting the IO device to the CC.

Change-Id: I166af8f9b3a2257f94d7b05db94888fb7cb4c79e

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1359

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

  1. … 251 more files in changeset.
ASTERIXDB-1736: Remove Grace Hash Join (not being used)

- Removed Grace Hash Join that is not currently being used

since we always use Optimized Hybrid Hash Join.

Change-Id: I16e9e4c73d7851f18a48c2715a6bc5c903b74eba

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1353

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 5 more files in changeset.
Cleanup FileSplit and FileReference

This change gives FileSplit and FileReference specific meaning to

avoid confusion of an absolute vs relative, local vs global, inside

an IO device vs outside IO devices.

In addition, it enables better abstraction of global partitions and

delegate the responsibility of choosing which partition goes to which

IO device to the IO Manager through the introduction of FileDeviceComputer

In details:

Previously, the LocalResource in Hyracks had partition (storage partition)

and there is no such thing in Hyracks. This scope leak is bad. In addition

The local resource had a name and a path. they were always the same and so

the name was removed.

The storage partition was instead moved to asterixdb implementation of the

serialized object in the local resource.

With all of these changes, the cluster controller (compiler) only needs to

know about partitions and relative paths. It doesn't need to worry about

heterogenous Node setups and different io device configurations. For File

assignment to IO devices, a new interface (IFileDeviceComputer) was

introduced which can be overriden by applications to have their own

strategy for distributing files among IO devices.

Change-Id: I4fac508bf9af5a3bed41a3cf4464d2cbfecf2f61

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1352

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

  1. … 284 more files in changeset.
Move Hyracks to subfolder

    • -0
    • +130
    ./client/Common.java
    • -0
    • +207
    ./client/Groupby.java
    • -0
    • +266
    ./client/Join.java
    • -0
    • +169
    ./client/Sort.java
  1. … 4424 more files in changeset.