Clone
 

javierjia <jianfeng.jia@gmail.com> in asterixdb

ASTERIXDB-1164 Fix the racing condition in CodePointToStringDescriptor

The following commits from your working branch will be included:

commit 3074d479468f5f1d512e48c03eb209a45e482f2d

Author: JavierJia <jianfeng.jia@gmail.com>

Date: Wed Nov 11 20:40:00 2015 -0800

fix the racing condition in CodePointToStringDescriptor

Change-Id: I7c440731798e2ec8a4f0ab51be06ef7032835193

Reviewed-on: https://asterix-gerrit.ics.uci.edu/484

Reviewed-by: Cameron Samak <csamak@apache.org>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <tillw@apache.org>

ASTERIXDB-1102: VarSize Encoding to store length of String and ByteArray

This patch is to change the encoding format that stores the length value

of

the variable length type (e.g. String, ByteArray) from fix-size encoding

(2bytes) to variable-size encoding ( 1 to 5bytes)

It will solve the issue 1102 to enable us to store a String that longer

than 64K. Also for the common case of storing the short string ( <=

127), it will save one byte per string.

Some important changes include:

1. The UTF8StringSerDer and ByteArraySerDer is not Singleton instance

any more. I need some state to speedup the serialization and avoid the

object creatation. Luckily, 99% percent of Serializer were used as

factory way. The other 1% has been fixed.

A separate Test support, the ExcutionTest now can produce the only.xml

which stores the previous failed runtime test.xml. It can speedup the

debug process.

Change-Id: I41fff780f5c071742ef10129d83c8f945d5886d7

Reviewed-on: https://asterix-gerrit.ics.uci.edu/450

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Jianfeng Jia <jianfeng.jia@gmail.com>

    • -0
    • +1500
    /asterix-app/data/big-object/order.tbl.verylong.big
  1. … 308 more files in changeset.
ASTERIXDB-1102: VarSize Encoding to store length of String and ByteArray

This patch is to change the encoding format that stores the length value of

the variable length type (e.g. String, ByteArray) from fix-size encoding

(2bytes) to variable-size encoding ( 1 to 5bytes)

It will solve the issue 1102 to enable us to store a String that longer

than 64K. Also for the common case of storing the short string ( <=

127), it will save one byte per string.

Some important changes include:

1. Add one hyracks-util package to consolidate all the hyracks

independent utility functions. It will reduce the chances of having

duplicate utils in different packages.

2. Move parts of Asterix string functions down to Hyracks

UTF8StringPointable object, which will benefit the other dependencies,

such as VXQuery.

Change-Id: I7e95df0f06984b784ebac2c84b97e56a50207d27

Reviewed-on: https://asterix-gerrit.ics.uci.edu/449

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Taewoo Kim <wangsaeu@gmail.com>

Reviewed-by: Jianfeng Jia <jianfeng.jia@gmail.com>

    • -1
    • +8
    /algebricks/algebricks-examples/pom.xml
  1. … 109 more files in changeset.
VariableSizeFrame(VSizeFrame) support for Hyracks.

This patch replaced Frame/Accessor/Appender with the new API which

supports BigObject.

The ExternalSorter/TopKSorter/ExternalGroupSorter

have been implemented to support big object.

The Groupby && Join should work with BigObject also. But it will break the

memory budget when it encounter a big object. I will fix the memory

problem later in a separate CR.

The design about the frame allocation is

here:https://docs.google.com/presentation/d/15h9iQf5OYsgGZoQTbGHkj1yS2G9q2fd0s1lDAD1EJq0/edit?usp=sharing

Suggest review order:

Patch 12: It includes all of the sorting operators.

Patch 13: It applys the new IFrame API to all Hyracks codes.

Patch 14: Some bug fixes to pass all Asterix's tests.

Patch 15: Skip it!

Patch 16: Some bug fixes to the Asterix's tests in small frame setting.

Later Patch: address the comments

Change-Id: I2e08692078683f6f2cf17387e39037ad851fc05b

Reviewed-on: https://asterix-gerrit.ics.uci.edu/234

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 205 more files in changeset.
VariableSizeFrame(VSizeFrame) support for Asterix (Runtime Only)

Apply the https://asterix-gerrit.ics.uci.edu/#/c/234/ API changes to

Asterix level.

Change-Id: I5459e877707a1494fc1bebf03d4457a7427e9e0f

Reviewed-on: https://asterix-gerrit.ics.uci.edu/259

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

    • -0
    • +150
    /asterix-app/data/big-object/customer.tbl.big
    • -0
    • +6005
    /asterix-app/data/big-object/lineitem.tbl.big
    • -0
    • +1500
    /asterix-app/data/big-object/order.tbl.big
    • -0
    • +23
    /asterix-app/src/test/resources/runtimets/only.xml
  1. … 29 more files in changeset.
Fix the HashFunction Bug in OptimizedHybridHashJoinOperatorDescriptor

The following commits from your working branch will be included:

The HashFunction used for InMemoryHashJoin is not update with level when

the OptimizedHybridHashJoin switches to InMemoryHashJoin. As the result,

it becomes the NestedLoopJoin after the 2nd round.

This patch is a fix for it.

Change-Id: Id25c85b7fadbb6bb969d0d94a51c60ac2573938e

Reviewed-on: https://asterix-gerrit.ics.uci.edu/285

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Pouria Pirzadeh <pouria.pirzadeh@gmail.com>

Add the Binary data type and corresponding helper functions to Asterix.

The binary data type is implemented as a bytearray. Its storage format

follows the String type which has 2 bytes for length and then store the

bytes contents.

Binary data will take hex("") or base64("") as the constructor method to

passing a hex string or base64 string into Asterix. For output we use

hex("") format.

The parse-[hex|base64](string) function will parse the corresponding hex

or base64 string to binary type. The print-[hex|base64](binary)

functions will print the binary to hex or base64 STRING format.

The sub-binary(binary, offset, [length]) function works the same as

substring(string, offset, [length])

The find-binary(srcbinary, targetbinary, [start-offset]) will find the

position of the targetbinary in the srcbinary.

Change-Id: I5ecf0cc115c44070fb5c1fc5b0ec12a95d4243a4

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/175

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

    • -0
    • +1
    /asterix-app/data/adm-load/binary_type.adm
    • -0
    • +25
    /asterix-app/data/adm-load/usermd5.adm
    • -0
    • +20
    /asterix-app/data/adm-load/usermd5copy.adm
    • -2
    • +17
    /asterix-app/data/nontagged/allData.json
  1. … 113 more files in changeset.
Add ByteArrayPointable datatype.

Change-Id: Iebb5add2363d0f72dcd66ac139339ccf834a9df1

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/174

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>