Clone Tools
  • last updated 16 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Make LSM bulkload append-only and write-once.

Allows for usage of LSM indexes with underlying storage that is append-only.

This also results in a small improvement for LSM component bulk load speed.

- Tree metadata (filters, etc) now lie at the back of the tree file in

append-only mode.

-- Note that you should *not* ever give the append-only flag on bulk-load,

if the tree is ever to be modified in place.

- Append-only operations bypass the buffer cache for writes, but utilize

the buffer cache for memory allocation and reads.

- Addresses ASTERIXDB-1059

Change-Id: I80fb891b5310252143854a336b591bf3f8cd4ba7

Reviewed-on: https://asterix-gerrit.ics.uci.edu/255

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

Reviewed-by: Murtadha Hubail <hubailmor@gmail.com>

  1. … 92 more files in changeset.
ASTERIXDB-1058: Lazy LSM memory components allocation

Change-Id: I476e756f8d71260ea614c8c072fc9503053866c9

Reviewed-on: https://asterix-gerrit.ics.uci.edu/405

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Ian Maxon <imaxon@apache.org>

Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

  1. … 10 more files in changeset.
ASTERIXDB-1102: VarSize Encoding to store length of String and ByteArray

This patch is to change the encoding format that stores the length value of

the variable length type (e.g. String, ByteArray) from fix-size encoding

(2bytes) to variable-size encoding ( 1 to 5bytes)

It will solve the issue 1102 to enable us to store a String that longer

than 64K. Also for the common case of storing the short string ( <=

127), it will save one byte per string.

Some important changes include:

1. Add one hyracks-util package to consolidate all the hyracks

independent utility functions. It will reduce the chances of having

duplicate utils in different packages.

2. Move parts of Asterix string functions down to Hyracks

UTF8StringPointable object, which will benefit the other dependencies,

such as VXQuery.

Change-Id: I7e95df0f06984b784ebac2c84b97e56a50207d27

Reviewed-on: https://asterix-gerrit.ics.uci.edu/449

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Taewoo Kim <wangsaeu@gmail.com>

Reviewed-by: Jianfeng Jia <jianfeng.jia@gmail.com>

  1. … 114 more files in changeset.
Moved LSMOperationType to LSM API package

Change-Id: Ib6f0b7373388fc88605188e5a8089bd183d23af1

Reviewed-on: https://asterix-gerrit.ics.uci.edu/447

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 23 more files in changeset.
ASTERIXDB-139: Add temp workspace files deletion to IOManager

This change includes the following:

- Add a method to delete temp workspace files (WAF)

- Expose LSMComponents files suffixes to Asterix

Change-Id: I760074764755e7aee100ff33c14b13bf4b29ec2e

Reviewed-on: https://asterix-gerrit.ics.uci.edu/337

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 21 more files in changeset.
Change license headers

Change-Id: I98b18f24a20dcd8dc75e828e47fb0ab88179a5be

Reviewed-on: https://asterix-gerrit.ics.uci.edu/386

Reviewed-by: Till Westmann <tillw@apache.org>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 2055 more files in changeset.
Change folder structure for Java repackage

Change only the folders, not the files, for our package name change.

This will break the build, and needs to be followed by a change to

the package name in all of the source files. However performing

the folder move and file change in two steps lets Git understand

that the files are the same, and lets us track revisions across

those files.

Change-Id: I08aff75e25ac7c6298c32cf3402febbc4a318c2a

Reviewed-on: https://asterix-gerrit.ics.uci.edu/307

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Chris Hillery <ceej@lambda.nu>

  1. … 3879 more files in changeset.
Optimized the binary tokenizer - get the total number of tokens Change-Id: Ifa9a18a43a097766da22633bb48371ffc78406ae Reviewed-on: https://asterix-gerrit.ics.uci.edu/348 Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu> Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

Introducing data replication API to LSM indexes

Change-Id: I80565fc9d74e30440d2df5917911904ba8f33c25

Reviewed-on: https://asterix-gerrit.ics.uci.edu/322

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

  1. … 37 more files in changeset.
Change Java package from edu.uci.ics to org.apache

Change-Id: I99172d856e88954b00cf7cfb24d33bb400f53994

Reviewed-on: https://asterix-gerrit.ics.uci.edu/308

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <tillw@apache.org>

  1. … 2019 more files in changeset.
VariableSizeFrame(VSizeFrame) support for Hyracks.

This patch replaced Frame/Accessor/Appender with the new API which

supports BigObject.

The ExternalSorter/TopKSorter/ExternalGroupSorter

have been implemented to support big object.

The Groupby && Join should work with BigObject also. But it will break the

memory budget when it encounter a big object. I will fix the memory

problem later in a separate CR.

The design about the frame allocation is

here:https://docs.google.com/presentation/d/15h9iQf5OYsgGZoQTbGHkj1yS2G9q2fd0s1lDAD1EJq0/edit?usp=sharing

Suggest review order:

Patch 12: It includes all of the sorting operators.

Patch 13: It applys the new IFrame API to all Hyracks codes.

Patch 14: Some bug fixes to pass all Asterix's tests.

Patch 15: Skip it!

Patch 16: Some bug fixes to the Asterix's tests in small frame setting.

Later Patch: address the comments

Change-Id: I2e08692078683f6f2cf17387e39037ad851fc05b

Reviewed-on: https://asterix-gerrit.ics.uci.edu/234

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 212 more files in changeset.
Algebricks fix for issue 873.

Change-Id: I78a4a30638d6cc5681b5410046fff6345b515291

Reviewed-on: https://asterix-gerrit.ics.uci.edu/266

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Wenhai Li <lwhaymail@yahoo.com>

Reviewed-by: Ildar Absalyamov <ildar.absalyamov@gmail.com>

  1. … 4 more files in changeset.
1. Fix the "writerCount!=0 during component flushing" issue 2. Fix the duplicate LSM disk component file name issue by avoiding duplicate timestamps for different components.

Note that this change includes https://asterix-gerrit.ics.uci.edu/#/c/268/.

Change-Id: I805eab33603f52e19a1b76f1c315f9b75b6e3c03

Reviewed-on: https://asterix-gerrit.ics.uci.edu/278

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Murtadha Hubail <hubailmor@gmail.com>

Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

  1. … 17 more files in changeset.
Add a flag for LSM-based indices to indicate whether force pages to disk devices during flush and merge.

Change-Id: I988716c03cffe30b008e144d3a478ee25e367212

Reviewed-on: https://asterix-gerrit.ics.uci.edu/240

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

  1. … 73 more files in changeset.
avoid duplication of Pointable code in SerializerDeserializer

Change-Id: Ia98985fc994e48d7d6a37dfaade0178b6644d836

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/221

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 38 more files in changeset.
- Fixed Type Casting issue - Reorganized duplicated internal class in the DelimitedDataParser and DelimitedDataParserFactory - Prevented a user from creating an inverted index on a dataset with a variable-length PK

Change-Id: Ic5606501223b8d860b49a258ff49afacd7d76b9a

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/191

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <westmann@gmail.com>

  1. … 65 more files in changeset.
Fix for issue 771 - removed FieldLengthIgnoring comparators and added field length check in Pointable comparator

Change-Id: Icac725fb54db21f1aa37ae0db545fbdebb651b14

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/200

Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 7 more files in changeset.
- Added Tokenize Operator in addition to the bulkload operator changes that were made by Zachary Heilbron. The tokenize operator is only added to the logical plan when bulk-loading the data. - Each secondary index is now updated in the separate branch by using the replicate operator. - Sink Operator now accepts multiple inputs. - Fixed the bulk-load so that it correctly produces auto-generated PK.

Change-Id: Ifb591754dba5eb4a9207edaa4e658f4cc745893a

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/78

Reviewed-by: Young-Seok Kim <kisskys@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 52 more files in changeset.
Changes to allow having the no-merge policy as an option in asterix.

Change-Id: I573b6a09185d51df1ec115edc38a89bd029574d5

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/107

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 19 more files in changeset.
Several bug fixes in HHJ, NLJ, and tokenizer

- in HHJ handle the case when it spills and skipInMemoryHJ is set to false,

- check for memsize in NLJ and correctly set memsize in HHJ,

- make counthashed-ngram-token() to skip the bits for length & type

Change-Id: I908345f993019b0bfd0ac0bcb3e497a42295b623

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/96

Reviewed-by: Pouria Pirzadeh <pouria.pirzadeh@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 2 more files in changeset.
Added LSM component-level filters for all indexes.

Change-Id: I898cf885c9f88feae85c99799a00fd8ec036efea

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/81

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Yingyi Bu <buyingyi@gmail.com>

  1. … 117 more files in changeset.
Adding external indexes

In Hyracks side, this change include the following:

1. The addition of three indexes:

a) external b-tree index

b) external r-tree index

c) external b-tree with buddy b-tree index

2. creating an additional logical operator in algebricks for performing lookup operations over external data and modify the different visitors to work with this operator

3. Added copyright header to all new files

Change-Id: Iecfbd86f06aff3caaf3a9652b63420666745ebb9

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/69

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Zachary Heilbron <zheilbron@gmail.com>

Reviewed-by: Sattam Alsubaiee <salsubaiee@gmail.com>

Reviewed-by: Ian Maxon <imaxon@uci.edu>

  1. … 63 more files in changeset.
fixed issue 731, 740, and more

commit 8911cc529e72e2bb544d9b472d6e10f173d173af

Author: Young-Seok <kisskys@gmail.com>

Date: Sun May 18 11:28:28 2014 -0700

another fix for picking available index for leftouterjoin plan

commit 9bce43087615fee53613467a027833dd53e190f9

Merge: c8e85ac efab69f

Author: Young-Seok <kisskys@gmail.com>

Date: Sun May 11 22:22:10 2014 -0700

merged master to kisskys/left-outer-join-issue branch

commit c8e85aca31545c13b2a02ff6dc259943e2cf66ad

Author: Young-Seok <kisskys@gmail.com>

Date: Sun May 11 20:17:17 2014 -0700

changes for left-outer-join to pick available indexes

Change-Id: Ib0fc186bc9388802f95445edee92c428b3bb69cc

Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/34

Reviewed-by: Inci Cetindil <icetindil@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

  1. … 50 more files in changeset.
making edit-distance-contains() work with lists

addressed code review comments

Various bug and performance fixes for the lsm indexes.

  1. … 8 more files in changeset.
added a new search modifier for fuzzy contains queries

Fixed inverted index bulkLoading issue by copying missing data from lastTupleBuilder to lastTuple

fix merge issue 131 in LSM R-Tree and LSM Inverted Index

  1. … 1 more file in changeset.
Fixing Methods signature

  1. … 3 more files in changeset.