Clone Tools
  • last updated 28 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
[NO ISSUE][RT] Improve PreclusteredGroupWriter

- user model changes: no

- storage format changes: no

- interface changes: no

Details:

- Modified PreclusteredGroupWriter to only save group fields

from a last tuple in a frame instead of the whole frame

- move PermutingFrameTupleReference and PermutingTupleReference

from 'hyracks-storage-am-common' to 'hyracks-dataflow-common'

Change-Id: Ic75de2e6b64d0aacaf48096ecc9d47fc8e95c9cf

Reviewed-on: https://asterix-gerrit.ics.uci.edu/3351

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Ali Alsuliman <ali.al.solaiman@gmail.com>

  1. … 37 more files in changeset.
[NO ISSUE][STO] Ensure First Component ID is Initialized

- user model changes: no

- storage format changes: no

- interface changes: yes

Details:

- Initialize the component id generator from the primary

index checkpoint, if exits, as soon as it is created.

- Ensure the first component id is passed to all indexes.

Change-Id: I246f9373f950e2f9a2c63f86746462e42a3f1c62

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2948

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

    • -14
    • +11
    ./LSMIOOperationCallbackTest.java
  1. … 13 more files in changeset.
[ASTERIXDB-2444][STO] Avoid Using System Clock in Storage

- user model changes: no

- storage format changes: yes

- interface changes: yes

Details:

- Replace the usage of system clock timestamps in LSM

index components file names by a sequencer. The next

sequence id to use is determined by checking the list

of existing components on disk. Note that due to a

rollback, an index checkpoint file may have last valid

component sequence which is greater than what is on disk.

This should not cause any issues since only components

that have a sequence greater than that appears in the

checkpoint will be deleted.

- Replace the usage of system clock timestamps in LSM

index components ids by a monotonically increasing

sequencer. The sequencer is initialized after restarts

by the last valid component id that appears in the

index checkpoint.

- Refactor the logic to generate flush/merge file names.

- Refactor the logic to check invalid components.

- Adapt test cases to new naming format.

Change-Id: I9dff8ffb38ce8064a199d03b070ed1f5b924b8a4

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2927

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

    • -21
    • +18
    ./LSMIOOperationCallbackTest.java
  1. … 23 more files in changeset.
[NO ISSUE][STO] Misc Storage Fixes and Improvements

- user model changes: no

- storage format changes: no

- interface changes: yes

Details:

- This change introduces some improvements to storage

operations.

- Local RecoveryManager is now extensible.

- Bulk loaders now call the IO callback similar to

Flushes, making them less special and creating a

unified lifecycle for adding an index component.

- As a result, The IndexCheckpointManager doesn't need

to have a special treatment for components loaded

through the bulk load operation.

- Component Id have been added to the index checkpoint

files.

- Cleanup for the code of local recovery for failed flush

operations.

- Ensure that after local recovery of flushes, primary

and secondary indexes have the same index for mutable

memory component.

- The use of WAIT logs to ensure in-flight flushes

are scheduled didn't work as expected. A new log type

WAIT_FOR_FLUSHES was introduced to acheive the expected

behavior.

- The local test framework was made Extensible to support

more use cases.

- Test cases were added for component ids in checkpoint files.

The following scenarios were covered:

- Primary and secondary both have values when a flush is

shceduled.

- Primary have values but not secondary when a flush is

scheduled.

- Primary is empty and an index is created through bulk

load.

- Primary has a single component and secondary is created

through bulk load.

- Primary has multiple components and secondary is created

through bulk load.

- Each primary opTracker now keeps a list of ongoing flushes.

- FlushDataset now waits only for flushes only and

not all io operations.

- Previously, we had many flushes scheduled on open datasets.

This was not detected but after this change, a failure

is thrown in such cases.

- Flush operations dont need to extend the comparable

interface anymore since they are FIFO per index.

Change-Id: If24c9baaac2b79e7d1acf47fa2601767388ce988

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2632

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

  1. … 88 more files in changeset.
[NO ISSUE][STO] Add consistency to flush lifecycle

- user model changes: no

- storage format changes: yes

- renamed AbstractLSMIOOperationCallbackFactory

to LSMIOOperationCallbackFactory

- useless classes have been removed.

- LSMBTreeIOOperationCallbackFactory

- LSMBTreeWithBuddyIOOperationCallbackFactory

- LSMInvertedIndexIOOperationCallbackFactory

- LSMRTreeIOOperationCallbackFactory

- interface changes: yes

Details:

- Previously, flushes have different lifecycle depending

on the memory component state

- not allocated

- allocated

- modified

- In certain cases, flush operations are skipped alltogether

- IO Operation callbacks became complicated and difficult

to maintain since calls are done differently in different

cases.

- In certain cases, afterFinalize is called on the IO

Operation callbacks even if beforeOperation was never

called.

- In this change, flushes go through the same lifecycle

events regardless of the state of the memory component.

- In addition, primary and secondary memory components

would reside in different virtual buffer caches due

to skipped flushes, or due to having the secondary

index created when the primary index's memory component

is residing on the virtual buffer cache with index !=0.

- Moreover, when flushes are lagging and all memory

components are being flushed, search operations assumes

the oldest of the memory component is the newest and

produces incorrect results.

- In addition, in case of a failed flush of a component,

the IO scheduler would skip it and flush the next

component. This would produce a bad state on disk.

- In this change, a failed flush can be retried. otherwise,

all future flushes of the component fail due to the failure

of the previously failed flush.

- Previously, when a component fails to modify an index due

to flush failures, it assumes disk is full.

- With this change, the modification failure reports the

original cause of the failed flush.

Change-Id: I29f7992ec6c0f71c5b63d45800b2fb590d651e4b

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2584

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

Tested-by: Murtadha Hubail <mhubail@apache.org>

    • -315
    • +0
    ./AbstractLSMIOOperationCallbackTest.java
    • -39
    • +0
    ./LSMBTreeIOOperationCallbackTest.java
    • -39
    • +0
    ./LSMBTreeWithBuddyIOOperationCallbackTest.java
    • -0
    • +217
    ./LSMIOOperationCallbackTest.java
    • -39
    • +0
    ./LSMInvertedIndexIOOperationCallbackTest.java
    • -39
    • +0
    ./LSMRTreeIOOperationCallbackTest.java
    • -0
    • +56
    ./TestFlushOperation.java
    • -0
    • +183
    ./TestLSMIndexAccessor.java
    • -12
    • +13
    ./TestLSMIndexOperationContext.java
  1. … 153 more files in changeset.
[ASTERIXDB-1952][TX][IDX] Filter logs pt.2

- user model changes: no

- storage format changes: yes

- interface changes: yes

Details:

- Add a log type specifically for filters

- Only log change when filter actually widens

- Stop logging of index + filter tuple during modification

- Redo index and filter tuples separately via their logs

Change-Id: Ie9e7795d9c8c212e8610dcb9bb5d26ec9fbbee8a

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1857

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Ian Maxon <imaxon@apache.org>

    • -1
    • +25
    ./TestLSMIndexOperationContext.java
  1. … 46 more files in changeset.
[NO ISSUE][STO] Improve the LSMIOOperationCallback interface.

- user model changes: no

- storage format changes: no

- interface changes: yes

+ ILSMIndexOperationContext.getIoOperationType()

+ ILSMIndexOperationContext.getNewComponent()

* before, after, and finalize

calls of ILSMIOOperationCallback now take

ILSMIndexOperationContext as a parameter

Details:

- Before, some calls to ILSMIOOperationCallback

take just an enum LSMIOOperationType, some of them

take an enum and a component object. These sometimes don't

provide enough information to different implementations of

the callback that might be interested in more than that.

- Having the operation context object passed allow for

better exchange of information between different callers

and callees throughout the IO operation.

Change-Id: Ib7120c40a1a2256ed528dfd2e5853db9dba247c6

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2455

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

    • -44
    • +68
    ./AbstractLSMIOOperationCallbackTest.java
    • -0
    • +177
    ./TestLSMIndexOperationContext.java
  1. … 17 more files in changeset.
[ASTERIXDB-2188] Ensure recovery of component ids

- user model changes: no

- storage format changes: yes.

Flush log record format changes.

- interface changes: no

Details:

- Add flush component ids to the flush log record. Upon

seeing a flush log record during recovery, schedule

a flush to all indexes in this partition s.t. LSN>maxDiskLSN

to ensure component ids are properly maintained upon

failed flushes.

- Add a test case to ensure the correctness of the recovery logic

of component ids

Change-Id: I8c1fc2b209cfb9d3dafa216771d2b7032eb99e75

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2408

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

    • -3
    • +5
    ./AbstractLSMIOOperationCallbackTest.java
  1. … 17 more files in changeset.
[NO ISSUE][STO] Introduce Index Checkpoints

- user model changes: no

- storage format changes: yes

- Add index checkpoints.

- Use index checkpoint to determine low watermark

during recovery.

- interface changes: yes

- Introduce IIndexCheckpointManager for managing

indexes checkpoints.

- Introduce IIndexCheckpointProvider for tracking

IIndexCheckpointManager references.

Details:

- Unify LSM flush/merge operations completion order.

- Introduce index checkpoints which contains:

- Index low watermark.

- Latest valid LSM component

- Mapping between master replica and local replica.

- Use index checkpoints instead of LSM component metadata

for identifying low watermark in recovery.

- Use index checkpoints in replication instead of overwriting

LSN byte offset in replica component metadata.

- Replace LSN_MAP used in replication by index checkpoints.

- Replace NIO Files.find by Commons FileUtils.listFiles to

avoid no NoSuchFileException on any file deletion.

Change-Id: Ib22800002bf8ea3660242e599b3f5f20678301a8

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2200

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Till Westmann <tillw@apache.org>

    • -25
    • +47
    ./AbstractLSMIOOperationCallbackTest.java
    • -2
    • +4
    ./LSMBTreeIOOperationCallbackTest.java
    • -2
    • +4
    ./LSMBTreeWithBuddyIOOperationCallbackTest.java
    • -2
    • +4
    ./LSMInvertedIndexIOOperationCallbackTest.java
    • -2
    • +4
    ./LSMRTreeIOOperationCallbackTest.java
  1. … 43 more files in changeset.
[ASTERIXDB-2161] Fix component id manage lifecycle

- user model changes: no

- storage format changes: no

- interface changes: yes. The interface of LMSIOOperationCallback

is changed

Details:

- The current way of management component ids is not correct,

in presence of that multiple partitions sharing the same primary op

tracker. It's possible when a partition is empty/being flushed,

the next flush is scheduled by another partition, which

will disturb the partition. This patch fixes this by

using the same logic of maintaining flushed LSNs to maintain

component id.

- Extend recycle memory component interface to indicate whether it

switches the new component or not.

- Also fixes [ASTERIXDB-2168] to ensure we do not miss latest flushed

LSNs by advancing io callback before finishing flush

Change-Id: Ifc35184c4d431db9af71cab302439e165ee55f54

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2153

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

    • -0
    • +267
    ./AbstractLSMIOOperationCallbackTest.java
    • -60
    • +7
    ./LSMBTreeIOOperationCallbackTest.java
    • -60
    • +7
    ./LSMBTreeWithBuddyIOOperationCallbackTest.java
    • -60
    • +7
    ./LSMInvertedIndexIOOperationCallbackTest.java
    • -60
    • +7
    ./LSMRTreeIOOperationCallbackTest.java
  1. … 10 more files in changeset.
[ASTERIXDB-2115] Add Component Ids to LSM Indexes

- user model changes: no

- storage format changes: no

- interface changes: yes

Details:

- Add LSMComponentId to all LSM components. Component Ids are managed

through IO operation callbacks.

- For memory component, it's ID is reset every time it's recycled.

- For disk component, it's ID is copied from the source component(s)

during flush/merge

- For indexes of a dataset, we need to guarantee all their memory

components should recieve the same ID. This is achieved using a shared

component Id generator.

- Fix memory component recycled callback, make sure it's called only

when we've indeed recycled the memory component

A design wiki for this patch: https://cwiki.apache.org/confluence/display/

ASTERIXDB/Component+Id-based+secondary-to-primary+index+acceleration

Change-Id: I8aec6261a84a0729ce35f4b1cb708be299ddb98d

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2025

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

    • -2
    • +5
    ./LSMBTreeIOOperationCallbackTest.java
    • -2
    • +5
    ./LSMBTreeWithBuddyIOOperationCallbackTest.java
    • -2
    • +5
    ./LSMInvertedIndexIOOperationCallbackTest.java
    • -2
    • +5
    ./LSMRTreeIOOperationCallbackTest.java
  1. … 59 more files in changeset.
[ASTERIXDB-1764][STO] Ensure LOAD follow same lifecycle with merge/flush

- user model changes: no

- storage format changes: no

- interface change: no

Details:

- Ensure ioOperationCallbacks are properly called for bulk loaded

component

- Add Load type to LSMIOOperationType to distinguish bulk loaded

component from flush component

- Change ILSMIOOperationCallback to use LSMIOOperationType instead of

LSMOperationType, because this callback only targets at LSM IO

operaitons

Change-Id: Ib9ecf7292c5dbaf8638d159decc6e6faf79de58b

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2131

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

    • -9
    • +9
    ./LSMBTreeIOOperationCallbackTest.java
    • -9
    • +9
    ./LSMBTreeWithBuddyIOOperationCallbackTest.java
    • -9
    • +9
    ./LSMInvertedIndexIOOperationCallbackTest.java
    • -9
    • +9
    ./LSMRTreeIOOperationCallbackTest.java
  1. … 19 more files in changeset.
[NO ISSUE][STO] Add a callback on recycling of memory components

- user model changes: no

- storage format changes: no

- interface change: yes

- ILSMIOOperationCallbackFactory.createIoOpCallback now takes

the ILSMIndex as a parameter.

- Remove ILSMIOOperationCallback.setNumOfMutableComponents

The callback can find out the number of mutable components

on instantiation since the lsm index is now passed.

- ILSMIOOperationCallback.allocated was added.

It gets called whenever a memory component is allocated.

- ILSMIOOperationCallback.recycled was added.

It gets called whenever a memory component is recycled.

- ILSMIndex.hasMemoryComponent is replaced with

ILSMIndex.getNumberOfMemoryComponents

Change-Id: I578ffd7ef17784034c94f3c0d23cd5094e39f6e0

Reviewed-on: https://asterix-gerrit.ics.uci.edu/2126

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Contrib: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

    • -4
    • +7
    ./LSMBTreeIOOperationCallbackTest.java
    • -4
    • +7
    ./LSMBTreeWithBuddyIOOperationCallbackTest.java
    • -4
    • +7
    ./LSMInvertedIndexIOOperationCallbackTest.java
    • -4
    • +7
    ./LSMRTreeIOOperationCallbackTest.java
  1. … 102 more files in changeset.
ASTERIXDB-1917: FLUSH_LSN for disk components is not correctly set

-Fixed a bug that FLUSH_LSN for flushed disk components is not

correctly set (not increasing) when an NC has multiple partitions.

-Added LSMIOOperationCallback unit tests to cover this bug

Change-Id: If438e34f8f612458d81f618eea04c0c72c49a9fe

Reviewed-on: https://asterix-gerrit.ics.uci.edu/1771

Reviewed-by: abdullah alamoudi <bamousaa@gmail.com>

Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

BAD: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

    • -0
    • +84
    ./LSMBTreeIOOperationCallbackTest.java
    • -0
    • +84
    ./LSMBTreeWithBuddyIOOperationCallbackTest.java
    • -0
    • +84
    ./LSMInvertedIndexIOOperationCallbackTest.java
    • -0
    • +84
    ./LSMRTreeIOOperationCallbackTest.java
  1. … 1 more file in changeset.