Clone
Khaled Bouaziz <khaled.bouaziz@hp.com>
committed
on 05 Feb 15
Bulk unload optimization using snapshot scan
resubmitting after facing git issues

The changes consist of:
*implementing the snapsho… Show more
Bulk unload optimization using snapshot scan

resubmitting after facing git issues

   The changes consist of:

   *implementing the snapshot scan optimization in the Trafodion scan operator

   *changes to the bulk unload changes to use the new snapshot scan.

   *Changes to scripts and permissions (using ACLS)

   *Rework based on review

   Details:

   *Snapshot Scan:

   ----------------------

   **Added support for snapshot scan to Trafodion scan

   **The scan expects the hbase snapshots themselves  to be created before running

    the query. When used with bulk unload the snapshots can created by bulk unload

   **The snapshot scan implementation can be used without the bulk-unload. To use

     the snapshot scan outside bulk-unload we need to use the below cqds

      cqd TRAF_TABLE_SNAPSHOT_SCAN 'on'; --

      -- the snapshot name will the table name concatenated with the suffix-string

      cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'suffix-string';

      -- temp dir needed for the hbase snapshotsca

      cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/';  n

   **snapshot scan can be used with table scan, index scans etc…

   *Bulk unload utility :

   -------------------------------

   **The bulk unload optimization is due the newly added support for snapshot scan.

      By default bulk unload uses the regular scan. But when snapshot scan is

      specified it will use snapshot scan  instead of regular scan

   **To use snapshot scan with Bulk unload we need to specify the new options in

     the bulk unload syntax : NEW|EXISTING SNAPHOT HAVING SUFFIX QUOTED_STRING

   ***using NEW  in the above syntax means the bulk unload tool will create  new

      snapshots while using EXISTING  means bulk unload expect the snapshot to

      exist already.

   ***The snapshot names are  based on the table names in the select statement. The

       snapshot name needs to start with table name and have a suffix QUOTED-STRING

   ***For example for “unload with NEW SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’

      select from cat.sch.table1; “ the unload utiliy will create a snapshot

      CAT.SCH.TABLE1_SNAP111; and  for “unload with EXISTING SNAPSHOT HAVING SUFFIX

      ‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utility will

      expect a snapshot CAT.SCH.TABLE1_SNAP111; to be existing already. Otherwise

      an error is produced.

   ***If this  newly added options is not used in the syntax bulk unload will use

      the regular scan instead of snapshot scan

   **The bulk unload queries the explain plan virtual table to get the list of

     Trafodion tables that will be scanned and based on the case it either creates

     the snapshots for those tables or verifies if they already exists or not

   *Configuration changes

   --------------------------------

   **Enable ACLs in hdfs

   **

   *Testing

   --------

   **All developper regression tests were run and all passed

   **bulk unload and snapshot scan were tested on the cluster

   *Examples:

   **Example of using snapshot scan without bulk unload:

     (we need to create the snapshot first )

     >>cqd TRAF_TABLE_SNAPSHOT_SCAN 'on';

     --- SQL operation complete.

     >>cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'SNAP777';

     --- SQL operation complete.

     >>cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/';

     --- SQL operation complete.

     >>select [first 5] c1,c2  from tt10;

     C1                     C2

     ---------------------  --------------------

                       .00                     0

                       .01                     1

                       .02                     2

                       .03                     3

                       .04                     4

     --- 5 row(s) selected.

   **Example of using snapshot scan with unload:

      UNLOAD

      WITH PURGEDATA FROM TARGET

      NEW  SNAPSHOT HAVING  SUFFIX 'SNAP778'

      INTO  '/bulkload/unload_TT14_3' select * from seabase.TT20 ;

Change-Id: Idb1d1807850787c6717ab0aa604dfc9a37f43dce

Show less

default + 9 more