Adding more run-time memory allocations from NAHeap This set of changes moves some of the string vector variables in HBase access operators from standard string template to our NAList and NAString (or HbaseStr for row IDs). In the process, allocationis of the objects will be from our HAHeap instead of the system heap. This would help us tracking memory usage and detecting leaks easier.
In addition, a change in ExHbaseAccessTcb::setupListOfColNames() prevents unnecessary allocations to populate the columns list unless it is empty. The Google profiling tools helped us on identifying this problem.
also, removed ExHbaseAccessDeleteTcb operator which was not used.
Bulk unload optimization using snapshot scan resubmitting after facing git issues
The changes consist of: *implementing the snapshot scan optimization in the Trafodion scan operator *changes to the bulk unload changes to use the new snapshot scan. *Changes to scripts and permissions (using ACLS) *Rework based on review
Details: *Snapshot Scan: ---------------------- **Added support for snapshot scan to Trafodion scan **The scan expects the hbase snapshots themselves to be created before running the query. When used with bulk unload the snapshots can created by bulk unload **The snapshot scan implementation can be used without the bulk-unload. To use the snapshot scan outside bulk-unload we need to use the below cqds cqd TRAF_TABLE_SNAPSHOT_SCAN 'on'; -- -- the snapshot name will the table name concatenated with the suffix-string cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'suffix-string'; -- temp dir needed for the hbase snapshotsca cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/'; n **snapshot scan can be used with table scan, index scans etc…
*Bulk unload utility : ------------------------------- **The bulk unload optimization is due the newly added support for snapshot scan. By default bulk unload uses the regular scan. But when snapshot scan is specified it will use snapshot scan instead of regular scan **To use snapshot scan with Bulk unload we need to specify the new options in the bulk unload syntax : NEW|EXISTING SNAPHOT HAVING SUFFIX QUOTED_STRING ***using NEW in the above syntax means the bulk unload tool will create new snapshots while using EXISTING means bulk unload expect the snapshot to exist already. ***The snapshot names are based on the table names in the select statement. The snapshot name needs to start with table name and have a suffix QUOTED-STRING ***For example for “unload with NEW SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utiliy will create a snapshot CAT.SCH.TABLE1_SNAP111; and for “unload with EXISTING SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utility will expect a snapshot CAT.SCH.TABLE1_SNAP111; to be existing already. Otherwise an error is produced. ***If this newly added options is not used in the syntax bulk unload will use the regular scan instead of snapshot scan **The bulk unload queries the explain plan virtual table to get the list of Trafodion tables that will be scanned and based on the case it either creates the snapshots for those tables or verifies if they already exists or not
*Configuration changes -------------------------------- **Enable ACLs in hdfs ** *Testing -------- **All developper regression tests were run and all passed **bulk unload and snapshot scan were tested on the cluster
*Examples: **Example of using snapshot scan without bulk unload: (we need to create the snapshot first )