C++ run-time interface for TMUDFs blueprint cmp-tmudf-compile-time-interface
- Support for C++ run-time interface: - A new language, C++ is added to langman, the existing LanguageManagerC handles both C and C++ - Two new parameter styles got added, C++ and Java object-oriented parameter styles. Routines written in C++ use the new object-oriented C++ parameter style. The compiler interface is only supported for that style (and in the future for the Java object-oriented style). - Also added one more compile time interface, the "completeDescription()" call in the generator. Added logic to extract the UDRPlanInfo of the optimal plan. - Changes to UDRInvocationInfo and UDRPlanInfo classes: - UDRInvocationInfo and UDRPlanInfo objects can now be serialized and they are added to generated plans, as part of the UDR TDB. - Split TableInfo into TupleInfo and TableInfo classes. TupleInfo is now the common base class for describing both parameters and input/output tables. - TypeInfo now has offsets for data, null indicator and varchar indicator. - New get<type> and set<type> methods on class TupleInfo, to be used at compile time for parameters and at runtime for parameters, input and output tables. - Added a "call phase" member, to be able to throw exceptions when certain methods are called at the wrong time (e.g. trying to modify compile time members at runtime). - Routine class in langman now has a new subclass, LmRoutineCppObj and a new method, invokeRoutineMethod, that is used to invoke the object-oriented methods, requiring UDRInvocationInfo and UDRPlanInfo as parameters. - Fixed some executor issues with error handling for UDFs, this is still not very well supported - Emitting the EOD row in the UDF is no longer required, and no longer supported or even possible. - UDRPlanInfo is now part of the physical properties, so that we can extract it from the optimal plan. - Disabling TMUDF as the inner of a nested join - for now. We might support this "routine join" at a later time. - regress/udr/TEST001: - SESSIONIZE_STATIC remains in C, but other TMUDFs are now rewritten in C++ (the runtime part that was not yet in C++) - SESSIONIZE_DYNAMIC is now the same as the example on the wiki - regress/udr/TEST002: Added some tests for event log reader UDF, but can't add the part that copies a sample log file, since in Jenkins, we don't have $MY_SQROOT set. Tried the test on my workstation, though. Steve tells me $MY_SQROOT should be available, so in a future checkin I'll enable this code again. - For patch set 2: Removed fix for LP bug 1420539 and addressed other review comments.
Bulk unload optimization using snapshot scan resubmitting after facing git issues
The changes consist of: *implementing the snapshot scan optimization in the Trafodion scan operator *changes to the bulk unload changes to use the new snapshot scan. *Changes to scripts and permissions (using ACLS) *Rework based on review
Details: *Snapshot Scan: ---------------------- **Added support for snapshot scan to Trafodion scan **The scan expects the hbase snapshots themselves to be created before running the query. When used with bulk unload the snapshots can created by bulk unload **The snapshot scan implementation can be used without the bulk-unload. To use the snapshot scan outside bulk-unload we need to use the below cqds cqd TRAF_TABLE_SNAPSHOT_SCAN 'on'; -- -- the snapshot name will the table name concatenated with the suffix-string cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'suffix-string'; -- temp dir needed for the hbase snapshotsca cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/'; n **snapshot scan can be used with table scan, index scans etc…
*Bulk unload utility : ------------------------------- **The bulk unload optimization is due the newly added support for snapshot scan. By default bulk unload uses the regular scan. But when snapshot scan is specified it will use snapshot scan instead of regular scan **To use snapshot scan with Bulk unload we need to specify the new options in the bulk unload syntax : NEW|EXISTING SNAPHOT HAVING SUFFIX QUOTED_STRING ***using NEW in the above syntax means the bulk unload tool will create new snapshots while using EXISTING means bulk unload expect the snapshot to exist already. ***The snapshot names are based on the table names in the select statement. The snapshot name needs to start with table name and have a suffix QUOTED-STRING ***For example for “unload with NEW SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utiliy will create a snapshot CAT.SCH.TABLE1_SNAP111; and for “unload with EXISTING SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utility will expect a snapshot CAT.SCH.TABLE1_SNAP111; to be existing already. Otherwise an error is produced. ***If this newly added options is not used in the syntax bulk unload will use the regular scan instead of snapshot scan **The bulk unload queries the explain plan virtual table to get the list of Trafodion tables that will be scanned and based on the case it either creates the snapshots for those tables or verifies if they already exists or not
*Configuration changes -------------------------------- **Enable ACLs in hdfs ** *Testing -------- **All developper regression tests were run and all passed **bulk unload and snapshot scan were tested on the cluster
*Examples: **Example of using snapshot scan without bulk unload: (we need to create the snapshot first )
--- SQL operation complete. >>select [first 5] c1,c2 from tt10;
C1 C2 --------------------- --------------------
.00 0 .01 1 .02 2 .03 3 .04 4
--- 5 row(s) selected.
**Example of using snapshot scan with unload: UNLOAD WITH PURGEDATA FROM TARGET NEW SNAPSHOT HAVING SUFFIX 'SNAP778' INTO '/bulkload/unload_TT14_3' select * from seabase.TT20 ;