Clone
 

khaled bouaziz <khaled.bouaziz@hp.com> in Trafodion

Fast Transport fix

fix for LP#1444575.

This checkin addresses an issue with size of the buffer where

we get the row before conveting to delimited format. The

buffer in this case is a single row buffer.

Change-Id: I33ad4bb0a5f2f84b8f56983b76b1b9ba73c9f6f6

hive/test002 fix

This fix is related to "additional changes to support ALIGNED row format."

OFFSET_SIZE in ExpAlignedFormat.h changed from 2 to 4 bytes.

Change-Id: I941a692e602f103f4234a7a8b5d3a8e9a24ad739

hive/test018 fix

- The "hadoop fs - du -s ..." command returns diffrent results on

development workstation and the test machines making the test

fail in the development environment. This checkin fixes the issue.

Change-Id: I6925d719d02e369235fc0ff30aab4e3ce108dfd5

hive/test018 fixes

Disabling select from hive table emporarily to avoid the hang issue

(there is an LP 1436333 for the hang issue) and let the test finish.

Change-Id: Idfe5b1a980a169b4b7da8db18717ada0a23eb105

hive/test015 fixes:

-Disabling upsert using load temporarily to avoid the hang issue.

(there is an LP 1417337 for the hang issue)

-Reducing the amount of data to make the test run faster.

Change-Id: I403942f8014e22f9c391a45773afd8a03ccabbc1

    • -209
    • +129
    /sql/regress/hive/EXPECTED015
adding backup and restore paths to package

Backup and restore scripts are not being packaged. this delivery is to fix the

packaging issue.

Change-Id: I43641f4bab618ef4822d50671668f008e3c1bf3c

fix in the tdb to return the correct name for bulk load

fix so that the correct name for bulk load preparation phase is

returned in the output

Change-Id: I8e93c9b5bb953c647f2b669ea966e5b9db5db434

Snapshot Scan changes

The changes in this delivery include:

-decoupling the snapshot scan from the bulk unload feature. Setup of the

temporary space and folders before running the query and cleanup afterwards

used to be done by the bulk unload operator because snapshot scan was specific

to bulk unload. In order the make snapshot scan indepenednt from bulk unload

and use it in any query the setup and cleanup tasks are now done by the query

itself at run time (the scan and root operators).

-caching of the snapshot information in NATable to optimize compilation time

Rework for chaching: when the user sets TRAF_TABLE_SNAPSHOT_SCAN to LATEST

we flush the metadata and then we set the caching back to on so that metadata

get cached again. If newer snapshots are created after setting the cqd they

won't be seen if they are already cached unless the user issue a command/cqd

to invalidate or flush the cache. One way for doing that can be to issue

"cqd TRAF_TABLE_SNAPSHOT_SCAN 'latest';" again

-code cleanup

below is a description of the CQds used with snapshot scan:

TRAF_TABLE_SNAPSHOT_SCAN

this CQD can be set to :

NONE--> (default)Snapshot scan is disabled and regular scan is used ,

SUFFIX --> Snapshot scan is enabled for the bulk unload (bulk unload

behavior is not changed)

LATEST --> Snapshot Scan is enabled independently from bulk unload and

the latest snapshot is used if it exists. If no snapshot exists

the regular scan is used. For this phase of the project the user

needs to create the snapshots using hbase shell or other tools.

And in the next phase of the project new comands to create,

delete and manage snapshots will be add.

TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX

This CQD is used with bulk unload and its value is used to build the

snapshot name as the table name followed by the suffix string

TRAF_TABLE_SNAPSHOT_SCAN_TABLE_SIZE_THRESHOLD

When the estimated table size is below the threshold (in MBs) defined by

this CQD the regular scan is used instead of snapshot scan. This CQD

does not apply to bulk unload which maintains the old behavior

TRAF_TABLE_SNAPSHOT_SCAN_TIMEOUT

The timeout beyond which we give up trying to create the snapshot scanner

TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION

Location for temporary links and files produced by snapshot scan

Change-Id: Ifede88bdf36049bac8452a7522b413fac2205251

    • -13
    • +37
    /sql/comexe/ComTdbHbaseAccess.cpp
    • -1
    • +107
    /sql/executor/HBaseClient_JNI.cpp
    • -44
    • +1
    /sql/executor/SequenceFileReader.cpp
  1. … 30 more files in changeset.
Full backup and restore utilities

Shell scripts to perform full offline backup and restore

of all trafodion tables including meta data tables. Trafodion

needs to be shutdown before doing the backup or restore operations.

The Backup operation takes snapshots of all the trafodion tables

and exports them using the ExportSnapshot MapReduce job to a HDFS

location. The restore operations also use the ExportSnapshot

MapReduce job to import the snapshots back from the HDFS location

and the restore the tables.

The backup and restore were tested on clusters with Cloudera and

Horton Works distributions. it was also tested on the development

workstations

The Blueprint can be found at:

https://blueprints.launchpad.net/trafodion/+spec/full-backup-restore

Change-Id: I120a34eb4eb94e286577b4e6dc529ca528f0b846

Bulk Unload and Snapshot Scan fix

hive/test018 is failing in the test environment and it

looks like it s a classpath issue. The issue does

happen on the clusters or dev workstations though.

This fix is specific to the cloudera distribution.

Other distributions will be addressed later

Change-Id: I72ba8e8b85c9b6f0b5da96e55701d6f638154aaa

Bulk unload and snapshot scan

+ adding testware files that were not delivered in t he first checkin

Change-Id: I87c3592cb34a12dc3e30a58c74e45582749b1807

    • -0
    • +24
    /sql/regress/hive/TEST018_create_hbase_objects.hbase
    • -0
    • +24
    /sql/regress/hive/TEST018_drop_hbase_objects.hbase
    • -0
    • +29
    /sql/regress/tools/regrhbase.ksh
Bulk unload optimization using snapshot scan

resubmitting after facing git issues

The changes consist of:

*implementing the snapshot scan optimization in the Trafodion scan operator

*changes to the bulk unload changes to use the new snapshot scan.

*Changes to scripts and permissions (using ACLS)

*Rework based on review

Details:

*Snapshot Scan:

----------------------

**Added support for snapshot scan to Trafodion scan

**The scan expects the hbase snapshots themselves to be created before running

the query. When used with bulk unload the snapshots can created by bulk unload

**The snapshot scan implementation can be used without the bulk-unload. To use

the snapshot scan outside bulk-unload we need to use the below cqds

cqd TRAF_TABLE_SNAPSHOT_SCAN 'on'; --

-- the snapshot name will the table name concatenated with the suffix-string

cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'suffix-string';

-- temp dir needed for the hbase snapshotsca

cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/'; n

**snapshot scan can be used with table scan, index scans etc…

*Bulk unload utility :

-------------------------------

**The bulk unload optimization is due the newly added support for snapshot scan.

By default bulk unload uses the regular scan. But when snapshot scan is

specified it will use snapshot scan instead of regular scan

**To use snapshot scan with Bulk unload we need to specify the new options in

the bulk unload syntax : NEW|EXISTING SNAPHOT HAVING SUFFIX QUOTED_STRING

***using NEW in the above syntax means the bulk unload tool will create new

snapshots while using EXISTING means bulk unload expect the snapshot to

exist already.

***The snapshot names are based on the table names in the select statement. The

snapshot name needs to start with table name and have a suffix QUOTED-STRING

***For example for “unload with NEW SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’

select from cat.sch.table1; “ the unload utiliy will create a snapshot

CAT.SCH.TABLE1_SNAP111; and for “unload with EXISTING SNAPSHOT HAVING SUFFIX

‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utility will

expect a snapshot CAT.SCH.TABLE1_SNAP111; to be existing already. Otherwise

an error is produced.

***If this newly added options is not used in the syntax bulk unload will use

the regular scan instead of snapshot scan

**The bulk unload queries the explain plan virtual table to get the list of

Trafodion tables that will be scanned and based on the case it either creates

the snapshots for those tables or verifies if they already exists or not

*Configuration changes

--------------------------------

**Enable ACLs in hdfs

**

*Testing

--------

**All developper regression tests were run and all passed

**bulk unload and snapshot scan were tested on the cluster

*Examples:

**Example of using snapshot scan without bulk unload:

(we need to create the snapshot first )

>>cqd TRAF_TABLE_SNAPSHOT_SCAN 'on';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'SNAP777';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/';

--- SQL operation complete.

>>select [first 5] c1,c2 from tt10;

C1 C2

--------------------- --------------------

.00 0

.01 1

.02 2

.03 3

.04 4

--- 5 row(s) selected.

**Example of using snapshot scan with unload:

UNLOAD

WITH PURGEDATA FROM TARGET

NEW SNAPSHOT HAVING SUFFIX 'SNAP778'

INTO '/bulkload/unload_TT14_3' select * from seabase.TT20 ;

Change-Id: Idb1d1807850787c6717ab0aa604dfc9a37f43dce

    • -187
    • +528
    /sql/executor/ExExeUtilLoad.cpp
    • -11
    • +39
    /sql/executor/HBaseClient_JNI.cpp
    • -102
    • +266
    /sql/executor/HTableClient.java
    • -10
    • +282
    /sql/executor/SequenceFileReader.cpp
  1. … 21 more files in changeset.
Bug fixes fir bulk unload

- 1398981 unload with union query fails with ERROR[4066]

- 1392481 UNLOAD query involves ORDER-BY returns ERORR[4001]

- By addressing these 2 issues the parser problem we faced before

where we could not use query_expression with UNLOAD is now

also fixed without any additinal conflicts

Change-Id: Ia3a724ac21f77e596a4db559781de5146fe208c6

fix for hive/test018

Query caching seems to cause hive/test018 to fail

disabling it for now till the query caching

issue is fixed

bug 1396386

Change-Id: Ife834050ce8f7d7b968488f5a5ff3f189ac2b666

fix for bug 1391643

-fix for bug 1391643

-rework based on preliminary review by Mike

Change-Id: Icbc4dd6a3ee71c228c2006017030b71508fa0b6f

fixes for bulk unload issues

- fix for bug 1391641. (unload WITH DELIMITER 255 returns ERROR[15001]).

- fix for bug 1387377. (unload with [delimiter | record_separator] 0 should err).

- fix bug 1389857. (trafci unload reports incorrect number of rows unloaded).

Change-Id: I3599728426464e81178c5904563b68aa78502a0b

Bulk unload fixes and rework

- rework

- fix for bug 1387377

Change-Id: I7ad6115ab50f291e2ad97a042ec2b8fbc9d256bf

Bulk load/unload fixes

- changes to sqenvcom.sh to support native compressions

for Clouder and Hortonworks distributions (tested on

cluster)

- rework from provious checkin.

- fix for bug 1387202 which cause bulk unload to hang

when target loaction is invalid.

Change-Id: Ia6046dfb2b5ff2f986b8306c26a991991a3da780

Bulk Load fixes

- fix for bug 1383849 . Releasing the bulk load objects once load is done.

- bulk load now uses CIF by default. This does not apply to populating

indexes using bulk load.

- fix for hive/test015 so it does not fail on the test machines

Change-Id: Iaafe8de8eb60352b0d4c644e9da0d84a4068688c

    • -0
    • +10
    /sql/sqlcomp/CmpSeabaseDDLindex.cpp
Fix for LP bug 1376306

- With this fix bulk loading salted tables and indexes now generates parallel

plans. Both salted base tables and salted indexes were tested

- if attemp_esp_parallelism cqd is set to off an error is returned

- also removed unneeded variables from sqenvcom.sh

Change-Id: I2a85d902070a4f35e3fe54b426a4277afaa60399

Bulk Unload feature

Blueprint can be found at:

https://blueprints.launchpad.net/trafodion/+spec/bulkunload

Change-Id: I395bd720e8952db0fcd04cb26cccab4d4877eae1

    • -650
    • +571
    /sql/executor/ExFastTransport.cpp
    • -349
    • +0
    /sql/executor/ExFastTransportIO.cpp
    • -358
    • +0
    /sql/executor/ExFastTransportIO.h
    • -25
    • +59
    /sql/executor/SequenceFileReader.cpp
  1. … 23 more files in changeset.
set TRAF_LOAD_USE_FOR_STATS back to OFF

- this CQD seems to be set to ON during the .98 merge by mistake

Change-Id: I7d2bf99598a471b4c48dcc40bc063e610c763987

removing quasi secure mode from bulk load

in HBase .98 permissions on the /bulkload will controlled

using Access Control Lists (ACL). Quasi secure was introdiced

because HBase .94 did not have ACLs.

Change-Id: I1427d6e7ab639417010875b10a01b60696790dee

    • -83
    • +0
    /sql/executor/TrafBulkLoadClient.java
making update statistics use upsert by default

Change-Id: I20d3d9658e755be3b3808627bfc6996658d00527

Porting TESTTOK2 to trafodion

Change-Id: Ib1bd3a1f5d612cf6a5c718b733edf503cb095852

    • -0
    • +44
    /sql/regress/compGeneral/EXPECTEDTOK2.LINUX
    • -0
    • +28
    /sql/regress/compGeneral/FILTERTOK2
    • -0
    • +18
    /sql/regress/compGeneral/TESTTOK2
    • -0
    • +48
    /sql/regress/compGeneral/TESTTOK2.sh
Bulk Extract fixes

Change-Id: Ic2d6369c0dfc61cd647715befa810c0442124d91

Bulk Unload (prototype) fixes

fixes for the parser and other issues

Change-Id: If0e3a508bf45c9b34d84083e4fb3906734b5db73

    • -8
    • +11
    /sql/executor/SequenceFileReader.cpp
Bulk Unload/extract (prototype)

These changes add support for a bulk unload/extract (prototype) to unload data

from trafodion tables into HDFS. Bulk unload unloads data in either

compressed (gzip) or uncompressed formats. When specified, compression take

place before writing data bufers to files. Once the data unloading is done

the files are merged into one single file. If compression is specified

the data are unloaded into files in compressed format and then merged in

compressed format. Otherwise they are unloaded in uncompressed format

and merged into one uncompressed file.

The unload syntax is:

UNLOAD [[WITH option] [, option] ...] INTO <hive-table> query_expression

Where

*<hive-table> is a hive table.

and

*option can be:

- PURGEDATA FROM TARGET: When this option specfied the files under the

<hive-table> are deleted

- COMPRESSION GZIP: when this option is specfied the Gzip compression is

used. the compression takes place in the hive-insert node and data

is written to disk in compressed format (Gzip i the only supported

compression for now)

- MERGE FILE <merged_file-path>: When this option is specified the files

unloaded are merged into one single file <merged-file-path>. if

compressiion is specied the data unloaded in compressed format

and the merged file will be in compressed formt also.

- NO OUTPUT: If this option is specified then no status messge is dispalyed

Change-Id: Ifd6f543d867f29ee752bcb80020c5ad6c16b7277

    • -82
    • +139
    /sql/executor/ExFastTransport.cpp
    • -0
    • +218
    /sql/executor/SequenceFileReader.cpp
  1. … 16 more files in changeset.
Bulk load and other changes

changes include:

- Setting the LIBHDFS_OPTS variable to Xmx2048m to limit the

amout of virtual memory used when reading hive tables to 2GB.

- Making update statistics use bulk load by default

- Changing bulk load to return the number of rows loaded

- fix for bug 1359872 where create index hangs

Change-Id: Ic8e36cfef43ed2ce7c2c2469c1b9c315a761ee31

Removing checks on indexes and constraints for uspert using load

Upsert using load can handle indexes and constraints. No need

to produce an error when the target table has indexes or

constraints

bug 1359316

Change-Id: I8ed03512fc59e3670fd3e962e6be572f811466cd