Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Changes in Patchset2

Fixed issues found during review.

Most of the changes are related to disbling this change for unique indexes.

When a unique index is found, they alone are disabled during the load.

Other indexes are online and are handled as described below. Once the base

table and regular indexes have been loaded, unique indexes are loaded from

scratch using a new command "populate all unique indexes on <tab-name>".

A simlilar command "alter table <tab-name> disable all unique indexes"

is used to disable all unique indexes on a table at the start of load.

Cqd change setting allow_incompatible_assignment is unrelated and fixes an

issue related to loading timestamp types from hive.

Odb change gets rid of minor warnings.

Thanks to all three reviewers for their helpful comments.

-----------------------------------

Adding support for incremental index maintenance during bulk load.

Previously when bulk loading into a table with indexes, the indexes are first

disabled, base table is loaded and then the indexes are populated from

scratch one by one. This could take a long time when the table has significant

data prior to the load.

Using a design by Hans this change allows indexes to be loaded in the same

query tree as the base table. The query tree looks like this

Root

|

NestedJoin

/ \

Sort Traf_load_prep (into index1)

|

Exchange

|

NestedJoin

/ \

Sort Traf_load_prep (i.e. bulk insert) (into base table)

|

Exchange

|

Hive scan

This design and change set allows multiple indexes to be on the same tree.

Only one index is shown here for simplicity. LOAD CLEANUP and LOAD COMPLETE

statements also now perform these tasks for the base table along with all

enabled indexes

This change is enabled by default. If a table has indexes it will be

incrementally maintained during bulk load.

The WITH NO POPULATE INDEX option has been removed

A new option WITH REBUILD INDEXES has been added. With this option we get

the old behaviour of disabling all indexes before load into table and

then populate all of them from scratch.

Change-Id: Ib5491649e753b81e573d96dfe438c2cf8481ceca

  1. … 32 more files in changeset.
Enabling Bulk load and Hive Scan error logging/skip feature

Also Fixed the hanging issue with Hive scan (ExHdfsScan operator) when there

is an error in data conversion.

ExHbaseAccessBulkLoadPrepSQTcb was not releasing all the resources when there

is an error or when the last buffer had some rows.

Error logging/skip feature can be enabled in

hive scan using CQDs and in bulk load using the command line options.

For Hive Scan

CQD TRAF_LOAD_CONTINUE_ON_ERROR ‘ON’ to skip errors

CQD TRAF_LOAD_LOG_ERROR_ROWS ‘ON’ to log the error rows in Hdfs files.

For Bulk load

LOAD WITH CONTINUE ON ERROR [TO <location>] – to skip error rows

LOAD WITH LOG ERROR ROWS – to log the error rows in hdfs files.

The default parent error logging directory in hdfs is /bulkload/logs. The error

rows are logged in subdirectory ERR_<date>_<time>. A separate hdfs file is

created for every process/operator involved in the bulk load in this directory.

Error rows in hive scan are logged in

<sourceHiveTableName>_hive_scan_err_<inst_id>

Error rows in bulk upsert are logged in

<destTrafTableName>_traf_upsert_err_<inst_id>

Bulk load can also aborted after a certain number of error rows are seen using

LOAD WITH LOG ERROR ROWS, STOP AFTER <n> ERROR ROWS option

Change-Id: Ief44ebb9ff74b0cef2587705158094165fca07d3

  1. … 32 more files in changeset.
Expected file change for failing hive test.

A previous checkin that enhanced an error message and caused this diff. So

just had to update the expected file.

Change-Id: I1cc5133e7bf971b98fc6e79f7cad05553eed6b46

(cherry picked from commit 233d4ca7bf16d5bb629be069e87a2f06e5956c02)

Expected file change for failing hive test.

A previous checkin that enhanced an error message and caused this diff. So

just had to update the expected file.

Change-Id: I1cc5133e7bf971b98fc6e79f7cad05553eed6b46

hive/test002 fix

This fix is related to "additional changes to support ALIGNED row format."

OFFSET_SIZE in ExpAlignedFormat.h changed from 2 to 4 bytes.

Change-Id: I941a692e602f103f4234a7a8b5d3a8e9a24ad739

hive/test018 fix

- The "hadoop fs - du -s ..." command returns diffrent results on

development workstation and the test machines making the test

fail in the development environment. This checkin fixes the issue.

Change-Id: I6925d719d02e369235fc0ff30aab4e3ce108dfd5

hive/test018 fixes

Disabling select from hive table emporarily to avoid the hang issue

(there is an LP 1436333 for the hang issue) and let the test finish.

Change-Id: Idfe5b1a980a169b4b7da8db18717ada0a23eb105

hive/test015 fixes:

-Disabling upsert using load temporarily to avoid the hang issue.

(there is an LP 1417337 for the hang issue)

-Reducing the amount of data to make the test run faster.

Change-Id: I403942f8014e22f9c391a45773afd8a03ccabbc1

Trafodion Metadata Cleanup command support.

Various changes to support cleanup command has been added.

A separate external spec contains the details.

Summary of syntax:

cleanup [ table t | index i | sequence s | object o] [, uid <value>]

cleanup [private | shared] schema sch

cleanup uid <value>

cleanup metadata, check, return details

In addition, a new command to get names of various hbase objects

has also been added:

get [ all | user | system | external ] hbase objects;

Change-Id: I93f1f45e7fd78091bacd7c9f166420edd7c1abee

  1. … 78 more files in changeset.
Snapshot Scan changes

The changes in this delivery include:

-decoupling the snapshot scan from the bulk unload feature. Setup of the

temporary space and folders before running the query and cleanup afterwards

used to be done by the bulk unload operator because snapshot scan was specific

to bulk unload. In order the make snapshot scan indepenednt from bulk unload

and use it in any query the setup and cleanup tasks are now done by the query

itself at run time (the scan and root operators).

-caching of the snapshot information in NATable to optimize compilation time

Rework for chaching: when the user sets TRAF_TABLE_SNAPSHOT_SCAN to LATEST

we flush the metadata and then we set the caching back to on so that metadata

get cached again. If newer snapshots are created after setting the cqd they

won't be seen if they are already cached unless the user issue a command/cqd

to invalidate or flush the cache. One way for doing that can be to issue

"cqd TRAF_TABLE_SNAPSHOT_SCAN 'latest';" again

-code cleanup

below is a description of the CQds used with snapshot scan:

TRAF_TABLE_SNAPSHOT_SCAN

this CQD can be set to :

NONE--> (default)Snapshot scan is disabled and regular scan is used ,

SUFFIX --> Snapshot scan is enabled for the bulk unload (bulk unload

behavior is not changed)

LATEST --> Snapshot Scan is enabled independently from bulk unload and

the latest snapshot is used if it exists. If no snapshot exists

the regular scan is used. For this phase of the project the user

needs to create the snapshots using hbase shell or other tools.

And in the next phase of the project new comands to create,

delete and manage snapshots will be add.

TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX

This CQD is used with bulk unload and its value is used to build the

snapshot name as the table name followed by the suffix string

TRAF_TABLE_SNAPSHOT_SCAN_TABLE_SIZE_THRESHOLD

When the estimated table size is below the threshold (in MBs) defined by

this CQD the regular scan is used instead of snapshot scan. This CQD

does not apply to bulk unload which maintains the old behavior

TRAF_TABLE_SNAPSHOT_SCAN_TIMEOUT

The timeout beyond which we give up trying to create the snapshot scanner

TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION

Location for temporary links and files produced by snapshot scan

Change-Id: Ifede88bdf36049bac8452a7522b413fac2205251

  1. … 40 more files in changeset.
Cleanup hive reader thread when canceling scan

Cancel processing, whether triggered by [FIRST N], error

handling, SQL cancel, or any other reason, can cause the

main executor thread to abruptly stop interacting with

the reader threads. This change fixes a hang caused by

a reader thread waiting for the main thread to give it

an empty buffer, after the main thread has finished the

canceled query.

Change-Id: Ib41a8a0036b7aab8dedf7d10ee55eb2007f7c265

Closes-Bug: #1425661

  1. … 3 more files in changeset.
Bulk unload and snapshot scan

+ adding testware files that were not delivered in t he first checkin

Change-Id: I87c3592cb34a12dc3e30a58c74e45582749b1807

    • -0
    • +24
    ./TEST018_create_hbase_objects.hbase
    • -0
    • +24
    ./TEST018_drop_hbase_objects.hbase
  1. … 1 more file in changeset.
Bulk unload optimization using snapshot scan

resubmitting after facing git issues

The changes consist of:

*implementing the snapshot scan optimization in the Trafodion scan operator

*changes to the bulk unload changes to use the new snapshot scan.

*Changes to scripts and permissions (using ACLS)

*Rework based on review

Details:

*Snapshot Scan:

----------------------

**Added support for snapshot scan to Trafodion scan

**The scan expects the hbase snapshots themselves to be created before running

the query. When used with bulk unload the snapshots can created by bulk unload

**The snapshot scan implementation can be used without the bulk-unload. To use

the snapshot scan outside bulk-unload we need to use the below cqds

cqd TRAF_TABLE_SNAPSHOT_SCAN 'on'; --

-- the snapshot name will the table name concatenated with the suffix-string

cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'suffix-string';

-- temp dir needed for the hbase snapshotsca

cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/'; n

**snapshot scan can be used with table scan, index scans etc…

*Bulk unload utility :

-------------------------------

**The bulk unload optimization is due the newly added support for snapshot scan.

By default bulk unload uses the regular scan. But when snapshot scan is

specified it will use snapshot scan instead of regular scan

**To use snapshot scan with Bulk unload we need to specify the new options in

the bulk unload syntax : NEW|EXISTING SNAPHOT HAVING SUFFIX QUOTED_STRING

***using NEW in the above syntax means the bulk unload tool will create new

snapshots while using EXISTING means bulk unload expect the snapshot to

exist already.

***The snapshot names are based on the table names in the select statement. The

snapshot name needs to start with table name and have a suffix QUOTED-STRING

***For example for “unload with NEW SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’

select from cat.sch.table1; “ the unload utiliy will create a snapshot

CAT.SCH.TABLE1_SNAP111; and for “unload with EXISTING SNAPSHOT HAVING SUFFIX

‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utility will

expect a snapshot CAT.SCH.TABLE1_SNAP111; to be existing already. Otherwise

an error is produced.

***If this newly added options is not used in the syntax bulk unload will use

the regular scan instead of snapshot scan

**The bulk unload queries the explain plan virtual table to get the list of

Trafodion tables that will be scanned and based on the case it either creates

the snapshots for those tables or verifies if they already exists or not

*Configuration changes

--------------------------------

**Enable ACLs in hdfs

**

*Testing

--------

**All developper regression tests were run and all passed

**bulk unload and snapshot scan were tested on the cluster

*Examples:

**Example of using snapshot scan without bulk unload:

(we need to create the snapshot first )

>>cqd TRAF_TABLE_SNAPSHOT_SCAN 'on';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'SNAP777';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/';

--- SQL operation complete.

>>select [first 5] c1,c2 from tt10;

C1 C2

--------------------- --------------------

.00 0

.01 1

.02 2

.03 3

.04 4

--- 5 row(s) selected.

**Example of using snapshot scan with unload:

UNLOAD

WITH PURGEDATA FROM TARGET

NEW SNAPSHOT HAVING SUFFIX 'SNAP778'

INTO '/bulkload/unload_TT14_3' select * from seabase.TT20 ;

Change-Id: Idb1d1807850787c6717ab0aa604dfc9a37f43dce

  1. … 33 more files in changeset.
CREATE SCHEMA OBJECTS fix

INITIALIZE TRAFODION,CREATE SCHEMA OBJECTS would fail if a

schema to be created was not a regular identifier. Now

names are delimited.

Also fixed a REVOKE ROLE problem where an unitilized variable

could lead to a core.

Updated HIVE tests to include CREATE SCHEMA commands.

Change-Id: I90572addc8c00d77fb9a736e165cd3e8d8b420f2

  1. … 2 more files in changeset.
Bug fixes fir bulk unload

- 1398981 unload with union query fails with ERROR[4066]

- 1392481 UNLOAD query involves ORDER-BY returns ERORR[4001]

- By addressing these 2 issues the parser problem we faced before

where we could not use query_expression with UNLOAD is now

also fixed without any additinal conflicts

Change-Id: Ia3a724ac21f77e596a4db559781de5146fe208c6

  1. … 1 more file in changeset.
fix for hive/test018

Query caching seems to cause hive/test018 to fail

disabling it for now till the query caching

issue is fixed

bug 1396386

Change-Id: Ife834050ce8f7d7b968488f5a5ff3f189ac2b666

fix for bug 1391643

-fix for bug 1391643

-rework based on preliminary review by Mike

Change-Id: Icbc4dd6a3ee71c228c2006017030b71508fa0b6f

  1. … 3 more files in changeset.
fixes for bulk unload issues

- fix for bug 1391641. (unload WITH DELIMITER 255 returns ERROR[15001]).

- fix for bug 1387377. (unload with [delimiter | record_separator] 0 should err).

- fix bug 1389857. (trafci unload reports incorrect number of rows unloaded).

Change-Id: I3599728426464e81178c5904563b68aa78502a0b

  1. … 3 more files in changeset.
Initial changes for ORC file support.

Access to ORC (optimized row columnar) format tables is not enabled by

default yet. This checkin is initial and infrastructure changes for

that support.

Change-Id: I683c1b63c502dd4d2c736181952cb40f9f299cfd

  1. … 52 more files in changeset.
Bulk unload fixes and rework

- rework

- fix for bug 1387377

Change-Id: I7ad6115ab50f291e2ad97a042ec2b8fbc9d256bf

  1. … 4 more files in changeset.
Bulk load/unload fixes

- changes to sqenvcom.sh to support native compressions

for Clouder and Hortonworks distributions (tested on

cluster)

- rework from provious checkin.

- fix for bug 1387202 which cause bulk unload to hang

when target loaction is invalid.

Change-Id: Ia6046dfb2b5ff2f986b8306c26a991991a3da780

  1. … 7 more files in changeset.
Bulk Load fixes

- fix for bug 1383849 . Releasing the bulk load objects once load is done.

- bulk load now uses CIF by default. This does not apply to populating

indexes using bulk load.

- fix for hive/test015 so it does not fail on the test machines

Change-Id: Iaafe8de8eb60352b0d4c644e9da0d84a4068688c

  1. … 13 more files in changeset.
Fix for LP bug 1376306

- With this fix bulk loading salted tables and indexes now generates parallel

plans. Both salted base tables and salted indexes were tested

- if attemp_esp_parallelism cqd is set to off an error is returned

- also removed unneeded variables from sqenvcom.sh

Change-Id: I2a85d902070a4f35e3fe54b426a4277afaa60399

  1. … 4 more files in changeset.
Bulk Unload feature

Blueprint can be found at:

https://blueprints.launchpad.net/trafodion/+spec/bulkunload

Change-Id: I395bd720e8952db0fcd04cb26cccab4d4877eae1

    • -0
    • +311
    ./TEST003_create_hive_tables.hive
  1. … 30 more files in changeset.
removing quasi secure mode from bulk load

in HBase .98 permissions on the /bulkload will controlled

using Access Control Lists (ACL). Quasi secure was introdiced

because HBase .94 did not have ACLs.

Change-Id: I1427d6e7ab639417010875b10a01b60696790dee

  1. … 13 more files in changeset.
Delivery of the Migration to HBase 0.98 branch

Change-Id: I410b90e0730f5d16f2e86a63cbffe4abaf9daa5d

  1. … 289 more files in changeset.
Bulk Extract fixes

Change-Id: Ic2d6369c0dfc61cd647715befa810c0442124d91

  1. … 3 more files in changeset.
Bulk Unload (prototype) fixes

fixes for the parser and other issues

Change-Id: If0e3a508bf45c9b34d84083e4fb3906734b5db73

  1. … 10 more files in changeset.
Bulk Unload/extract (prototype)

These changes add support for a bulk unload/extract (prototype) to unload data

from trafodion tables into HDFS. Bulk unload unloads data in either

compressed (gzip) or uncompressed formats. When specified, compression take

place before writing data bufers to files. Once the data unloading is done

the files are merged into one single file. If compression is specified

the data are unloaded into files in compressed format and then merged in

compressed format. Otherwise they are unloaded in uncompressed format

and merged into one uncompressed file.

The unload syntax is:

UNLOAD [[WITH option] [, option] ...] INTO <hive-table> query_expression

Where

*<hive-table> is a hive table.

and

*option can be:

- PURGEDATA FROM TARGET: When this option specfied the files under the

<hive-table> are deleted

- COMPRESSION GZIP: when this option is specfied the Gzip compression is

used. the compression takes place in the hive-insert node and data

is written to disk in compressed format (Gzip i the only supported

compression for now)

- MERGE FILE <merged_file-path>: When this option is specified the files

unloaded are merged into one single file <merged-file-path>. if

compressiion is specied the data unloaded in compressed format

and the merged file will be in compressed formt also.

- NO OUTPUT: If this option is specified then no status messge is dispalyed

Change-Id: Ifd6f543d867f29ee752bcb80020c5ad6c16b7277

    • -0
    • +68
    ./TEST018_create_hive_tables.hive
  1. … 26 more files in changeset.
Bulk load and other changes

changes include:

- Setting the LIBHDFS_OPTS variable to Xmx2048m to limit the

amout of virtual memory used when reading hive tables to 2GB.

- Making update statistics use bulk load by default

- Changing bulk load to return the number of rows loaded

- fix for bug 1359872 where create index hangs

Change-Id: Ic8e36cfef43ed2ce7c2c2469c1b9c315a761ee31

  1. … 9 more files in changeset.