EXPECTED015

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Changes in Patchset2

Fixed issues found during review.

Most of the changes are related to disbling this change for unique indexes.

When a unique index is found, they alone are disabled during the load.

Other indexes are online and are handled as described below. Once the base

table and regular indexes have been loaded, unique indexes are loaded from

scratch using a new command "populate all unique indexes on <tab-name>".

A simlilar command "alter table <tab-name> disable all unique indexes"

is used to disable all unique indexes on a table at the start of load.

Cqd change setting allow_incompatible_assignment is unrelated and fixes an

issue related to loading timestamp types from hive.

Odb change gets rid of minor warnings.

Thanks to all three reviewers for their helpful comments.

-----------------------------------

Adding support for incremental index maintenance during bulk load.

Previously when bulk loading into a table with indexes, the indexes are first

disabled, base table is loaded and then the indexes are populated from

scratch one by one. This could take a long time when the table has significant

data prior to the load.

Using a design by Hans this change allows indexes to be loaded in the same

query tree as the base table. The query tree looks like this

Root

|

NestedJoin

/ \

Sort Traf_load_prep (into index1)

|

Exchange

|

NestedJoin

/ \

Sort Traf_load_prep (i.e. bulk insert) (into base table)

|

Exchange

|

Hive scan

This design and change set allows multiple indexes to be on the same tree.

Only one index is shown here for simplicity. LOAD CLEANUP and LOAD COMPLETE

statements also now perform these tasks for the base table along with all

enabled indexes

This change is enabled by default. If a table has indexes it will be

incrementally maintained during bulk load.

The WITH NO POPULATE INDEX option has been removed

A new option WITH REBUILD INDEXES has been added. With this option we get

the old behaviour of disabling all indexes before load into table and

then populate all of them from scratch.

Change-Id: Ib5491649e753b81e573d96dfe438c2cf8481ceca

  1. … 35 more files in changeset.
Enabling Bulk load and Hive Scan error logging/skip feature

Also Fixed the hanging issue with Hive scan (ExHdfsScan operator) when there

is an error in data conversion.

ExHbaseAccessBulkLoadPrepSQTcb was not releasing all the resources when there

is an error or when the last buffer had some rows.

Error logging/skip feature can be enabled in

hive scan using CQDs and in bulk load using the command line options.

For Hive Scan

CQD TRAF_LOAD_CONTINUE_ON_ERROR ‘ON’ to skip errors

CQD TRAF_LOAD_LOG_ERROR_ROWS ‘ON’ to log the error rows in Hdfs files.

For Bulk load

LOAD WITH CONTINUE ON ERROR [TO <location>] – to skip error rows

LOAD WITH LOG ERROR ROWS – to log the error rows in hdfs files.

The default parent error logging directory in hdfs is /bulkload/logs. The error

rows are logged in subdirectory ERR_<date>_<time>. A separate hdfs file is

created for every process/operator involved in the bulk load in this directory.

Error rows in hive scan are logged in

<sourceHiveTableName>_hive_scan_err_<inst_id>

Error rows in bulk upsert are logged in

<destTrafTableName>_traf_upsert_err_<inst_id>

Bulk load can also aborted after a certain number of error rows are seen using

LOAD WITH LOG ERROR ROWS, STOP AFTER <n> ERROR ROWS option

Change-Id: Ief44ebb9ff74b0cef2587705158094165fca07d3

  1. … 33 more files in changeset.
hive/test015 fixes:

-Disabling upsert using load temporarily to avoid the hang issue.

(there is an LP 1417337 for the hang issue)

-Reducing the amount of data to make the test run faster.

Change-Id: I403942f8014e22f9c391a45773afd8a03ccabbc1

  1. … 1 more file in changeset.
fixes for bulk unload issues

- fix for bug 1391641. (unload WITH DELIMITER 255 returns ERROR[15001]).

- fix for bug 1387377. (unload with [delimiter | record_separator] 0 should err).

- fix bug 1389857. (trafci unload reports incorrect number of rows unloaded).

Change-Id: I3599728426464e81178c5904563b68aa78502a0b

  1. … 3 more files in changeset.
Fix for LP bug 1376306

- With this fix bulk loading salted tables and indexes now generates parallel

plans. Both salted base tables and salted indexes were tested

- if attemp_esp_parallelism cqd is set to off an error is returned

- also removed unneeded variables from sqenvcom.sh

Change-Id: I2a85d902070a4f35e3fe54b426a4277afaa60399

  1. … 7 more files in changeset.
Bulk Unload feature

Blueprint can be found at:

https://blueprints.launchpad.net/trafodion/+spec/bulkunload

Change-Id: I395bd720e8952db0fcd04cb26cccab4d4877eae1

  1. … 37 more files in changeset.
Delivery of the Migration to HBase 0.98 branch

Change-Id: I410b90e0730f5d16f2e86a63cbffe4abaf9daa5d

  1. … 290 more files in changeset.
Bulk Extract fixes

Change-Id: Ic2d6369c0dfc61cd647715befa810c0442124d91

  1. … 3 more files in changeset.
Bulk Unload (prototype) fixes

fixes for the parser and other issues

Change-Id: If0e3a508bf45c9b34d84083e4fb3906734b5db73

  1. … 10 more files in changeset.
Bulk Unload/extract (prototype)

These changes add support for a bulk unload/extract (prototype) to unload data

from trafodion tables into HDFS. Bulk unload unloads data in either

compressed (gzip) or uncompressed formats. When specified, compression take

place before writing data bufers to files. Once the data unloading is done

the files are merged into one single file. If compression is specified

the data are unloaded into files in compressed format and then merged in

compressed format. Otherwise they are unloaded in uncompressed format

and merged into one uncompressed file.

The unload syntax is:

UNLOAD [[WITH option] [, option] ...] INTO <hive-table> query_expression

Where

*<hive-table> is a hive table.

and

*option can be:

- PURGEDATA FROM TARGET: When this option specfied the files under the

<hive-table> are deleted

- COMPRESSION GZIP: when this option is specfied the Gzip compression is

used. the compression takes place in the hive-insert node and data

is written to disk in compressed format (Gzip i the only supported

compression for now)

- MERGE FILE <merged_file-path>: When this option is specified the files

unloaded are merged into one single file <merged-file-path>. if

compressiion is specied the data unloaded in compressed format

and the merged file will be in compressed formt also.

- NO OUTPUT: If this option is specified then no status messge is dispalyed

Change-Id: Ifd6f543d867f29ee752bcb80020c5ad6c16b7277

  1. … 30 more files in changeset.
Bulk load and other changes

changes include:

- Setting the LIBHDFS_OPTS variable to Xmx2048m to limit the

amout of virtual memory used when reading hive tables to 2GB.

- Making update statistics use bulk load by default

- Changing bulk load to return the number of rows loaded

- fix for bug 1359872 where create index hangs

Change-Id: Ic8e36cfef43ed2ce7c2c2469c1b9c315a761ee31

  1. … 11 more files in changeset.
Removing checks on indexes and constraints for uspert using load

Upsert using load can handle indexes and constraints. No need

to produce an error when the target table has indexes or

constraints

bug 1359316

Change-Id: I8ed03512fc59e3670fd3e962e6be572f811466cd

  1. … 2 more files in changeset.
Compression support with bulk load

Change-Id: Ia492c0f5801d51e6d8c3cd7faa98b9ffcd2314ca

  1. … 2 more files in changeset.
Trafodion bulk load changes

The changes include:

-A way to specify the maximum size of the Hfiles beyond which the file

will be split.

-Adding the “upsert using load …” statement to run under the load utility

so that if can take advantage of disabling and populating indexes and so on.

the syntax is: load with upsert using load into <trafodion table> select ...

from <table>. "Upsert using load" can still be used seperately from

load utility

-Checks in the compiler to make sure indexes and constraints are disabled

before running the "upsert using load" statement

-Moving seabase tests 015 and 017 to the hive suite as they use hive tables.

Change-Id: I80303e4471d2179718e050c98d954ef56cd4cc4f

  1. … 26 more files in changeset.