Clone
Selvaganesan <selva.ganesan@hp.com>
committed
on 14 May 15
Enabling Bulk load and Hive Scan error logging/skip feature
Also Fixed the hanging issue with Hive scan (ExHdfsScan operator) when there
is… Show more
Enabling Bulk load and Hive Scan error logging/skip  feature

Also Fixed the hanging issue with Hive scan (ExHdfsScan operator) when there

is an error in data conversion.

ExHbaseAccessBulkLoadPrepSQTcb was not releasing all the resources when there

is an error or when the last buffer had some rows.

Error logging/skip feature can be enabled in

hive scan using CQDs and in bulk load using the command line options.

For Hive Scan

CQD TRAF_LOAD_CONTINUE_ON_ERROR ‘ON’ to skip errors

CQD TRAF_LOAD_LOG_ERROR_ROWS ‘ON’ to log the error rows in Hdfs files.

For Bulk load

LOAD WITH CONTINUE ON ERROR [TO <location>] – to skip error rows

LOAD WITH LOG ERROR ROWS – to log the error rows in hdfs files.

The default parent error logging directory in hdfs is /bulkload/logs. The error

rows are logged in subdirectory ERR_<date>_<time>. A separate hdfs file is

created for every process/operator involved in the bulk load in this directory.

Error rows in hive scan are logged in

   <sourceHiveTableName>_hive_scan_err_<inst_id>

Error rows in bulk upsert are logged in

   <destTrafTableName>_traf_upsert_err_<inst_id>

Bulk load can also aborted after a certain number of error rows are seen using

LOAD WITH LOG ERROR ROWS, STOP AFTER <n> ERROR ROWS option

Change-Id: Ief44ebb9ff74b0cef2587705158094165fca07d3

Show less

default + 8 more