ExHbaseIUD.cpp

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Reworked fix for 1452424 VSBB scan cause query to return wrong result

The assumption that rowID should be exactly equal to the calculated

key length is not correct. Trafodion SQL engine uses a null extra byte

to ensure that the row ID is not found in case of data conversion errors

Hence direct ROWID buffer format is changed to accommodate this.

Now the format is

numRowIds + rowIDSuffix + rowId + rowIDSuffix + rowId + …

rowIDSuffix is '0' then the subsequent rowID is of length

= passed rowID len

'1' then the subsequent rowID is of length

= passed rowID len+1

Change-Id: I07a283895f6f9c652b3f933bcf0330b69ee2d300

  1. … 4 more files in changeset.
Additional fix for 1452424 VSBB scan cause query to return wrong result

VSBB scan was returning wrong results randomly even for the same query.

An extra null byte was added in the row id at times while setting up the

unique key. This caused a shift in row id parsing at java layer leading

to row not found for all row ids after this extra byte.

In addition, VSSB select was not getting GET_EOD or GET_NOMORE queue

entries at the end of the query. Hence, the rowset was never getting

closed causing a resource leak.

Also, retained the other changes that helped in debugging this issue.

Change-Id: I6a807bddc8edaff2f4140931d4f228e94badcc05

  1. … 4 more files in changeset.
Move core into subdir to combine repos

  1. … 10768 more files in changeset.
Move core into subdir to combine repos

  1. … 10622 more files in changeset.
Move core into subdir to combine repos

Use: git log --follow -- <file>

to view file history thru renames.

  1. … 10837 more files in changeset.
Changes in Patchset2

Fixed issues found during review.

Most of the changes are related to disbling this change for unique indexes.

When a unique index is found, they alone are disabled during the load.

Other indexes are online and are handled as described below. Once the base

table and regular indexes have been loaded, unique indexes are loaded from

scratch using a new command "populate all unique indexes on <tab-name>".

A simlilar command "alter table <tab-name> disable all unique indexes"

is used to disable all unique indexes on a table at the start of load.

Cqd change setting allow_incompatible_assignment is unrelated and fixes an

issue related to loading timestamp types from hive.

Odb change gets rid of minor warnings.

Thanks to all three reviewers for their helpful comments.

-----------------------------------

Adding support for incremental index maintenance during bulk load.

Previously when bulk loading into a table with indexes, the indexes are first

disabled, base table is loaded and then the indexes are populated from

scratch one by one. This could take a long time when the table has significant

data prior to the load.

Using a design by Hans this change allows indexes to be loaded in the same

query tree as the base table. The query tree looks like this

Root

|

NestedJoin

/ \

Sort Traf_load_prep (into index1)

|

Exchange

|

NestedJoin

/ \

Sort Traf_load_prep (i.e. bulk insert) (into base table)

|

Exchange

|

Hive scan

This design and change set allows multiple indexes to be on the same tree.

Only one index is shown here for simplicity. LOAD CLEANUP and LOAD COMPLETE

statements also now perform these tasks for the base table along with all

enabled indexes

This change is enabled by default. If a table has indexes it will be

incrementally maintained during bulk load.

The WITH NO POPULATE INDEX option has been removed

A new option WITH REBUILD INDEXES has been added. With this option we get

the old behaviour of disabling all indexes before load into table and

then populate all of them from scratch.

Change-Id: Ib5491649e753b81e573d96dfe438c2cf8481ceca

  1. … 35 more files in changeset.
Fix for 1452424 vsbb scan/delete cause query to return wrong result

VSBB update/delete were not tracking the number of rows in the

buffer.This has been corrected.

Change-Id: I2a89ccd9a84832c4771481de2ee8503e912ce0d8

  1. … 2 more files in changeset.
Enabling Bulk load and Hive Scan error logging/skip feature

Also Fixed the hanging issue with Hive scan (ExHdfsScan operator) when there

is an error in data conversion.

ExHbaseAccessBulkLoadPrepSQTcb was not releasing all the resources when there

is an error or when the last buffer had some rows.

Error logging/skip feature can be enabled in

hive scan using CQDs and in bulk load using the command line options.

For Hive Scan

CQD TRAF_LOAD_CONTINUE_ON_ERROR ‘ON’ to skip errors

CQD TRAF_LOAD_LOG_ERROR_ROWS ‘ON’ to log the error rows in Hdfs files.

For Bulk load

LOAD WITH CONTINUE ON ERROR [TO <location>] – to skip error rows

LOAD WITH LOG ERROR ROWS – to log the error rows in hdfs files.

The default parent error logging directory in hdfs is /bulkload/logs. The error

rows are logged in subdirectory ERR_<date>_<time>. A separate hdfs file is

created for every process/operator involved in the bulk load in this directory.

Error rows in hive scan are logged in

<sourceHiveTableName>_hive_scan_err_<inst_id>

Error rows in bulk upsert are logged in

<destTrafTableName>_traf_upsert_err_<inst_id>

Bulk load can also aborted after a certain number of error rows are seen using

LOAD WITH LOG ERROR ROWS, STOP AFTER <n> ERROR ROWS option

Change-Id: Ief44ebb9ff74b0cef2587705158094165fca07d3

  1. … 33 more files in changeset.
Changes to enable Rowset select - Fix for bug 1423327

HBase always returns an empty result set when the row is not found. Trafodion

is changed to exploit this concept to project no data in a rowset select.

Now optimizer has been enabled to choose a plan involving Rowset Select

where ever possible. This can result in plan changes for the queries -

nested join plan instead of hash join,

vsbb delete instead of delete,

vsbb insert instead of regular insert.

A new CQD HBASE_ROWSET_VSBB_SIZE is now added to control the hbase rowset size.

The default values is 1000

Change-Id: Id76c2e6abe01f2d1a7b6387f917825cac2004081

  1. … 19 more files in changeset.
Eliminate manual steps in load/ustat integration

The fix achieves full integration of the bulk load utility with

Update Statistics. The Hive backing sample table is now creeated

automatically (formerly, we only wrote the HDFS files to be

used by the Hive external table), the correct sampling percentage

for the sample table is calculated, and the ustat command is

launched fro1m the executor as one of the steps in execution of

the bulk load utility.

Change-Id: I9d5600c65f0752cbc7386b1c78cd10a091903015

Closes-Bug: #1436939

  1. … 26 more files in changeset.
New ustat algorithm and bulk load integration

This is the initial phase of the Update Statistics change to use

counting Bloom filters and a Hive backing sample table created

during bulk load and amended whenever a new HFile for a table

is created. All changes are currently disabled pending some

needed fixes and testing.

blueprint ustat-bulk-load

Change-Id: I32af5ce110b0f6359daa5d49a3b787ab518295fa

  1. … 16 more files in changeset.
Adding more run-time memory allocations from NAHeap

This set of changes moves some of the string vector variables in HBase

access operators from standard string template to our NAList and

NAString (or HbaseStr for row IDs). In the process, allocationis of the

objects will be from our HAHeap instead of the system heap. This would

help us tracking memory usage and detecting leaks easier.

In addition, a change in ExHbaseAccessTcb::setupListOfColNames()

prevents unnecessary allocations to populate the columns list unless it

is empty. The Google profiling tools helped us on identifying this

problem.

also, removed ExHbaseAccessDeleteTcb operator which was not used.

Change-Id: I87ab674ab8e3d291f2fc9563718d88de537ae96b

  1. … 10 more files in changeset.
Manageability changes - event mgmt and stats publication

Implements changes to support event management using log4cpp.

Configuration files are located in $MY_SQROOT/conf folder and all logs

files are located in $MY_SQROOT/logs folder

For more information see the blueprint at:

https://blueprints.launchpad.net/trafodion/+spec/eventmanagement

Implements changes for publication of statistics to repository. For more

information see the blueprint at:

https://blueprints.launchpad.net/trafodion/+spec/repositorymetrics

Note:

In this initial delivery publication of statistics is disabled by

default and it can be enabled via DCS property. This code has been

reviewed internally prior to merging with mainline

Documentation:

https://wiki.trafodion.org/wiki/index.php/Trafodion_Manageability

Included timestamp to be part of the primarykey for metric aggregation

table

Addressed some of the comments and incorporated Anoop's change for

repository

Changed the queryBuf size in sql/sqlcomp/CmpSeabaseDDLrepos.cpp to 20000

Modified the sql/regress/seabase/EXPECTED024

Change-Id: I517575233c10b2a8683cdd1d53a2eec96d7c2a6f

  1. … 781 more files in changeset.
SQL syntax to cancel executing query, phase 3

This change fixes some problems with subset DELETE and

UPDATE statements which prevented them from responding

to CANCEL. It addresses an identical potential issue in

SELECT statements with predicates that reject large

numbers of rows.

The change also allows an envvar, SQL_NO_REGISTER_CANCEL,

which if set to 1, prevents queries from registering with

the cancel broker. It can be used to debug performance

regressions.

The change also adds test cases to the regression test

for UPDATE, DELETE, INSERT and UPSERT WITH LOAD.

Change-Id: I86977c3985db4f56f2d4a0e89051970cec2c9411

Implements: blueprint sql-query-cancel

  1. … 6 more files in changeset.
Native external hbase table access (select, IUD) changes.

-- IUD on external hbase tables is now enabled by default

-- predicates on native hbase tables can now be pushed down to

hbase region server

-- traf varchar col maxlength is now 200K by default,

can be changed by cqd max_character_col_size

-- executor handles column values length greater than 32K during

move to/from JNI

-- error is correctly returned if data retrieved from hbase exceeds expected

max row length

-- hbase column_create function now takes an expression/param as its

column name operand

Change-Id: Ieb3fcabfebaa22008eff2a049fc1e2000e68861e

  1. … 46 more files in changeset.
Various LP fixes, bugs and code cleanup.

-- removed obsolete code (label create/alter/delete, get disk/label/buffer stats,

dp2 scan)

-- metadata structs are now created as classes and initialized during

creation. LP 1394649

-- warnings are now being returned from compiler to executor after DDL operations.

-- duplicate constraint names now return error.

-- handle NOT ENFORCED constraints: give warning during creation and not enforce

during use. LP 1361784

-- drop all indexes (enabled and disabled indexes) on a table during drop table

and schema now works. LP 1384380

-- drop constraint on disabled index succeeds. LP 1384479

-- string truncation error is now returned if default value doesn't fit in

column. LP 1394780

-- fixed issue where a failure during multiple constraints creation in a create

stmt was not cleaning up metadata. LP 1389871

-- update where current of is now supported. LP 1324679

Change-Id: Iec1b0b4fc6a8161a33b7f69228c0f1e3f441f330

  1. … 54 more files in changeset.
Enabling runtime stats for hbase tables and operators

This is the third set of changes to collect the runtime stats info. Part

is to address the comments and suggestions from last review.

1) Instead of passing the hbase access stats entry to every htable

calls, set the pointer in the EXP hbase interface layer with first init

call in the tcb work methods (not the task work methods), then

eventually to the htable JNI layer from getHTableClient()

(sql/exp/ExpHbaseInterface.cpp).

2) Rewrite the way to construct the hbase operator names from one

methord and use it for display both tdb contents and tcb stats.

3) Populate the hbase I/O bytes counter for both read and insert

(sql/executor/HBaseClient_JNI.cpp).

4) Fix the problem that parsing stats variable text string could go

beyond the end of the string (getSubstrInfo() in

sql/executor/ExExeUtilGetStats.cpp).

Change-Id: I62618b57894039bc1ca5bc0f3c9b89efec5cc42e

  1. … 15 more files in changeset.
Identity column and sequence numbers support.

Added support for IDENTITY columns.

Finished sequence numbers functionality.

Bug fixes and perf enhancements in those areas.

This code has been pre-reviewed by Joanie C.

Change-Id: I0445bc9765b60becb9adf8c053c05344395aecaa

  1. … 94 more files in changeset.
Initial changes for ORC file support.

Access to ORC (optimized row columnar) format tables is not enabled by

default yet. This checkin is initial and infrastructure changes for

that support.

Change-Id: I683c1b63c502dd4d2c736181952cb40f9f299cfd

  1. … 53 more files in changeset.
Enabling runtime stats for hbase tables and operators

This is the second set of changes to collect the runtime stats info for

hbase tables and operators. It contains:

1) Stats for hbase IUD operations

2) Moved incActualRowsReturned() call to

ExHbaseAccessTcb::moveRowToUpQueue()

3) Added Hbase call counter

4) Display full hbase operator names instead of generic

"EX_HBASE_ACCESS" for hbase operator runtime stats

Change-Id: I94d727c897876a429b588f9acb3fec465dd56fe5

  1. … 11 more files in changeset.
Changes to support OSS poc.

This checkin contains multiple changes that were added to support OSS poc.

These changes are enabled through a special cqd mode_special_4 and not

yet externalized for general use.

A separate spec contains details of these changes.

These changes have been contributed and pre-reviewed by Suresh, Jim C,

Ravisha, Mike H, Selva and Khaled.

All dev regressions have been run and passed.

Change-Id: I2281c1b4ce7e7e6a251bbea3bf6dc391168f3ca3

  1. … 143 more files in changeset.
Bulk Load fixes

- fix for bug 1383849 . Releasing the bulk load objects once load is done.

- bulk load now uses CIF by default. This does not apply to populating

indexes using bulk load.

- fix for hive/test015 so it does not fail on the test machines

Change-Id: Iaafe8de8eb60352b0d4c644e9da0d84a4068688c

  1. … 13 more files in changeset.
Native Hbase access improvments

Native hbase access via Trafodion SQL engine now utilizes the enhancements

made for Trafodion table like pushing mutliple rows to JNI and pre-fetch.

Also removed the dependency on the ResultIterator for the native hbase access.

This reduces yet another object created in HTableClient and improves java

memory usage on the client side.

Cleaned up state changes in the HBase access operators and removed

redundant code in the JNI layer.

Change-Id: I70ab52917aac64b68b3816b8ad834842a4d8745e

  1. … 8 more files in changeset.
Fix for bug 1360493 - Use correct row ID length

Closes-Bug: 1360493

Change to use the key length value, if explicitly set, as hbase table row ID length.

Also, add the tests that uncover this bug to the regression test set.

Change-Id: I29c47348ef3242d557c42f5ce4e92f7b7fe6b2b8

  1. … 3 more files in changeset.
Pre-fetch cells from Hbase

Pre-fetch is enabled via a parameter in HTableClient.startScan method.Pre-fetch

is not done for unique and batch Trafodion operations and all native

Hbase table access. Pre-fetch is currently disabled for non-unique UMD

Trafodion operations.

startScan method invokes pre-fetch to Hbase in a different thread. When the

fetchRows method is called, pre-fetch completes, passes cell info to JNI and

invokes pre-fetch if there are more rows to be fetched.

We have observed around 45% reduction in response time to fetch 12 million

rows of a sizteen partition table in a node via a single process.

Change-Id: I3c81e182663fddd08a2fc873a39302b179850c92

  1. … 6 more files in changeset.
Cleanup scan related functions

Incorporated the comments from change id 331

Changed ExpHbaseInterface_JNI::fetchAllRows method to use the new scan methods

Removed the scan related methods that are no longer used

Triggered the cleanup of Java objects at the time of releaseHTableClient.

Increased the default maximum java heap size to 1024MB from 512MB.

Change-Id: I851bcfa266504f609fdbcba6f2a5e9e6dd2937d3

  1. … 9 more files in changeset.
Reducing the path length for scan, single row get, batch get operations

Earlier, Key-values per row is buffered as a ByteBuffer and shipped to JNI side.There were 2 C->java transitions to get the Hbase row into tuple format.

Now, we avoid this copy on the Java side and ship the length and offsets of

the Key-Value for all rows in a batch along with reference to the

Key-Value buffer. The data is directly moved from Hbase Buffers to

row format buffer. There are only 2 C->java transitions per batch of row.

This has shown 3-4 times reduction in path length in Trafodion SQL processes

while scanning a table with 12 million rows.

Trafodion SQL processes dump core when ex_assert function is called or when it

exits with non-zero value. These processes will also dump core when an

environment variable ABORT_ON_ERROR is set in release mode.

Fix for seabase/TEST010 failure with the patchSet 1

Rowset Select was dumping core sometimes. A wrong tuple was used to calculate the row length. It caused the proces to dump core sometimes.

Change-Id: I0ec4669b54971b6a0c699c0fa0662c85f68bd25d

  1. … 13 more files in changeset.
Merge "fix for LP 1359906"

fix for LP 1359906

Change-Id: I3a8e40b25d3ce3a9261cfd999c116d1b6538d84d

  1. … 1 more file in changeset.
Bulk load and other changes

changes include:

- Setting the LIBHDFS_OPTS variable to Xmx2048m to limit the

amout of virtual memory used when reading hive tables to 2GB.

- Making update statistics use bulk load by default

- Changing bulk load to return the number of rows loaded

- fix for bug 1359872 where create index hangs

Change-Id: Ic8e36cfef43ed2ce7c2c2469c1b9c315a761ee31

  1. … 11 more files in changeset.