GenRelExeUtil.cpp

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Fixes from review to sqvers

commit 04f3812f112a5629a563f02d7e72c5fa503c6a8d

Author: Sandhya Sundaresan <sandhya.sundaresan@hp.com>

Date: Sun Jun 14 04:23:21 2015 +0000

Preliminary checkin of lob support for external files. Inserts from http

files, hdfs files and lob local files are supported. Added support for

new extract synttax. Extract from lob

columns to hdfs files has been added . More work needed to support

binary files and very large files . Current limit is 1G.

Also fixed some error handling issues

Fixed some substring warning issues in the lobtostring/stringtolob

functions.

Added references and interfaces to curl library that is needed to read external http

files.

More work needed before this support can be used

Change-Id: Ieacaa3e4b7fa2a040764888c90ef0b029f107a8b

Change-Id: Ife3caf13041f8106a999d06808b69e5b6a348a6b

  1. … 29 more files in changeset.
Move core into subdir to combine repos

  1. … 10768 more files in changeset.
Move core into subdir to combine repos

  1. … 10622 more files in changeset.
Move core into subdir to combine repos

Use: git log --follow -- <file>

to view file history thru renames.

  1. … 10837 more files in changeset.
Rework for incremental IM during bulk load

Address comments by Hans and fix 1 regression failure

A regression failure in executor/test013 was caused due to how external

names as used with volatile indexes. This has been fixed in GenRelExeUtil.cpp

The parser change suggested could not be made due to increasing conflicts.

Thank you for the feedback.

Change-Id: Icdf5dbbf90673d44d5d0ccb58086266520fcf5c3

  1. … 5 more files in changeset.
Changes in Patchset2

Fixed issues found during review.

Most of the changes are related to disbling this change for unique indexes.

When a unique index is found, they alone are disabled during the load.

Other indexes are online and are handled as described below. Once the base

table and regular indexes have been loaded, unique indexes are loaded from

scratch using a new command "populate all unique indexes on <tab-name>".

A simlilar command "alter table <tab-name> disable all unique indexes"

is used to disable all unique indexes on a table at the start of load.

Cqd change setting allow_incompatible_assignment is unrelated and fixes an

issue related to loading timestamp types from hive.

Odb change gets rid of minor warnings.

Thanks to all three reviewers for their helpful comments.

-----------------------------------

Adding support for incremental index maintenance during bulk load.

Previously when bulk loading into a table with indexes, the indexes are first

disabled, base table is loaded and then the indexes are populated from

scratch one by one. This could take a long time when the table has significant

data prior to the load.

Using a design by Hans this change allows indexes to be loaded in the same

query tree as the base table. The query tree looks like this

Root

|

NestedJoin

/ \

Sort Traf_load_prep (into index1)

|

Exchange

|

NestedJoin

/ \

Sort Traf_load_prep (i.e. bulk insert) (into base table)

|

Exchange

|

Hive scan

This design and change set allows multiple indexes to be on the same tree.

Only one index is shown here for simplicity. LOAD CLEANUP and LOAD COMPLETE

statements also now perform these tasks for the base table along with all

enabled indexes

This change is enabled by default. If a table has indexes it will be

incrementally maintained during bulk load.

The WITH NO POPULATE INDEX option has been removed

A new option WITH REBUILD INDEXES has been added. With this option we get

the old behaviour of disabling all indexes before load into table and

then populate all of them from scratch.

Change-Id: Ib5491649e753b81e573d96dfe438c2cf8481ceca

  1. … 35 more files in changeset.
Merge "Enabling Bulk load and Hive Scan error logging/skip feature"

  1. … 6 more files in changeset.
various lp and other fixes, details below.

-- added support for self referencing constraints

-- limit clause can now be specified as a param

(select * from t limit ?)

-- lp 1448261. alter table add identity col is not allowed and now

returns an error

-- error is returned if a specified constraint in an alter/create statement

exists on any table

-- lp 1447343. cannot have more than one identity columns.

-- embedded compiler is now used to get priv info during invoke/showddl.

-- auth info is is not reread if already initialized

-- sequence value function is now cacheable

-- lp 1448257. inserts in volatile table with identity column now work

-- lp 1447346. inserts with identity col default now work if inserted

in a salted table.

-- only one compiler is now needed to process ddl operations with or

without authorization enabled

-- query cache in embedded compiler is now cleared if user id changes

-- pre-created default schema 'SEABASE' can no longer be dropped

-- default schema 'SCH' is automatically created if running regressions

and it doesn't exist.

-- improvements in regressions run.

-- regressions run no longer call a script from another sqlci session

to init auth, create default schema

and insert into defaults table before every regr script

-- switched the order of regression runs

-- updates from review comments.

Change-Id: Ifb96d9c45b7ef60c67aedbeefd40889fb902a131

  1. … 69 more files in changeset.
Enabling Bulk load and Hive Scan error logging/skip feature

Also Fixed the hanging issue with Hive scan (ExHdfsScan operator) when there

is an error in data conversion.

ExHbaseAccessBulkLoadPrepSQTcb was not releasing all the resources when there

is an error or when the last buffer had some rows.

Error logging/skip feature can be enabled in

hive scan using CQDs and in bulk load using the command line options.

For Hive Scan

CQD TRAF_LOAD_CONTINUE_ON_ERROR ‘ON’ to skip errors

CQD TRAF_LOAD_LOG_ERROR_ROWS ‘ON’ to log the error rows in Hdfs files.

For Bulk load

LOAD WITH CONTINUE ON ERROR [TO <location>] – to skip error rows

LOAD WITH LOG ERROR ROWS – to log the error rows in hdfs files.

The default parent error logging directory in hdfs is /bulkload/logs. The error

rows are logged in subdirectory ERR_<date>_<time>. A separate hdfs file is

created for every process/operator involved in the bulk load in this directory.

Error rows in hive scan are logged in

<sourceHiveTableName>_hive_scan_err_<inst_id>

Error rows in bulk upsert are logged in

<destTrafTableName>_traf_upsert_err_<inst_id>

Bulk load can also aborted after a certain number of error rows are seen using

LOAD WITH LOG ERROR ROWS, STOP AFTER <n> ERROR ROWS option

Change-Id: Ief44ebb9ff74b0cef2587705158094165fca07d3

  1. … 33 more files in changeset.
LOAD and UNLOAD privilege check fixes

1437078 - LOAD fails with error 4481 even if user has priv

This problem happens because the table definition cached in NATableCache is

not being refreshed with the new values:

Generally, when a query is compiled and the user does not have privilege(s), a

call to checkPrivileges (called during binding) returns a special privilege

error. After compilation completes, the compiler (CmpMain::sqlcomp) checks to

see if a privilege error occurred. If so, the NATable entry is removed and the

request is recompiled. If a privilege error occurs the second time, the

privilege error is reported and the latest cached NATable structure is retained.

In the case of LOAD, the privilege checks are performed in the generator;

therefore checkPrivileges is not being called, the special privilege error is

not reported and the cached NATable entry is not being refreshed.

The fix moves authorization checks from the generator into the binder -

specifically checkPrivileges. A bindNode method was added to the bulk loader

code to verify privileges. The bindNode, checks to see if the user has the

MANAGE_LOAD privilege. If so, no additional checks are required. If not

bindNode sets up the privilege structure (stoi) and saves it in the binder work

area. Later, checkPrivileges is called and privileges checked as required.

1305015 - User with SELECT and INSERT privs unable to UNLOAD

This problem occurs during the generator phase when privileges are being

checked. When an unload statement is parsed, the parser creates the

ExeUtilHBaseBulkUnload class and set the table name to DUMMY. When the

privilege checks are later performed, the DUMMY table is checked which does not

exist.

The fix moves authorization checks from the generation phase into the binder.

A bindNode method was added to the bulk unload code to verify privileges. The

bindNode code, first checks to see if the user has the MANAGE_LOAD privilege.

If so, no additional checks are required. If not, it grabs the query expression

attached the the ExeUtilHBaseBulkUnLoad class and binds it. Binding the query

expression calls checkPrivileges and reports any violations.

This change requires that the query expression created during parsing be stored

in a new class member.

Other fixes related to load and unload:

While fixing the above issues, a problem was found when trying to load a table

with indexes if the user had MANAGE_LOAD privilege. A check was added to index

code to allow the operation to proceed.

The load code is not checking privileges on the source table

1438896 Internal error during create or replace view

Not found errors can be returned, so the error check was change to look for

STATUS_ERROR only.

Change-Id: I00b08eca6678b9c1a0f84848536de3bc93735853

  1. … 4 more files in changeset.
Eliminate manual steps in load/ustat integration

The fix achieves full integration of the bulk load utility with

Update Statistics. The Hive backing sample table is now creeated

automatically (formerly, we only wrote the HDFS files to be

used by the Hive external table), the correct sampling percentage

for the sample table is calculated, and the ustat command is

launched fro1m the executor as one of the steps in execution of

the bulk load utility.

Change-Id: I9d5600c65f0752cbc7386b1c78cd10a091903015

Closes-Bug: #1436939

  1. … 26 more files in changeset.
metadata fixes and 'sqlmp' code cleanup

-- NATable struct for metadata was being created multiple

times whenever information for a new table was read

from metadata. That has been fixed.

-- an 'initialize trafodion, drop' followed by 'initialize traf'

from the same session was failing due to priv info not getting

reset. This would show up if 'initialize authorization' was

done earlier. That has been fixed.

-- code cleanup mostly related to sqlmp legacy code and reference.

Change-Id: I346e3f3bbc6c7784b38e7e2e1f11d487854c281c

  1. … 54 more files in changeset.
explain enhancements and fixes.

-- support to return explain details from packed explain plan.

select * from table(explain(null, 'EXPLAIN_PLAN=<packed-plan>'))

This enables caller to retrieve the packed explain plan, ship

it to another process and then format it there.

DSM will be using this functionality.

-- sqlci syntax added to test explain enhancement functionality:

get qid for statement s;

store explain for s in repository;

set qid <qid> for s;

-- new tests added to seabase/TEST011

-- some bug fixes to handle 4 byte lengths for

explain plan greater than short max.

-- changed err enums CLI_DESC_NOT_EXSISTS and

CLI_STMT_NOT_EXSISTS to the right EXISTS spelling

(this is just for you, Dave).

-- added missing copyrights

Change-Id: Ic60758fe49790516be125cca7f7e23fe1265feb7

  1. … 32 more files in changeset.
New ustat algorithm and bulk load integration

This is the initial phase of the Update Statistics change to use

counting Bloom filters and a Hive backing sample table created

during bulk load and amended whenever a new HFile for a table

is created. All changes are currently disabled pending some

needed fixes and testing.

blueprint ustat-bulk-load

Change-Id: I32af5ce110b0f6359daa5d49a3b787ab518295fa

  1. … 16 more files in changeset.
repository upgrade and explain enhancements. Details below.

-- repository upgrade infrastructure integrated

with 'initialize trafodion, upgrade'

-- fields have been added/removed/modified in repos tables

-- new repos table has been added

-- upgrade enhanced to only upgrade subset of components that

need to be upgraded. One or more of metadata, repos, views, priv.

-- metadata version now tracks release version and indicates

the traf release where upgrade was needed.

Current metadata version will be 1.1.0.

-- packed explain data is now returned to caller so it could be

stored in repository

-- explain info from repository for a query id could be retrieved

and displayed

explain qid <query-id>;

explain options 'f' qid <query-id>

select * from table(explain(null, 'EXPLAIN_QID=<query-id>'));

-- a sql query could be explained and returned in relational format.

select * from table(explain(null, 'EXPLAIN_STMT=<query-str>));

Users only need to have select permission on tables referenced

in query-str.

-- explain and statistics tables could be invoked

invoke table(explain(null, null));

invoke table(statistics(null, null));

Squashed commit of the following:

commit 09004ff7260b771378bc1a29f0ecb82e5e0c6100

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 16 12:24:03 2015 -0700

repos and explain, #12

Change-Id: Ia08151b5c7087b6b7daa5b662df2a584e5a7b2a1

commit 46f05ada264094626e45445d3875ccf876cad802

Merge: 3393587 6b3a4c4

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Sun Mar 15 21:27:48 2015 -0700

Merge remote branch 'gerrit/master' into fix1

commit 3393587de319a3ad43d7d360cd74501e8b8103e4

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Sun Mar 15 21:26:30 2015 -0700

repos and explain, #11

Change-Id: I895a4b4766e0a92b7472a6b347f0bb3ee07fa9b6

commit cf6ffee549b37ffbcd6ce750a1fdbf7f0c410d61

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Sun Mar 15 16:07:22 2015 -0700

repos and explain, #10

Change-Id: I19e182ec6600525cec986f173afaf68a44e04af2

commit e331eec68feea59d9fa11e0154e3093762120eb2

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 18:19:03 2015 -0700

repos and explain, #9

Change-Id: I783c75513a432f718d2cd7c6030d410c419a79c8

commit 9e9da221eb45b760963741657fe9cb910a753c84

Merge: 20b64e5 9eb7f37

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 14:52:09 2015 -0700

Merge remote branch 'gerrit/master' into fix1

Conflicts:

sql/common/ComSmallDefs.h

sql/executor/ExExeUtilCli.h

sql/generator/GenRelExeUtil.cpp

sql/sqlcomp/CmpSeabaseDDLcommon.cpp

sql/sqlcomp/CmpSeabaseDDLupgrade.cpp

sql/sqlcomp/CmpSeabaseDDLupgrade.h

Change-Id: Id21b10ff35ad506f409a83a8d955c2e643a2bcf5

commit 20b64e5552d8f614586ed843aad76d87d3473e2e

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 13:44:15 2015 -0700

repos and explain, #8

Change-Id: I2b49d962f08a22ca6c8b437a583855e2344f513c

commit 500b7a9486f23fff6b1c17bb6e79c2f614a7e3e6

Merge: f2c050e fc1c31d

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 08:07:34 2015 -0700

Merge remote branch 'gerrit/master' into fix1

commit f2c050ea048b409b7d6769172e16c4ab26267bd0

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 08:07:15 2015 -0700

repos and explain, #7

Change-Id: I0bc04b6e8f147af1defbfd023c3700e26cc5b6f1

commit fabdb7dadd34238a8a227e880aa868ce484dc580

Merge: 59bb689 74640d7

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Thu Mar 12 09:01:20 2015 -0700

Merge remote branch 'gerrit/master' into fix1

commit 59bb6893437abe3999a53188503c60cf47633458

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Thu Mar 12 09:00:44 2015 -0700

repos and explain, #7

Change-Id: I6595a04ad59c1f705a04beee18c877cb81db0fac

commit a86353b45436126915fd2ba74c9af53f8bfd4d19

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 9 16:21:16 2015 -0700

repos and explain, #6

Change-Id: Ie4fe2b2bece7de3697eb8ada0d1cb1067b89bb88

commit 61272e0d6107536ab8b691415770e856267b6da5

Merge: 1708cb4 ca5dfb4

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 9 15:52:43 2015 -0700

Merge remote branch 'gerrit/master' into fix1

commit 1708cb44511809bf8e63800d226ac8ea34916726

Merge: 1284f4f 31f25ec

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 6 14:55:13 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit 1284f4f78f1f43657f29899cc8b9a96c02f37815

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 6 14:54:54 2015 -0800

commit: repos and explain, #6

Change-Id: I38d796fd4ab58bf29fa62c4bd5492ce44de9c8a0

commit f639911a1ea4247ae4c297bbbd904b84c7587b3a

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 2 06:31:21 2015 -0800

repos and explain, #5

Change-Id: Ie97ba3cac8dc78784d0a7eef57176e51b9e16224

commit da69fe9595d4ad5c9adf3b32ae8f848a0e6be964

Merge: 9b95b8c a5caeef

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 2 06:31:00 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit 9b95b8c3240a4186704b8b7f2ff8cefebf211732

Merge: ea2f687 8cd8a92

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Thu Feb 26 08:50:25 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit ea2f6871b7c7803d0f1e3fa664b474e069a0befd

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Thu Feb 26 08:50:03 2015 -0800

commit: repos and explain, #5

Change-Id: I44d8f87b3652e6300e41302d3a3e88b9b18652a3

commit 522a47eb991df2245005c693e4d034235adbfdc5

Merge: d986988 f42694e

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Wed Feb 25 13:13:02 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit d98698856a4be479a7603b0fb79210e80e88e1c1

Merge: 5841e47 2aad3ef

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Wed Feb 18 14:00:24 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit 5841e47f1e3f9a018424298a92cd134f4430e6bf

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Wed Feb 18 13:57:46 2015 -0800

repos and explain, #4

Change-Id: I01f53003069598381743c9be80b405ed37cefe6d

commit e7b5966d8ce653a57b8128c0b24c15fc05737668

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Wed Feb 11 11:28:20 2015 -0800

repos and explain, #3

Change-Id: Iff148ef175a7641ea37e27e60435ccf8f613e6db

commit aba98dc8cfda62d4e708cd892ed626f2a3051eb0

Merge: 66f1926 19a0c32

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Tue Feb 10 14:09:58 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit 66f1926d279f2b408d436b0aa53a3a6a5388901e

Merge: 9d0ef2a b68f59b

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Feb 9 10:52:35 2015 -0800

Merge remote branch 'gerrit/master' into fix1

Conflicts:

sql/sqlcomp/CmpSeabaseDDLcommon.cpp

sql/sqlcomp/CmpSeabaseDDLtable.cpp

Change-Id: I6348fec3aece2baad9e9570749f43c086416219e

commit 9d0ef2aa01a62ea3e4e0ab35f8ec9687ef3336ef

Merge: aac268d 2cd8fbb

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Feb 6 14:15:49 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit aac268d097afbe506dcf76de4d8f49652a1d4df7

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Tue Feb 3 16:31:17 2015 -0800

repos and explain, #2

Change-Id: Ia2a50159e577cd6ff8110dca35236fa90ea0d260

commit 2e6160411c8facb448e78d387fa3227e7b759c5d

Merge: c6e2fe5 566706d

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Tue Feb 3 15:21:03 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit c6e2fe529f9d1e59d3eb26cb145b260d81814443

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Tue Feb 3 15:16:36 2015 -0800

repository updated and explain changes, #1

Change-Id: I5daf66a2064b46ad65f1200579385965f7f56808

Change-Id: I0211d754524b8c06a05bb7ade20f884ec91340d2

  1. … 58 more files in changeset.
Trafodion Metadata Cleanup command support.

Various changes to support cleanup command has been added.

A separate external spec contains the details.

Summary of syntax:

cleanup [ table t | index i | sequence s | object o] [, uid <value>]

cleanup [private | shared] schema sch

cleanup uid <value>

cleanup metadata, check, return details

In addition, a new command to get names of various hbase objects

has also been added:

get [ all | user | system | external ] hbase objects;

Change-Id: I93f1f45e7fd78091bacd7c9f166420edd7c1abee

  1. … 79 more files in changeset.
Snapshot Scan changes

The changes in this delivery include:

-decoupling the snapshot scan from the bulk unload feature. Setup of the

temporary space and folders before running the query and cleanup afterwards

used to be done by the bulk unload operator because snapshot scan was specific

to bulk unload. In order the make snapshot scan indepenednt from bulk unload

and use it in any query the setup and cleanup tasks are now done by the query

itself at run time (the scan and root operators).

-caching of the snapshot information in NATable to optimize compilation time

Rework for chaching: when the user sets TRAF_TABLE_SNAPSHOT_SCAN to LATEST

we flush the metadata and then we set the caching back to on so that metadata

get cached again. If newer snapshots are created after setting the cqd they

won't be seen if they are already cached unless the user issue a command/cqd

to invalidate or flush the cache. One way for doing that can be to issue

"cqd TRAF_TABLE_SNAPSHOT_SCAN 'latest';" again

-code cleanup

below is a description of the CQds used with snapshot scan:

TRAF_TABLE_SNAPSHOT_SCAN

this CQD can be set to :

NONE--> (default)Snapshot scan is disabled and regular scan is used ,

SUFFIX --> Snapshot scan is enabled for the bulk unload (bulk unload

behavior is not changed)

LATEST --> Snapshot Scan is enabled independently from bulk unload and

the latest snapshot is used if it exists. If no snapshot exists

the regular scan is used. For this phase of the project the user

needs to create the snapshots using hbase shell or other tools.

And in the next phase of the project new comands to create,

delete and manage snapshots will be add.

TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX

This CQD is used with bulk unload and its value is used to build the

snapshot name as the table name followed by the suffix string

TRAF_TABLE_SNAPSHOT_SCAN_TABLE_SIZE_THRESHOLD

When the estimated table size is below the threshold (in MBs) defined by

this CQD the regular scan is used instead of snapshot scan. This CQD

does not apply to bulk unload which maintains the old behavior

TRAF_TABLE_SNAPSHOT_SCAN_TIMEOUT

The timeout beyond which we give up trying to create the snapshot scanner

TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION

Location for temporary links and files produced by snapshot scan

Change-Id: Ifede88bdf36049bac8452a7522b413fac2205251

  1. … 44 more files in changeset.
OSIM (Optimizer Simulator) redesign 1.

Simulate query plan generation of production cluster on dev workstation,

by collecting information from production cluster, and restore it on dev worksation.

--runnig on production clusters, collect table DDLs, statistics, CQD, to osim-directory,

--the directory path can either full(absolute) or relative.

osim capture location '<osim-directory>'[, force];

--runing queries on cluster

osim capture stop;

--restore DDLs, CQDs, statistics and cluster information.

osim load from '<osim-directory>';

--setup runtime information, like cpu number, node number.

osim simulate start|continue '<osim-directory>';

Change-Id: I30882e87a6ea0f08c9aa64685705eebebcbb3bf0

  1. … 38 more files in changeset.
Merge "Change to avoid placing large scan results in RegionServer cache"

  1. … 9 more files in changeset.
Remove code and cqds related to Thrift interface

ExpHbaseInterface_Thrift class was removed a few months ago. Completing

that cleanup work. exp/Hbase_types.{cpp,h} still remain. These are Thrift

generated files but we use the structs/classes generated for JNI access.

Change-Id: I7bc2ead6cc8d6025fb38f86fbdf7ed452807c445

  1. … 19 more files in changeset.
Change to avoid placing large scan results in RegionServer cache

By default the result of every Scan and Get request to HBase is placed in

the RegionServer cache. When a scan returns a lot of rows this can lead to

cache thrashing, causing results which are being shared by other queries

to be flushed out. This change uses cardinality estimates and hbase row size

estimates along with the configured size of region server cache size to

determine when such thrashing may occur. Heap size for region server is

specified through a cqd HBASE_REGION_SERVER_MAX_HEAP_SIZE. The units are in MB.

The fraction of this heap allocated to block cache is read from the config

file once per session through a lightweight (no disk access) JNI call. The

hueristic used is approximate as it does not consider total number of region

servers or that sometimes a scan may be concentrated in one or a few region

servers. We simply do not place rows in RS cache, if the memory used by all

rows in a scan will exceed the cache in a single RS. Change can be overridden

with cqd HBASE_CACHE_BLOCKS 'ON'. The default is now SYSTEM. Change applies

to both count(*) coproc plans and regular scans.

Change-Id: I0afc8da44df981c1dffa3a77fb3efe2f806c3af1

  1. … 20 more files in changeset.
Bulk unload optimization using snapshot scan

resubmitting after facing git issues

The changes consist of:

*implementing the snapshot scan optimization in the Trafodion scan operator

*changes to the bulk unload changes to use the new snapshot scan.

*Changes to scripts and permissions (using ACLS)

*Rework based on review

Details:

*Snapshot Scan:

----------------------

**Added support for snapshot scan to Trafodion scan

**The scan expects the hbase snapshots themselves to be created before running

the query. When used with bulk unload the snapshots can created by bulk unload

**The snapshot scan implementation can be used without the bulk-unload. To use

the snapshot scan outside bulk-unload we need to use the below cqds

cqd TRAF_TABLE_SNAPSHOT_SCAN 'on'; --

-- the snapshot name will the table name concatenated with the suffix-string

cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'suffix-string';

-- temp dir needed for the hbase snapshotsca

cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/'; n

**snapshot scan can be used with table scan, index scans etc…

*Bulk unload utility :

-------------------------------

**The bulk unload optimization is due the newly added support for snapshot scan.

By default bulk unload uses the regular scan. But when snapshot scan is

specified it will use snapshot scan instead of regular scan

**To use snapshot scan with Bulk unload we need to specify the new options in

the bulk unload syntax : NEW|EXISTING SNAPHOT HAVING SUFFIX QUOTED_STRING

***using NEW in the above syntax means the bulk unload tool will create new

snapshots while using EXISTING means bulk unload expect the snapshot to

exist already.

***The snapshot names are based on the table names in the select statement. The

snapshot name needs to start with table name and have a suffix QUOTED-STRING

***For example for “unload with NEW SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’

select from cat.sch.table1; “ the unload utiliy will create a snapshot

CAT.SCH.TABLE1_SNAP111; and for “unload with EXISTING SNAPSHOT HAVING SUFFIX

‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utility will

expect a snapshot CAT.SCH.TABLE1_SNAP111; to be existing already. Otherwise

an error is produced.

***If this newly added options is not used in the syntax bulk unload will use

the regular scan instead of snapshot scan

**The bulk unload queries the explain plan virtual table to get the list of

Trafodion tables that will be scanned and based on the case it either creates

the snapshots for those tables or verifies if they already exists or not

*Configuration changes

--------------------------------

**Enable ACLs in hdfs

**

*Testing

--------

**All developper regression tests were run and all passed

**bulk unload and snapshot scan were tested on the cluster

*Examples:

**Example of using snapshot scan without bulk unload:

(we need to create the snapshot first )

>>cqd TRAF_TABLE_SNAPSHOT_SCAN 'on';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'SNAP777';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/';

--- SQL operation complete.

>>select [first 5] c1,c2 from tt10;

C1 C2

--------------------- --------------------

.00 0

.01 1

.02 2

.03 3

.04 4

--- 5 row(s) selected.

**Example of using snapshot scan with unload:

UNLOAD

WITH PURGEDATA FROM TARGET

NEW SNAPSHOT HAVING SUFFIX 'SNAP778'

INTO '/bulkload/unload_TT14_3' select * from seabase.TT20 ;

Change-Id: Idb1d1807850787c6717ab0aa604dfc9a37f43dce

  1. … 35 more files in changeset.
LP and other fixes. Details below.

-- LP 1408504: Any sql operation done when trafodion is uninitialized or

needs to be upgraded will return error. Until now, some commands

(like get schemas, invoke) were not returning an error.

-- LP 1408506: metadata upgrade was not handling repository tables

and was failing. That has been fixed.

-- index related commands (create, populate) now run in multiple phases.

Metadata update within a xn and row population without a xn.

-- common methods have been added that can be called to begin and

end transactions.

-- HYBRID_QUERY_CACHE cqd is now off by default

Change-Id: I99ef2548998b1a6830d4332db09080df5bcfc1c1

  1. … 9 more files in changeset.
ANSI Schema changes

ANSI Schema

Implements the changes to support ANSI schemas. For more information

see the blueprint at:

https://blueprints.launchpad.net/trafodion/+spec/security-ansi-schemas

The syntax changes for REGISTER USER and CREATE ROLE were not

implemented in this delivery.

NOTE: This code was reviewed internally prior to merging with the

main branch.

Change-Id: I1c7937dbcd067e792dcacb65f12c43e4f84a25ad

Change-Id: I98395eeef1e8bde424d9e83f96928358f0b1991b

  1. … 75 more files in changeset.
Various changes, details listed below.

-- fixed error msg 1429 text

-- added code to set objectUID & owner for metadata, histogram and

sequence tables during creation of metadata structs for these objects.

-- removed previously added code in binder that computed objectUID

for sequence.

-- Updated method lookupObjectUid to call an existing method

to get objectuid.

-- removed obsolete code for reorg, replicate and load

Change-Id: I60d161cfa72bcc674dc6c64e3a07237c7522ee6c

  1. … 28 more files in changeset.
Fixes and removal of obsolete code.

-- LP 1400556 'get tables in schema' is not supported on external

hbase tables. An error is now returned.

-- LP 1400553 Insert into external hbase tables in _ROW_ format must use

column_create function and VALUES clause to create rows.

An error is returned otherwise.

-- a bug with that prevented a boundary case when sequence increment value

was one less than largeint max has been fixed.

-- error message to indicate what options can be used during alter sequence

has been updated

-- create table as select stmt now returns an error if running within a user

transaction. This is the same behavior as other DDL operations.

This will be

removed once we have transaction support for DDL stmts.

-- create table as select now uses non-transactional 'upsert using load' to

populate target table instead of transactional 'insert...select' stmt.

-- hive/test020 has been enabled. This tests for access to ORC files.

-- obsolete sidetree insert and NVT user load code has been removed.

Change-Id: I14d321deaa52321777acd1d8ca55420f1e973367

  1. … 31 more files in changeset.
Merge "Authorization checks for DDL & utilities"

  1. … 1 more file in changeset.
Authorization checks for DDL & utilities

Fixed issues from code comments.

LOAD/UNLOAD authorization checks:

Code was added during code generation to make sure user has privileges,

if the user had necessary privileges, then the EXEUTIL parser flag is

turned on to avoid further privilege checks. When load/unload

completes, the parser flag is reset.

Update/showstats Statistics authorization checks:

Added a new error message

Changed hs_globals to support a new isAuthorized method and store

parser flags when class is instantiated and reset them when done

Changed hs_cli.cpp to use new IF NOT EXISTS syntax when creating

histogram tables, make owner of histogram tables DB__ROOT

(will need to adjust when schema privileges happen), and clean up

CreateHistTables method to remove old authorization mechanism

Changed hs_update.cpp which controls the update and showstats operation

to add authorization checks

Purgedata and populate index changes:

Changed CmpSeabaseDDLcommon.cpp to check privileges for purgedata

Changed CmpSeabaseDDLindex.cpp to check privileges for popindex

Additional component privileges and checks:

Added support for new component privileges in PrivMgrMD.h/.cpp

Added support for MANAGE_COMPONENTS

Added support for CREATE_INDEX and DROP_INDEX component privs

Fixes from last delivery that were postponed:

Context.cpp - fix for previous code review

CmpSeabaseDDLtable - added calls to deallocEHI

PrivMgrMD - fixed wording in a comment

Miscellaneous changes:

ComUser - added new convenience method - isRootUserID()

NATable.cpp (setupPrivInfo) to always set up privInfo_ and to call

the embedded compiler while extracting privileges

Privilege adjustments to take advantage of privInfo stored in NATable:

Added code to mark and rewind errors in diags.

Fix for LP bug 1392895

Change-Id: I6f7245ae7e66086769c0e92d901399c99e8f2af3

  1. … 33 more files in changeset.
Fix performance regression due to QI for DDL

This check-in omits object UIDs from query plans for the tables

SB_HISTOGRAMS and SB_HISTOGRAMS_INTERVALS. Previously, when the

code generator tried to add the object UIDs for these, it had to

make a special query to the metadata, since the corresponding

internal cached structure omitted object UIDs when they were

created via methods like Generator::createVirtualTableDesc. The

special query to lookup these object UIDs was shown to be

responsible for a large pathlengh regression.

Change-Id: Id5046c5c55a4fc8dd2ba3f891449ea87d35a5534

Closes-Bug: #1398600

  1. … 7 more files in changeset.
Merge "removal of obsolete code that is no longer valid and unused in Trafodion"

  1. … 9 more files in changeset.