DefaultConstants.h

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Merge branch 'master' of https://github.com/trafodion/core

Conflicts:

sqf/sqenvcom.sh

sql/nskgmake/Makerules.linux

sql/nskgmake/qms/Makefile

sql/nskgmake/sqlci/Makefile

Change-Id: I2589648e978c247c96f6c914e689010916b04037

  1. … 6 more files in changeset.
initial support for returning multiple versions and column timestamps

This feature is not yet externalized.

Support added to:

-- return multiple versions of rows

-- select * from table {versions N | MAX | ALL}

-- get hbase timestamp of a column

-- select hbase_timestamp(col) from t;

-- get version number of a column.

-- select hbase_version(col) from t

Change-Id: I37921681fc606a22c19d2c0cb87a35dee5491e1e

  1. … 48 more files in changeset.
Fixes from review to sqvers

commit 04f3812f112a5629a563f02d7e72c5fa503c6a8d

Author: Sandhya Sundaresan <sandhya.sundaresan@hp.com>

Date: Sun Jun 14 04:23:21 2015 +0000

Preliminary checkin of lob support for external files. Inserts from http

files, hdfs files and lob local files are supported. Added support for

new extract synttax. Extract from lob

columns to hdfs files has been added . More work needed to support

binary files and very large files . Current limit is 1G.

Also fixed some error handling issues

Fixed some substring warning issues in the lobtostring/stringtolob

functions.

Added references and interfaces to curl library that is needed to read external http

files.

More work needed before this support can be used

Change-Id: Ieacaa3e4b7fa2a040764888c90ef0b029f107a8b

Change-Id: Ife3caf13041f8106a999d06808b69e5b6a348a6b

  1. … 29 more files in changeset.
Migrate from log4cpp to log4cxx

This change is a wholesale removal of log4cpp from source tree.

log4cxx is an external library installed via RPM, or user build

to default /usr/lib64 and /usr/include directories. Some of the

QRLogger and CommonLogger code was changed to use the new log4cxx

APIs.

Change-Id: I248bac0a8ffbfea6cbc1ba847867b30638892eae

  1. … 208 more files in changeset.
Code change for ESP colocation strategy described in LP 1464306

Change-Id: I838bc57541bf753e3bad90ae7776e870958c930a

  1. … 14 more files in changeset.
Costing and statistics compiler interfaces for UDFs

blueprint cmp-tmudf-compile-time-interface

bug 1433192

This change adds compiler interfaces for UDFs that give information

about statistics of the result table and also a cost estimate. It also

has more code for the upcoming Java UDF feature, retrieving updated

invocation infos and returning them back to the executor/compiler C++

code.

Description of the changes in more detail:

- Addressed remaining review comments from my last checkin,

https://review.trafodion.org/1655

- Make sure that user-generated exceptions during deallocation of

a routine are reported. These happens in the destructor of the

object derived from tmudr::UDR. For Java, we may need a deallocate

method.

- Java and JNI code to serialize the updated UDRInvocationInfo and

UDRPlanInfo object after calling the user code and return them back

through the JNI interface to the calling C++ code.

- The cost method source files had some inline methods defined in

the .cpp file and used an include file that included other .cpp

files. Make didn't pick up changes made in these files. Removed

this code and changed it to regular methods and inlines.

- Replaced some Context * parameters in costing with PlanWorkSpace *,

to be able to get to UDF-related info that's stored in a special

PlanWorkSpace.

- Changed the behavior or isBigMemoryOperator() for TMUDFs. If the

UDF writer specifies the DoP for the UDF invocation, then consider

it a BMO.

- If possible, synthesize the HASH2 partitioning function of a TMUDF's

child as the partitioning function of the UDF. This can be done if

the partitioning key gets passed through the UDF.

- Statistics interface for TMUDFs:

- TMUDF now populates statistics field in the UDRInvocationInfo

object and calls the describeStatistics() method.

- Added an estimated # of partitions for partitioned input tables

of TMUDFs. Also changed row count methods to "estimated" row count.

- Added code to incorporate the information on row count and UEC

provided by the UDF writer into statistics of the TMUDF. This code

is not that suitable for coding it as the default implementation

of describeStatistics(). Therefore, the default implementation of

describeStatistics() does nothing, but the compiler applies some

heuristics in case the UDF writer provides no statistics.

- Changed cost method for TMUDFs to incorporate an estimated cost

per row from the UDF writer. There is no special compiler interface

call to ask for the cost, it can be set from the

describeDesiredDegreeOfParallelism() call and, once supported, from

the describePlanProperties() call. Note that we don't have immediate

plans to support describePlanProperties(), that might come after 2.0.

Patch Set 3: Addressed Dave's review comments.

Patch Set 4: Fixed misplaced copyright in expected file.

Change-Id: Ia9ae076b7ae1fc2968c3d253d6d2d0e1d9a2ea40

  1. … 45 more files in changeset.
various fixes and enhancements, details below.

-- improved DDL performance by not invalidating internal create/alter

operations.

-- added an optimization during CREATE INDEX to not go through

'upsert using load' processing if source table is empty.

-- added support for ISO datetime format (2015-06-01T07:35:20Z)

-- added support for RESET option to ALTER SEQUENCE and IDENTITY.

This will reset generated seq num to the START VALUE.

-- added support for cqd TRAF_STRING_AUTO_TRUNCATE.

If set, strings will be automatically truncated during insert/update.

-- fixed sqlci to pass in correct varchar param len indicator (2 or 4 bytes).

-- changed sizeof(short) to correct vcindlen (2 or 4 bytes)

-- removed some NA_SHADOWCALLS defines

Change-Id: Ie6715435d9c210ae6c2db4ff6bc0545c1b196979

  1. … 39 more files in changeset.
Merge "Avoid scanner timeout for Update Statistics"

  1. … 3 more files in changeset.
Avoid scanner timeout for Update Statistics

For performance reasons, Update Stats pushes sampling down into HBase,

using a filter that returns only randomly selected rows. When the

sampling rate is very low, as is the case when the default sampling

protocol (which includes a sample limit of a million rows) is used on

a very large table, a long time can be taken in the region server

before returning to Trafodion, with the resultant risk of an

OutOfOrderScannerNextException. To avoid these timeouts, this fix

reduces the scanner cache size (the number of rows accumulated before

returning) used by a given scan based on the sampling rate. If an

adequate return time can not be achieved in this manner without

going below the scanner cache minimum prescribed by the

HBASE_NUM_CACHE_ROWS_MIN cqd, then the scanner cache reduction is

complemented by a modification of the sampling rate used in HBase.

The sampling rate used in HBase is increased, but the overall rate

is maintained by doing supplementary sampling of the returned rows in

Trafodion. For example, if the original sampling rate is .000001,

and reducing the scanner cache to the minimum still results in an

excessive average time spent in the region server, the sampling

may be split into a .00001 rate in HBase and a .01 rate in Trafodion,

resulting in the same effective .000001 overall rate.

Change-Id: Id05ab5063c2c119c21b5c6c002ba9554501bb4e1

Closes-Bug: #1391271

  1. … 6 more files in changeset.
Configuring hbase option MAX_VERSION via SQL

Change-Id: I88041d539b24de1289c15654151f5320b67eb289

  1. … 11 more files in changeset.
Merge "Enabling Bulk load and Hive Scan error logging/skip feature"

  1. … 6 more files in changeset.
various lp and other fixes, details below.

-- added support for self referencing constraints

-- limit clause can now be specified as a param

(select * from t limit ?)

-- lp 1448261. alter table add identity col is not allowed and now

returns an error

-- error is returned if a specified constraint in an alter/create statement

exists on any table

-- lp 1447343. cannot have more than one identity columns.

-- embedded compiler is now used to get priv info during invoke/showddl.

-- auth info is is not reread if already initialized

-- sequence value function is now cacheable

-- lp 1448257. inserts in volatile table with identity column now work

-- lp 1447346. inserts with identity col default now work if inserted

in a salted table.

-- only one compiler is now needed to process ddl operations with or

without authorization enabled

-- query cache in embedded compiler is now cleared if user id changes

-- pre-created default schema 'SEABASE' can no longer be dropped

-- default schema 'SCH' is automatically created if running regressions

and it doesn't exist.

-- improvements in regressions run.

-- regressions run no longer call a script from another sqlci session

to init auth, create default schema

and insert into defaults table before every regr script

-- switched the order of regression runs

-- updates from review comments.

Change-Id: Ifb96d9c45b7ef60c67aedbeefd40889fb902a131

  1. … 69 more files in changeset.
Enabling Bulk load and Hive Scan error logging/skip feature

Also Fixed the hanging issue with Hive scan (ExHdfsScan operator) when there

is an error in data conversion.

ExHbaseAccessBulkLoadPrepSQTcb was not releasing all the resources when there

is an error or when the last buffer had some rows.

Error logging/skip feature can be enabled in

hive scan using CQDs and in bulk load using the command line options.

For Hive Scan

CQD TRAF_LOAD_CONTINUE_ON_ERROR ‘ON’ to skip errors

CQD TRAF_LOAD_LOG_ERROR_ROWS ‘ON’ to log the error rows in Hdfs files.

For Bulk load

LOAD WITH CONTINUE ON ERROR [TO <location>] – to skip error rows

LOAD WITH LOG ERROR ROWS – to log the error rows in hdfs files.

The default parent error logging directory in hdfs is /bulkload/logs. The error

rows are logged in subdirectory ERR_<date>_<time>. A separate hdfs file is

created for every process/operator involved in the bulk load in this directory.

Error rows in hive scan are logged in

<sourceHiveTableName>_hive_scan_err_<inst_id>

Error rows in bulk upsert are logged in

<destTrafTableName>_traf_upsert_err_<inst_id>

Bulk load can also aborted after a certain number of error rows are seen using

LOAD WITH LOG ERROR ROWS, STOP AFTER <n> ERROR ROWS option

Change-Id: Ief44ebb9ff74b0cef2587705158094165fca07d3

  1. … 33 more files in changeset.
get indexLevel and blockSize from Hbase metadata to use in costing code.

Change-Id: I7b30364ec83a763d3391ddc39e12adec2ca1bd00

  1. … 9 more files in changeset.
Remove some dead code

Remove dead code concerned with constraint and schema labels.

This is an anachronism from pre-open-source versions of the code.

Most of the code removed is in the compiler, with a small amount

of cli and executor code removed.

Change-Id: Ic8a833bb15d1ca9a0e2e2683f2d4644b44c4f96b

  1. … 13 more files in changeset.
Changes to enable Rowset select - Fix for bug 1423327

HBase always returns an empty result set when the row is not found. Trafodion

is changed to exploit this concept to project no data in a rowset select.

Now optimizer has been enabled to choose a plan involving Rowset Select

where ever possible. This can result in plan changes for the queries -

nested join plan instead of hash join,

vsbb delete instead of delete,

vsbb insert instead of regular insert.

A new CQD HBASE_ROWSET_VSBB_SIZE is now added to control the hbase rowset size.

The default values is 1000

Change-Id: Id76c2e6abe01f2d1a7b6387f917825cac2004081

  1. … 19 more files in changeset.
additional changes to support ALIGNED row format.

This feature is not externalized yet.

Change-Id: Idbf19022916d437bb7bb69019194de5057cbcb65

  1. … 21 more files in changeset.
DDL Transactions, sql cqd, end to end create test

Change-Id: I4122b2f7d0aea13c61fdf5b19349d88a00569c51

  1. … 9 more files in changeset.
perf enhancement for ddl operations.

DDL operations where objects (tables, views) had large number

of columns were running slow due to single row inserts into

metadata COLUMNS table. This showed up during DSM repository

creation which had 150 tables and 500 views.

Changes done:

-- added code to do rowwise rowsets

-- enhanced metadata COLUMNS inserts to use rwrs upserts

-- changed metadata calls to use upsert instead of insert

-- fixed a cleanup bug

Change-Id: I07b619598e05eab80ec965ac0194614b73ecde57

  1. … 12 more files in changeset.
New ustat algorithm and bulk load integration

This is the initial phase of the Update Statistics change to use

counting Bloom filters and a Hive backing sample table created

during bulk load and amended whenever a new HFile for a table

is created. All changes are currently disabled pending some

needed fixes and testing.

blueprint ustat-bulk-load

Change-Id: I32af5ce110b0f6359daa5d49a3b787ab518295fa

  1. … 16 more files in changeset.
repository upgrade and explain enhancements. Details below.

-- repository upgrade infrastructure integrated

with 'initialize trafodion, upgrade'

-- fields have been added/removed/modified in repos tables

-- new repos table has been added

-- upgrade enhanced to only upgrade subset of components that

need to be upgraded. One or more of metadata, repos, views, priv.

-- metadata version now tracks release version and indicates

the traf release where upgrade was needed.

Current metadata version will be 1.1.0.

-- packed explain data is now returned to caller so it could be

stored in repository

-- explain info from repository for a query id could be retrieved

and displayed

explain qid <query-id>;

explain options 'f' qid <query-id>

select * from table(explain(null, 'EXPLAIN_QID=<query-id>'));

-- a sql query could be explained and returned in relational format.

select * from table(explain(null, 'EXPLAIN_STMT=<query-str>));

Users only need to have select permission on tables referenced

in query-str.

-- explain and statistics tables could be invoked

invoke table(explain(null, null));

invoke table(statistics(null, null));

Squashed commit of the following:

commit 09004ff7260b771378bc1a29f0ecb82e5e0c6100

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 16 12:24:03 2015 -0700

repos and explain, #12

Change-Id: Ia08151b5c7087b6b7daa5b662df2a584e5a7b2a1

commit 46f05ada264094626e45445d3875ccf876cad802

Merge: 3393587 6b3a4c4

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Sun Mar 15 21:27:48 2015 -0700

Merge remote branch 'gerrit/master' into fix1

commit 3393587de319a3ad43d7d360cd74501e8b8103e4

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Sun Mar 15 21:26:30 2015 -0700

repos and explain, #11

Change-Id: I895a4b4766e0a92b7472a6b347f0bb3ee07fa9b6

commit cf6ffee549b37ffbcd6ce750a1fdbf7f0c410d61

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Sun Mar 15 16:07:22 2015 -0700

repos and explain, #10

Change-Id: I19e182ec6600525cec986f173afaf68a44e04af2

commit e331eec68feea59d9fa11e0154e3093762120eb2

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 18:19:03 2015 -0700

repos and explain, #9

Change-Id: I783c75513a432f718d2cd7c6030d410c419a79c8

commit 9e9da221eb45b760963741657fe9cb910a753c84

Merge: 20b64e5 9eb7f37

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 14:52:09 2015 -0700

Merge remote branch 'gerrit/master' into fix1

Conflicts:

sql/common/ComSmallDefs.h

sql/executor/ExExeUtilCli.h

sql/generator/GenRelExeUtil.cpp

sql/sqlcomp/CmpSeabaseDDLcommon.cpp

sql/sqlcomp/CmpSeabaseDDLupgrade.cpp

sql/sqlcomp/CmpSeabaseDDLupgrade.h

Change-Id: Id21b10ff35ad506f409a83a8d955c2e643a2bcf5

commit 20b64e5552d8f614586ed843aad76d87d3473e2e

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 13:44:15 2015 -0700

repos and explain, #8

Change-Id: I2b49d962f08a22ca6c8b437a583855e2344f513c

commit 500b7a9486f23fff6b1c17bb6e79c2f614a7e3e6

Merge: f2c050e fc1c31d

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 08:07:34 2015 -0700

Merge remote branch 'gerrit/master' into fix1

commit f2c050ea048b409b7d6769172e16c4ab26267bd0

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 13 08:07:15 2015 -0700

repos and explain, #7

Change-Id: I0bc04b6e8f147af1defbfd023c3700e26cc5b6f1

commit fabdb7dadd34238a8a227e880aa868ce484dc580

Merge: 59bb689 74640d7

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Thu Mar 12 09:01:20 2015 -0700

Merge remote branch 'gerrit/master' into fix1

commit 59bb6893437abe3999a53188503c60cf47633458

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Thu Mar 12 09:00:44 2015 -0700

repos and explain, #7

Change-Id: I6595a04ad59c1f705a04beee18c877cb81db0fac

commit a86353b45436126915fd2ba74c9af53f8bfd4d19

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 9 16:21:16 2015 -0700

repos and explain, #6

Change-Id: Ie4fe2b2bece7de3697eb8ada0d1cb1067b89bb88

commit 61272e0d6107536ab8b691415770e856267b6da5

Merge: 1708cb4 ca5dfb4

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 9 15:52:43 2015 -0700

Merge remote branch 'gerrit/master' into fix1

commit 1708cb44511809bf8e63800d226ac8ea34916726

Merge: 1284f4f 31f25ec

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 6 14:55:13 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit 1284f4f78f1f43657f29899cc8b9a96c02f37815

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Mar 6 14:54:54 2015 -0800

commit: repos and explain, #6

Change-Id: I38d796fd4ab58bf29fa62c4bd5492ce44de9c8a0

commit f639911a1ea4247ae4c297bbbd904b84c7587b3a

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 2 06:31:21 2015 -0800

repos and explain, #5

Change-Id: Ie97ba3cac8dc78784d0a7eef57176e51b9e16224

commit da69fe9595d4ad5c9adf3b32ae8f848a0e6be964

Merge: 9b95b8c a5caeef

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Mar 2 06:31:00 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit 9b95b8c3240a4186704b8b7f2ff8cefebf211732

Merge: ea2f687 8cd8a92

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Thu Feb 26 08:50:25 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit ea2f6871b7c7803d0f1e3fa664b474e069a0befd

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Thu Feb 26 08:50:03 2015 -0800

commit: repos and explain, #5

Change-Id: I44d8f87b3652e6300e41302d3a3e88b9b18652a3

commit 522a47eb991df2245005c693e4d034235adbfdc5

Merge: d986988 f42694e

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Wed Feb 25 13:13:02 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit d98698856a4be479a7603b0fb79210e80e88e1c1

Merge: 5841e47 2aad3ef

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Wed Feb 18 14:00:24 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit 5841e47f1e3f9a018424298a92cd134f4430e6bf

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Wed Feb 18 13:57:46 2015 -0800

repos and explain, #4

Change-Id: I01f53003069598381743c9be80b405ed37cefe6d

commit e7b5966d8ce653a57b8128c0b24c15fc05737668

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Wed Feb 11 11:28:20 2015 -0800

repos and explain, #3

Change-Id: Iff148ef175a7641ea37e27e60435ccf8f613e6db

commit aba98dc8cfda62d4e708cd892ed626f2a3051eb0

Merge: 66f1926 19a0c32

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Tue Feb 10 14:09:58 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit 66f1926d279f2b408d436b0aa53a3a6a5388901e

Merge: 9d0ef2a b68f59b

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Mon Feb 9 10:52:35 2015 -0800

Merge remote branch 'gerrit/master' into fix1

Conflicts:

sql/sqlcomp/CmpSeabaseDDLcommon.cpp

sql/sqlcomp/CmpSeabaseDDLtable.cpp

Change-Id: I6348fec3aece2baad9e9570749f43c086416219e

commit 9d0ef2aa01a62ea3e4e0ab35f8ec9687ef3336ef

Merge: aac268d 2cd8fbb

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Fri Feb 6 14:15:49 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit aac268d097afbe506dcf76de4d8f49652a1d4df7

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Tue Feb 3 16:31:17 2015 -0800

repos and explain, #2

Change-Id: Ia2a50159e577cd6ff8110dca35236fa90ea0d260

commit 2e6160411c8facb448e78d387fa3227e7b759c5d

Merge: c6e2fe5 566706d

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Tue Feb 3 15:21:03 2015 -0800

Merge remote branch 'gerrit/master' into fix1

commit c6e2fe529f9d1e59d3eb26cb145b260d81814443

Author: Anoop Sharma <anoop.sharma@hp.com>

Date: Tue Feb 3 15:16:36 2015 -0800

repository updated and explain changes, #1

Change-Id: I5daf66a2064b46ad65f1200579385965f7f56808

Change-Id: I0211d754524b8c06a05bb7ade20f884ec91340d2

  1. … 58 more files in changeset.
Add new PCode Expression Cache feature.

This new cache is maintained by the SQL Compiler. The purpose of this

cache is to avoid the fairly expensive logic involved in transforming

unoptimized PCode to optimized PCode and, where applicable, to also

avoid

the logic involved in transforming optimized PCode to a Native

Expression. This cache is accessed ONLY by the SQL Compiler code.

NOTES:

* This is second attempt to check in this code. First attempt had to

be abandoned as other developers made changes which prevented

automatic merging.

* This code has been pre-reviewed by Justin, Qifan, Selva, Mike,

Ravisha, Suresh, and Dave B. Many thanks to them for various

suggestions. Most of those suggestions have been incorporated into

this delivery. A few are left for future improvements.

* There is one instance of this new cache per CmpContext.

* There are 5 new CQDs used to control this cache. To be effective for

all instances of the cache, these need to be set in the system

defaults table. The CQD command given to sqlci will affect only the

instance of the cache for the current CmpContext.

The 5 CQDs are:

PCODE_EXPR_CACHE_ENABLED - set to 0 to disable the cache. Default is 1

PCODE_EXPR_CACHE_SIZE - max size in bytes. Default is 2,000,000.

PCODE_EXPR_CACHE_CMP_ONLY - Compare Only mode - useful to QA and

Development only.

PCODE_EXPR_CACHE_DEBUG - set to 1 to enable debug mode. Default is 0

PCODE_DEBUG_LOGDIR - pathname of existing directory where debug log

files will be placed -- one log file per cache

instance. Log files are designed to be easily

imported into an Excel Spreadsheet. No default.

* Also included are a small number of changes to the Native Expressions

feature to (a) Use the new PCODE_DEBUG_LOGDIR cqd to specify where to

put the Native Expressions debug log files, (b) measure cpu-time

rather than wall-clock time for measuring how long it took to produce

a Native Expression, and (c) add a CQD named PCODE_NE_ENABLED so we

can easily disable the Native Expressions feature [though there is

currently no known reason for doing so.]

Change-Id: I58f833f63099743ff6c1107acdff94fe8aef4b70

  1. … 14 more files in changeset.
Snapshot Scan changes

The changes in this delivery include:

-decoupling the snapshot scan from the bulk unload feature. Setup of the

temporary space and folders before running the query and cleanup afterwards

used to be done by the bulk unload operator because snapshot scan was specific

to bulk unload. In order the make snapshot scan indepenednt from bulk unload

and use it in any query the setup and cleanup tasks are now done by the query

itself at run time (the scan and root operators).

-caching of the snapshot information in NATable to optimize compilation time

Rework for chaching: when the user sets TRAF_TABLE_SNAPSHOT_SCAN to LATEST

we flush the metadata and then we set the caching back to on so that metadata

get cached again. If newer snapshots are created after setting the cqd they

won't be seen if they are already cached unless the user issue a command/cqd

to invalidate or flush the cache. One way for doing that can be to issue

"cqd TRAF_TABLE_SNAPSHOT_SCAN 'latest';" again

-code cleanup

below is a description of the CQds used with snapshot scan:

TRAF_TABLE_SNAPSHOT_SCAN

this CQD can be set to :

NONE--> (default)Snapshot scan is disabled and regular scan is used ,

SUFFIX --> Snapshot scan is enabled for the bulk unload (bulk unload

behavior is not changed)

LATEST --> Snapshot Scan is enabled independently from bulk unload and

the latest snapshot is used if it exists. If no snapshot exists

the regular scan is used. For this phase of the project the user

needs to create the snapshots using hbase shell or other tools.

And in the next phase of the project new comands to create,

delete and manage snapshots will be add.

TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX

This CQD is used with bulk unload and its value is used to build the

snapshot name as the table name followed by the suffix string

TRAF_TABLE_SNAPSHOT_SCAN_TABLE_SIZE_THRESHOLD

When the estimated table size is below the threshold (in MBs) defined by

this CQD the regular scan is used instead of snapshot scan. This CQD

does not apply to bulk unload which maintains the old behavior

TRAF_TABLE_SNAPSHOT_SCAN_TIMEOUT

The timeout beyond which we give up trying to create the snapshot scanner

TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION

Location for temporary links and files produced by snapshot scan

Change-Id: Ifede88bdf36049bac8452a7522b413fac2205251

  1. … 44 more files in changeset.
Enable authorization by default for regress, plus

Patch 1:

Added TEST138 to catman1 - skipped files

Fixed wording in the traf_authentication_setup script from reviewer comments.

Original delivery:

change 1 - Enable authorization during development regression tests

change 2 – Added support for create schema IF NOT EXISTS and drop schema IF EXISTS

change 3 - Changed traf_authentication_setup script to support a new installation option

change 1 - Enable authorization during development regression tests

Authorization will be enabled during regressions runs

Since regressions run mostly as DB__ROOT, there should be few visible differences.

Developers may see GRANT statements displayed as part of SHOWDDL requests.

This can be controlled by a new CQD:SHOWDDL_DISPLAY_PRIVILEGE_GRANTS

SHOWDDL_DISPLAY_PRIVILEGE_GRANTS

ON - display GRANTS if authorization is enabled

OFF - do not display GRANTS

SYSTEM

if running with SQLMX_REGRESS set, do not display grants

otherwise, display grants

regress/tools/init_sb_regr_sql -- execute initialize authorization

regress/tools/runregr_catman1.ksh -- turn on TEST138

regress/catman1 -- various test and expected files to set the new SHOWDDL CQD

"Initialize authorization, drop;" can be performed to disable authorization

files:

sql/regress/catman1/EXPECTED135

sql/regress/catman1/EXPECTED137

sql/regress/catman1/EXPECTED138

sql/regress/catman1/TEST133

sql/regress/catman1/TEST135

sql/regress/catman1/TEST136

sql/regress/catman1/TEST137

sql/regress/catman1/TEST138

sql/regress/catman1/TEST139

sql/regress/tools/init_sb_regr.sql

sql/regress/tools/runregr_catman1.ksh

sql/sqlcomp/CmpDescribe.cpp

sql/sqlcomp/CmpSeabaseDDLauth.cpp

sql/sqlcomp/DefaultConstants.h

sql/sqlcomp/nadefaults.cpp

change 2: Added support for create schema IF NOT EXISTS and drop schema IF EXISTS

Added support for new schema syntax. Change update stats for HIVE tables to use this syntax

files:

sql/parser/StmtDDLCreate.cpp

sql/parser/StmtDDLCreateSchema.h

sql/parser/StmtDDLDrop.cpp

sql/parser/StmtDDLDropSchema.h

sql/parser/sqlparser.y

sql/sqlcomp/CmpSeabaseDDL.h

sql/sqlcomp/CmpSeabaseDDLcommon.cpp

sql/sqlcomp/CmpSeabaseDDLschema.cpp

sql/ustat/hs_globals.cpp

change 3: Changed traf_authentication_setup script

This file was changed to support a new option "--setup" that only enables authentication

This will be used by the installation script when the customer chooses not to

initialize trafodion.

sqf/sql/scripts/traf_authentication_setup

traf_authentication_setup --help

This script enables or disables security features for Trafodion

Usage: traf_authentication_setup [options]

Options:

--file <loc> Optional location of the OpenLDAP configuration file

--help Prints this message

--off Disables authentication and authorization

--on Enables authentication and authorization

--setup Enables authentication

--status Returns status of authentication enablement

Change-Id: Ia9a66364a6d74955a0833088874e0aaca044eae3

  1. … 24 more files in changeset.
C++ run-time interface for TMUDFs

blueprint cmp-tmudf-compile-time-interface

- Support for C++ run-time interface:

- A new language, C++ is added to langman, the existing

LanguageManagerC handles both C and C++

- Two new parameter styles got added, C++ and Java

object-oriented parameter styles. Routines written in C++

use the new object-oriented C++ parameter style. The compiler

interface is only supported for that style (and in the future

for the Java object-oriented style).

- Also added one more compile time interface, the "completeDescription()"

call in the generator. Added logic to extract the UDRPlanInfo of

the optimal plan.

- Changes to UDRInvocationInfo and UDRPlanInfo classes:

- UDRInvocationInfo and UDRPlanInfo objects can now be serialized

and they are added to generated plans, as part of the UDR TDB.

- Split TableInfo into TupleInfo and TableInfo classes. TupleInfo

is now the common base class for describing both parameters and

input/output tables.

- TypeInfo now has offsets for data, null indicator and varchar

indicator.

- New get<type> and set<type> methods on class TupleInfo, to be

used at compile time for parameters and at runtime for parameters,

input and output tables.

- Added a "call phase" member, to be able to throw exceptions when

certain methods are called at the wrong time (e.g. trying to modify

compile time members at runtime).

- Routine class in langman now has a new subclass, LmRoutineCppObj

and a new method, invokeRoutineMethod, that is used to invoke

the object-oriented methods, requiring UDRInvocationInfo and

UDRPlanInfo as parameters.

- Fixed some executor issues with error handling for UDFs, this is

still not very well supported

- Emitting the EOD row in the UDF is no longer required, and no longer

supported or even possible.

- UDRPlanInfo is now part of the physical properties, so that we

can extract it from the optimal plan.

- Disabling TMUDF as the inner of a nested join - for now.

We might support this "routine join" at a later time.

- regress/udr/TEST001:

- SESSIONIZE_STATIC remains in C, but other TMUDFs are now

rewritten in C++ (the runtime part that was not yet in C++)

- SESSIONIZE_DYNAMIC is now the same as the example on the wiki

- regress/udr/TEST002: Added some tests for event log reader UDF,

but can't add the part that copies a sample log file, since

in Jenkins, we don't have $MY_SQROOT set. Tried the test on my

workstation, though. Steve tells me $MY_SQROOT should be available,

so in a future checkin I'll enable this code again.

- For patch set 2: Removed fix for LP bug 1420539 and addressed

other review comments.

Change-Id: I008ad68a8f25f1aaee94e1c45bbf097a267129bb

  1. … 73 more files in changeset.
Merge "Change to avoid placing large scan results in RegionServer cache"

  1. … 9 more files in changeset.
Remove code and cqds related to Thrift interface

ExpHbaseInterface_Thrift class was removed a few months ago. Completing

that cleanup work. exp/Hbase_types.{cpp,h} still remain. These are Thrift

generated files but we use the structs/classes generated for JNI access.

Change-Id: I7bc2ead6cc8d6025fb38f86fbdf7ed452807c445

  1. … 19 more files in changeset.
Change to avoid placing large scan results in RegionServer cache

By default the result of every Scan and Get request to HBase is placed in

the RegionServer cache. When a scan returns a lot of rows this can lead to

cache thrashing, causing results which are being shared by other queries

to be flushed out. This change uses cardinality estimates and hbase row size

estimates along with the configured size of region server cache size to

determine when such thrashing may occur. Heap size for region server is

specified through a cqd HBASE_REGION_SERVER_MAX_HEAP_SIZE. The units are in MB.

The fraction of this heap allocated to block cache is read from the config

file once per session through a lightweight (no disk access) JNI call. The

hueristic used is approximate as it does not consider total number of region

servers or that sometimes a scan may be concentrated in one or a few region

servers. We simply do not place rows in RS cache, if the memory used by all

rows in a scan will exceed the cache in a single RS. Change can be overridden

with cqd HBASE_CACHE_BLOCKS 'ON'. The default is now SYSTEM. Change applies

to both count(*) coproc plans and regular scans.

Change-Id: I0afc8da44df981c1dffa3a77fb3efe2f806c3af1

  1. … 20 more files in changeset.
Bulk unload optimization using snapshot scan

resubmitting after facing git issues

The changes consist of:

*implementing the snapshot scan optimization in the Trafodion scan operator

*changes to the bulk unload changes to use the new snapshot scan.

*Changes to scripts and permissions (using ACLS)

*Rework based on review

Details:

*Snapshot Scan:

----------------------

**Added support for snapshot scan to Trafodion scan

**The scan expects the hbase snapshots themselves to be created before running

the query. When used with bulk unload the snapshots can created by bulk unload

**The snapshot scan implementation can be used without the bulk-unload. To use

the snapshot scan outside bulk-unload we need to use the below cqds

cqd TRAF_TABLE_SNAPSHOT_SCAN 'on'; --

-- the snapshot name will the table name concatenated with the suffix-string

cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'suffix-string';

-- temp dir needed for the hbase snapshotsca

cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/'; n

**snapshot scan can be used with table scan, index scans etc…

*Bulk unload utility :

-------------------------------

**The bulk unload optimization is due the newly added support for snapshot scan.

By default bulk unload uses the regular scan. But when snapshot scan is

specified it will use snapshot scan instead of regular scan

**To use snapshot scan with Bulk unload we need to specify the new options in

the bulk unload syntax : NEW|EXISTING SNAPHOT HAVING SUFFIX QUOTED_STRING

***using NEW in the above syntax means the bulk unload tool will create new

snapshots while using EXISTING means bulk unload expect the snapshot to

exist already.

***The snapshot names are based on the table names in the select statement. The

snapshot name needs to start with table name and have a suffix QUOTED-STRING

***For example for “unload with NEW SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’

select from cat.sch.table1; “ the unload utiliy will create a snapshot

CAT.SCH.TABLE1_SNAP111; and for “unload with EXISTING SNAPSHOT HAVING SUFFIX

‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utility will

expect a snapshot CAT.SCH.TABLE1_SNAP111; to be existing already. Otherwise

an error is produced.

***If this newly added options is not used in the syntax bulk unload will use

the regular scan instead of snapshot scan

**The bulk unload queries the explain plan virtual table to get the list of

Trafodion tables that will be scanned and based on the case it either creates

the snapshots for those tables or verifies if they already exists or not

*Configuration changes

--------------------------------

**Enable ACLs in hdfs

**

*Testing

--------

**All developper regression tests were run and all passed

**bulk unload and snapshot scan were tested on the cluster

*Examples:

**Example of using snapshot scan without bulk unload:

(we need to create the snapshot first )

>>cqd TRAF_TABLE_SNAPSHOT_SCAN 'on';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'SNAP777';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/';

--- SQL operation complete.

>>select [first 5] c1,c2 from tt10;

C1 C2

--------------------- --------------------

.00 0

.01 1

.02 2

.03 3

.04 4

--- 5 row(s) selected.

**Example of using snapshot scan with unload:

UNLOAD

WITH PURGEDATA FROM TARGET

NEW SNAPSHOT HAVING SUFFIX 'SNAP778'

INTO '/bulkload/unload_TT14_3' select * from seabase.TT20 ;

Change-Id: Idb1d1807850787c6717ab0aa604dfc9a37f43dce

  1. … 35 more files in changeset.
port skew buster to Trafodion

1. add simplified TEST062

2. reuse cached partitioning expression when only doVarCharCast

is the same as the when the expression is created in

TableHashPartitioningFunction::createPartitioningExpressionImp().

3. rework

4. fix a time monitor bug reporting incorrect processor time (CPU time

computed from clock() calls)

5. comment out the assert on the size of NATable cache not decreasing.

This is to fix regression failure with seabase/TEST020. Selva will check

in a formal fix that allocates space for NATable objects from a single

heap, and then reenables this assert check.

Change-Id: I9eeee4f36ba8678e90e0ac68a85bfc733599d932

  1. … 26 more files in changeset.