various lp and other fixes, details below.

-- added support for self referencing constraints

-- limit clause can now be specified as a param

(select * from t limit ?)

-- lp 1448261. alter table add identity col is not allowed and now

returns an error

-- error is returned if a specified constraint in an alter/create statement

exists on any table

-- lp 1447343. cannot have more than one identity columns.

-- embedded compiler is now used to get priv info during invoke/showddl.

-- auth info is is not reread if already initialized

-- sequence value function is now cacheable

-- lp 1448257. inserts in volatile table with identity column now work

-- lp 1447346. inserts with identity col default now work if inserted

in a salted table.

-- only one compiler is now needed to process ddl operations with or

without authorization enabled

-- query cache in embedded compiler is now cleared if user id changes

-- pre-created default schema 'SEABASE' can no longer be dropped

-- default schema 'SCH' is automatically created if running regressions

and it doesn't exist.

-- improvements in regressions run.

-- regressions run no longer call a script from another sqlci session

to init auth, create default schema

and insert into defaults table before every regr script

-- switched the order of regression runs

-- updates from review comments.

Change-Id: Ifb96d9c45b7ef60c67aedbeefd40889fb902a131

Phase 2 for log reader TMUDF

blueprint cmp-tmudf-compile-time-interface

Log reader TMUDF is mostly working now.

Still need to set cqd NUM_PARALLEL_ESPS '<num of nodes>' on clusters.

Still needs more work and more testing.

Still seeing some issues with non-ASCII characters.

// SQL Syntax to invoke this function:


// select * from udf(event_log_reader( [options] ));


// The optional [options] argument is a character constant. The

// following options are supported:

// f: add file name output columns (see below)

// t: turn on tracing

// d: loop in the runtime code, to be able to attach a debugger

// (debug build only)

// p: force parallel execution on workstation environment with

// virtual nodes (debug build only)

More detailed explanation of changes:

- PredefUdrReadfile.cpp: Work on event log reader TMUDF

- sqludr.*: New method to add formal parameters, allows TMUDF to

accept optional parameters.

- OptPhysRelExpr.cpp:

Made some changes for TMUDFs with arity 0 to avoid asserts

and to be able to call okToAttemptESPParallelism in method

RelExpr::synthPhysicalProperty(). This is needed for leaf

operators (arity 0) that want to initiate parallel execution

and TMUDFs seem to be first in that situation.

Changed TableMappingUDF::synthPhysicalProperty to generate

a partitioning function with multiple partitions (and no

partitioning key, so far) if required.

- ExUdr.cpp,








Addressed review comments from last phase, got rid of ALLOW_UDF CQD

- Rel*.h


OptPhysRelExpr.cpp (has other changes as well)

Simple but messy change to add one more parameter to


Change-Id: I5549e47c0f019beefd4ec1695ae7abf8c3bd43e3

Computed column key predicates for MDAM

Moving the generation of computed column predicates out of the

SearchKey logic and making it available as a static method on

class ScanKey. This allows us to compute these predicates before

we create the Disjuncts data structure that is used in a file

scan, where it will go into a SearchKey or an MdamKey.

Also fixing a bug that stopped after the first predicate found

on a computed column, so it failed to produce both a begin and

and end key value when selecting a range of values

(removed a "break" in ScanKey::createComputedColumnPredicates)

Change set 2: Addressed reviewer comments. Moved computation of

computed preds to Scan::addIndexInfo and ValueIdSet

that stores these preds from FileScan to Scan.

Change-Id: I4297d789ded8522eb67d5441ac281657ff90e774

Support for divisioning (multi-temperature data)

This is the initial support for divisioning. See

blueprint cmp-divisioning for more information:

Also, this change fixes the following LaunchPad bugs:

Bug 1388458 insert using primary key default value into a salted

table asserts in generator

Bug 1385543 salt clause on a table with large number of primary

key columns returns error

Bug 1392450 Internal error 2005 when querying a Hive table with

an unsupported data type

In addition, it changes the following behavior:

- The _SALT_ column now gets added as the last column in the

CREATE TABLE statement, rather than the first column after

SYSKEY. The position of _SALT_ in the clustering key does

not change. This will cause some differences in INVOKE and

in the column number assigned to columns.

- For CREATE TABLE LIKE, the defaults of the WITH clauses

are changing. CREATE TABLE LIKE now copies constraints,

SALT and DIVISION clauses by default. The WITH CONSTRAINTS

clause is now the default and should no longer be used.


DIVISIONING clauses are supported.

- For CREATE INDEX ... SALT LIKE TABLE, we now give a

warning instead of an error if the table is not salted.

- Also added an optimization for BETWEEN predicates. If

part or all of them can be converted to an equals predicate,

we do this now. Example:

(a,b,c,d) between (1,2,3,4) and (1,2,5,6)

is transformed into

a=1 and b=2 and (c,d) between (3,4) and (5,6).

More detailed description of changes:

- arkcmp/CmoStoredProc.cpp


+ other files

Using the new FLAGS column in the COLUMNS metadata table to store

whether a column is a salt or divisioning column. Note that since

there may be existing salted tables without this flag set, the flag

is so far only reliable for divisioning columns.

- comexe/ComTdb.h




Changed the column class field in struct

ComTdbVirtTableColumnInfo from a string to the corresponding

enum. Sorry, this caused lots of small changes (deleting "_LIT"

from the initializers). Also added the column flags.

- executor/hiveHook.cpp: Added a check for partitioned tables

(having multiple SDs). This is part of the fix for

bug 1353632.

- GenRelUpdate.cpp: When generating the key encoding expression

for an insert inside a MERGE operation, we assumed the new

record expression was in the order of the key columns. Added

a step to sort by key column, so we can pass the expression

in any order.

- optimizer/ItemExpr.cpp


Added a named NATypeToItem item expression.

This is used to do a primitive "bind" operation of an item expression

when processing a DDL statement. Specifically, to bind the DIVISION BY

clause in a CREATE TABLE statement.

- optimizer/ItemFunc.h

optimizer/SynthType.cpp: The DDL time "binder" gets expressions as

they come out of the parser, e.g. a ZZZBinderFunction. Need to add

type synthesis for some cases of the ZZZBinderFunction.

- optimizer/NATable.cpp

Removing some dead code. Adding an error message when we encounter

a Hive column type we can't handle yet. Bug 1392450.

- optimizer/TableDesc.*

Method TableDesc::validateDivisionByClauseForDDL() got moved

to CmpSeabaseDDL::validateDivisionByExprForDDL().

- optimizer/NormItemExpr.cpp

BETWEEN transformation described above.

- optimizer/ValueDesc.cpp

Avoid hard-codeing the "_SALT_" name and adding a comment about

possibility to use the flag in the future.

- parser

Lots of small changes for salt and divisioning option changes.

Simplifying the syntax for salt options somewhat. I think the older

syntax was so complex because it needed to record the starting and

ending position of the divisioning clause, something we don't need


- regress: Adding new test

- sqlcomp/CmpDescribe.cpp: Support for describing DIVISION BY clause

and also supporting the new WITHOUT SALT | DIVISION options

for CREATE TABLE LIKE, which relies on the describe feature.

- sqlcomp/CmpSeabaseDDLcommon.cpp


+ Handling the new column flags and making sure they are not

confused with the HBase column flags (e.g. for serialization).

+ Setting the new COLUMNS.FLAGS when writing metadata.

+ Also, writing the computed column text to the TEXT table.

+ For DROP TABLE, unconditionally deleting TEXT rows, since the

table could contain computed columns.

+ When building ColInfoArray, check system column flags, since

system columns can now appear at any position.

+ Add method to "bind" an item expression during DDL processing

without going through the full binder. This replaces any column

reference with a named NATypeToItem node, since all we really

need is the type and the name for unparsing.

+ Method TableDesc::validateDivisionByClauseForDDL() got moved

to CmpSeabaseDDL::validateDivisionByExprForDDL() with some minor

adjustments, since it used to be called on a bound ItemExpr, now

it gets called on something that came out of the parser and went

through the DDL time "binder".

- sqlcomp/CmpSeabaseDDLindex.cpp:

Support for CREATE INDEX ... DIVISION LIKE TABLE. If this is

set, add the division columns in front of the index key, otherwise


- sqlcomp/CmpSeabaseDDLtable.cpp:

+ Code to make sure column flags and column class is set and propagated.

+ Fix for bug 1385543: Now that we use the TEXT table for computed

column text, we no longer have a length limit. This is true for both

divisioning and salt expressions.

+ When processing the column list in seabaseCreateTable() we have a

bit of a chicken and egg problem: We need the column list to validate

the DIVISION BY expressions, but the DIVISION BY columns need to be part

of the column list. So, we do this a first time without divisioning

columns, then we add those, and produce the final list in a second


+ getTextFromMD method now takes a sub-id as an input parameter. That's

the column number for computed column text.

+ read computed column text from the TEXT table. Note: This also needs

to handle older tables where the computed column text is stored in

the default value.

Change-Id: I7c3ebe39a950c1d01f31855bdc92cbb98e5eb275

Index-join scan trimming heuristics rework II

Change-Id: I9cf63be7967a012559e27dc4ca950bb28b8ccd3b

Changes to support OSS poc.

This checkin contains multiple changes that were added to support OSS poc.

These changes are enabled through a special cqd mode_special_4 and not

yet externalized for general use.

A separate spec contains details of these changes.

These changes have been contributed and pre-reviewed by Suresh, Jim C,

Ravisha, Mike H, Selva and Khaled.

All dev regressions have been run and passed.

Change-Id: I2281c1b4ce7e7e6a251bbea3bf6dc391168f3ca3

Bug 1343615: Duplicated rows for parallel scan on salted table

- In preCodeGen, add partitioning key predicates to scan node

if it uses a single subset key and HASH2 partitioning function

- Handle partitioning key preds in FileScan::preCodeGen, move

code from HbaseAccess::preCodeGen.

- Make a special partitioning key predicate for salted tables

with a HASH2 function: "_SALT_" between :pivLo and :pivHi

This will lead to an efficient access path for the ESP to read

only the data it is supposed to read.

- Salted tables have a HASH2 partitioning key that does not

include the "_SALT_" column. So, the partitioning key is

not a prefix of the clustering key. However, we need to apply

the partitioning key predicates to the clustering key of the

table, since that's the only key we have. This is different

from a "partition access" node. See

TableHashPartitioningFunction::createSearchKey() in file


- Moved a method to create a partitioning key predicate of

the form <some expr> between :pivLo and :pivHi up in the

class hierarchy, to be able to use it in both

HashPartitioningFunction and TableHashPartitioningFunction

Also added support for the KeyPrefixRegionSplitPolicy. This might

be useful in the future when we push things like GROUP BY into the

region servers. It can ensure that keys with the same prefix of a

given length stay in the same region. Example for a table with

this split policy:

-- make sure all the line items for an order (first 4 bytes of the key)

-- stay within the same region

create table lineitems(orderno int not null,

lineno int not null,

comment char(10),

primary key (orderno, lineno))

hbase_options ( SPLIT_POLICY = 'org.apache.hadoop.hbase.regionserver.KeyPrefixRegionSplitPolicy',


Removed ENCODE_ON_DISK table property because it is deprecated and does nothing.

Patch set 2: Changed comment in ExpHbaseDefs.h.

Change-Id: I57fafe2f854475261313abcf5bd2c81013f43756

Launchpad fixes_2

-- hbase access errors were not being returned during process bringup phase.

arkcmp/CmpContext.*, bin/SqlciErrors.txt

sqlcomp/CmpSeabaseDDLcommon.cpp, CmpSeabaseDDLtable.cpp, nadefaults.cpp

launchpad #1343061

-- showddl now shows hbase options, salted partitions.

-- hbase_options and salt info is kept in metadata.

comexe/ComTdb.h, optimizer/NATable.cpp, NATable.h


sqlcat/desc.h, sqlcomp/CmpDescribe.cpp, CmpSeabaseDDLcommon.cpp

CmpSeabaseDDL.h, CmpSeabaseDDLTable.cpp

launchpad #1342465

launchpad #1342996

-- added support for update where current of cursor.

generator/GenRelScan.cpp, GenRelUpdate.cpp,

launchpad #1324679

-- check constrs in update and views

generator/GenRelUpdate.cpp, optimizer/BindRelExpr.cpp

Launchpad #1320034, #1321479

-- long column name returned assertion.


launchpad #1324689

-- purgedata does not recreate with the original salt and hbase options.

optimizer/RelExeUtil.cpp, sqlcomp/CmpSeabaseDDLcommon.cpp

launchpad #1322400

-- long keys got assertion failure:


launchpad #1332607

-- cqd hide_indexes was causing constraint creation to fail.


launchpad #1340385

-- current_timestamp/time functions are now non-nullable.

Change-Id: Ib3a071894d11d0e3719b98f0cfddfc5ce8624519

Merge "Bug 1315567 bug in salted tables with descending VARCHAR columns"

Bug 1315567 bug in salted tables with descending VARCHAR columns

This fixes several issues related to VARCHARs, UCS2 and UTF8 chars and

varchars, when used in the SALT clause. Most of the issues only

occur when using a DESCending key column, since that involves non-ASCII

characters. Summary of changes:

- The logic involves creating string literals for min/max values of

columns. Create those literals as UTF8 character strings, so that we

can represent characters of non-ASCII or ISO character sets.

- When generating the max value for a UTF8 column, generate valid UTF8

characters (0xFF does not occur in valid UTF8 and causes an error when

converting it while it's used in expressions).

- Change the lexter to use something other than the max UCS2 character

0xFFFF as the EOF constant. Using a non-allowed UCS2 character.

- Fix some issues with the DECODE function when it operates on varchars.

Pass a separate pointer to the varchar length field of the result, like

it is done for other expressions. Note, this is not done for the operand.

More detailed description of changes for reviewers:


- The min/maxRepresentableValue method now generates a string literal

that should be ready to feed to the parser. This string literal is

always in UTF-8 and it uses a charset prefix to indicate the actual

type's charset. Example: _UCS2'abc'. The old code returned the actual

string (no quotes) in the type's charset. That value is still available.

- When creating the max representable value for UTF8, generate valid

UTF8 characters, not 0xFF bytes. This allows us to feed the max value

back into the parser.

- New virtual method to create an equivalent char type from a varchar


- in decodeKeyValue, pass in a separate pointer to the varchar length field,

like it is done for other function evaluators. This is for the result

varchar length field (the source is really a string of bytes).

- removed an unused function to avoid having to change it


- Now that NAType::min/maxRepresentableValue() methods return a literal

that can be parsed, there is no more need to create a parsable string


- When creating SQL expressions, make sure they are created in UTF8, to

be able to use non-ISO88591 characters.


- Min key values with zeroes in them didn't get copied, including length

fields that were multiples of 256.


- When converting a binary region boundary value into text, use UTF-8

as the target charset


- Handle case where text boundary values are not specified and make

most conservative assumption in that case


- This caller of NAType::mim/maxRepresentableValue needed to be changed

for the new behavior of these methods (use the string buffer in the

type's charset instead of the UTF-8 string literal)


- When parsing max values for UCS2 character, the lexer saw 0xFFFF characters

and interpreted those as EOF. Changed the EOF character to some value

that is not valid UCS2, according to the standard.

7/20: Rework for comments from Qifan and Dave.

Change-Id: If4aa698393d0f204c839efe40087a1696069d277

Enables HASH2 partitioning representation of salted tables + 2 reworks.

Change-Id: Id2c32fd99a18dea0c4384e49356e67502eaf4127

Closes-Bug: #1336450

Various Launchpad and other fixes.

-- metadata and statistics tables will no longer be created with

serialization attribute even if cqd hbase_serialization is set to ON.

also, reenabled HBASE_SERIALIZATION for regressions run

(sqlcomp/CmpSeabaseDDLcommon.cpp, regress/tools/sbdefs)

-- rowwise hbase rows from native hbase tables are now being created correctly

in all cases.


exp/exp_function.*, optimizer/BindItemExpr.cpp, ItemExpr.cpp

-- IUD and SELECT execution state is now being correctly initialized at the

beginning of a run. Multiple executions were failing otherwise.

(executor/ExHbaseIUD.cpp, ExHbaseSelect.cpp)

-- sign is now allowed in an interval literal

(generator/GenItemFunc.cpp, GenRelScan.cpp, ItemFunc.h, ValueDesc.cpp)

-- location value being returned from updates was not being set correctly in

some cases. That has been fixed


-- self referencing updates were not returning the right values due to

halloween issue. It is fixed by transforming it to insert/delete.


-- purgedata now returns an error if issued on hbase, hive, neoview tables,

or on a view.

(optimizer/RelExeUtil.cpp, sqlcomp/CmpSeabaseDDLcommon.cpp)

-- referencing and referenced columns in a foreign key are now enforced

to have the same datatype attributes


-- drop schema now works with delimited schema names

-- in some cases, a create constraint failure was not dropping the table on

which the constraint was being created.

that has been fixed.


-- some additional infra changes for Traf as a mysql storage engine.

(cli/*, executor/ExExeUtilCli.*)

Change-Id: I94d5eb13c826efdf44ba10c04ac52a671f86553e

Source file cleanup

Change-Id: Iba4e0059e8a0f996ea18f8089a87338bfe1930a9

