Clone
Hans Zeller <hans.zeller@hp.com>
committed
on 26 Nov 14
Support for divisioning (multi-temperature data)
This is the initial support for divisioning. See
blueprint cmp-divisioning for more informa… Show more
Support for divisioning (multi-temperature data)

This is the initial support for divisioning. See

blueprint cmp-divisioning for more information:

https://blueprints.launchpad.net/trafodion/+spec/cmp-divisioning

Also, this change fixes the following LaunchPad bugs:

Bug 1388458 insert using primary key default value into a salted

           table asserts in generator

Bug 1385543 salt clause on a table with large number of primary

           key columns returns error

Bug 1392450 Internal error 2005 when querying a Hive table with

           an unsupported data type

In addition, it changes the following behavior:

- The _SALT_ column now gets added as the last column in the

 CREATE TABLE statement, rather than the first column after

 SYSKEY. The position of _SALT_ in the clustering key does

 not change. This will cause some differences in INVOKE and

 in the column number assigned to columns.

- For CREATE TABLE LIKE, the defaults of the WITH clauses

 are changing. CREATE TABLE LIKE now copies constraints,

 SALT and DIVISION clauses by default. The WITH CONSTRAINTS

 clause is now the default and should no longer be used.

 Instead, WITHOUT CONSTRAINTS, WITHOUT SALT and WITHOUT

 DIVISIONING clauses are supported.

- For CREATE INDEX ... SALT LIKE TABLE, we now give a

 warning instead of an error if the table is not salted.

- Also added an optimization for BETWEEN predicates. If

 part or all of them can be converted to an equals predicate,

 we do this now. Example:

 (a,b,c,d) between (1,2,3,4) and (1,2,5,6)

 is transformed into

 a=1 and b=2 and (c,d) between (3,4) and (5,6).

More detailed description of changes:

- arkcmp/CmoStoredProc.cpp

 sqlcat/desc.h

 + other files

 Using the new FLAGS column in the COLUMNS metadata table to store

 whether a column is a salt or divisioning column. Note that since

 there may be existing salted tables without this flag set, the flag

 is so far only reliable for divisioning columns.

- comexe/ComTdb.h

 comexe/*.h

 generator/Generator.cpp

 sqlcomp/CmpSeabaseDDLmd.h:

 Changed the column class field in struct

 ComTdbVirtTableColumnInfo from a string to the corresponding

 enum. Sorry, this caused lots of small changes (deleting "_LIT"

 from the initializers). Also added the column flags.

- executor/hiveHook.cpp: Added a check for partitioned tables

 (having multiple SDs). This is part of the fix for

 bug 1353632.

- GenRelUpdate.cpp: When generating the key encoding expression

 for an insert inside a MERGE operation, we assumed the new

 record expression was in the order of the key columns. Added

 a step to sort by key column, so we can pass the expression

 in any order.

- optimizer/ItemExpr.cpp

 optimizer/ItemNAType.h:

 Added a named NATypeToItem item expression.

 This is used to do a primitive "bind" operation of an item expression

 when processing a DDL statement. Specifically, to bind the DIVISION BY

 clause in a CREATE TABLE statement.

- optimizer/ItemFunc.h

 optimizer/SynthType.cpp: The DDL time "binder" gets expressions as

 they come out of the parser, e.g. a ZZZBinderFunction. Need to add

 type synthesis for some cases of the ZZZBinderFunction.

- optimizer/NATable.cpp

 Removing some dead code. Adding an error message when we encounter

 a Hive column type we can't handle yet. Bug 1392450.

- optimizer/TableDesc.*

 Method TableDesc::validateDivisionByClauseForDDL() got moved

 to CmpSeabaseDDL::validateDivisionByExprForDDL().

- optimizer/NormItemExpr.cpp

 BETWEEN transformation described above.

- optimizer/ValueDesc.cpp

 Avoid hard-codeing the "_SALT_" name and adding a comment about

 possibility to use the flag in the future.

- parser

 Lots of small changes for salt and divisioning option changes.

 Simplifying the syntax for salt options somewhat. I think the older

 syntax was so complex because it needed to record the starting and

 ending position of the divisioning clause, something we don't need

 anymore.

- regress: Adding new test

- sqlcomp/CmpDescribe.cpp: Support for describing DIVISION BY clause

 and also supporting the new WITHOUT SALT | DIVISION options

 for CREATE TABLE LIKE, which relies on the describe feature.

- sqlcomp/CmpSeabaseDDLcommon.cpp

 sqlcomp/CmpSeabaseDDL.h

 + Handling the new column flags and making sure they are not

   confused with the HBase column flags (e.g. for serialization).

 + Setting the new COLUMNS.FLAGS when writing metadata.

 + Also, writing the computed column text to the TEXT table.

 + For DROP TABLE, unconditionally deleting TEXT rows, since the

   table could contain computed columns.

 + When building ColInfoArray, check system column flags, since

   system columns can now appear at any position.

 + Add method to "bind" an item expression during DDL processing

   without going through the full binder. This replaces any column

   reference with a named NATypeToItem node, since all we really

   need is the type and the name for unparsing.

 + Method TableDesc::validateDivisionByClauseForDDL() got moved

   to CmpSeabaseDDL::validateDivisionByExprForDDL() with some minor

   adjustments, since it used to be called on a bound ItemExpr, now

   it gets called on something that came out of the parser and went

   through the DDL time "binder".

- sqlcomp/CmpSeabaseDDLindex.cpp:

 Support for CREATE INDEX ... DIVISION LIKE TABLE. If this is

 set, add the division columns in front of the index key, otherwise

 don't.

- sqlcomp/CmpSeabaseDDLtable.cpp:

 + Code to make sure column flags and column class is set and propagated.

 + Fix for bug 1385543: Now that we use the TEXT table for computed

   column text, we no longer have a length limit. This is true for both

   divisioning and salt expressions.

 + When processing the column list in seabaseCreateTable() we have a

   bit of a chicken and egg problem: We need the column list to validate

   the DIVISION BY expressions, but the DIVISION BY columns need to be part

   of the column list. So, we do this a first time without divisioning

   columns, then we add those, and produce the final list in a second

   iteration.

 + getTextFromMD method now takes a sub-id as an input parameter. That's

   the column number for computed column text.

 + read computed column text from the TEXT table. Note: This also needs

   to handle older tables where the computed column text is stored in

   the default value.

Change-Id: I7c3ebe39a950c1d01f31855bdc92cbb98e5eb275

Show less

default + 10 more