Also, this change fixes the following LaunchPad bugs:
Bug 1388458 insert using primary key default value into a salted table asserts in generator
Bug 1385543 salt clause on a table with large number of primary key columns returns error
Bug 1392450 Internal error 2005 when querying a Hive table with an unsupported data type
In addition, it changes the following behavior:
- The _SALT_ column now gets added as the last column in the CREATE TABLE statement, rather than the first column after SYSKEY. The position of _SALT_ in the clustering key does not change. This will cause some differences in INVOKE and in the column number assigned to columns.
- For CREATE TABLE LIKE, the defaults of the WITH clauses are changing. CREATE TABLE LIKE now copies constraints, SALT and DIVISION clauses by default. The WITH CONSTRAINTS clause is now the default and should no longer be used. Instead, WITHOUT CONSTRAINTS, WITHOUT SALT and WITHOUT DIVISIONING clauses are supported.
- For CREATE INDEX ... SALT LIKE TABLE, we now give a warning instead of an error if the table is not salted.
- Also added an optimization for BETWEEN predicates. If part or all of them can be converted to an equals predicate, we do this now. Example: (a,b,c,d) between (1,2,3,4) and (1,2,5,6) is transformed into a=1 and b=2 and (c,d) between (3,4) and (5,6).
More detailed description of changes:
- arkcmp/CmoStoredProc.cpp sqlcat/desc.h + other files
Using the new FLAGS column in the COLUMNS metadata table to store whether a column is a salt or divisioning column. Note that since there may be existing salted tables without this flag set, the flag is so far only reliable for divisioning columns.
- comexe/ComTdb.h comexe/*.h generator/Generator.cpp sqlcomp/CmpSeabaseDDLmd.h: Changed the column class field in struct ComTdbVirtTableColumnInfo from a string to the corresponding enum. Sorry, this caused lots of small changes (deleting "_LIT" from the initializers). Also added the column flags.
- executor/hiveHook.cpp: Added a check for partitioned tables (having multiple SDs). This is part of the fix for bug 1353632.
- GenRelUpdate.cpp: When generating the key encoding expression for an insert inside a MERGE operation, we assumed the new record expression was in the order of the key columns. Added a step to sort by key column, so we can pass the expression in any order.
- optimizer/ItemExpr.cpp optimizer/ItemNAType.h: Added a named NATypeToItem item expression. This is used to do a primitive "bind" operation of an item expression when processing a DDL statement. Specifically, to bind the DIVISION BY clause in a CREATE TABLE statement.
- optimizer/ItemFunc.h optimizer/SynthType.cpp: The DDL time "binder" gets expressions as they come out of the parser, e.g. a ZZZBinderFunction. Need to add type synthesis for some cases of the ZZZBinderFunction.
- optimizer/NATable.cpp Removing some dead code. Adding an error message when we encounter a Hive column type we can't handle yet. Bug 1392450.
- optimizer/TableDesc.* Method TableDesc::validateDivisionByClauseForDDL() got moved to CmpSeabaseDDL::validateDivisionByExprForDDL().
- optimizer/NormItemExpr.cpp BETWEEN transformation described above.
- optimizer/ValueDesc.cpp Avoid hard-codeing the "_SALT_" name and adding a comment about possibility to use the flag in the future.
- parser Lots of small changes for salt and divisioning option changes. Simplifying the syntax for salt options somewhat. I think the older syntax was so complex because it needed to record the starting and ending position of the divisioning clause, something we don't need anymore.
- regress: Adding new test
- sqlcomp/CmpDescribe.cpp: Support for describing DIVISION BY clause and also supporting the new WITHOUT SALT | DIVISION options for CREATE TABLE LIKE, which relies on the describe feature.
- sqlcomp/CmpSeabaseDDLcommon.cpp sqlcomp/CmpSeabaseDDL.h + Handling the new column flags and making sure they are not confused with the HBase column flags (e.g. for serialization). + Setting the new COLUMNS.FLAGS when writing metadata. + Also, writing the computed column text to the TEXT table. + For DROP TABLE, unconditionally deleting TEXT rows, since the table could contain computed columns. + When building ColInfoArray, check system column flags, since system columns can now appear at any position. + Add method to "bind" an item expression during DDL processing without going through the full binder. This replaces any column reference with a named NATypeToItem node, since all we really need is the type and the name for unparsing. + Method TableDesc::validateDivisionByClauseForDDL() got moved to CmpSeabaseDDL::validateDivisionByExprForDDL() with some minor adjustments, since it used to be called on a bound ItemExpr, now it gets called on something that came out of the parser and went through the DDL time "binder".
- sqlcomp/CmpSeabaseDDLindex.cpp: Support for CREATE INDEX ... DIVISION LIKE TABLE. If this is set, add the division columns in front of the index key, otherwise don't.
- sqlcomp/CmpSeabaseDDLtable.cpp: + Code to make sure column flags and column class is set and propagated. + Fix for bug 1385543: Now that we use the TEXT table for computed column text, we no longer have a length limit. This is true for both divisioning and salt expressions. + When processing the column list in seabaseCreateTable() we have a bit of a chicken and egg problem: We need the column list to validate the DIVISION BY expressions, but the DIVISION BY columns need to be part of the column list. So, we do this a first time without divisioning columns, then we add those, and produce the final list in a second iteration. + getTextFromMD method now takes a sub-id as an input parameter. That's the column number for computed column text. + read computed column text from the TEXT table. Note: This also needs to handle older tables where the computed column text is stored in the default value.
fix cardinality errors for better DoP for OE Problems fixed are as follows. 1. TableAnalysis::getLocalPredsOnPrefixOfList() prematurely breaks out of the loop to collect predicates on key columns. 2. TableDesc::getSaltColumnAsSet() does not return the result in the form of base columns valueIds. 3. When the dop is estimated lower than the low bound of ESPs, the low bound value is used. 4. SimpleFileScanOptimizer::computeSingleSubsetSize() incorrectly applies the computed predicates on salted column which reduces the subset size by CQD HIST_NO_STATS_UEC (default to 2). A subset size specifies the # of rows processed.