Fix for bug 1442932 and bug 1442966, encoding for varchar Submitting this before finishing regressions on workstation, in the interest of time.
Key encodings for VARCHAR values used to put a varchar length indicator in front of the encoded value. The value was the max. length of the varchar and the indicator was 2 or 4 bytes long, depending on the length of the indicator in the source field. That length used to depend only on the max number of bytes in the field, for >32767 bytes we would use a 4 byte VC length indicator.
Now, with the introduction of long rows, the varchar indicator length for varchars in aligned rows is always 4 bytes, regardless of the character length. This causes a problem for the key encoding.
We could have computed the encoded VC indicator length from the field length. Anoop suggested a better solution, not to include the VC indicator at all, since that is unnecessary. Note that for HBase row keys stored on disk, we already remove the VC indicator by converting such keys from varchar to fixed char. Therefore, the issue happens only for encoding needed in a query, for example when sorting or in a merge join or union.
Description of the fix:
1. Change CompEncode::synthType not to include the VC length indicator in the encoded buffer. This change also includes some minor code clean-up.
2. Change the assert in CompEncode::codeGen not to include the VC indicator length anymore.
3. Changes in ex_function_encode::encodeKeyValue(): a) Read 2 and 4 byte VC length indicators for VARCHAR/NVARCHAR. b) Small code cleanup, don't copy buffer for case-insensitive encode, since that is not necessary. c) Don't write max length as VC length indicator into target and adjust target offsets accordingly (for VARCHAR/NVARCHAR).
4. Other changes in sql/exp/exp_function.cpp: d) Handle 2 and 4 byte VC len indicators in hash function and Hive hash function (problems unrelated to LP bugs fixed). e) Add some asserts for cases where we assume VC length indicator is a 2 byte integer.
CompDecode is not yet changed. Filed bug 1444134 to do that for the next release, since that change is less urgent.
Patch set 2: Copyright notice changes only. Patch set 3: Updated expected regression test file that prints out encoded key in hex.
Also, this change fixes the following LaunchPad bugs:
Bug 1388458 insert using primary key default value into a salted table asserts in generator
Bug 1385543 salt clause on a table with large number of primary key columns returns error
Bug 1392450 Internal error 2005 when querying a Hive table with an unsupported data type
In addition, it changes the following behavior:
- The _SALT_ column now gets added as the last column in the CREATE TABLE statement, rather than the first column after SYSKEY. The position of _SALT_ in the clustering key does not change. This will cause some differences in INVOKE and in the column number assigned to columns.
- For CREATE TABLE LIKE, the defaults of the WITH clauses are changing. CREATE TABLE LIKE now copies constraints, SALT and DIVISION clauses by default. The WITH CONSTRAINTS clause is now the default and should no longer be used. Instead, WITHOUT CONSTRAINTS, WITHOUT SALT and WITHOUT DIVISIONING clauses are supported.
- For CREATE INDEX ... SALT LIKE TABLE, we now give a warning instead of an error if the table is not salted.
- Also added an optimization for BETWEEN predicates. If part or all of them can be converted to an equals predicate, we do this now. Example: (a,b,c,d) between (1,2,3,4) and (1,2,5,6) is transformed into a=1 and b=2 and (c,d) between (3,4) and (5,6).
More detailed description of changes:
- arkcmp/CmoStoredProc.cpp sqlcat/desc.h + other files
Using the new FLAGS column in the COLUMNS metadata table to store whether a column is a salt or divisioning column. Note that since there may be existing salted tables without this flag set, the flag is so far only reliable for divisioning columns.
- comexe/ComTdb.h comexe/*.h generator/Generator.cpp sqlcomp/CmpSeabaseDDLmd.h: Changed the column class field in struct ComTdbVirtTableColumnInfo from a string to the corresponding enum. Sorry, this caused lots of small changes (deleting "_LIT" from the initializers). Also added the column flags.
- executor/hiveHook.cpp: Added a check for partitioned tables (having multiple SDs). This is part of the fix for bug 1353632.
- GenRelUpdate.cpp: When generating the key encoding expression for an insert inside a MERGE operation, we assumed the new record expression was in the order of the key columns. Added a step to sort by key column, so we can pass the expression in any order.
- optimizer/ItemExpr.cpp optimizer/ItemNAType.h: Added a named NATypeToItem item expression. This is used to do a primitive "bind" operation of an item expression when processing a DDL statement. Specifically, to bind the DIVISION BY clause in a CREATE TABLE statement.
- optimizer/ItemFunc.h optimizer/SynthType.cpp: The DDL time "binder" gets expressions as they come out of the parser, e.g. a ZZZBinderFunction. Need to add type synthesis for some cases of the ZZZBinderFunction.
- optimizer/NATable.cpp Removing some dead code. Adding an error message when we encounter a Hive column type we can't handle yet. Bug 1392450.
- optimizer/TableDesc.* Method TableDesc::validateDivisionByClauseForDDL() got moved to CmpSeabaseDDL::validateDivisionByExprForDDL().
- optimizer/NormItemExpr.cpp BETWEEN transformation described above.
- optimizer/ValueDesc.cpp Avoid hard-codeing the "_SALT_" name and adding a comment about possibility to use the flag in the future.
- parser Lots of small changes for salt and divisioning option changes. Simplifying the syntax for salt options somewhat. I think the older syntax was so complex because it needed to record the starting and ending position of the divisioning clause, something we don't need anymore.
- regress: Adding new test
- sqlcomp/CmpDescribe.cpp: Support for describing DIVISION BY clause and also supporting the new WITHOUT SALT | DIVISION options for CREATE TABLE LIKE, which relies on the describe feature.
- sqlcomp/CmpSeabaseDDLcommon.cpp sqlcomp/CmpSeabaseDDL.h + Handling the new column flags and making sure they are not confused with the HBase column flags (e.g. for serialization). + Setting the new COLUMNS.FLAGS when writing metadata. + Also, writing the computed column text to the TEXT table. + For DROP TABLE, unconditionally deleting TEXT rows, since the table could contain computed columns. + When building ColInfoArray, check system column flags, since system columns can now appear at any position. + Add method to "bind" an item expression during DDL processing without going through the full binder. This replaces any column reference with a named NATypeToItem node, since all we really need is the type and the name for unparsing. + Method TableDesc::validateDivisionByClauseForDDL() got moved to CmpSeabaseDDL::validateDivisionByExprForDDL() with some minor adjustments, since it used to be called on a bound ItemExpr, now it gets called on something that came out of the parser and went through the DDL time "binder".
- sqlcomp/CmpSeabaseDDLindex.cpp: Support for CREATE INDEX ... DIVISION LIKE TABLE. If this is set, add the division columns in front of the index key, otherwise don't.
- sqlcomp/CmpSeabaseDDLtable.cpp: + Code to make sure column flags and column class is set and propagated. + Fix for bug 1385543: Now that we use the TEXT table for computed column text, we no longer have a length limit. This is true for both divisioning and salt expressions. + When processing the column list in seabaseCreateTable() we have a bit of a chicken and egg problem: We need the column list to validate the DIVISION BY expressions, but the DIVISION BY columns need to be part of the column list. So, we do this a first time without divisioning columns, then we add those, and produce the final list in a second iteration. + getTextFromMD method now takes a sub-id as an input parameter. That's the column number for computed column text. + read computed column text from the TEXT table. Note: This also needs to handle older tables where the computed column text is stored in the default value.