Bug 1315567 bug in salted tables with descending VARCHAR columns This fixes several issues related to VARCHARs, UCS2 and UTF8 chars and varchars, when used in the SALT clause. Most of the issues only occur when using a DESCending key column, since that involves non-ASCII characters. Summary of changes:
- The logic involves creating string literals for min/max values of columns. Create those literals as UTF8 character strings, so that we can represent characters of non-ASCII or ISO character sets. - When generating the max value for a UTF8 column, generate valid UTF8 characters (0xFF does not occur in valid UTF8 and causes an error when converting it while it's used in expressions). - Change the lexter to use something other than the max UCS2 character 0xFFFF as the EOF constant. Using a non-allowed UCS2 character. - Fix some issues with the DECODE function when it operates on varchars. Pass a separate pointer to the varchar length field of the result, like it is done for other expressions. Note, this is not done for the operand.
More detailed description of changes for reviewers:
common/CharType.cpp: - The min/maxRepresentableValue method now generates a string literal that should be ready to feed to the parser. This string literal is always in UTF-8 and it uses a charset prefix to indicate the actual type's charset. Example: _UCS2'abc'. The old code returned the actual string (no quotes) in the type's charset. That value is still available. - When creating the max representable value for UTF8, generate valid UTF8 characters, not 0xFF bytes. This allows us to feed the max value back into the parser. - New virtual method to create an equivalent char type from a varchar
exp/exp_function.cpp: - in decodeKeyValue, pass in a separate pointer to the varchar length field, like it is done for other function evaluators. This is for the result varchar length field (the source is really a string of bytes). - removed an unused function to avoid having to change it
optimizer/EncodedKeyValue.cpp: - Now that NAType::min/maxRepresentableValue() methods return a literal that can be parsed, there is no more need to create a parsable string here. - When creating SQL expressions, make sure they are created in UTF8, to be able to use non-ISO88591 characters.
optimizer/HBaseSearchSpec.cpp: - Min key values with zeroes in them didn't get copied, including length fields that were multiples of 256.
optimizer/NATable.cpp: - When converting a binary region boundary value into text, use UTF-8 as the target charset
optimizer/PartFunc.cpp: - Handle case where text boundary values are not specified and make most conservative assumption in that case
optimizer/ValueDesc.cpp: - This caller of NAType::mim/maxRepresentableValue needed to be changed for the new behavior of these methods (use the string buffer in the type's charset instead of the UTF-8 string literal)
parser/ulexer.cpp: - When parsing max values for UCS2 character, the lexer saw 0xFFFF characters and interpreted those as EOF. Changed the EOF character to some value that is not valid UCS2, according to the standard.