PartFunc.cpp

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Bug 1315567 bug in salted tables with descending VARCHAR columns

This fixes several issues related to VARCHARs, UCS2 and UTF8 chars and

varchars, when used in the SALT clause. Most of the issues only

occur when using a DESCending key column, since that involves non-ASCII

characters. Summary of changes:

- The logic involves creating string literals for min/max values of

columns. Create those literals as UTF8 character strings, so that we

can represent characters of non-ASCII or ISO character sets.

- When generating the max value for a UTF8 column, generate valid UTF8

characters (0xFF does not occur in valid UTF8 and causes an error when

converting it while it's used in expressions).

- Change the lexter to use something other than the max UCS2 character

0xFFFF as the EOF constant. Using a non-allowed UCS2 character.

- Fix some issues with the DECODE function when it operates on varchars.

Pass a separate pointer to the varchar length field of the result, like

it is done for other expressions. Note, this is not done for the operand.

More detailed description of changes for reviewers:

common/CharType.cpp:

- The min/maxRepresentableValue method now generates a string literal

that should be ready to feed to the parser. This string literal is

always in UTF-8 and it uses a charset prefix to indicate the actual

type's charset. Example: _UCS2'abc'. The old code returned the actual

string (no quotes) in the type's charset. That value is still available.

- When creating the max representable value for UTF8, generate valid

UTF8 characters, not 0xFF bytes. This allows us to feed the max value

back into the parser.

- New virtual method to create an equivalent char type from a varchar

exp/exp_function.cpp:

- in decodeKeyValue, pass in a separate pointer to the varchar length field,

like it is done for other function evaluators. This is for the result

varchar length field (the source is really a string of bytes).

- removed an unused function to avoid having to change it

optimizer/EncodedKeyValue.cpp:

- Now that NAType::min/maxRepresentableValue() methods return a literal

that can be parsed, there is no more need to create a parsable string

here.

- When creating SQL expressions, make sure they are created in UTF8, to

be able to use non-ISO88591 characters.

optimizer/HBaseSearchSpec.cpp:

- Min key values with zeroes in them didn't get copied, including length

fields that were multiples of 256.

optimizer/NATable.cpp:

- When converting a binary region boundary value into text, use UTF-8

as the target charset

optimizer/PartFunc.cpp:

- Handle case where text boundary values are not specified and make

most conservative assumption in that case

optimizer/ValueDesc.cpp:

- This caller of NAType::mim/maxRepresentableValue needed to be changed

for the new behavior of these methods (use the string buffer in the

type's charset instead of the UTF-8 string literal)

parser/ulexer.cpp:

- When parsing max values for UCS2 character, the lexer saw 0xFFFF characters

and interpreted those as EOF. Changed the EOF character to some value

that is not valid UCS2, according to the standard.

7/20: Rework for comments from Qifan and Dave.

Change-Id: If4aa698393d0f204c839efe40087a1696069d277

  1. … 12 more files in changeset.
Initial code drop of Trafodion

    • -0
    • +6027
    ./PartFunc.cpp
  1. … 4886 more files in changeset.