PartFunc.cpp

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Bug 1343615: Duplicated rows for parallel scan on salted table

- In preCodeGen, add partitioning key predicates to scan node

if it uses a single subset key and HASH2 partitioning function

- Handle partitioning key preds in FileScan::preCodeGen, move

code from HbaseAccess::preCodeGen.

- Make a special partitioning key predicate for salted tables

with a HASH2 function: "_SALT_" between :pivLo and :pivHi

This will lead to an efficient access path for the ESP to read

only the data it is supposed to read.

- Salted tables have a HASH2 partitioning key that does not

include the "_SALT_" column. So, the partitioning key is

not a prefix of the clustering key. However, we need to apply

the partitioning key predicates to the clustering key of the

table, since that's the only key we have. This is different

from a "partition access" node. See

TableHashPartitioningFunction::createSearchKey() in file

PartFunc.cpp.

- Moved a method to create a partitioning key predicate of

the form <some expr> between :pivLo and :pivHi up in the

class hierarchy, to be able to use it in both

HashPartitioningFunction and TableHashPartitioningFunction

Also added support for the KeyPrefixRegionSplitPolicy. This might

be useful in the future when we push things like GROUP BY into the

region servers. It can ensure that keys with the same prefix of a

given length stay in the same region. Example for a table with

this split policy:

-- make sure all the line items for an order (first 4 bytes of the key)

-- stay within the same region

create table lineitems(orderno int not null,

lineno int not null,

comment char(10),

primary key (orderno, lineno))

hbase_options ( SPLIT_POLICY = 'org.apache.hadoop.hbase.regionserver.KeyPrefixRegionSplitPolicy',

PREFIX_LENGTH_KEY = '4');

Removed ENCODE_ON_DISK table property because it is deprecated and does nothing.

Patch set 2: Changed comment in ExpHbaseDefs.h.

Change-Id: I57fafe2f854475261313abcf5bd2c81013f43756

  1. … 7 more files in changeset.
Bug 1315567 bug in salted tables with descending VARCHAR columns

This fixes several issues related to VARCHARs, UCS2 and UTF8 chars and

varchars, when used in the SALT clause. Most of the issues only

occur when using a DESCending key column, since that involves non-ASCII

characters. Summary of changes:

- The logic involves creating string literals for min/max values of

columns. Create those literals as UTF8 character strings, so that we

can represent characters of non-ASCII or ISO character sets.

- When generating the max value for a UTF8 column, generate valid UTF8

characters (0xFF does not occur in valid UTF8 and causes an error when

converting it while it's used in expressions).

- Change the lexter to use something other than the max UCS2 character

0xFFFF as the EOF constant. Using a non-allowed UCS2 character.

- Fix some issues with the DECODE function when it operates on varchars.

Pass a separate pointer to the varchar length field of the result, like

it is done for other expressions. Note, this is not done for the operand.

More detailed description of changes for reviewers:

common/CharType.cpp:

- The min/maxRepresentableValue method now generates a string literal

that should be ready to feed to the parser. This string literal is

always in UTF-8 and it uses a charset prefix to indicate the actual

type's charset. Example: _UCS2'abc'. The old code returned the actual

string (no quotes) in the type's charset. That value is still available.

- When creating the max representable value for UTF8, generate valid

UTF8 characters, not 0xFF bytes. This allows us to feed the max value

back into the parser.

- New virtual method to create an equivalent char type from a varchar

exp/exp_function.cpp:

- in decodeKeyValue, pass in a separate pointer to the varchar length field,

like it is done for other function evaluators. This is for the result

varchar length field (the source is really a string of bytes).

- removed an unused function to avoid having to change it

optimizer/EncodedKeyValue.cpp:

- Now that NAType::min/maxRepresentableValue() methods return a literal

that can be parsed, there is no more need to create a parsable string

here.

- When creating SQL expressions, make sure they are created in UTF8, to

be able to use non-ISO88591 characters.

optimizer/HBaseSearchSpec.cpp:

- Min key values with zeroes in them didn't get copied, including length

fields that were multiples of 256.

optimizer/NATable.cpp:

- When converting a binary region boundary value into text, use UTF-8

as the target charset

optimizer/PartFunc.cpp:

- Handle case where text boundary values are not specified and make

most conservative assumption in that case

optimizer/ValueDesc.cpp:

- This caller of NAType::mim/maxRepresentableValue needed to be changed

for the new behavior of these methods (use the string buffer in the

type's charset instead of the UTF-8 string literal)

parser/ulexer.cpp:

- When parsing max values for UCS2 character, the lexer saw 0xFFFF characters

and interpreted those as EOF. Changed the EOF character to some value

that is not valid UCS2, according to the standard.

7/20: Rework for comments from Qifan and Dave.

Change-Id: If4aa698393d0f204c839efe40087a1696069d277

  1. … 12 more files in changeset.
Initial code drop of Trafodion

    • -0
    • +6027
    ./PartFunc.cpp
  1. … 4886 more files in changeset.