PartFunc.cpp

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Bug 1376922 Union query on a view returns wrong results

Qifan investigated this bug and found the problem in replacePivs().

Rather than forcing the parent's partitioning function on the child,

the fix takes the child's function and only replaces the PIVs in it.

Additional changes:

- The replacePivs() method used a ValueIdSet for the PIVs. This should

be a list, since we use multiple PIVs often by position in the list.

- We don't need the code in replacePivs() that fixes predicates in scan

nodes, since we call replacePivs() before calling preCodeGen() on the

child and therefore the child node does not yet have predicates that

refer to PIVs.

- We don't need to replace the partitioning expression anymore, since

it does not refer to any PIVs and we leave the partitioning key

predicates almost unchanged.

- Fixing a small, unrelated, thing: When sourcing in sqenv.sh twice,

it reported an error message, due to a shell variable that didn't

get initialized to an empty string (workstation environment only).

Change-Id: Id8a20c0d958d8ce13edd59849a1418d252b5691d

  1. … 3 more files in changeset.
Enabling HASH2 partitioning of salted tables

- In FileScan::preCodeGen(), make sure we add part key predicates in all

3 cases, a) MDAM, b) with an existing search key, c) without a search key.

- Make sure we don’t do the HBase “constant keys” optimization when we have

partitioning key predicates (HBaseAccess::preCodeGen()).

- Since the partition input values for a HASH2 function are actual hash

values, the key predicate needs to call the Hash2Distrib function to

compute the salt value

(PartitioningFunction::createBetweenPartitioningKeyPredicates())

- When we replace an existing search key with a new one for the partitioning

key predicates, try to include the existing predicates as well

(TableHashPartitioningFunction::createSearchKey())

Change-Id: I092ae85653f320d1d26273a15da4e0ac6b0ae2bc

  1. … 8 more files in changeset.
Bug 1343615: Duplicated rows for parallel scan on salted table

- In preCodeGen, add partitioning key predicates to scan node

if it uses a single subset key and HASH2 partitioning function

- Handle partitioning key preds in FileScan::preCodeGen, move

code from HbaseAccess::preCodeGen.

- Make a special partitioning key predicate for salted tables

with a HASH2 function: "_SALT_" between :pivLo and :pivHi

This will lead to an efficient access path for the ESP to read

only the data it is supposed to read.

- Salted tables have a HASH2 partitioning key that does not

include the "_SALT_" column. So, the partitioning key is

not a prefix of the clustering key. However, we need to apply

the partitioning key predicates to the clustering key of the

table, since that's the only key we have. This is different

from a "partition access" node. See

TableHashPartitioningFunction::createSearchKey() in file

PartFunc.cpp.

- Moved a method to create a partitioning key predicate of

the form <some expr> between :pivLo and :pivHi up in the

class hierarchy, to be able to use it in both

HashPartitioningFunction and TableHashPartitioningFunction

Also added support for the KeyPrefixRegionSplitPolicy. This might

be useful in the future when we push things like GROUP BY into the

region servers. It can ensure that keys with the same prefix of a

given length stay in the same region. Example for a table with

this split policy:

-- make sure all the line items for an order (first 4 bytes of the key)

-- stay within the same region

create table lineitems(orderno int not null,

lineno int not null,

comment char(10),

primary key (orderno, lineno))

hbase_options ( SPLIT_POLICY = 'org.apache.hadoop.hbase.regionserver.KeyPrefixRegionSplitPolicy',

PREFIX_LENGTH_KEY = '4');

Removed ENCODE_ON_DISK table property because it is deprecated and does nothing.

Patch set 2: Changed comment in ExpHbaseDefs.h.

Change-Id: I57fafe2f854475261313abcf5bd2c81013f43756

  1. … 7 more files in changeset.
Bug 1315567 bug in salted tables with descending VARCHAR columns

This fixes several issues related to VARCHARs, UCS2 and UTF8 chars and

varchars, when used in the SALT clause. Most of the issues only

occur when using a DESCending key column, since that involves non-ASCII

characters. Summary of changes:

- The logic involves creating string literals for min/max values of

columns. Create those literals as UTF8 character strings, so that we

can represent characters of non-ASCII or ISO character sets.

- When generating the max value for a UTF8 column, generate valid UTF8

characters (0xFF does not occur in valid UTF8 and causes an error when

converting it while it's used in expressions).

- Change the lexter to use something other than the max UCS2 character

0xFFFF as the EOF constant. Using a non-allowed UCS2 character.

- Fix some issues with the DECODE function when it operates on varchars.

Pass a separate pointer to the varchar length field of the result, like

it is done for other expressions. Note, this is not done for the operand.

More detailed description of changes for reviewers:

common/CharType.cpp:

- The min/maxRepresentableValue method now generates a string literal

that should be ready to feed to the parser. This string literal is

always in UTF-8 and it uses a charset prefix to indicate the actual

type's charset. Example: _UCS2'abc'. The old code returned the actual

string (no quotes) in the type's charset. That value is still available.

- When creating the max representable value for UTF8, generate valid

UTF8 characters, not 0xFF bytes. This allows us to feed the max value

back into the parser.

- New virtual method to create an equivalent char type from a varchar

exp/exp_function.cpp:

- in decodeKeyValue, pass in a separate pointer to the varchar length field,

like it is done for other function evaluators. This is for the result

varchar length field (the source is really a string of bytes).

- removed an unused function to avoid having to change it

optimizer/EncodedKeyValue.cpp:

- Now that NAType::min/maxRepresentableValue() methods return a literal

that can be parsed, there is no more need to create a parsable string

here.

- When creating SQL expressions, make sure they are created in UTF8, to

be able to use non-ISO88591 characters.

optimizer/HBaseSearchSpec.cpp:

- Min key values with zeroes in them didn't get copied, including length

fields that were multiples of 256.

optimizer/NATable.cpp:

- When converting a binary region boundary value into text, use UTF-8

as the target charset

optimizer/PartFunc.cpp:

- Handle case where text boundary values are not specified and make

most conservative assumption in that case

optimizer/ValueDesc.cpp:

- This caller of NAType::mim/maxRepresentableValue needed to be changed

for the new behavior of these methods (use the string buffer in the

type's charset instead of the UTF-8 string literal)

parser/ulexer.cpp:

- When parsing max values for UCS2 character, the lexer saw 0xFFFF characters

and interpreted those as EOF. Changed the EOF character to some value

that is not valid UCS2, according to the standard.

7/20: Rework for comments from Qifan and Dave.

Change-Id: If4aa698393d0f204c839efe40087a1696069d277

  1. … 12 more files in changeset.
Initial code drop of Trafodion

    • -0
    • +6027
    ./PartFunc.cpp
  1. … 4886 more files in changeset.