Clone
Hans Zeller <hans.zeller@hp.com>
committed
on 19 Aug 14
Bug 1343615: Duplicated rows for parallel scan on salted table
- In preCodeGen, add partitioning key predicates to scan node
if it uses a … Show more
Bug 1343615: Duplicated rows for parallel scan on salted table

- In preCodeGen, add partitioning key predicates to scan node

 if it uses a single subset key and HASH2 partitioning function

- Handle partitioning key preds in FileScan::preCodeGen, move

 code from HbaseAccess::preCodeGen.

- Make a special partitioning key predicate for salted tables

 with a HASH2 function: "_SALT_" between :pivLo and :pivHi

 This will lead to an efficient access path for the ESP to read

 only the data it is supposed to read.

- Salted tables have a HASH2 partitioning key that does not

 include the "_SALT_" column. So, the partitioning key is

 not a prefix of the clustering key. However, we need to apply

 the partitioning key predicates to the clustering key of the

 table, since that's the only key we have. This is different

 from a "partition access" node. See

 TableHashPartitioningFunction::createSearchKey() in file

 PartFunc.cpp.

- Moved a method to create a partitioning key predicate of

 the form <some expr> between :pivLo and :pivHi up in the

 class hierarchy, to be able to use it in both

 HashPartitioningFunction and TableHashPartitioningFunction

Also added support for the KeyPrefixRegionSplitPolicy. This might

be useful in the future when we push things like GROUP BY into the

region servers. It can ensure that keys with the same prefix of a

given length stay in the same region. Example for a table with

this split policy:

-- make sure all the line items for an order (first 4 bytes of the key)

-- stay within the same region

create table lineitems(orderno int not null,

                      lineno int not null,

                      comment char(10),

                      primary key (orderno, lineno))

hbase_options ( SPLIT_POLICY = 'org.apache.hadoop.hbase.regionserver.KeyPrefixRegionSplitPolicy',

               PREFIX_LENGTH_KEY = '4');

Removed ENCODE_ON_DISK table property because it is deprecated and does nothing.

Patch set 2: Changed comment in ExpHbaseDefs.h.

Change-Id: I57fafe2f854475261313abcf5bd2c81013f43756

Show less

default + 10 more