Bug 1343615: Duplicated rows for parallel scan on salted table - In preCodeGen, add partitioning key predicates to scan node if it uses a single subset key and HASH2 partitioning function - Handle partitioning key preds in FileScan::preCodeGen, move code from HbaseAccess::preCodeGen. - Make a special partitioning key predicate for salted tables with a HASH2 function: "_SALT_" between :pivLo and :pivHi This will lead to an efficient access path for the ESP to read only the data it is supposed to read. - Salted tables have a HASH2 partitioning key that does not include the "_SALT_" column. So, the partitioning key is not a prefix of the clustering key. However, we need to apply the partitioning key predicates to the clustering key of the table, since that's the only key we have. This is different from a "partition access" node. See TableHashPartitioningFunction::createSearchKey() in file PartFunc.cpp. - Moved a method to create a partitioning key predicate of the form <some expr> between :pivLo and :pivHi up in the class hierarchy, to be able to use it in both HashPartitioningFunction and TableHashPartitioningFunction
Also added support for the KeyPrefixRegionSplitPolicy. This might be useful in the future when we push things like GROUP BY into the region servers. It can ensure that keys with the same prefix of a given length stay in the same region. Example for a table with this split policy:
-- make sure all the line items for an order (first 4 bytes of the key) -- stay within the same region create table lineitems(orderno int not null, lineno int not null, comment char(10), primary key (orderno, lineno)) hbase_options ( SPLIT_POLICY = 'org.apache.hadoop.hbase.regionserver.KeyPrefixRegionSplitPolicy', PREFIX_LENGTH_KEY = '4');
Removed ENCODE_ON_DISK table property because it is deprecated and does nothing.