Clone
Barry Fritchman <barry.fritchman@hp.com>
committed
on 27 May 15
Avoid scanner timeout for Update Statistics
For performance reasons, Update Stats pushes sampling down into HBase,
using a filter that retur… Show more
Avoid scanner timeout for Update Statistics

For performance reasons, Update Stats pushes sampling down into HBase,

using a filter that returns only randomly selected rows. When the

sampling rate is very low, as is the case when the default sampling

protocol (which includes a sample limit of a million rows) is used on

a very large table, a long time can be taken in the region server

before returning to Trafodion, with the resultant risk of an

OutOfOrderScannerNextException. To avoid these timeouts, this fix

reduces the scanner cache size (the number of rows accumulated before

returning) used by a given scan based on the sampling rate. If an

adequate return time can not be achieved in this manner without

going below the scanner cache minimum prescribed by the

HBASE_NUM_CACHE_ROWS_MIN cqd, then the scanner cache reduction is

complemented by a modification of the sampling rate used in HBase.

The sampling rate used in HBase is increased, but the overall rate

is maintained by doing supplementary sampling of the returned rows in

Trafodion. For example, if the original sampling rate is .000001,

and reducing the scanner cache to the minimum still results in an

excessive average time spent in the region server, the sampling

may be split into a .00001 rate in HBase and a .01 rate in Trafodion,

resulting in the same effective .000001 overall rate.

Change-Id: Id05ab5063c2c119c21b5c6c002ba9554501bb4e1

Closes-Bug: #1391271

Show less

default + 8 more