Clone
Barry Fritchman <barry.fritchman@hp.com>
committed
on 01 Jul 14
Update Statistics performance improved by sampling in HBase
Update Statistics is much slower on HBase tables than it was for
Seaquest. A rec… Show more
Update Statistics performance improved by sampling in HBase

Update Statistics is much slower on HBase tables than it was for

Seaquest. A recent performance analysis revealed that much of the

deficit is due to the time spent retrieving the data from HBase that

is used to derive the histograms. Typically, Update Statistics uses a

1% random sample of a table’s rows for this purpose.  All rows of the

table are retrieved from HBase, and the random selection of which rows

to use is done in Trafodion.

To reduce the number of rows flowing from Hbase to Trafodion for queries

using a SAMPLE clause that specifies random sampling, the sampling logic

was pushed into the HBase layer using a RandomRowFilter, one of the

built-in filtersrovided by HBase. In the typical case of a 1% sample,

this reduces the number of rows passed from HBase to Trafodion by 99%.

After the fix was implemented, Update Stats on various tables was 2 to 4

times faster than before, when using a 1% random sample.

Change-Id: Icd40e4db1dde444dec76165c215596755afae96c

Show less

default + 10 more