Update Statistics performance improved by sampling in HBase Update Statistics is much slower on HBase tables than it was for Seaquest. A recent performance analysis revealed that much of the deficit is due to the time spent retrieving the data from HBase that is used to derive the histograms. Typically, Update Statistics uses a 1% random sample of a table’s rows for this purpose. All rows of the table are retrieved from HBase, and the random selection of which rows to use is done in Trafodion.
To reduce the number of rows flowing from Hbase to Trafodion for queries using a SAMPLE clause that specifies random sampling, the sampling logic was pushed into the HBase layer using a RandomRowFilter, one of the built-in filtersrovided by HBase. In the typical case of a 1% sample, this reduces the number of rows passed from HBase to Trafodion by 99%. After the fix was implemented, Update Stats on various tables was 2 to 4 times faster than before, when using a 1% random sample.