Enabling runtime stats for hbase operators This is the first set of changes to collect the runtime stats info for hbase tables and operators. It contains: 1) Populate the estimated row count in hbase access TDB. 2) Collect the hbase access time and accessed row count at the JNI layer (only for select operations now).
Partially reviewed by Mike H. and Selva G.
Removed the part that devides the estimated rows by number of ESPs based on the comments
Update Statistics performance improved by sampling in HBase Update Statistics is much slower on HBase tables than it was for Seaquest. A recent performance analysis revealed that much of the deficit is due to the time spent retrieving the data from HBase that is used to derive the histograms. Typically, Update Statistics uses a 1% random sample of a table’s rows for this purpose. All rows of the table are retrieved from HBase, and the random selection of which rows to use is done in Trafodion.
To reduce the number of rows flowing from Hbase to Trafodion for queries using a SAMPLE clause that specifies random sampling, the sampling logic was pushed into the HBase layer using a RandomRowFilter, one of the built-in filtersrovided by HBase. In the typical case of a 1% sample, this reduces the number of rows passed from HBase to Trafodion by 99%. After the fix was implemented, Update Stats on various tables was 2 to 4 times faster than before, when using a 1% random sample.