Provide quick row count estimation for Ustat Update Statistics needs an estimation of the cardinality of an HBase table, which to this point has been provided by the result of selecting count(*) from the table with an internal query. This incurred a significant overhead for large files, and also occasionally resulted in an 8448 error due to a known coprocessor problem. The approach implemented by this fix is to access the HFiles through the FileSystem interface and read the EntryCount field in the trailer block of each file. Some sampling of initial data blocks is done to determine the expected number of missing KevValues due to nulls and the number of non-PUT KeyValues. The number of rows is estimated by dividing the adjusted count by the number of columns in the table. The MemStore of each of the table's regions is checked to get the total storage for the table outside of HFiles, and the number of rows in memory is estimated using the total MemStore size and the size-to-rowcount ratio for the HFiles.
Pre-fetch cells from Hbase Pre-fetch is enabled via a parameter in HTableClient.startScan method.Pre-fetch is not done for unique and batch Trafodion operations and all native Hbase table access. Pre-fetch is currently disabled for non-unique UMD Trafodion operations.
startScan method invokes pre-fetch to Hbase in a different thread. When the fetchRows method is called, pre-fetch completes, passes cell info to JNI and invokes pre-fetch if there are more rows to be fetched.
We have observed around 45% reduction in response time to fetch 12 million rows of a sizteen partition table in a node via a single process.