Estmate NHase row count when stats not available When statistics are not available for a table, give the optimizer a better estimate of its cardinaly than the default value by reading summary information from the trailer block of the table's HFiles.
Provide quick row count estimation for Ustat Update Statistics needs an estimation of the cardinality of an HBase table, which to this point has been provided by the result of selecting count(*) from the table with an internal query. This incurred a significant overhead for large files, and also occasionally resulted in an 8448 error due to a known coprocessor problem. The approach implemented by this fix is to access the HFiles through the FileSystem interface and read the EntryCount field in the trailer block of each file. Some sampling of initial data blocks is done to determine the expected number of missing KevValues due to nulls and the number of non-PUT KeyValues. The number of rows is estimated by dividing the adjusted count by the number of columns in the table. The MemStore of each of the table's regions is checked to get the total storage for the table outside of HFiles, and the number of rows in memory is estimated using the total MemStore size and the size-to-rowcount ratio for the HFiles.