Provide quick row count estimation for Ustat Update Statistics needs an estimation of the cardinality of an HBase table, which to this point has been provided by the result of selecting count(*) from the table with an internal query. This incurred a significant overhead for large files, and also occasionally resulted in an 8448 error due to a known coprocessor problem. The approach implemented by this fix is to access the HFiles through the FileSystem interface and read the EntryCount field in the trailer block of each file. Some sampling of initial data blocks is done to determine the expected number of missing KevValues due to nulls and the number of non-PUT KeyValues. The number of rows is estimated by dividing the adjusted count by the number of columns in the table. The MemStore of each of the table's regions is checked to get the total storage for the table outside of HFiles, and the number of rows in memory is estimated using the total MemStore size and the size-to-rowcount ratio for the HFiles.
Fix histograms for primary key of salted tables Users attempting to use Update Statistics to create a multi-column histogram (MC) corresponding to the primary key of a salted table may be unaware that the "_SALT_" column is implicitly prepended to the key as stated in the Create Table statement, and omit it. This fix will cause Update Stats to detect a request for a multi-column histograms that specifies the primary key columns (or a prefix of the full key), and add _SALT_ to it if missing, and order the MC to match the order of the columns in the primary key.
The change only affects salted tables, and is only applied if neither the ON EVERY KEY nor ON EVERY COLUMN clauses is present, because an MC matching the full primary key is automatically generated in those cases.
A second part of this fix applies to cases where ON EVERY KEY or ON EVERY COLUMN is specified in an Update Statistics statement on a salted table. By default, MCs corresponding to subsets of the primary key will no longer be generated automatically in this case. The cqd USTAT_ADD_SALTED_KEY_PREFIXES_FOR_MC may be set to 'ON' to cause MCs for subsets of the primary key to be generated.