Avoid scanner timeout for Update Statistics For performance reasons, Update Stats pushes sampling down into HBase, using a filter that returns only randomly selected rows. When the sampling rate is very low, as is the case when the default sampling protocol (which includes a sample limit of a million rows) is used on a very large table, a long time can be taken in the region server before returning to Trafodion, with the resultant risk of an OutOfOrderScannerNextException. To avoid these timeouts, this fix reduces the scanner cache size (the number of rows accumulated before returning) used by a given scan based on the sampling rate. If an adequate return time can not be achieved in this manner without going below the scanner cache minimum prescribed by the HBASE_NUM_CACHE_ROWS_MIN cqd, then the scanner cache reduction is complemented by a modification of the sampling rate used in HBase. The sampling rate used in HBase is increased, but the overall rate is maintained by doing supplementary sampling of the returned rows in Trafodion. For example, if the original sampling rate is .000001, and reducing the scanner cache to the minimum still results in an excessive average time spent in the region server, the sampling may be split into a .00001 rate in HBase and a .01 rate in Trafodion, resulting in the same effective .000001 overall rate.
Changes to enable Rowset select - Fix for bug 1423327 HBase always returns an empty result set when the row is not found. Trafodion is changed to exploit this concept to project no data in a rowset select.
Now optimizer has been enabled to choose a plan involving Rowset Select where ever possible. This can result in plan changes for the queries - nested join plan instead of hash join, vsbb delete instead of delete, vsbb insert instead of regular insert.
A new CQD HBASE_ROWSET_VSBB_SIZE is now added to control the hbase rowset size. The default values is 1000