Update Statistics performance improved by sampling in HBase Update Statistics is much slower on HBase tables than it was for Seaquest. A recent performance analysis revealed that much of the deficit is due to the time spent retrieving the data from HBase that is used to derive the histograms. Typically, Update Statistics uses a 1% random sample of a table’s rows for this purpose. All rows of the table are retrieved from HBase, and the random selection of which rows to use is done in Trafodion.
To reduce the number of rows flowing from Hbase to Trafodion for queries using a SAMPLE clause that specifies random sampling, the sampling logic was pushed into the HBase layer using a RandomRowFilter, one of the built-in filtersrovided by HBase. In the typical case of a 1% sample, this reduces the number of rows passed from HBase to Trafodion by 99%. After the fix was implemented, Update Stats on various tables was 2 to 4 times faster than before, when using a 1% random sample.
trafodion bulk load changes (disabled)) fixing a conflict in install_local_hadoop
these changes include: *changes to the install scripts (install_local_hadoop and other files) **change hbase run on top of hdfs instead of local file system. This change may require running install_local_hadoop again when you rebase and initiliaze trafodion again. **You may lose youtr tables. If you have tables that you need to keep please use extract and then load to extract the data before you rebasing and then load them after you rebase *adding a coprocessor to support secure way of doing load using hidden folders (works with non secure hbase). secure load is disabled by default *recovery using Snapshots (diabled by default) ): when enabled a snapshot is taken before the load starts and restored if something goes wrong. Otherwise it is deleted after the data is loaded *changes to Makefiles to build the coprocessors in java 7 and 6