Change to avoid placing large scan results in RegionServer cache By default the result of every Scan and Get request to HBase is placed in the RegionServer cache. When a scan returns a lot of rows this can lead to cache thrashing, causing results which are being shared by other queries to be flushed out. This change uses cardinality estimates and hbase row size estimates along with the configured size of region server cache size to determine when such thrashing may occur. Heap size for region server is specified through a cqd HBASE_REGION_SERVER_MAX_HEAP_SIZE. The units are in MB. The fraction of this heap allocated to block cache is read from the config file once per session through a lightweight (no disk access) JNI call. The hueristic used is approximate as it does not consider total number of region servers or that sometimes a scan may be concentrated in one or a few region servers. We simply do not place rows in RS cache, if the memory used by all rows in a scan will exceed the cache in a single RS. Change can be overridden with cqd HBASE_CACHE_BLOCKS 'ON'. The default is now SYSTEM. Change applies to both count(*) coproc plans and regular scans.
Reworked fix for LP bug 1404951 The scan cache size for an mdam probe is now set to the hbase default of 100. Setting it values like 1 or 2 resulted in intermittent failures. The cqd COMP_BOOL_184 can be set ON to get a cache size of 1 for mdam probes. Root cause for this intermittent failure will be investigated later.