Clone
suresh.subbiah@hp.com <suresh.subbiah@hp.com>
committed
on 08 Feb 15
Change to avoid placing large scan results in RegionServer cache
By default the result of every Scan and Get request to HBase is placed in
t… Show more
Change to avoid placing large scan results in RegionServer cache

By default the result of every Scan and Get request to HBase is placed in

the RegionServer cache. When a scan returns a lot of rows this can lead to

cache thrashing, causing results which are being shared by other queries

to be flushed out. This change uses cardinality estimates and hbase row size

estimates along with the configured size of region server cache size to

determine when such thrashing may occur. Heap size for region server is

specified through a cqd HBASE_REGION_SERVER_MAX_HEAP_SIZE. The units are in MB.

The fraction of this heap allocated to block cache is read from the config

file once per session through a lightweight (no disk access) JNI call. The

hueristic used is approximate as it does not consider total number of region

servers or that sometimes a scan may be concentrated in one or a few region

servers. We simply do not place rows in RS cache, if the memory used by all

rows in a scan will exceed the cache in a single RS. Change can be overridden

with cqd HBASE_CACHE_BLOCKS 'ON'. The default is now SYSTEM. Change applies

to both count(*) coproc plans and regular scans.

Change-Id: I0afc8da44df981c1dffa3a77fb3efe2f806c3af1

Show less

default + 9 more