Compaction Techniques & Crash Recovery in HBase
Compaction in HBase The recommended maximum region size is 10 - 20 Gb. For HBase clusters running version 0.90. x, the maximum recommended region size is 4 Gb and the default is 256 Mb. Compaction in HBase is a process by which HBase cleans itself. HBase is a distributed data store optimized for read performance. Optimal read performance comes from having one file per column family. It is not always possible to have one file per column family during the heavy writes. That is reason why HBase tries to combine all HFiles into a large single HFile to reduce the maximum number of disk seeks needed for read. This process is known as compaction. Compactions can cause HBase to block writes to prevent JVM heap exhaustion. Whereas this process is of two types: Minor HBase Compaction Major HBase Compaction. This Minor and Major Compaction will take time for merging/zipping those files so it makes network traffic. For avoiding network traffic, it is generally scheduled during low peak ...