Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook HBase Users Group, January 23, 2013
Download ReportTranscript Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook HBase Users Group, January 23, 2013
Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook HBase Users Group, January 23, 2013 Agenda 1 Background 2 Compactions in Hbase 3 Compactions: Other System Algorithms 4 Parting Thoughts Compactions: Background Log Structured Merge Tree Server .... Shard #2 Shard #1 .... ColumnFamily #2 ColumnFamily #1 HFiles Memstore flush Data in HFile is sorted; has block index for efficient retrieval About LSMT Write Algorithms are relatively-trivial ▪ Write new, immutable file ▪ Avoid stalls Read Algorithms are varied ▪ Compaction ▪ Server-side Filters ▪ Block Index ▪ Bloom Filter Compactions: Intro Critical for Read Performance ▪ Merge N files ▪ Reduces read IO when earlier filters don’t help enough ▪ The most complicated part of an LSMT ▪ What & when to select HFiles Merge Compactions: Disclaimers Assumptions ▪ Only general algorithms included ▪ ▪ Coprocessors available for some common apps Assume a relatively-stable R+W workload Compactions in HBase Sigma Compaction Default algorithm in HBase 0.90 #1. File selection based on summation of sizes. size[i] < (size[0] + size[1] + …size[i-1]) * C #2. Compact only if at least N eligible files found. + trivial implementation + minimal overwrites lifetime - non-deterministic latency - files have variable - no Compactions: Configuration All Compaction Algorithms ▪ hbase.hstore.compaction.ratio ▪ hbase.hstore.compaction.min ▪ hbase.hregion.majorcompaction ▪ hbase.offpeak.start.hour ▪ hbase.offpeak.end.hour ▪ hbase.hstore.compaction.ratio.offpeak Tiered Compaction Default algorithm in BigTable/HBase #1. File selection based on size relative to a pivot: size[i] * C >= size[p] <= size[k] / C :: i < p < k #2. Compact only if at least N eligible files found. (groups files into “tiers”) + trivial implementation + more deterministic behavior + medium size files are warm - more files seeks necessary - still write-biased - no incremental benefit Compactions: Configuration Tiered Compaction ▪ Enable: “hbase.hstore.compaction.CompactionPolicy” ▪ Default.NumCompactionTiers ▪ Default.Tier.X ▪ MaxSize ▪ MaxAgeInDisk Compactions: Work Queues ▪ Problem: Starvation ▪ Solution: ▪ Handle Large & Small Compactions Differently ▪ Allow a configurable “throttle” to determine which queue Compactions: Configuration Compaction Work Queues ▪ hbase.regionserver.thread.compaction.small ▪ hbase.regionserver.thread.compaction.large ▪ hbase.regionserver.thread.compaction.throttle / “ThrottlePoint” Compactions: Other Algorithms Leveled Compaction Default algorithm in LevelDB #1. Bucket into tiers of magnitude difference (~10x) #2. Shard the compaction across files (not just block index) #3. Only the shard that goes over a certain size + optimized for read-heavy use - complicated algorithm + faster compaction turnaround - heavy rewrites on write-dominated use + easy to cache-on-compact - time range filters less effective Time-Series Compaction ▪ Log-structured Merge Tree ▪ Time-ordered Data Storage! HFiles flush ▪ Time-Series Compaction ▪ Implement with Coprocessor ▪ Time-boundary Based ▪ Shard HFiles on Hour, Day, etc… … ▪ Time-series data optimized ▪ Write-biased query optimized HFiles day… hour… Parting Thoughts Compactions: Associated JIRAs ▪ 0.90 Sigma Compactions (HBASE-3209) ▪ 0.92 Multi-Threaded Compactions (HBASE-1476) ▪ 0.96 Tier-based Compaction (HBASE-6371 & 7055) ▪ Future Make Compactions Pluggable (HBASE-7516) Leveled Compaction (HBASE-7519) Compactions: High Level Thoughts Variables ▪ Disk IO on HFile Read ▪ Disk & Network IO on Compaction (R+W) Compactions: High Level Thoughts Related Questions ▪ ▪ Is data mutate or append? ▪ Mutates benefit from lazy seeks but cause disk bloat ▪ HFile reduction is less useful as Rows queries are larger Are you missing critical filters? ▪ Explicit vs. Implicit Requests ▪ Cache on write/compact (CacheConfig) ▪ Time Range / Column Filter ▪ Bloom Filters: non-trivial decision, need to measure Thanks! Questions?