Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook HBase Users Group, January 23, 2013

Download Report

Transcript Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook HBase Users Group, January 23, 2013

Understanding & Tuning
Compaction Algorithms
Nicolas Spiegelberg
Software Engineer, Facebook
HBase Users Group,
January 23, 2013
Agenda
1
Background
2
Compactions in Hbase
3
Compactions: Other System Algorithms
4
Parting Thoughts
Compactions: Background
Log Structured Merge Tree
Server
....
Shard #2
Shard #1
....
ColumnFamily #2
ColumnFamily #1
HFiles
Memstore
flush
Data in HFile is sorted; has block index for efficient
retrieval
About LSMT
Write Algorithms are relatively-trivial
▪
Write new, immutable file
▪
Avoid stalls
Read Algorithms are varied
▪
Compaction
▪
Server-side Filters
▪
Block Index
▪
Bloom Filter
Compactions: Intro
Critical for Read Performance
▪
Merge N files
▪
Reduces read IO when earlier filters don’t help enough
▪
The most complicated part of an LSMT
▪
What & when to select
HFiles
Merge
Compactions: Disclaimers
Assumptions
▪
Only general algorithms included
▪
▪
Coprocessors available for some common apps
Assume a relatively-stable R+W workload
Compactions in HBase
Sigma Compaction
Default algorithm in HBase 0.90
#1. File selection based on summation of sizes.
size[i] < (size[0] + size[1] + …size[i-1]) * C
#2. Compact only if at least N eligible files found.
+ trivial implementation
+ minimal overwrites
lifetime
- non-deterministic latency
- files have variable
- no
Compactions: Configuration
All Compaction Algorithms
▪
hbase.hstore.compaction.ratio
▪
hbase.hstore.compaction.min
▪
hbase.hregion.majorcompaction
▪
hbase.offpeak.start.hour
▪
hbase.offpeak.end.hour
▪
hbase.hstore.compaction.ratio.offpeak
Tiered Compaction
Default algorithm in BigTable/HBase
#1. File selection based on size relative to a pivot:
size[i] * C >= size[p] <= size[k] / C :: i < p < k
#2. Compact only if at least N eligible files found.
(groups files into “tiers”)
+ trivial implementation
+ more deterministic behavior
+ medium size files are warm
- more files seeks necessary
- still write-biased
- no incremental benefit
Compactions: Configuration
Tiered Compaction
▪
Enable: “hbase.hstore.compaction.CompactionPolicy”
▪
Default.NumCompactionTiers
▪
Default.Tier.X
▪
MaxSize
▪
MaxAgeInDisk
Compactions: Work Queues
▪
Problem: Starvation
▪
Solution:
▪
Handle Large & Small Compactions Differently
▪
Allow a configurable “throttle” to determine which queue
Compactions: Configuration
Compaction Work Queues
▪
hbase.regionserver.thread.compaction.small
▪
hbase.regionserver.thread.compaction.large
▪
hbase.regionserver.thread.compaction.throttle / “ThrottlePoint”
Compactions: Other Algorithms
Leveled Compaction
Default algorithm in LevelDB
#1. Bucket into tiers of magnitude difference (~10x)
#2. Shard the compaction across files (not just block index)
#3. Only the shard that goes over a certain size
+ optimized for read-heavy use
- complicated algorithm
+ faster compaction turnaround
- heavy rewrites on write-dominated use
+ easy to cache-on-compact
- time range filters less effective
Time-Series Compaction
▪
Log-structured Merge Tree
▪
Time-ordered Data Storage!
HFiles
flush
▪
Time-Series Compaction
▪
Implement with Coprocessor
▪
Time-boundary Based
▪
Shard HFiles on Hour, Day, etc…
…
▪
Time-series data optimized
▪
Write-biased query optimized
HFiles
day…
hour…
Parting Thoughts
Compactions: Associated JIRAs
▪
0.90
Sigma Compactions (HBASE-3209)
▪
0.92
Multi-Threaded Compactions (HBASE-1476)
▪
0.96
Tier-based Compaction (HBASE-6371 & 7055)
▪
Future
Make Compactions Pluggable (HBASE-7516)
Leveled Compaction (HBASE-7519)
Compactions: High Level Thoughts
Variables
▪
Disk IO on HFile Read
▪
Disk & Network IO on Compaction (R+W)
Compactions: High Level Thoughts
Related Questions
▪
▪
Is data mutate or append?
▪
Mutates benefit from lazy seeks but cause disk bloat
▪
HFile reduction is less useful as Rows queries are larger
Are you missing critical filters?
▪
Explicit vs. Implicit Requests
▪
Cache on write/compact (CacheConfig)
▪
Time Range / Column Filter
▪
Bloom Filters: non-trivial decision, need to measure
Thanks! Questions?