Transcript Slides
Efficient Locking Techniques for Databases on Modern Hardware Hideaki Kimura #* Goetz Graefe + # Brown University * Microsoft Jim Gray Systems Lab Harumi Kuno + + Hewlett-Packard Laboratories at
ADMS'12
Slides/papers available on request. Email us:
Traditional DBMS on Modern Hardware Optimized for Magnetic Disk Bottleneck
Disk I/O Costs Other Costs
Useful Work
Then What’s This?
Fig. Instructions and Cycles for New Order
[S. Harizopoulos et al. SIGMOD‘08] 2/26
Context of This Paper Achieved up to 6x overall speed-up Foster B-trees Consolidation Array, Flush-Pipeline Shore-MT/Aether [Johnson et al'10]
This Paper
Work in progress 3/26
Our Prior Work: Foster B-trees
[TODS'12]
Foster Relationship Fence Keys Simple Prefix Compression Poor-man's Normalized Keys Efficient yet Exhaustive Verification
Implemented by modifying Shore-MT and compared with it: Low Latch Contention High Latch Contention 2-3x speed-up 6x speedup
On Sun Niagara. Tested without locks. only latches.
4/26
Talk Overview 1) 2) 3) 4) Key Range Locks w/ Higher Concurrency
Combines fence-keys and Graefe lock modes
Lightweight Intent Lock
Extremely Scalable and Fast
Scalable Deadlock Detection
Dreadlocks Algorithm applied to Databases
Serializable Early-Lock-Release
Serializable all-kinds ELR that allows read-only transaction to bypass logging
5/26
SELECT Key=10 S
1. Key Range Lock
Gap
10
SELECT Key=15 UPDATE Key=30
20
X
30
SELECT Key=20 ~ 25 Mohan et al. : Locks neighboring key. Lomet et al.: Adds a few new lock modes. (e.g., RangeX-S)
Still lacks a few lock modes, resulting in lower concurrency.
6/26
Our Key Range Locking
Graefe Lock Modes. All 3*3=9 modes Fence Keys D E E E A E B … E Z F Use Fence Keys to lock on page boundary Create a ghost record (pseudo deleted record) before insertion as a separate Xct.
7/26
2. Intent Lock
Coarse level locking (e.g., table, database) [Gray et al] Intent Lock (IS/IX) and Absolute Lock (X/S/SIX) Saves overhead for large scan/write transactions (just one absolute lock) 8/26
Intent Lock: Physical Contention Logical Physical
Lock Queues
DB-1
IS IX
DB-1
IS IX
VOL-1
IS IX
VOL-1
IS IX
IND-1
IS
Key-A
IX S
Key-B
X
IND-1
IS
Key-A
S
Key-B
IX X
9/26
Lightweight Intent Lock
Logical DB-1
IS IX
Physical
IS IX S
VOL-1
IS IX
DB1 1 VOL1 1 IND1 1 1 1 1 0 0 0 IND-1
IS IX X
0 0 0
Counters for Coarse Locks No Lock Queue, No Mutex
Key-A Key-B
S X
Key-A Key-B
S X
Lock Queues for Key Locks
10/26
Intent Lock: Summary
Extremely Lightweight for Scalability Just a set of counters, no queue Only spinlock. Mutex only when absolute lock is requested.
Timeout to avoid deadlock Separate from main lock table
Physical Contention Required Functionality Main Lock Table Intent Lock Table Low High High Low
11/26
3. Deadlock Handling
Traditional approaches have some drawback Deadlock Prevention (e.g., wound-wait/wait-die) cause many false positives Deadlock Detection (Cycle Detection) Infrequent check: delay Frequent/Immediate check: not scalable many cores Timeout: false positives, delays, hard to configure.
can on 12/26
Solution: Dreadlocks
[Koskinen et al '08] Immediate deadlock detection Local Spin: Scalable and Low-overhead Almost * no false positives (*)due to Bloom filter More details in paper Issues specific to databases: Lock modes, queues and upgrades Avoid pure spinning to save CPU cycles Deadlock resolution for flush pipeline 13/26
4. Early Lock Release
A T1:S
Resources
B T1:S C T3:X
Transactions
T1 T2:X T3:S
Locks S: Read X: Write
T2 T3 T4 T5
10ms-
[DeWitt et al'84] [Johnson et al'10] Lock Commit Request
Flush Wait
Unlock …
More and More Locks, Waits, Deadlocks
T1000
Group-Commit Flush-Pipeline
14/26
Prior Work: Aether
[Johnson et al VLDB'10] First implementation of ELR in DBMS Significant speed-up (10x) on many-core Simply releases locks on commit-request "… [must hold] until both their own and their predecessor’s log records have reached the disk. Serial log implementations preserve this property naturally ,…"
LSN
10 11 Dependent 12 Problem: A read-only transaction
Serial Log
T1: Write T1: Commit ELR T2: Commit bypasses logging 15/26
Anomaly of Prior ELR Technique
Lock-queue: "D"
Event Latest LSN Durable LSN
T1:S
D is 20!
T2: D=20 (T1: Read D) T2: Commit-Req T1: Read D T1: Commit 1 2 3 4 5 0 0 0 0 1 ..
2 T1 T2: Commit ..
3 16/26
Naïve Solutions
Flush wait for Read-Only Transaction Orders of magnitude higher latency.
Short read-only query: microseconds Disk Flush: milliseconds Do not release X-locks in ELR (S-ELR) Concurrency as low as No-ELR After all, all lock-waits involve X-locks 17/26
Safe SX-ELR: X-Release Tag
Lock-queue: "D"
Event Latest LSN
T2:X T1:S tag T2: D=20 (T1: Read D) 1 2 T2: Commit-Req 3 max-tag Lock-queue: "E" E=5 T1 T1: Read D (max-tag=3) T1: Commit-Req 4 5 T3:S tag T3: Read E (max tag=0) & Commit 6 E is 5 T1, T2: Commit 7 T3
Durable LSN
0 0 0 0 1 2 3 18/26
Safe SX-ELR: Summary
Serializable yet Highly Concurrent Safely release all kinds of locks Most read-only transaction quickly exits Only necessary threads get waited Low Overhead Just LSN comparison Applicable to Coarse Locks Self-tag and Descendant-tag SIX/IX: Update Descendant-tag. X: Upd. Self-tag IS/IX: Check Self-tag. S/X/SIX: Check Both 19/26
Experiments
TPC-B: 250MB of data, fits in bufferpool Hardware Sun-Niagara: 64 Hardware contexts HP Z600: 6 Cores. SSD drive Software Foster B-trees (Modified) in Shore-MT (Original) with/without each technique Fully ACID, Serializable mode.
20/26
Key Range Locks
Z600, 6-Threads, AVG & 95% on 20 Runs 21/26
Lightweight Intent Lock
Sun Niagara, 60 threads, AVG & 95% on 20 Runs 22/26
Dreadlocks vs Traditional
Sun Niagara, AVG on 20 Runs 23/26
Early Lock Release (ELR)
Z600, 6-Threads, AVG & 95% on 20 Runs
HDD Log SSD Log
SX-ELR performs 5x faster.
S-only ELR isn’t useful All improvements combined, -50x faster.
24/26
Related Work
ARIES/KVL, IM [Mohan et al] Key range locking [Lomet'93] Shore-MT at EPFL/CMU/UW-Madison Speculative Lock Inheritance [Johnson et al'09] Aether [Johnson et al'10] Dreadlocks [Koskinen and Herlihy'08] H-Store at Brown/MIT 25/26
Wrap up
Locking as bottleneck on Modern H/W Revisited all aspects of database locking 1.
2.
3.
4.
Graefe Lock Modes Lightweight Intent Lock Dreadlock Early Lock Release All together, significant speed-up (-50x) Future Work: Buffer-pool 26/26
27/26
Reserved: Locking Details
28/26
Transactional Processing
High Concurrency Very Short Latency Fully ACID-compliant Relatively Small Data # Digital Transactions CPU Clock Speed 1000 100 10 1 1972 1982 1992 2002 2012 Modern Hardware 29/26
Many-Cores and Contentions Logical Contention Physical Contention Mutex or Spinlock Shared Resource Critical Section
0 1 0 1 1 1 0 0
Doesn't Help, even Worsens!
30/26
Background: Fence keys
A~ A M V
Define key ranges in each page.
A~ A C E ~M ~Z A~ C~ B~ ~C ~E
31/26
Key-Range Lock Mode
[Lomet '93] Adds a few new lock modes Consists of 2 parts; Range and Key
10
RangeX-S X S
20
RangeI-N I * (*) Instant X lock
30
RangeS-S S (RangeN-S) But, still lacks a few lock modes 32/26
Example: Missing lock modes
SELECT Key=15
RangeS-N?
10
RangeS-S
20
X
UPDATE Key=20 30
RangeA-B 33/26
Graefe Lock Modes
* New lock modes (*) S≡SS X≡XX 34/26
(**) Ours locks the key
prior
to the range while SQL Server uses
next
-key locking.
Next-key locking RangeS-N ≈ Prior-key locking NS 35/26
LIL: Lock-Request Protocol
36/26
LIL: Lock-Release Protocol
37/26
Dreadlocks
[Koskinen et al '08]
A waits for B ( live lock )
A B C
(dead lock) Thread
E A
1. does it contain me?
B C D D E
deadlock!!
Digest *
{B}
2. add it to myself
(*) actually a Bloom filter (bit-vector).
{E,C} D 38/26
Naïve Solution: Check Page-LSN?
Page LSN M Page Z Page Log-buffer T2
1
: T2, D, 10 → 20 E=5
2
: T2, Z, 20 → 10
3
: T2, Commit T1 immediately exits if durable-LSN ≥ 1 ?
Read-only transaction can exit only after Commit Log of dependents becomes durable.
39/26
Deadlock Victim & Flush Pipeline 40/26
Victim & Flush Pipeline (Cont'd)
41/26
Dreadlock + Backoff on Sleep
TPC-B, Lazy commit, SSD, Xct-chain max 100k 42/26
Related Work: H-Store/VoltDB
Differences Disk-based DB ↔ Pure Main-Memory DB Shared-everything ↔ -nothing in each node Foster B-Trees/Shore-MT VoltDB (Note: both are shared-nothing across-nodes) Distributed Xct
RAM RAM
Pros / Cons
- Accessible RAM per CPU - Simplicity and Best-case Performance
Keep 'em, but improve 'em.
Get rid of latches.
Both are interesting directions.
43/26
Reserved: Foster B-tree Slides
44/26
Latch Contention in B-trees
1. Root-leaf EX Latch 2. Next/Prev Pointers
45/26
Foster B-trees Architecture
A~ A M V ~Z
1. Fence-keys
A~ A C E ~M A~ B~ ~C C~ ~E
2. Foster Relationship
cf. B-link tree [Lehman et al‘81] 46/26
More on Fence Keys
Efficient Prefix Compression High: "AAP" Low: "AAF" "AAI31" "I31" Poor man's normalization Powerful B-tree Verification Efficient yet Exhaustive Verification Simpler and More Scalable B-tree No tree-latch B-tree code size Halved Key Range Locking Slot array "I3" "J1" "I31", xxx Tuple 47/26
B-tree lookup speed-up
No Locks. SELECT-only workload.
48/26
Insert-Intensive Case
6-7x Speed-up
Log-Buffer Contention Bottleneck Will port
"Consolidation Array"
[Johnson et al] Latch Contention Bottleneck 49/26
Chain length: Mixed 1 Thread
50/26
Eager-Opportunistic
51/26
B-tree Verification
4 1,5 1 0,5 0 3,5 3 2,5 2 None
Verification Type
Fence Key 52/26