Transcript Slides

Efficient Locking Techniques for Databases on Modern Hardware Hideaki Kimura #* Goetz Graefe + # Brown University * Microsoft Jim Gray Systems Lab Harumi Kuno + + Hewlett-Packard Laboratories at

ADMS'12

Slides/papers available on request. Email us:

[email protected]

, [email protected]

, [email protected]

Traditional DBMS on Modern Hardware Optimized for Magnetic Disk Bottleneck

Disk I/O Costs Other Costs

Useful Work

Then What’s This?

Fig. Instructions and Cycles for New Order

[S. Harizopoulos et al. SIGMOD‘08] 2/26

Context of This Paper Achieved up to 6x overall speed-up Foster B-trees Consolidation Array, Flush-Pipeline Shore-MT/Aether [Johnson et al'10]

This Paper

Work in progress 3/26

Our Prior Work: Foster B-trees

[TODS'12]

 Foster Relationship  Fence Keys  Simple Prefix Compression  Poor-man's Normalized Keys  Efficient yet Exhaustive Verification

Implemented by modifying Shore-MT and compared with it: Low Latch Contention High Latch Contention 2-3x speed-up 6x speedup

On Sun Niagara. Tested without locks. only latches.

4/26

Talk Overview 1) 2) 3) 4) Key Range Locks w/ Higher Concurrency

Combines fence-keys and Graefe lock modes

Lightweight Intent Lock

Extremely Scalable and Fast

Scalable Deadlock Detection

Dreadlocks Algorithm applied to Databases

Serializable Early-Lock-Release

Serializable all-kinds ELR that allows read-only transaction to bypass logging

5/26

SELECT Key=10 S

1. Key Range Lock

Gap

10

SELECT Key=15 UPDATE Key=30

20

X

30

SELECT Key=20 ~ 25  Mohan et al. : Locks neighboring key.  Lomet et al.: Adds a few new lock modes. (e.g., RangeX-S)

Still lacks a few lock modes, resulting in lower concurrency.

6/26

Our Key Range Locking

 Graefe Lock Modes. All 3*3=9 modes Fence Keys D E E E A E B … E Z F  Use Fence Keys to lock on page boundary  Create a ghost record (pseudo deleted record) before insertion as a separate Xct.

7/26

2. Intent Lock

 Coarse level locking (e.g., table, database) [Gray et al]  Intent Lock (IS/IX) and Absolute Lock (X/S/SIX)  Saves overhead for large scan/write transactions (just one absolute lock) 8/26

Intent Lock: Physical Contention Logical Physical

Lock Queues

DB-1

IS IX

DB-1

IS IX

VOL-1

IS IX

VOL-1

IS IX

IND-1

IS

Key-A

IX S

Key-B

X

IND-1

IS

Key-A

S

Key-B

IX X

9/26

Lightweight Intent Lock

Logical DB-1

IS IX

Physical

IS IX S

VOL-1

IS IX

DB1 1 VOL1 1 IND1 1 1 1 1 0 0 0 IND-1

IS IX X

0 0 0

Counters for Coarse Locks No Lock Queue, No Mutex

Key-A Key-B

S X

Key-A Key-B

S X

Lock Queues for Key Locks

10/26

Intent Lock: Summary

 Extremely Lightweight for Scalability  Just a set of counters, no queue  Only spinlock. Mutex only when absolute lock is requested.

 Timeout to avoid deadlock  Separate from main lock table

Physical Contention Required Functionality Main Lock Table Intent Lock Table Low High High Low

11/26

3. Deadlock Handling

Traditional approaches have some drawback  Deadlock Prevention (e.g., wound-wait/wait-die) cause many false positives  Deadlock Detection (Cycle Detection)  Infrequent check: delay  Frequent/Immediate check: not scalable many cores  Timeout: false positives, delays, hard to configure.

can on 12/26

Solution: Dreadlocks

[Koskinen et al '08]  Immediate deadlock detection  Local Spin: Scalable and Low-overhead  Almost * no false positives (*)due to Bloom filter  More details in paper Issues specific to databases:   Lock modes, queues and upgrades Avoid pure spinning to save CPU cycles  Deadlock resolution for flush pipeline 13/26

4. Early Lock Release

A T1:S

Resources

B T1:S C T3:X

Transactions

T1 T2:X T3:S

Locks S: Read X: Write

T2 T3 T4 T5

10ms-

[DeWitt et al'84] [Johnson et al'10] Lock Commit Request

Flush Wait

Unlock …

More and More Locks, Waits, Deadlocks

T1000

Group-Commit Flush-Pipeline

14/26

Prior Work: Aether

[Johnson et al VLDB'10]  First implementation of ELR in DBMS  Significant speed-up (10x) on many-core  Simply releases locks on commit-request "… [must hold] until both their own and their predecessor’s log records have reached the disk. Serial log implementations preserve this property naturally ,…"

LSN

10 11 Dependent 12 Problem: A read-only transaction

Serial Log

T1: Write T1: Commit ELR T2: Commit bypasses logging 15/26

 

Anomaly of Prior ELR Technique

Lock-queue: "D"

Event Latest LSN Durable LSN

T1:S

D is 20!

T2: D=20 (T1: Read D) T2: Commit-Req T1: Read D T1: Commit 1 2 3 4 5 0 0 0 0 1 ..

2 T1 T2: Commit ..

3 16/26

Naïve Solutions

Flush wait for Read-Only Transaction Orders of magnitude higher latency.

 Short read-only query: microseconds  Disk Flush: milliseconds  Do not release X-locks in ELR (S-ELR) Concurrency as low as No-ELR After all, all lock-waits involve X-locks 17/26

 

Safe SX-ELR: X-Release Tag

Lock-queue: "D"

Event Latest LSN

T2:X T1:S tag T2: D=20 (T1: Read D) 1 2 T2: Commit-Req 3 max-tag Lock-queue: "E" E=5 T1 T1: Read D (max-tag=3) T1: Commit-Req 4 5  T3:S tag T3: Read E (max tag=0) & Commit 6 E is 5 T1, T2: Commit 7 T3

Durable LSN

0 0 0 0 1 2 3 18/26

Safe SX-ELR: Summary

 Serializable yet Highly Concurrent Safely release all kinds of locks  Most read-only transaction quickly exits Only necessary threads get waited  Low Overhead Just LSN comparison  Applicable to Coarse Locks Self-tag and Descendant-tag SIX/IX: Update Descendant-tag. X: Upd. Self-tag IS/IX: Check Self-tag. S/X/SIX: Check Both 19/26

Experiments

 TPC-B: 250MB of data, fits in bufferpool  Hardware  Sun-Niagara: 64 Hardware contexts  HP Z600: 6 Cores. SSD drive  Software  Foster B-trees (Modified) in Shore-MT (Original) with/without each technique  Fully ACID, Serializable mode.

20/26

Key Range Locks

Z600, 6-Threads, AVG & 95% on 20 Runs 21/26

Lightweight Intent Lock

Sun Niagara, 60 threads, AVG & 95% on 20 Runs 22/26

Dreadlocks vs Traditional

Sun Niagara, AVG on 20 Runs 23/26

Early Lock Release (ELR)

Z600, 6-Threads, AVG & 95% on 20 Runs

HDD Log SSD Log

 SX-ELR performs 5x faster.

 S-only ELR isn’t useful  All improvements combined, -50x faster.

24/26

Related Work

ARIES/KVL, IM [Mohan et al]  Key range locking [Lomet'93]  Shore-MT at EPFL/CMU/UW-Madison  Speculative Lock Inheritance [Johnson et al'09]  Aether [Johnson et al'10]  Dreadlocks [Koskinen and Herlihy'08]  H-Store at Brown/MIT 25/26

Wrap up

 Locking as bottleneck on Modern H/W  Revisited all aspects of database locking 1.

2.

3.

4.

Graefe Lock Modes Lightweight Intent Lock Dreadlock Early Lock Release  All together, significant speed-up (-50x)  Future Work: Buffer-pool 26/26

27/26

Reserved: Locking Details

28/26

Transactional Processing

 High Concurrency  Very Short Latency  Fully ACID-compliant  Relatively Small Data # Digital Transactions CPU Clock Speed 1000 100 10 1 1972 1982 1992 2002 2012 Modern Hardware 29/26

Many-Cores and Contentions  Logical Contention  Physical Contention Mutex or Spinlock Shared Resource Critical Section

0 1 0 1 1 1 0 0

Doesn't Help, even Worsens!

30/26

Background: Fence keys

A~ A M V

Define key ranges in each page.

A~ A C E ~M ~Z A~ C~ B~ ~C ~E

31/26

Key-Range Lock Mode

[Lomet '93]  Adds a few new lock modes  Consists of 2 parts; Range and Key

10

RangeX-S X S

20

RangeI-N I * (*) Instant X lock

30

RangeS-S S (RangeN-S)  But, still lacks a few lock modes 32/26

Example: Missing lock modes

SELECT Key=15

RangeS-N?

10

RangeS-S

20

X

UPDATE Key=20 30

RangeA-B 33/26

Graefe Lock Modes

* New lock modes (*) S≡SS X≡XX 34/26

(**) Ours locks the key

prior

to the range while SQL Server uses

next

-key locking.

Next-key locking RangeS-N ≈ Prior-key locking NS 35/26

LIL: Lock-Request Protocol

36/26

LIL: Lock-Release Protocol

37/26

Dreadlocks

[Koskinen et al '08]

A waits for B ( live lock )

A B C

(dead lock) Thread

E A

1. does it contain me?

B C D D E

deadlock!!

Digest *

{B}

2. add it to myself

(*) actually a Bloom filter (bit-vector).

{E,C} D 38/26

Naïve Solution: Check Page-LSN?

Page LSN M Page Z Page Log-buffer T2 

1

: T2, D, 10 → 20 E=5

2

: T2, Z, 20 → 10

3

: T2, Commit T1 immediately exits if durable-LSN ≥ 1 ?

Read-only transaction can exit only after Commit Log of dependents becomes durable.

39/26

Deadlock Victim & Flush Pipeline 40/26

Victim & Flush Pipeline (Cont'd)

41/26

Dreadlock + Backoff on Sleep

TPC-B, Lazy commit, SSD, Xct-chain max 100k 42/26

Related Work: H-Store/VoltDB

Differences  Disk-based DB ↔ Pure Main-Memory DB  Shared-everything ↔ -nothing in each node Foster B-Trees/Shore-MT VoltDB (Note: both are shared-nothing across-nodes) Distributed Xct

RAM RAM

Pros / Cons

- Accessible RAM per CPU - Simplicity and Best-case Performance

Keep 'em, but improve 'em.

Get rid of latches.

Both are interesting directions.

43/26

Reserved: Foster B-tree Slides

44/26

Latch Contention in B-trees

1. Root-leaf EX Latch 2. Next/Prev Pointers

45/26

Foster B-trees Architecture

A~ A M V ~Z

1. Fence-keys

A~ A C E ~M A~ B~ ~C C~ ~E

2. Foster Relationship

cf. B-link tree [Lehman et al‘81] 46/26

More on Fence Keys

  Efficient Prefix Compression High: "AAP" Low: "AAF" "AAI31" "I31" Poor man's normalization Powerful B-tree Verification Efficient yet Exhaustive Verification  Simpler and More Scalable B-tree  No tree-latch  B-tree code size Halved  Key Range Locking Slot array "I3" "J1" "I31", xxx Tuple 47/26

B-tree lookup speed-up

 No Locks. SELECT-only workload.

48/26

Insert-Intensive Case

6-7x Speed-up

Log-Buffer Contention Bottleneck Will port

"Consolidation Array"

[Johnson et al] Latch Contention Bottleneck 49/26

Chain length: Mixed 1 Thread

50/26

Eager-Opportunistic

51/26

B-tree Verification

4 1,5 1 0,5 0 3,5 3 2,5 2 None

Verification Type

Fence Key 52/26