Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Download Report

Transcript Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Speculative Lock Elision:
Enabling Highly Concurrent
Multithreaded Execution
Ravi Rajwar and Jams R. Goodman
Presented by Yang Liu
CPS221 Spring 2008
Outline

Why want to elide locks?

How to elide locks? – SLE

Why does SLE work correctly?

How to implement SLE?

How is performance improved?
Why want to elide locks?


Locks force serialization
Not all locks are required
How to elide locks?
- Atomicity conditions

Within a speculatively executing critical section


Read data is not modified by another thread before
section ends
Written data is not accessed by another thread
before section ends
How to elide locks?
- Eliding silent pairs
How to elide locks? - Steps





Predict lock release store (silent pairs)
Predict atomicity and elide lock acquire
Execute critical section speculatively and
buffer results
If no atomicity, trigger misspeculation,
recover and explicitly acquire lock
If lock release store is seen, elide lock
release, commit state and exit section
How to elide locks? - Example
How to elide locks? - Example
Lock Acquire
How to elide locks? - Example
Lock Acquire
How to elide locks? - Example
Lock Acquire
Lock Release
Why does SLE work correctly?

Two predictions


Predict lock release store
Resolved by monitoring memory location of the stores
Predict memory operation atomicity
Resolved by checking atomicity conditions using
cache coherence mechanisms
How to implement SLE? – Four Aspects

Initiating speculation


Filter, index, confidence metric
Buffering speculative state

Speculative register state



ROB
Register checkpointing
Speculative memory state

Augmented write-buffers
How to implement SLE? – Four Aspects

Misspeculation conditions and detection

Atomicity violations



Violations due to limited resources



ROB
Register checkpointing with access bit
Finite cache/write-buffer/ROB size
Uncached accesses or events
Committing speculative memory state
How to implement SLE?
How is performance improved?
- Evaluation methodology

Three multiprocessor systems

CMP/SMP/DSM

A simple microbenchmark and six applications

Single register checkpoint

32-entry lock predictor indexed by PC
How is performance improved?
- Microbenchmark result
How is performance improved?
- Application result
How is performance improved?
- Application result
Some thoughts

Idea is good




Remove unnecessary serialization
Make programmers’ work easier
No results for using ROB
Why padding the benchmarks to reduce false
sharing?

Shouldn’t this be done by SLE?