Space-Efficient TCAM-based Classification Using Gray Coding
Download
Report
Transcript Space-Efficient TCAM-based Classification Using Gray Coding
Scheduling-based TM Contention Management
A survey talk
Danny Hendler
Ben-Gurion university
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM: a research toy?
“Why STM can be more
than a research toy.”
“TM:Why it is only a
research toy.”
Dragojevic et al., 2009
Cascaval et al., 2008
There is consensus that TM performance must be improved. Specifically:
“In some workloads, performance degraded when we used too many concurrent
threads. One possible alternative to improving performance in these cases
would be to modify the thread scheduler so it avoids running more
concurrent threads than is optimal for a given workload, based on the information
provided by the STM runtime. “ [Dragojevic et al.]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM scheduling: rationale
Transactional threads controlled by TM-aware scheduler
o Kernel-level, user-level
Richer “tool-box“ for reducing and/or preventing
transaction conflicts
Improve performance under high-contention
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Conventional” (non-scheduling)
contention management
Non TM-scheduled
threads
Contention
Manager
arbitrate
Contention
Detection
TM system
Abort/retry, wait
proceed
“Polymorphic contention management”, [Guerraoui, Herlihy & Pochon DISC'05]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Suicide
Polite
Karma
greedy
Aggressive
Polka
Conventional Contention
Management is often problematic
Loser resumes execution after a waiting period
May resume execution too early
May resume execution too late
Repeated collisions occur under high contention
Livelocks
Performance may become worse than single lock
Scheduling-based CM to the rescue.
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline
Preliminaries
The first TM schedulers
Later user-land work
Kernel support
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
The first TM schedulers
“Adaptive Transaction Scheduling for transactional memory
systems”
[Yoo & Lee, SPAA'08]
“CAR-STM: Scheduling-based collision avoidance and resolution for
software transactional memory”
[Dolev, Hendler & Suissa, PODC '08]
“Steal-on-abort: dynamic transaction reordering to reduce conflicts
in transactional memory”
[Ansari , Jarvis, Kirkham, Kotsedilis, Lujan and Watson, HiPEAC'09]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Adaptive Transaction Scheduling (ATS)
(Yoo & Lee, SPAA'08)
A single scheduling queue
Per-thread Contention Intensity (CI) computed
Adaptive mechanism
CI below threshold transaction begins normally
CI above threshold transaction serialized (queued)
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
ATS: adaptive parallelism control
Timeline flows from
top to bottom
An average of all the
CIs from running threads
Contention Intensity
Threshold
Timeline 1
Transactions begin execution
without resorting to the scheduler
Timeline 2
As contention starts to increase,
some transactions call the scheduler
Timeline 3
As more transactions get serialized,
contention intensity starts to decrease
Timeline 4
Contention intensity subsides
below threshold
Timeline 5
More transactions start without the
scheduler to exploit more parallelism
time
Executing Transaction
Queued Transaction
Behavior of a Queue-Based Scheduler
ATS adaptively varies the number of concurrent transactions
according to the dynamic contention feedback
Yoo & Lee, Transaction Scheduling.
CAR-STM (Collision Avoidance and Resolution for STM)
(Dolev, Hendler & Suissa, PODC'08)
Per-core transaction queues
Serialize conflicting transactions
Contention avoidance: attempt to avoid even first collision
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
CAR-STM high-level architecture
Transaction
thread
T-Info
Dispatcher
Collision
Avoider
TQ
thread
TQ
thread
Serializing
contention
mgr.
Danny Hendler
Transaction
queue #1
Transaction
queue #k
Core #1
Core #k
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Execution time: STMBench7*
R/W dominated workloads
(*) “STMBench7: a benchmark for STM”, [Guerraoui, Kapalka & Viteck., Eurosys'07]
Shortcomings of first TM schedulers
May restrict parallelism too much
ATS: a single serialization queue
CAR-STM: at most a single transactional thread per core
o High overheads even in the lack of contention
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline
Preliminaries
The first TM schedulers
Later user-land work
Kernel support
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer
Avoid repeated collisions while minimizing over-serialization
No per-core queues
Adaptive
• “On the impact of serializing contention management on STM performance”,
[Heber, Hendler & Suissa., opodis'09]
• “Scheduling support for TM contention management”,
[Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
Transactional
threads
Condition
variables
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
3) t changes status of
t' to ABORT (writes
that t is winner)
t'
t
1) t Identifies a
collision
4) t' identifies it was
aborted
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
2) t calls contention
manager: ABORT_OTHER
A low-overhead serializer (cont'd)
6) Eventually t commits and
broadcasts on its condition
variable…
t
5) t' rolls back transaction and
goes to sleep on the condition
variable of t
t'
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
t'
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
Stabilization mechanism
Algorithm is adaptive
o Serializing mode / “Conventional’’ mode
Prevents “mode-oscillations”:
o Shifting to serialization-mode reduces perceived contention
o Should use two thresholds
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
Experimental evaluation
CAR-STM over-serializes compared with
Throughput of CM conventional algorithms is low
Stabilization mechanism helps
low-overhead serializer
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – collision prevention/avoidance
Predicts future accesses based on past accesses
o Read-set predicted based on past few committed/aborted TXs
(temporal locality)
o Write-set predicted based on immediately preceding aborted TX
Serialize with a probability proportional to the number of threads
currently serialized (serialization affinity), if thread's success rate is
low and a collision is predicted
• “Preventing versus curing: avoiding conflicts in transactional memories”,
[Dragojevic, Guerraoui, Singh and Singh, PODC'09]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – collision prevention (cont'd)
Don't serialize when contention is low
Serialize only if contention is high &
a collision is “predicted”
Danny Hendler
Update statistics, release lock if you own it
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – experimental evaluation
STMBench7, read-write workload
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline
Preliminaries
The first TM schedulers
Later user-land work
Kernel support
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Scheduling Support for Transactional Memory
Contention Management
Implement CM scheduling support in the kernel scheduler
(Linux & OpenSolaris)
(Strict) serialization
Soft serialization
Time-slice extension
Different mechanisms for communication between userlevel STM library and kernel scheduler
• “Scheduling support for TM contention management”,
[Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM Library / Kernel Communication via
Shared Memory Segment (Ser-k algorithm)
User code notifies kernel on events such as: transaction start, commit
and abort (in which case thread yields)
Kernel code handles moving thread between ready and blocked queues
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Soft Serialization
Instead of blocking, reduce loser thread priority and yield
Efficient in scenarios where loser transactions may take a
different execution path when retrying
Priority should be restored upon commit or when conflicting
transactions terminate
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Time-slice extention
Preemption in the midst of a transaction increases conflict
“window of vulnerability”
Defer preemption of transactional threads
avoid CPU monopolization by bounding number of extensions and
yielding after commit
May be combined with serialization/soft serialization
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Evaluation (STMBench7, 16-core AMD Opterom)
Conventional CM deteriorates
when threads>cores
Danny Hendler
Serializing by local spinning is efficient
as long as threads ≤ cores
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Evaluation - STMBench7 throughput
Serializing by sleeping on condition var is best when threads>cores,
All strict serialization schemes significantly reduce aborts
since system call overhead is negligible (long transactions)
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Additional TM scheduling work
“Transactional scheduling for read-dominated workloads”
[Attiya & Milani, OPODIS'09]
“Taking the heat off transactions: dynamic selection of pessimistic
concurrency control [Sonmez, Harris, Cristal, Unsal & Valeo, IPDPS'09]
“Proactive transaction scheduling for contention management”
[Blake, Dreslinky & Mudge, MICRO'09]
“Improving performance by reducing aborts in HTM”
[Ansari, Khan, Lujan, Kotselidis, Kirkham and Watson, HIPEAC'10]
“Window-based greedy contention management for TM”
[Sharma, Estrade & Busch, DC'10]
“On Transaction Scheduling in distributed TM systems]
[Kim & Ravindran, 2010]
“Kernel-assisted Scheduling and Deadline Support for STM”
[Maldonado, Marlier, Felber, Lawall, Muller & Riviere, DSN'11]
“Adaptive thread scheduling techniques for improving scalability of STM”
[Chan, Lam & Wang, 2011]
…
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Scheduling support for TM
Conclusions & future work
Scheduling-based CM results in improved throughput under
high contention
o Overhead is negligible when contention is low
Lightweight kernel support can improve performance and
efficiency for some workloads
Dynamically selecting best CM algorithm for
workload at hand is a challenging research
direction
o Machine learning?
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Thank you.
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome