Space-Efficient TCAM-based Classification Using Gray Coding

Download Report

Transcript Space-Efficient TCAM-based Classification Using Gray Coding

Scheduling-based TM Contention Management
A survey talk
Danny Hendler
Ben-Gurion university
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM: a research toy?
“Why STM can be more
than a research toy.”
“TM:Why it is only a
research toy.”
Dragojevic et al., 2009
Cascaval et al., 2008
There is consensus that TM performance must be improved. Specifically:
“In some workloads, performance degraded when we used too many concurrent
threads. One possible alternative to improving performance in these cases
would be to modify the thread scheduler so it avoids running more
concurrent threads than is optimal for a given workload, based on the information
provided by the STM runtime. “ [Dragojevic et al.]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM scheduling: rationale
 Transactional threads controlled by TM-aware scheduler
o Kernel-level, user-level
 Richer “tool-box“ for reducing and/or preventing
transaction conflicts
Improve performance under high-contention
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Conventional” (non-scheduling)
contention management
Non TM-scheduled
threads
Contention
Manager
arbitrate
Contention
Detection
TM system
Abort/retry, wait
proceed
“Polymorphic contention management”, [Guerraoui, Herlihy & Pochon DISC'05]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Suicide
Polite
Karma
greedy
Aggressive
Polka
Conventional Contention
Management is often problematic
 Loser resumes execution after a waiting period


May resume execution too early
May resume execution too late
 Repeated collisions occur under high contention


Livelocks
Performance may become worse than single lock
Scheduling-based CM to the rescue.
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline




Preliminaries
The first TM schedulers
Later user-land work
Kernel support
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
The first TM schedulers
 “Adaptive Transaction Scheduling for transactional memory
systems”
[Yoo & Lee, SPAA'08]
 “CAR-STM: Scheduling-based collision avoidance and resolution for
software transactional memory”
[Dolev, Hendler & Suissa, PODC '08]
 “Steal-on-abort: dynamic transaction reordering to reduce conflicts
in transactional memory”
[Ansari , Jarvis, Kirkham, Kotsedilis, Lujan and Watson, HiPEAC'09]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Adaptive Transaction Scheduling (ATS)
(Yoo & Lee, SPAA'08)
 A single scheduling queue
 Per-thread Contention Intensity (CI) computed
 Adaptive mechanism
 CI below threshold  transaction begins normally
 CI above threshold  transaction serialized (queued)
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
ATS: adaptive parallelism control
Timeline flows from
top to bottom
An average of all the
CIs from running threads
Contention Intensity
Threshold
Timeline 1
Transactions begin execution
without resorting to the scheduler
Timeline 2
As contention starts to increase,
some transactions call the scheduler
Timeline 3
As more transactions get serialized,
contention intensity starts to decrease
Timeline 4
Contention intensity subsides
below threshold
Timeline 5
More transactions start without the
scheduler to exploit more parallelism
time
Executing Transaction
Queued Transaction
Behavior of a Queue-Based Scheduler
ATS adaptively varies the number of concurrent transactions
according to the dynamic contention feedback
Yoo & Lee, Transaction Scheduling.
CAR-STM (Collision Avoidance and Resolution for STM)
(Dolev, Hendler & Suissa, PODC'08)
 Per-core transaction queues
 Serialize conflicting transactions
 Contention avoidance: attempt to avoid even first collision
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
CAR-STM high-level architecture
Transaction
thread
T-Info
Dispatcher
Collision
Avoider
TQ
thread
TQ
thread
Serializing
contention
mgr.
Danny Hendler
Transaction
queue #1
Transaction
queue #k
Core #1
Core #k
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Execution time: STMBench7*
R/W dominated workloads
(*) “STMBench7: a benchmark for STM”, [Guerraoui, Kapalka & Viteck., Eurosys'07]
Shortcomings of first TM schedulers
 May restrict parallelism too much
 ATS: a single serialization queue
 CAR-STM: at most a single transactional thread per core
o High overheads even in the lack of contention
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline




Preliminaries
The first TM schedulers
Later user-land work
Kernel support
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer
 Avoid repeated collisions while minimizing over-serialization
 No per-core queues
 Adaptive
• “On the impact of serializing contention management on STM performance”,
[Heber, Hendler & Suissa., opodis'09]
• “Scheduling support for TM contention management”,
[Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
Transactional
threads
Condition
variables
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
3) t changes status of
t' to ABORT (writes
that t is winner)
t'
t
1) t Identifies a
collision
4) t' identifies it was
aborted
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
2) t calls contention
manager: ABORT_OTHER
A low-overhead serializer (cont'd)
6) Eventually t commits and
broadcasts on its condition
variable…
t
5) t' rolls back transaction and
goes to sleep on the condition
variable of t
t'
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
t'
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
Stabilization mechanism
 Algorithm is adaptive
o Serializing mode / “Conventional’’ mode
 Prevents “mode-oscillations”:
o Shifting to serialization-mode reduces perceived contention
o Should use two thresholds
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
A low-overhead serializer (cont'd)
Experimental evaluation
CAR-STM over-serializes compared with
Throughput of CM conventional algorithms is low
Stabilization mechanism helps
low-overhead serializer
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – collision prevention/avoidance
 Predicts future accesses based on past accesses
o Read-set predicted based on past few committed/aborted TXs
(temporal locality)
o Write-set predicted based on immediately preceding aborted TX
 Serialize with a probability proportional to the number of threads
currently serialized (serialization affinity), if thread's success rate is
low and a collision is predicted
• “Preventing versus curing: avoiding conflicts in transactional memories”,
[Dragojevic, Guerraoui, Singh and Singh, PODC'09]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – collision prevention (cont'd)
Don't serialize when contention is low
Serialize only if contention is high &
a collision is “predicted”
Danny Hendler
Update statistics, release lock if you own it
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
“Shrink” – experimental evaluation
STMBench7, read-write workload
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Talk outline




Preliminaries
The first TM schedulers
Later user-land work
Kernel support
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Scheduling Support for Transactional Memory
Contention Management
 Implement CM scheduling support in the kernel scheduler
(Linux & OpenSolaris)
 (Strict) serialization
 Soft serialization
 Time-slice extension
 Different mechanisms for communication between userlevel STM library and kernel scheduler
• “Scheduling support for TM contention management”,
[Maldonado, Felber, Fedorova, Hendler, Lawall, Marlier, Muller & Suissa PPoPP'10]
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
TM Library / Kernel Communication via
Shared Memory Segment (Ser-k algorithm)
 User code notifies kernel on events such as: transaction start, commit
and abort (in which case thread yields)
 Kernel code handles moving thread between ready and blocked queues
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Soft Serialization
 Instead of blocking, reduce loser thread priority and yield
 Efficient in scenarios where loser transactions may take a
different execution path when retrying
 Priority should be restored upon commit or when conflicting
transactions terminate
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Time-slice extention
 Preemption in the midst of a transaction increases conflict
“window of vulnerability”
 Defer preemption of transactional threads
 avoid CPU monopolization by bounding number of extensions and
yielding after commit
 May be combined with serialization/soft serialization
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Evaluation (STMBench7, 16-core AMD Opterom)
Conventional CM deteriorates
when threads>cores
Danny Hendler
Serializing by local spinning is efficient
as long as threads ≤ cores
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Evaluation - STMBench7 throughput
Serializing by sleeping on condition var is best when threads>cores,
All strict serialization schemes significantly reduce aborts
since system call overhead is negligible (long transactions)
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Additional TM scheduling work
 “Transactional scheduling for read-dominated workloads”
[Attiya & Milani, OPODIS'09]
 “Taking the heat off transactions: dynamic selection of pessimistic
concurrency control [Sonmez, Harris, Cristal, Unsal & Valeo, IPDPS'09]
 “Proactive transaction scheduling for contention management”
[Blake, Dreslinky & Mudge, MICRO'09]
 “Improving performance by reducing aborts in HTM”
[Ansari, Khan, Lujan, Kotselidis, Kirkham and Watson, HIPEAC'10]
 “Window-based greedy contention management for TM”
[Sharma, Estrade & Busch, DC'10]
 “On Transaction Scheduling in distributed TM systems]
[Kim & Ravindran, 2010]
 “Kernel-assisted Scheduling and Deadline Support for STM”
[Maldonado, Marlier, Felber, Lawall, Muller & Riviere, DSN'11]
 “Adaptive thread scheduling techniques for improving scalability of STM”
[Chan, Lam & Wang, 2011]
 …
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Scheduling support for TM
Conclusions & future work
 Scheduling-based CM results in improved throughput under
high contention
o Overhead is negligible when contention is low
 Lightweight kernel support can improve performance and
efficiency for some workloads
 Dynamically selecting best CM algorithm for
workload at hand is a challenging research
direction
o Machine learning?
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome
Thank you.
Danny Hendler
3rd workshop on the Theory of Transactional Memory, Sep 22-23, 2011, Rome