1-HTM_large.pptx

Download Report

Transcript 1-HTM_large.pptx

Hardware-based
Transactional Memory
Supporting Large Transactions
Anvesh Komuravelli
Abe Othman
Kanat Tangwongsan
Concurrent Programs
handle with care
Thread 1
lock_acquire(critical_zone);
obj.x = 7;
find_primes();
// intrusion test
if (obj.x != 7) fireMissiles()
lock_release(critical_zone);
Lock-based Approaches
Thread 2
Deadlock
do_stuff();
obj.x Starvation
= 42;
Complex Program
Transactional Memory
Transactional
Memory
x_begin();
obj.x = 7;
find_primes();
Atomicity
in the face of concurrency.
// intrusion test
if (obj.x !=across
7) fireMissiles()
Consistency
the whole system.
x_finish();
do_stuff();
Isolation
from other transactions.
obj.x = 42;
Programmer: enclose instructions in a transaction.
System: execute transactions concurrently, and if conflict, do
something intelligent (e.g., abort, restart)
Different strokes for different folks
ANL
Java
Pthreads
100
10
1
0.1
50th
80th
Percentile of Transactions
Write Set Size in Kbytes
Read Set Size in Kbytes
1000
1000
ANL
Java
Pthreads
100
10
1
0.1
0.01
50th
80th
Percentile of Transactions
Common Case: 98% transactions fit in L1 => hardware
Fast… Easy conflict detection… Easy commit and abort
Challenges
&
Opportunities
What to do with the rest 2%?
Goal: Hide platform/resource limitations from programmers
VTM – Virtual Transactional Memory
•
•
•
•
•
On overflow, use process’s virtual memory
Tracking at cache-line granularity
Per process state (tag and store virtual addresses)
Flatten nested transactions
Implemented in specialized hardware (dedicated
cache, search logic, …)
• Drawbacks?
– Modifications to hardware. Costly?
XTM – eXtended Transactional
Memory
• “Complete TM Virtualization without complex
hardware”
• Page table per transaction
• Allows arbitrary nesting – no flattening
• The only hardware support – raise an exception on
overflow
• Drawbacks?
– Page granularity on overflows
– Potentially higher memory usage than VTM
– Software commit is costlier than VTM’s hardware commit –
can stall other xactions of the process
Comparing the approaches
8.3
2.5
Versioning
Validation
Commit
Violations
Idle
Useful
2
1.5
1
0.5
tomcatv
[37.7%]
volrend
[0.01%]
radix
[0.26%]
micro-P10
[39.2%]
micro-P20
[60.3%]
VTM
XTM-e
XTM-g
XTM
VTM
XTM-e
XTM-g
XTM
VTM
XTM-e
XTM-g
XTM
VTM
XTM-e
XTM-g
XTM
VTM
XTM-e
XTM-g
XTM
VTM
XTM-e
XTM-g
0
XTM
Normalized Execution Time
3
micro-P30
[60.8%]
An observation
• Small transactions get things done in the
hardware
• Large transactions spill the buffers and TM
switches to virtual mode
• What about varyingly large transactions?
– What if everything fits again in the buffers?
– Can we switch back to hardware mode?
Towards improving virtualization
• Permissions-only cache – reduces the chance
of overflowing buffers significantly
– At the cost of a little extra hardware
• The already less frequent (assumed to be!)
large transactions are even lesser
• Large transactions are serialized and handled
one-at-a-time.
Towards improving virtualization
Do we always have only a few large
transactions?
• For now: yes
• In the future: maybe not
• I/O and blocking system calls might wish to be
atomic
• How do the earlier discussed approaches fare?
– VTM – complex hardware
– XTM – complications with OS and page granularity
– OneTM – can lead to starvation!
TokenTM
• Uses tokens to monitor memory blocks
– To read, you get a token
– To write, you need to get every token
• Rigorous bookkeeping – blocks are tracked in
caches, memory and disk
• Handles large transactions gracefully
– Except for conflicts, transaction speed is
unaffected by large transactions in other threads
TokenTM Downsides
• Small transactions suffer(?)
– L1 cache sized transactions can work at hardware
speed….BUT:
• Need flash-clear and flash-OR circuits in L1 cache
• Requires a very involved ad hoc representation
• …or taking a 3% overhead hit
• Optimizes the rare large case to the detriment
of the frequent small case?
Conclusion
• Sun Research’s Transactional Memory
Spotlight:
More recent proposals for “unbounded” HTM
aim to overcome these disadvantages, but Sun
Labs researchers came to the conclusion that
the proposals were sufficiently complex and
risky that they were unlikely to be adopted in
mainstream commercial processor designs in
the near future.