Transcript ppt

The read-copy-update mechanism for supporting real-time applications
on shared-memory multiprocessor systems with Linux
Guniguntala et al.

Publish-Subscribe
◦ insertion
◦ reader-writer synchronization

Wait for pre-existing readers to complete
◦ deletion
◦ change – wait for readers – free
◦ safe memory reclamation

Maintain multiple versions of update objects
◦ for readers
Starting List
New Node
Copy B to B’ and Modify
Move A.Next to B’
B still visible, but not for new readers
Readers complete, remove B
RCU Semantics: A First Attempt
McKenney & Walpole

Reader-Writer
◦ rcu_assign_pointer()
p->a = 1;
p->b = 2;
p->c = 3;
rcu_assign_pointer(gp, p);
◦ rcu_dereference()
◦ Memory barriers embedded
in API

Writer-Collection
◦ rcu_synchronize() blocks
caller until safe to collect
◦ call_rcu() is asychronous
call for collection

Reader-Collection (?)

General issues in non-blocking & swap-free

Non-blocking queue
◦ When is it safe to free memory?
◦ Memory reclamation tracking can be relatively costly
◦ Expensive atomic operations / memory barriers required
◦ Atomic operation expense
 CAS (15-25 clock cycles on P4)
◦ Retry on contention

Non-blocking synchronization
◦ Atomic operation expense
 store_conditional
◦ Data structure copy expense

With interactions between reader, writer and
collector, when is it time to reclaim memory?
◦ Writer identifies what to collect and trigger
collection to occur (synchronously or asynch)
◦ Readers (indirectly) indicate when to collect by no
longer referencing the freed object

One solution for collector:
◦ Track copies of global pointer into thread-local
memory
 Each thread maintains a list of it’s currently active
pointers
◦ Collector checks the thread-local list prior to
memory reclamation

Sounds a lot like the hazard pointer !

Hazard Pointer Disadvantages:
◦ Required manual identification of hazard references
◦ Expensive on the read path
 Requires two memory barriers on the read path
 Copy of the global pointer to local reference
 Entry of hazard pointer into the list
 Every read thread incurs this extra overhead as the
cost for correct memory reclamation. Expensive for
many-reader situations

RCU -> Collection based on ‘quiescent state’
◦ Threads prevent the occurrence of quiescent state
while their local memory is alive
◦ Collector indirectly observes state of all threads to
infer when safe to reclaim memory
◦ The definition chosen for ‘quiescent state’ will
significantly impact performance
 Best choice: Infer by operations that occur anyway

Reader-Collection
◦ rcu_read_lock()
◦ rcu_read_unlock()
◦ read-side critical section
rcu_read_lock();
retval = rcu_dereference(gbl_foo)->a;
rcu_read_unlock();
return retval;

Non-preemptible kernel
◦ Programming convention is
to avoid yielding in the readside critical section
◦ Memory reclamation on
voluntary context switch
◦ rcu_read_lock/unlock calls
do nothing in nonpreemptible kernel

‘Simple case’: Non-preemptible kernel
◦ All threads use read-side critical section with no voluntary yield
 no context switch within a read-side critical section
◦ Collector observes all CPU to determine when all threads have
undergone a context switch
 Indicates a pass into a quiescent state
 All previous read-side critical sections are now guaranteed to have
exited
 Any new threads no longer have visibility to removed object
◦ Safe–conservative-imprecise–degrades real-time
 Detection of quiescent state occurs after last reader use
 Collector waits for all readers to finish even if not all readers were
accessing the memory to be reclaimed
 Delay real-time response due to refusal to yield within read-side critical

Read-side critical section
◦ Readers can now be preempted in their read-side critical
◦ Disable preemption on entry and re-enable on exit

Memory freed using synchronize_sched()
◦ Counts scheduler preemptions

Benefits and trade-offs
◦ Allows use of RCU with preemptible kernel
◦ Read-side critical section won’t be preempted by RT
events, negative consequences for RT responsiveness
◦ Additional read-side work to disable/enable preemption

Global counter
◦ Atomic increment in rcu_read_lock()
◦ Atomic decrement in rcu_read_unlock()

Quiescent state defined as global counter=0

Not practical
◦ As CPU count increases, counter may never reach 0

Use two-element array as counter
◦ Atomically increment/decrement as
matched pair of ‘current’ and ‘last’
counter
◦ Grace period starts – swap sense of
‘current’ and ‘last’, proceed to only
decrement the ‘last’ counter
◦ Counter eventually reaches 0, marking
end of grace period

High overhead due to memory
contention / cache misses




2xN arrays, N=thread
count (2 per thread)
Global index
Updated with
rcu_read_lock() and
rcu_read_unlock()
Requires a graceperiod detection state
machine

Improves read-side performance
◦ Avoids cache-miss
◦ Does not require (expensive) atomic instructions
◦ Does not require (expensive) memory barriers

Requires state-machine for grace period
detection

Indefinite delays in read-side critical sections

Priority boost would work – but relatively
expensive and not required for every reader

Solution is to defer priority boosting
◦
◦
◦
◦
Extends grace period
Exhausts memory since no collection can occur
Writers cannot allocate memory
Need to prevent low-priority threads from being
indefinitely preempted
◦ Preempted read-side critical threads added to list
◦ List serves as an ‘aging’ tracker

Issue List

Global definition of grace period
◦ Single delayed thread in read-side critical section
can stall memory reclamation for everyone
◦ Stall occurs even though reader’s data is unrelated
to memory trying to be reclaimed

RCU Control Block
◦ Reader/updater invocations share defined control
blocks
◦ Readers won’t block reclamation for unrelated
idx = srcu_read_lock(&scb)
control blocks
/* read-side critical */
srcu_read_unlock(&scb, idx)
/* collection */
synchronize_srcu(&scb)
RCU Performance Comparisons
Fast concurrent reads
Relatively slow writers
Preemption & RT support requires increased read-side work