Transcript Document

Chapter 3
KERNEL
SYNCHRONIZATION
PRIMITIVES
Chen Xiangqun
[email protected]
Outline
 Synchronization
 Mutex
& harware support
locks
 Reader/writer
 Dispatcher
(RW) locks
locks
 Semaphores
Synchronization & Systems Architectures

Synchronization





Supports SMP shared memory architecture
Maintain data integrity to critical data
Code reading or writing data must acquire appropriate
lock
Once data operation completed holder release the lock
Parallel Systems Architectures



SMP — Symmetric multiprocessor
with a shared memory model, single kernel image
MPP — Message-based model, multiple kernel images
NUMA/ccNUMA (NUMAand cache coherent NUMA)
Shared memory model, single kernel image
Hardware Considerations

What is a lock?





The first consideration



A piece of data at a specific location
A simplest lock is a single byte location
A lock is set, or held
lock value = 0xFF
A lock is available
lock value = 0x00
Instructions suitable for locking code
Offer byte-level test-and-set instruction
The second consideration
The visibility of a lock’s state
when it is examined by executing kernel threads
 The lock state must be globally visible to all
processors on the system

Atomic instruction

Atomic instruction properties :
no other store operation allowed
between executing load and store instructions

Three UltraSPARC processor instructions
guarantee atomic behaviour:
 ldstub
(load and store unsigned byte)
 cas (compare and swap)
 swap (swap byte locations)
ldstub instruction
cas instruction
Data flow levels in a processor
Total Store Ordering (TSO) model
SPARC processors offer Total Store Ordering
(TSO) model
loads and stores rules:
 Loads are blocking and are ordered with
respect to other loads
 Stores are ordered with respect to other
stores. Stores cannot bypass earlier loads
 Atomic load-stores (ldstub and cas
instructions) are ordered with respect to
loads
mutex and RW lock implemented

on UltraSPARC-based systems:
use ldstub and cas instructions

on Intel x86
use cmpxchgl
(compare/exchange long) instruction
 memory
barrier (membar) instructions
constraint the ordering of memory access
operations (loads and stores)
Synchronization Objects




mutex lock: commonly used in Solaris
exclusive read and write access to data
reader/writer (RW) lock:
multiple readers allowable
only one writer allowed at a time
dispatcher lock:
special type of mutex lock
used by the kernel dispatcher
semaphores:
access to a finite number of resources
Synchronization Process

When a thread attempts to acquire a lock
encounter one of two possible lock states



free (available)
not free (owned, held)
Turnstiles:
a sleep queue for threads blocking on locks

When thread ends operation, release lock
two possible cases:


No waiters: the lock simply released
Threads are waiting for lock (waiters)
three cases:
 release lock , wake up blocking threads
the first thread to execute acquires lock
 select a thread from turnstile
based on priority or sleep time
wake up only that thread
 select next lock
lock owner hand lock off to selected thread
Solaris Locks diagram
Synchronization Object Operations
Vector

mutex locks, reader/writer locks, and semaphores
define an operations vector linked to kernel threads
that are blocking on the object
 The object’s operations vector is a data structure
that exports a subset of object functions required
for kthreads sleeping on the lock
Synchronization object type

mutex_sobj_ops for mutex locks
rw_sobj_ops
for reader/writer locks
sema_sobj_ops for semaphores

three functions


 owner
function:
returns thread ID that owns the object
 unsleep function:
transitions a kernel thread from a sleep state
 change_pri function:
changes thread priority for priority inheritance
Outline
 Synchronization
 Mutex
& Harware Support
locks
 Reader/writer
 Dispatcher
(RW) locks
locks
 Semaphores
Mutex Locks

a thread to acquire a mutex lock that is being held:

spin:
thread enters a tight loop
attempting to acquire lock in each loop


thread placed on a sleep queue
when lock released sends a wakeup to thread
The benefit of spin:



block:
no overhead of context switching and fast acquisition
Spin’s downside of consuming CPU cycles
The benefit of blocking:


free processor to other threads
disadvantage: require context switching
a little more lock acquisition latency
Lock granularity

A very coarse level
 a simple table-level mutex
 if a thread needs a process
it must acquire the process table mutex
 advantages: simplicity and minimal overhead
 disadvantage: only one thread at a time can
manipulate on process table (hot lock)
 A finer level
 a lock-per-process table entry
 multiple threads manipulate different process
structures at same time
 disadvantages: more complex
chances of deadlock and more overhead
 Solaris implements relatively fine-grained locking
Two types of mutex locks

spin locks:
 a tight loop, if a desired lock is being held

adaptive locks:
 dynamically either spin or block,
depending on the state of the holder
 If the lock holder is running,
the thread to get the lock will spin
 If the thread is not running
the thread to get lock will block
Benefits of two types of mutex locks
 If a thread holding a lock and running
 the lock will be released very soon
 If a lock holder is not running
 then context switch is involved
simply block and free up processor for others
 High-level interrupt not allowed context
switch
 adaptive locks can do context switching

only spin locks used in high-level interrupt
Functions of mutex locks




mutex_init() : lock is initialized only called once
lock type, spin or adaptive, is determined
 most common type :
MUTEX_DEFAULT
 rare case type passed:
MUTEX_SPIN
mutex_enter():
get a lock
executes a compare-and-swap instruction
testing for a zero value
mutex_vector_enter():
If the lock is held or is a spin lock
mutex_exit(): release the lock
Adaptive Locks Implementation

m_owner field:
 holds the address of the thread that owns the
lock (the kthread pointer)
 serves as the actual lock

successful lock acquisition:
 its kthread pointer is in m_owner field of
target lock
 threads attempt to get the lock(waiters):
 bit 0 of m_owner is set
Notice: kthread pointer values do not require bit 0
Spin locks Implementation

m_spinlock:
actual mutex lock bits
 m_dummylock:
The test for a lock held, testing for a zero value
 for spin locks:
Since set to all 1’s in mutex_init(), test will fail
 for a adaptive lock:
The test will also fail too
 m_oldspl:
set to the priority level of running processor
 m_minspl:
stores interrupt priority level when lock initialized
called from interrupts & device driver

An interrupt block cookie added to argument list
when called from device driver
or kernel module that generates interrupts

mutex_init(): checks interrupt block cookie
 If mutex_init() called from a device drive:
the lock type is set to spin
 otherwise:
adaptive lock initialized

interrupts levels above 10 on SPARC:


require spin locks
m_dummylock: set to all 1’s (0xFF)
Function of mutex_vector_enter()
When lock release

When a thread finished working


With no threads waiting for the lock



clearing lock fields (m_owner) and returning
entered mutex_vector_exit() for the simple case
For spin lock released


call mutex_exit() to release the lock
lock field cleared and processor returned to PIL level
For adaptive lock released



select a waiter from turnstile (more than one waiter)
its state changed from sleeping to runnable
placed on a dispatch queue , execute and get the lock
Outline
 Synchronization
 Mutex
& Harware Support
locks
 Reader/writer
 Dispatcher
(RW) locks
locks
 Semaphores
Reader/Writer Locks



Multiple threads reading data at same time
but only one writer
While a writer is holding the lock
no readers are allowed
Basic mechanics of RW locks:
 rw_init():
initialization
 rw_enter():
to acquire the lock, in assembly code
 rw_exit():
to release the lock, in assembly code
Solaris 7 Reader/Writer Lock Structure


bit 0: wait bit, set when waiting for lock
bit 1: wrwant bit
at least one thread is waiting for a write lock
 bit 2: wrlock, actual write lock
determines the meaning of high-order bits


If write lock is held (bit 2 set):
upper bits contain a pointer to
the kernel thread holding the write lock
If bit 2 is clear:
upper bits contain the number of threads
holding the lock as a read lock
Simple cases for rw_enter()

The write lock wanted and available
the read lock wanted, write lock not held
no threads waiting for write lock

acquisition of write lock:
 bit 2:


upper bits:
acquisition of a reader:
 upper bits:

incremented
the write lock is being held, causing:


set
the kernel thread pointer
a lock request to fail
a thread waiting for a write lock, causing:


a read lock request to fail
a call to rw_enter_sleep() function
test to see if the lock is available
Case of rw_enter_sleep()


Test to see if lock available
when the lock is available:




the caller gets the lock
the lockstat(1M) statistics updated
the code returns
when the lock is not available:

 With


the turnstile code is called
putting the calling thread to sleep
a turnstile now available:
test the lock availability
when the lock still held
the thread set to sleep state
placed on a turnstile
Changes of R/W Locks structure

in RW lock structure
 wait bit set for a reader waiting
 or wrwant bit set for a thread wanting the
blocking write lock
 in cpu_sysinfo structure’s two counters:
 failures to get a read lock: rw_rdfails
 failures to get a write lock: rw_wrfails
 incremented just prior to the turnstile call
when thread on a turnstile sleep queue
 mpstat(1M) sums the counters
 displays fails-per-second in srw column
Case of rw_exit()

called when ready to release the lock




more than one reader blocking
and no writers blocking when the lock released:



direct transfer lock ownership to next writer
or a group of readers
who gets the lock next ? algorithm requirements:



simple case (no waiters) : use rw_exit_wakeup() retests
holder was a writer:
wrlock bit cleared
one less reader:
hold count field decremented
balanced system performance
minimize possibility of starvation
if waiters present, when lock is released:


offer lock to the write waiter on the turnstile
save the pointer to next write waiter on turnstile
if one exists
Case of a writer releasing write lock

There are waiting readers and writers



readers who have same or higher priority than
highest-priority blocked writer are granted read lock
readers woken up by turnstile_wakeup()
if the reader thread is of a lower priority
it inherits the priority of the writer that released lock
Operation of the lock ownership handoff



there is no lock owner for read locks
setting the hold count in the lock to reflect the number
of readers coming off the turnstile
issuing the wakeup of each reader
Case of a reader holding lock

If a writer comes along

an exiting reader always grants lock
to a waiting writer
even if there are higher-priority readers blocked
 wrwant bit is set
signify that a writer is waiting for the lock
subsequent readers cannot get the lock

It’s possible for a reader
free lock to waiting readers
when a reader executes rw_exit_wakeup
Turnstiles and Priority Inheritance

A turnstile



a data abstraction of FIFO queue, fairness policy
encapsulates sleep queues
and priority inheritance information
for mutex locks and RW locks
addresses the priority inversion problem

higher-priority thread can will its priority to
lower-priority thread that holding the resource
 so thread holding resource will have higher priority
scheduled to run sooner
 thus release resource, and the original priority
returned to thread
Priorities Adjustment

Kernel threads in timeshare and interactive
scheduling classes, priorities adjusted based on :




amount of time the threads running on a processor
sleep time (blocking)
the case when they are preempted
Threads in the real-time class are fixed priority
 the priorities never changed
unless explicitly changed
through programming interfaces
Turnstiles structure diagram
turnstile_table[] structure

Turnstiles , a hash table
each entry in array of turnstile_chain :
beginning of a linked list of turnstiles
 Array indexed via a hash function
on address of synchronization object
 Each chain has :

tc_lock: each entry has its own lock
statically initialized at boot time
 ts_next: an active list
 ts_free: a free list
 waiters: a count of threads waiting on sync object
 ts_sobj: a pointer to the synchronization object
 a thread pointer linking to a kernel thread that had a
priority boost through priority inheritance
 two sleep queues: one for readers, one for writers
How turnstile works: turnstile_lookup()

Turnstile required when thread
blocked on a synchronization object

look up turnstile for synchronization object
in turnstile_table[]
by hashing on the address of the lock
 if a turnstile already exists
get the correct turnstile
 If no kthreads waiting for lock
turnstile_lookup() returns a null value
 If the blocking code must be called
then turnstile_block() is entered
place the kernel thread on a sleep queue
associated with the turnstile for the lock
How turnstile works: turnstile_block()
Pointers determined by turnstile_lookup()
 If turnstile pointer is null


‘pointed linked to by kernel thread’s t_ts pointer
If turnstile pointer not null




at least one kthread waiting on the lock
code sets up pointer links
places kthread’s turnstile on free list
set the thread is in sleep state: ts_sleep()
 ts_waiters: incremented
 set to the address of the lock: t_wchan
 set to address of lock’s operations vectors: t_sobj_ops:
owner, unsleep, and change_priority functions
 places thread on sleep queue associated
with turnstile: sleepq_insert()
How turnstile works: priority inversion
Do necessary priority changes

The priority inheritance rules

if priority of lock holder is less than
priority of requesting thread
 requesting thread’s priority is “willed” to the holder
 holder’s t_epri field is set to the new priority
 inheritor pointer in turnstile is linked to kernel thread

All threads on blocking chain are
potential inheritors
based on their priority relative to calling thread
How turnstile works:
swtch() & turnstile_wakeup()

A call to swtch()



another thread is removed from dispatch queue
context-switched onto a processor
If threads are blocking on the lock
lock exit routine results turnstile_wakeup() call
 turnstile_wakeup() reverse of turnstile_block()
 threads inherited a better priority have that
priority waived
 the thread is removed from sleep queue and
given a turnstile from chain’s free list
 Once thread is unlinked from sleep queue
 scheduling class wakeup code is entered
 thread put back on processor’s dispatch queue
Outline
 Synchronization
 Mutex
& Harware Support
locks
 Reader/writer
 Dispatcher
(RW) locks
locks
 Semaphores
Dispatcher Locks

Kernel dispatcher :



Two lock types:



select and context switching of threads
manipulates the dispatch queues
simple spin locks
locks that raise the interrupt priority of the processor
Once the interrupt code block


swtch() : determine if thread as a pinned thread
If so, resume_from_intr() restarting a pinned thread to
resume execution
 With high-level interrupts (above level 10 on SPARC)
 processor directly into a handler
 executes in the context of thread that was running when
the interrupt arrived
Dispatcher Locks implementaion
Definition: 1-byte data item

to acquire a dispatcher spin lock: call lock_set()


fundamental algorithm
when attempt to get lock and the lock held
a spin loop used checking the lock
attempting to acquire the lock in each loop
call lock_set_spl()
raises PIL of calling processor to 10


same spin algorithm as spin lock
processor PIL remains till the lock released
saves current PIL of calling process
after lock is released
calling process restored to level at which it executing
Outline
 Synchronization
 Mutex
& Harware Support
locks
 Reader/writer
 Dispatcher
(RW) locks
locks
 Semaphores
Kernel Semaphores
Synchronizing access to sharable resource

Semaphore:







P operation attempts to acquire the semaphore
V operation releases it
semaphore value is initialized
to the number of shared resources
when a process needs a resource
the value is decremented
indicate one less of resource
when the process is finished with resource
semaphore value is incremented
0 semaphore value: no resources are available
the calling process blocks until
another process finishes resource and frees it
Functions for semaphores

sema_init():



sema_p(): P operations, attempts the semaphore





initialization routine, sets the count value
s_slpq pointer is set to NULL
semaphore count > 0, a resource available
the count is decremented
the code returns to the caller
semaphore count = 0, a resource not available
sema_v(): V operations , releases a semaphore
sema_held(): test function
sema_destroy(): destroy function
just nulls the s_slpq pointer
Reference

Jim Mauro, Richard McDougall, Solaris Internals-Core Kernel
Components, Sun Microsystems Press, 2000
End
•[email protected]