Transcript Document
Chapter 3
KERNEL
SYNCHRONIZATION
PRIMITIVES
Chen Xiangqun
[email protected]
Outline
Synchronization
Mutex
& harware support
locks
Reader/writer
Dispatcher
(RW) locks
locks
Semaphores
Synchronization & Systems Architectures
Synchronization
Supports SMP shared memory architecture
Maintain data integrity to critical data
Code reading or writing data must acquire appropriate
lock
Once data operation completed holder release the lock
Parallel Systems Architectures
SMP — Symmetric multiprocessor
with a shared memory model, single kernel image
MPP — Message-based model, multiple kernel images
NUMA/ccNUMA (NUMAand cache coherent NUMA)
Shared memory model, single kernel image
Hardware Considerations
What is a lock?
The first consideration
A piece of data at a specific location
A simplest lock is a single byte location
A lock is set, or held
lock value = 0xFF
A lock is available
lock value = 0x00
Instructions suitable for locking code
Offer byte-level test-and-set instruction
The second consideration
The visibility of a lock’s state
when it is examined by executing kernel threads
The lock state must be globally visible to all
processors on the system
Atomic instruction
Atomic instruction properties :
no other store operation allowed
between executing load and store instructions
Three UltraSPARC processor instructions
guarantee atomic behaviour:
ldstub
(load and store unsigned byte)
cas (compare and swap)
swap (swap byte locations)
ldstub instruction
cas instruction
Data flow levels in a processor
Total Store Ordering (TSO) model
SPARC processors offer Total Store Ordering
(TSO) model
loads and stores rules:
Loads are blocking and are ordered with
respect to other loads
Stores are ordered with respect to other
stores. Stores cannot bypass earlier loads
Atomic load-stores (ldstub and cas
instructions) are ordered with respect to
loads
mutex and RW lock implemented
on UltraSPARC-based systems:
use ldstub and cas instructions
on Intel x86
use cmpxchgl
(compare/exchange long) instruction
memory
barrier (membar) instructions
constraint the ordering of memory access
operations (loads and stores)
Synchronization Objects
mutex lock: commonly used in Solaris
exclusive read and write access to data
reader/writer (RW) lock:
multiple readers allowable
only one writer allowed at a time
dispatcher lock:
special type of mutex lock
used by the kernel dispatcher
semaphores:
access to a finite number of resources
Synchronization Process
When a thread attempts to acquire a lock
encounter one of two possible lock states
free (available)
not free (owned, held)
Turnstiles:
a sleep queue for threads blocking on locks
When thread ends operation, release lock
two possible cases:
No waiters: the lock simply released
Threads are waiting for lock (waiters)
three cases:
release lock , wake up blocking threads
the first thread to execute acquires lock
select a thread from turnstile
based on priority or sleep time
wake up only that thread
select next lock
lock owner hand lock off to selected thread
Solaris Locks diagram
Synchronization Object Operations
Vector
mutex locks, reader/writer locks, and semaphores
define an operations vector linked to kernel threads
that are blocking on the object
The object’s operations vector is a data structure
that exports a subset of object functions required
for kthreads sleeping on the lock
Synchronization object type
mutex_sobj_ops for mutex locks
rw_sobj_ops
for reader/writer locks
sema_sobj_ops for semaphores
three functions
owner
function:
returns thread ID that owns the object
unsleep function:
transitions a kernel thread from a sleep state
change_pri function:
changes thread priority for priority inheritance
Outline
Synchronization
Mutex
& Harware Support
locks
Reader/writer
Dispatcher
(RW) locks
locks
Semaphores
Mutex Locks
a thread to acquire a mutex lock that is being held:
spin:
thread enters a tight loop
attempting to acquire lock in each loop
thread placed on a sleep queue
when lock released sends a wakeup to thread
The benefit of spin:
block:
no overhead of context switching and fast acquisition
Spin’s downside of consuming CPU cycles
The benefit of blocking:
free processor to other threads
disadvantage: require context switching
a little more lock acquisition latency
Lock granularity
A very coarse level
a simple table-level mutex
if a thread needs a process
it must acquire the process table mutex
advantages: simplicity and minimal overhead
disadvantage: only one thread at a time can
manipulate on process table (hot lock)
A finer level
a lock-per-process table entry
multiple threads manipulate different process
structures at same time
disadvantages: more complex
chances of deadlock and more overhead
Solaris implements relatively fine-grained locking
Two types of mutex locks
spin locks:
a tight loop, if a desired lock is being held
adaptive locks:
dynamically either spin or block,
depending on the state of the holder
If the lock holder is running,
the thread to get the lock will spin
If the thread is not running
the thread to get lock will block
Benefits of two types of mutex locks
If a thread holding a lock and running
the lock will be released very soon
If a lock holder is not running
then context switch is involved
simply block and free up processor for others
High-level interrupt not allowed context
switch
adaptive locks can do context switching
only spin locks used in high-level interrupt
Functions of mutex locks
mutex_init() : lock is initialized only called once
lock type, spin or adaptive, is determined
most common type :
MUTEX_DEFAULT
rare case type passed:
MUTEX_SPIN
mutex_enter():
get a lock
executes a compare-and-swap instruction
testing for a zero value
mutex_vector_enter():
If the lock is held or is a spin lock
mutex_exit(): release the lock
Adaptive Locks Implementation
m_owner field:
holds the address of the thread that owns the
lock (the kthread pointer)
serves as the actual lock
successful lock acquisition:
its kthread pointer is in m_owner field of
target lock
threads attempt to get the lock(waiters):
bit 0 of m_owner is set
Notice: kthread pointer values do not require bit 0
Spin locks Implementation
m_spinlock:
actual mutex lock bits
m_dummylock:
The test for a lock held, testing for a zero value
for spin locks:
Since set to all 1’s in mutex_init(), test will fail
for a adaptive lock:
The test will also fail too
m_oldspl:
set to the priority level of running processor
m_minspl:
stores interrupt priority level when lock initialized
called from interrupts & device driver
An interrupt block cookie added to argument list
when called from device driver
or kernel module that generates interrupts
mutex_init(): checks interrupt block cookie
If mutex_init() called from a device drive:
the lock type is set to spin
otherwise:
adaptive lock initialized
interrupts levels above 10 on SPARC:
require spin locks
m_dummylock: set to all 1’s (0xFF)
Function of mutex_vector_enter()
When lock release
When a thread finished working
With no threads waiting for the lock
clearing lock fields (m_owner) and returning
entered mutex_vector_exit() for the simple case
For spin lock released
call mutex_exit() to release the lock
lock field cleared and processor returned to PIL level
For adaptive lock released
select a waiter from turnstile (more than one waiter)
its state changed from sleeping to runnable
placed on a dispatch queue , execute and get the lock
Outline
Synchronization
Mutex
& Harware Support
locks
Reader/writer
Dispatcher
(RW) locks
locks
Semaphores
Reader/Writer Locks
Multiple threads reading data at same time
but only one writer
While a writer is holding the lock
no readers are allowed
Basic mechanics of RW locks:
rw_init():
initialization
rw_enter():
to acquire the lock, in assembly code
rw_exit():
to release the lock, in assembly code
Solaris 7 Reader/Writer Lock Structure
bit 0: wait bit, set when waiting for lock
bit 1: wrwant bit
at least one thread is waiting for a write lock
bit 2: wrlock, actual write lock
determines the meaning of high-order bits
If write lock is held (bit 2 set):
upper bits contain a pointer to
the kernel thread holding the write lock
If bit 2 is clear:
upper bits contain the number of threads
holding the lock as a read lock
Simple cases for rw_enter()
The write lock wanted and available
the read lock wanted, write lock not held
no threads waiting for write lock
acquisition of write lock:
bit 2:
upper bits:
acquisition of a reader:
upper bits:
incremented
the write lock is being held, causing:
set
the kernel thread pointer
a lock request to fail
a thread waiting for a write lock, causing:
a read lock request to fail
a call to rw_enter_sleep() function
test to see if the lock is available
Case of rw_enter_sleep()
Test to see if lock available
when the lock is available:
the caller gets the lock
the lockstat(1M) statistics updated
the code returns
when the lock is not available:
With
the turnstile code is called
putting the calling thread to sleep
a turnstile now available:
test the lock availability
when the lock still held
the thread set to sleep state
placed on a turnstile
Changes of R/W Locks structure
in RW lock structure
wait bit set for a reader waiting
or wrwant bit set for a thread wanting the
blocking write lock
in cpu_sysinfo structure’s two counters:
failures to get a read lock: rw_rdfails
failures to get a write lock: rw_wrfails
incremented just prior to the turnstile call
when thread on a turnstile sleep queue
mpstat(1M) sums the counters
displays fails-per-second in srw column
Case of rw_exit()
called when ready to release the lock
more than one reader blocking
and no writers blocking when the lock released:
direct transfer lock ownership to next writer
or a group of readers
who gets the lock next ? algorithm requirements:
simple case (no waiters) : use rw_exit_wakeup() retests
holder was a writer:
wrlock bit cleared
one less reader:
hold count field decremented
balanced system performance
minimize possibility of starvation
if waiters present, when lock is released:
offer lock to the write waiter on the turnstile
save the pointer to next write waiter on turnstile
if one exists
Case of a writer releasing write lock
There are waiting readers and writers
readers who have same or higher priority than
highest-priority blocked writer are granted read lock
readers woken up by turnstile_wakeup()
if the reader thread is of a lower priority
it inherits the priority of the writer that released lock
Operation of the lock ownership handoff
there is no lock owner for read locks
setting the hold count in the lock to reflect the number
of readers coming off the turnstile
issuing the wakeup of each reader
Case of a reader holding lock
If a writer comes along
an exiting reader always grants lock
to a waiting writer
even if there are higher-priority readers blocked
wrwant bit is set
signify that a writer is waiting for the lock
subsequent readers cannot get the lock
It’s possible for a reader
free lock to waiting readers
when a reader executes rw_exit_wakeup
Turnstiles and Priority Inheritance
A turnstile
a data abstraction of FIFO queue, fairness policy
encapsulates sleep queues
and priority inheritance information
for mutex locks and RW locks
addresses the priority inversion problem
higher-priority thread can will its priority to
lower-priority thread that holding the resource
so thread holding resource will have higher priority
scheduled to run sooner
thus release resource, and the original priority
returned to thread
Priorities Adjustment
Kernel threads in timeshare and interactive
scheduling classes, priorities adjusted based on :
amount of time the threads running on a processor
sleep time (blocking)
the case when they are preempted
Threads in the real-time class are fixed priority
the priorities never changed
unless explicitly changed
through programming interfaces
Turnstiles structure diagram
turnstile_table[] structure
Turnstiles , a hash table
each entry in array of turnstile_chain :
beginning of a linked list of turnstiles
Array indexed via a hash function
on address of synchronization object
Each chain has :
tc_lock: each entry has its own lock
statically initialized at boot time
ts_next: an active list
ts_free: a free list
waiters: a count of threads waiting on sync object
ts_sobj: a pointer to the synchronization object
a thread pointer linking to a kernel thread that had a
priority boost through priority inheritance
two sleep queues: one for readers, one for writers
How turnstile works: turnstile_lookup()
Turnstile required when thread
blocked on a synchronization object
look up turnstile for synchronization object
in turnstile_table[]
by hashing on the address of the lock
if a turnstile already exists
get the correct turnstile
If no kthreads waiting for lock
turnstile_lookup() returns a null value
If the blocking code must be called
then turnstile_block() is entered
place the kernel thread on a sleep queue
associated with the turnstile for the lock
How turnstile works: turnstile_block()
Pointers determined by turnstile_lookup()
If turnstile pointer is null
‘pointed linked to by kernel thread’s t_ts pointer
If turnstile pointer not null
at least one kthread waiting on the lock
code sets up pointer links
places kthread’s turnstile on free list
set the thread is in sleep state: ts_sleep()
ts_waiters: incremented
set to the address of the lock: t_wchan
set to address of lock’s operations vectors: t_sobj_ops:
owner, unsleep, and change_priority functions
places thread on sleep queue associated
with turnstile: sleepq_insert()
How turnstile works: priority inversion
Do necessary priority changes
The priority inheritance rules
if priority of lock holder is less than
priority of requesting thread
requesting thread’s priority is “willed” to the holder
holder’s t_epri field is set to the new priority
inheritor pointer in turnstile is linked to kernel thread
All threads on blocking chain are
potential inheritors
based on their priority relative to calling thread
How turnstile works:
swtch() & turnstile_wakeup()
A call to swtch()
another thread is removed from dispatch queue
context-switched onto a processor
If threads are blocking on the lock
lock exit routine results turnstile_wakeup() call
turnstile_wakeup() reverse of turnstile_block()
threads inherited a better priority have that
priority waived
the thread is removed from sleep queue and
given a turnstile from chain’s free list
Once thread is unlinked from sleep queue
scheduling class wakeup code is entered
thread put back on processor’s dispatch queue
Outline
Synchronization
Mutex
& Harware Support
locks
Reader/writer
Dispatcher
(RW) locks
locks
Semaphores
Dispatcher Locks
Kernel dispatcher :
Two lock types:
select and context switching of threads
manipulates the dispatch queues
simple spin locks
locks that raise the interrupt priority of the processor
Once the interrupt code block
swtch() : determine if thread as a pinned thread
If so, resume_from_intr() restarting a pinned thread to
resume execution
With high-level interrupts (above level 10 on SPARC)
processor directly into a handler
executes in the context of thread that was running when
the interrupt arrived
Dispatcher Locks implementaion
Definition: 1-byte data item
to acquire a dispatcher spin lock: call lock_set()
fundamental algorithm
when attempt to get lock and the lock held
a spin loop used checking the lock
attempting to acquire the lock in each loop
call lock_set_spl()
raises PIL of calling processor to 10
same spin algorithm as spin lock
processor PIL remains till the lock released
saves current PIL of calling process
after lock is released
calling process restored to level at which it executing
Outline
Synchronization
Mutex
& Harware Support
locks
Reader/writer
Dispatcher
(RW) locks
locks
Semaphores
Kernel Semaphores
Synchronizing access to sharable resource
Semaphore:
P operation attempts to acquire the semaphore
V operation releases it
semaphore value is initialized
to the number of shared resources
when a process needs a resource
the value is decremented
indicate one less of resource
when the process is finished with resource
semaphore value is incremented
0 semaphore value: no resources are available
the calling process blocks until
another process finishes resource and frees it
Functions for semaphores
sema_init():
sema_p(): P operations, attempts the semaphore
initialization routine, sets the count value
s_slpq pointer is set to NULL
semaphore count > 0, a resource available
the count is decremented
the code returns to the caller
semaphore count = 0, a resource not available
sema_v(): V operations , releases a semaphore
sema_held(): test function
sema_destroy(): destroy function
just nulls the s_slpq pointer
Reference
Jim Mauro, Richard McDougall, Solaris Internals-Core Kernel
Components, Sun Microsystems Press, 2000
End
•[email protected]