Transcript Document
Chapter 3 KERNEL SYNCHRONIZATION PRIMITIVES Chen Xiangqun [email protected] Outline Synchronization Mutex & harware support locks Reader/writer Dispatcher (RW) locks locks Semaphores Synchronization & Systems Architectures Synchronization Supports SMP shared memory architecture Maintain data integrity to critical data Code reading or writing data must acquire appropriate lock Once data operation completed holder release the lock Parallel Systems Architectures SMP — Symmetric multiprocessor with a shared memory model, single kernel image MPP — Message-based model, multiple kernel images NUMA/ccNUMA (NUMAand cache coherent NUMA) Shared memory model, single kernel image Hardware Considerations What is a lock? The first consideration A piece of data at a specific location A simplest lock is a single byte location A lock is set, or held lock value = 0xFF A lock is available lock value = 0x00 Instructions suitable for locking code Offer byte-level test-and-set instruction The second consideration The visibility of a lock’s state when it is examined by executing kernel threads The lock state must be globally visible to all processors on the system Atomic instruction Atomic instruction properties : no other store operation allowed between executing load and store instructions Three UltraSPARC processor instructions guarantee atomic behaviour: ldstub (load and store unsigned byte) cas (compare and swap) swap (swap byte locations) ldstub instruction cas instruction Data flow levels in a processor Total Store Ordering (TSO) model SPARC processors offer Total Store Ordering (TSO) model loads and stores rules: Loads are blocking and are ordered with respect to other loads Stores are ordered with respect to other stores. Stores cannot bypass earlier loads Atomic load-stores (ldstub and cas instructions) are ordered with respect to loads mutex and RW lock implemented on UltraSPARC-based systems: use ldstub and cas instructions on Intel x86 use cmpxchgl (compare/exchange long) instruction memory barrier (membar) instructions constraint the ordering of memory access operations (loads and stores) Synchronization Objects mutex lock: commonly used in Solaris exclusive read and write access to data reader/writer (RW) lock: multiple readers allowable only one writer allowed at a time dispatcher lock: special type of mutex lock used by the kernel dispatcher semaphores: access to a finite number of resources Synchronization Process When a thread attempts to acquire a lock encounter one of two possible lock states free (available) not free (owned, held) Turnstiles: a sleep queue for threads blocking on locks When thread ends operation, release lock two possible cases: No waiters: the lock simply released Threads are waiting for lock (waiters) three cases: release lock , wake up blocking threads the first thread to execute acquires lock select a thread from turnstile based on priority or sleep time wake up only that thread select next lock lock owner hand lock off to selected thread Solaris Locks diagram Synchronization Object Operations Vector mutex locks, reader/writer locks, and semaphores define an operations vector linked to kernel threads that are blocking on the object The object’s operations vector is a data structure that exports a subset of object functions required for kthreads sleeping on the lock Synchronization object type mutex_sobj_ops for mutex locks rw_sobj_ops for reader/writer locks sema_sobj_ops for semaphores three functions owner function: returns thread ID that owns the object unsleep function: transitions a kernel thread from a sleep state change_pri function: changes thread priority for priority inheritance Outline Synchronization Mutex & Harware Support locks Reader/writer Dispatcher (RW) locks locks Semaphores Mutex Locks a thread to acquire a mutex lock that is being held: spin: thread enters a tight loop attempting to acquire lock in each loop thread placed on a sleep queue when lock released sends a wakeup to thread The benefit of spin: block: no overhead of context switching and fast acquisition Spin’s downside of consuming CPU cycles The benefit of blocking: free processor to other threads disadvantage: require context switching a little more lock acquisition latency Lock granularity A very coarse level a simple table-level mutex if a thread needs a process it must acquire the process table mutex advantages: simplicity and minimal overhead disadvantage: only one thread at a time can manipulate on process table (hot lock) A finer level a lock-per-process table entry multiple threads manipulate different process structures at same time disadvantages: more complex chances of deadlock and more overhead Solaris implements relatively fine-grained locking Two types of mutex locks spin locks: a tight loop, if a desired lock is being held adaptive locks: dynamically either spin or block, depending on the state of the holder If the lock holder is running, the thread to get the lock will spin If the thread is not running the thread to get lock will block Benefits of two types of mutex locks If a thread holding a lock and running the lock will be released very soon If a lock holder is not running then context switch is involved simply block and free up processor for others High-level interrupt not allowed context switch adaptive locks can do context switching only spin locks used in high-level interrupt Functions of mutex locks mutex_init() : lock is initialized only called once lock type, spin or adaptive, is determined most common type : MUTEX_DEFAULT rare case type passed: MUTEX_SPIN mutex_enter(): get a lock executes a compare-and-swap instruction testing for a zero value mutex_vector_enter(): If the lock is held or is a spin lock mutex_exit(): release the lock Adaptive Locks Implementation m_owner field: holds the address of the thread that owns the lock (the kthread pointer) serves as the actual lock successful lock acquisition: its kthread pointer is in m_owner field of target lock threads attempt to get the lock(waiters): bit 0 of m_owner is set Notice: kthread pointer values do not require bit 0 Spin locks Implementation m_spinlock: actual mutex lock bits m_dummylock: The test for a lock held, testing for a zero value for spin locks: Since set to all 1’s in mutex_init(), test will fail for a adaptive lock: The test will also fail too m_oldspl: set to the priority level of running processor m_minspl: stores interrupt priority level when lock initialized called from interrupts & device driver An interrupt block cookie added to argument list when called from device driver or kernel module that generates interrupts mutex_init(): checks interrupt block cookie If mutex_init() called from a device drive: the lock type is set to spin otherwise: adaptive lock initialized interrupts levels above 10 on SPARC: require spin locks m_dummylock: set to all 1’s (0xFF) Function of mutex_vector_enter() When lock release When a thread finished working With no threads waiting for the lock clearing lock fields (m_owner) and returning entered mutex_vector_exit() for the simple case For spin lock released call mutex_exit() to release the lock lock field cleared and processor returned to PIL level For adaptive lock released select a waiter from turnstile (more than one waiter) its state changed from sleeping to runnable placed on a dispatch queue , execute and get the lock Outline Synchronization Mutex & Harware Support locks Reader/writer Dispatcher (RW) locks locks Semaphores Reader/Writer Locks Multiple threads reading data at same time but only one writer While a writer is holding the lock no readers are allowed Basic mechanics of RW locks: rw_init(): initialization rw_enter(): to acquire the lock, in assembly code rw_exit(): to release the lock, in assembly code Solaris 7 Reader/Writer Lock Structure bit 0: wait bit, set when waiting for lock bit 1: wrwant bit at least one thread is waiting for a write lock bit 2: wrlock, actual write lock determines the meaning of high-order bits If write lock is held (bit 2 set): upper bits contain a pointer to the kernel thread holding the write lock If bit 2 is clear: upper bits contain the number of threads holding the lock as a read lock Simple cases for rw_enter() The write lock wanted and available the read lock wanted, write lock not held no threads waiting for write lock acquisition of write lock: bit 2: upper bits: acquisition of a reader: upper bits: incremented the write lock is being held, causing: set the kernel thread pointer a lock request to fail a thread waiting for a write lock, causing: a read lock request to fail a call to rw_enter_sleep() function test to see if the lock is available Case of rw_enter_sleep() Test to see if lock available when the lock is available: the caller gets the lock the lockstat(1M) statistics updated the code returns when the lock is not available: With the turnstile code is called putting the calling thread to sleep a turnstile now available: test the lock availability when the lock still held the thread set to sleep state placed on a turnstile Changes of R/W Locks structure in RW lock structure wait bit set for a reader waiting or wrwant bit set for a thread wanting the blocking write lock in cpu_sysinfo structure’s two counters: failures to get a read lock: rw_rdfails failures to get a write lock: rw_wrfails incremented just prior to the turnstile call when thread on a turnstile sleep queue mpstat(1M) sums the counters displays fails-per-second in srw column Case of rw_exit() called when ready to release the lock more than one reader blocking and no writers blocking when the lock released: direct transfer lock ownership to next writer or a group of readers who gets the lock next ? algorithm requirements: simple case (no waiters) : use rw_exit_wakeup() retests holder was a writer: wrlock bit cleared one less reader: hold count field decremented balanced system performance minimize possibility of starvation if waiters present, when lock is released: offer lock to the write waiter on the turnstile save the pointer to next write waiter on turnstile if one exists Case of a writer releasing write lock There are waiting readers and writers readers who have same or higher priority than highest-priority blocked writer are granted read lock readers woken up by turnstile_wakeup() if the reader thread is of a lower priority it inherits the priority of the writer that released lock Operation of the lock ownership handoff there is no lock owner for read locks setting the hold count in the lock to reflect the number of readers coming off the turnstile issuing the wakeup of each reader Case of a reader holding lock If a writer comes along an exiting reader always grants lock to a waiting writer even if there are higher-priority readers blocked wrwant bit is set signify that a writer is waiting for the lock subsequent readers cannot get the lock It’s possible for a reader free lock to waiting readers when a reader executes rw_exit_wakeup Turnstiles and Priority Inheritance A turnstile a data abstraction of FIFO queue, fairness policy encapsulates sleep queues and priority inheritance information for mutex locks and RW locks addresses the priority inversion problem higher-priority thread can will its priority to lower-priority thread that holding the resource so thread holding resource will have higher priority scheduled to run sooner thus release resource, and the original priority returned to thread Priorities Adjustment Kernel threads in timeshare and interactive scheduling classes, priorities adjusted based on : amount of time the threads running on a processor sleep time (blocking) the case when they are preempted Threads in the real-time class are fixed priority the priorities never changed unless explicitly changed through programming interfaces Turnstiles structure diagram turnstile_table[] structure Turnstiles , a hash table each entry in array of turnstile_chain : beginning of a linked list of turnstiles Array indexed via a hash function on address of synchronization object Each chain has : tc_lock: each entry has its own lock statically initialized at boot time ts_next: an active list ts_free: a free list waiters: a count of threads waiting on sync object ts_sobj: a pointer to the synchronization object a thread pointer linking to a kernel thread that had a priority boost through priority inheritance two sleep queues: one for readers, one for writers How turnstile works: turnstile_lookup() Turnstile required when thread blocked on a synchronization object look up turnstile for synchronization object in turnstile_table[] by hashing on the address of the lock if a turnstile already exists get the correct turnstile If no kthreads waiting for lock turnstile_lookup() returns a null value If the blocking code must be called then turnstile_block() is entered place the kernel thread on a sleep queue associated with the turnstile for the lock How turnstile works: turnstile_block() Pointers determined by turnstile_lookup() If turnstile pointer is null ‘pointed linked to by kernel thread’s t_ts pointer If turnstile pointer not null at least one kthread waiting on the lock code sets up pointer links places kthread’s turnstile on free list set the thread is in sleep state: ts_sleep() ts_waiters: incremented set to the address of the lock: t_wchan set to address of lock’s operations vectors: t_sobj_ops: owner, unsleep, and change_priority functions places thread on sleep queue associated with turnstile: sleepq_insert() How turnstile works: priority inversion Do necessary priority changes The priority inheritance rules if priority of lock holder is less than priority of requesting thread requesting thread’s priority is “willed” to the holder holder’s t_epri field is set to the new priority inheritor pointer in turnstile is linked to kernel thread All threads on blocking chain are potential inheritors based on their priority relative to calling thread How turnstile works: swtch() & turnstile_wakeup() A call to swtch() another thread is removed from dispatch queue context-switched onto a processor If threads are blocking on the lock lock exit routine results turnstile_wakeup() call turnstile_wakeup() reverse of turnstile_block() threads inherited a better priority have that priority waived the thread is removed from sleep queue and given a turnstile from chain’s free list Once thread is unlinked from sleep queue scheduling class wakeup code is entered thread put back on processor’s dispatch queue Outline Synchronization Mutex & Harware Support locks Reader/writer Dispatcher (RW) locks locks Semaphores Dispatcher Locks Kernel dispatcher : Two lock types: select and context switching of threads manipulates the dispatch queues simple spin locks locks that raise the interrupt priority of the processor Once the interrupt code block swtch() : determine if thread as a pinned thread If so, resume_from_intr() restarting a pinned thread to resume execution With high-level interrupts (above level 10 on SPARC) processor directly into a handler executes in the context of thread that was running when the interrupt arrived Dispatcher Locks implementaion Definition: 1-byte data item to acquire a dispatcher spin lock: call lock_set() fundamental algorithm when attempt to get lock and the lock held a spin loop used checking the lock attempting to acquire the lock in each loop call lock_set_spl() raises PIL of calling processor to 10 same spin algorithm as spin lock processor PIL remains till the lock released saves current PIL of calling process after lock is released calling process restored to level at which it executing Outline Synchronization Mutex & Harware Support locks Reader/writer Dispatcher (RW) locks locks Semaphores Kernel Semaphores Synchronizing access to sharable resource Semaphore: P operation attempts to acquire the semaphore V operation releases it semaphore value is initialized to the number of shared resources when a process needs a resource the value is decremented indicate one less of resource when the process is finished with resource semaphore value is incremented 0 semaphore value: no resources are available the calling process blocks until another process finishes resource and frees it Functions for semaphores sema_init(): sema_p(): P operations, attempts the semaphore initialization routine, sets the count value s_slpq pointer is set to NULL semaphore count > 0, a resource available the count is decremented the code returns to the caller semaphore count = 0, a resource not available sema_v(): V operations , releases a semaphore sema_held(): test function sema_destroy(): destroy function just nulls the s_slpq pointer Reference Jim Mauro, Richard McDougall, Solaris Internals-Core Kernel Components, Sun Microsystems Press, 2000 End •[email protected]