Transcript Concurrency
Concurrency and Race
Conditions
Linux Kernel Programming
CIS 4930/COP 5641
MOTIVATION:
EXAMPLE PITFALL IN SCULL
Pitfalls in scull
Race condition: result of uncontrolled access
to shared data
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kzalloc(quantum, GFP_KERNEL);
if (!dptr->data[s_pos]) {
goto out;
}
}
Pitfalls in scull
Race condition: result of uncontrolled access
to shared data
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kzalloc(quantum, GFP_KERNEL);
if (!dptr->data[s_pos]) {
goto out;
}
}
Pitfalls in scull
Race condition: result of uncontrolled access
to shared data
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
if (!dptr->data[s_pos]) {
goto out;
}
}
Pitfalls in scull
Race condition: result of uncontrolled access
to shared data
if (!dptr->data[s_pos]) {
dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
if (!dptr->data[s_pos]) {
goto out;
Memory leak
}
}
MANAGING CONCURRENCY
Concurrency and Its Management
Sources of concurrency
Multiple user-space processes
Multiple CPUs
Device interrupts
Timers
Some guiding principles
Try to avoid concurrent access entirely
Global variables
Apply locking and mutual exclusion principles
Implications to device drivers
Use sufficient concurrency mechanisms (depending
on context)
No object can be made available to the kernel until it
can function properly
References to such objects must be tracked for proper
removal
Avoid “roll your own” solutions
Managing Concurrency
Atomic operation: all or nothing from the
perspective of other threads
Critical section: code executed by only one
thread at a time
Not all critical sections are the same
Access from interrupt handlers
Latency constraints
Lock Design Considerations
Context
Can another thread be scheduled on the current
processor?
Assumptions of kernel operation
Breaking assumptions will break code that relies on them
Time expected to wait for lock
Considerations
Long
Amount of time lock is expected to be held
Amount of expected contention
Other threads can make better use of the processor
Short
Time to switch to another thread will be longer than just
waiting a short amount of time
Kernel Locking Implementations
mutex
Sleep if lock cannot be acquired immediately
Allow other threads to use the processor
spinlock
Continuously try to grab the lock
Generally do not allow sleeping
Why?
MUTEX
Mutex Implementation
Architecture-dependent code
Optimizations
Initialization
DEFINE_MUTEX(name)
void mutex_init(struct mutex *lock);
Various routines
void mutex_lock(struct mutex *lock);
int mutex_lock_interruptible(struct mutex *lock);
int mutex_lock_killable(struct mutex *lock);
void mutex_unlock(struct mutex *lock);
Using mutexes in scull
scull_dev structure revisited
struct scull_dev {
struct scull_qset *data; /* Pointer to first quantum set
int quantum;
/* the current quantum size
int qset;
/* the current array size
unsigned long size;
/* amount of data stored here
unsigned int access_key; /* used by sculluid & scullpriv
struct mutex mutex;
/* mutual exclusion */
struct cdev cdev;
/* Char device structure
};
*/
*/
*/
*/
*/
*/
Using mutexes in scull
scull_dev initialization
for (i = 0; i < scull_nr_devs; i++) {
scull_devices[i].quantum = scull_quantum;
scull_devices[i].qset = scull_qset;
mutex_init(&scull_devices[i].mutex); /* before cdev_add */
scull_setup_cdev(&scull_devices[i], i);
}
Using mutexes in scull
scull_write()
if (mutex_lock_interruptible(&dev->mutex))
return -ERESTARTSYS;
scull_write ends with
out:
mutex_unlock(&dev->mutex);
return retval;
mutex_lock_interruptible()
returns nonzero
If can be resubmitted
Undo
visible changes if any and
restart
Otherwise return -EINTR
E.g.,
could not undo changes
mutex_lock_interruptible()
(returns non-zero)
If can be resubmitted
Undo
visible changes if any and
restart
Otherwise return -EINTR
E.g.,
could not undo changes
Restartable system call
Automatic restarting of certain interrupted system calls
Retry with same arguments (values)
Simplifies user-space programming for dealing with
"interrupted system call“
POSIX permits an implementation to restart system calls,
but it is not required.
SUS defines the SA_RESTART flag to provide a means
by which an application can request that an interrupted
system calls be restarted.
http://pubs.opengroup.org/onlinepubs/009604499/function
s/sigaction.html
return -ERESTARTSYS
Restartable system call
Arguments may need to be modified
return
-ERESTARTSYS_RESTARTBLOCK
Specify
callback function to modify
arguments
http://lwn.net/Articles/17744/
Userspace write() and
kernelspace *_interruptible()
From POSIX man page
If write() is interrupted by a signal before it
writes any data, it shall return -1 with errno set
to [EINTR].
If write() is interrupted by a signal after it
successfully writes some data, it shall return
the number of bytes written.
http://pubs.opengroup.org/onlinepubs/009604
499/functions/sigaction.html
mutex_lock_killable()
mutex_lock()
Process assumes that it cannot be interrupted by a signal
Breaking assumption breaks user-kernel space interface
If process receives fatal signal and mutex_lock() never
returns
Results in an immortal process
Assumptions/expectations do not apply if process receives
fatal signal
Process that called system call will never return
Does not break assumption since process does not continue
http://lwn.net/Articles/288056/
MUTEX USAGE AS
COMPLETION (ERROR)
HTTPS://LKML.ORG/LKML/2013/12/2/997
General Pattern
refcount variable for deciding which thread to
perform cleanup
Usage
Initialize shared object
Set refcount to number of concurrent threads
Start multiple threads
Last thread cleans up
<do stuff>
mutex_lock(obj->lock);
dead = !--obj->refcount;
mutex_unlock(obj->lock);
if (dead)
free(obj);
fs/pipe.c
__pipe_lock(pipe);
…
spin_lock(&inode->i_lock);
if (!--pipe->files) {
inode->i_pipe = NULL;
kill = 1;
}
spin_unlock(&inode->i_lock);
__pipe_unlock(pipe);
if (kill)
free_pipe_info(pipe);
CPU 1
mutex_lock(obj->lock);
dead = !--obj->refcount;
// refcount was 2, is now 1, dead = 0.
mutex_unlock(obj->lock);
__mutex_fastpath_unlock()
fastpath fails (because mutex is nonpositive
__mutex_unlock_slowpath:
if (__mutex_slowpath_needs_to_unlock())
atomic_set(&lock->count, 1);
but in the meantime, CPU1 is busy still
unlocking:
if (!list_empty(&lock->wait_list)) {
CPU 2
mutex_lock(obj->lock);
// blocks on obj->lock, goes to slowpath
// mutex is negative, CPU2 is in
optimistic
// spinning mode in
__mutex_lock_common
if ((atomic_read(&lock->count) == 1) &&
(atomic_cmpxchg(&lock->count, 1, 0)
== 1)) {
.. and now CPU2 owns the mutex, and goes
on
dead = !--obj->refcount;
// refcount was 1, is now 0, dead = 1.
mutex_unlock(obj->lock);
if (dead)
free(obj);
Conclusion
Mutex serializes what is inside the mutex, but
not necessarily the lock ITSELF
Use spinlocks and/or atomic ref counts
"don't use mutexes to implement
completions"
COMPLETIONS
Completions
Start and wait for operation to complete (outside current
thread)
Common pattern in kernel programming
E.g., wait for initialization to complete
Reasons to use instead of mutexes
Wake up multiple threads
More efficient
More meaningful syntax
Subtle races with mutex implementation code
Cleanup of mutex itself
http://lkml.iu.edu//hypermail/linux/kernel/0107.3/0674.html
https://lkml.org/lkml/2008/4/11/323
completions
#include <linux/completion.h>
Completions
To create a completion
DECLARE_COMPLETION(my_completion);
Or
struct completion my_completion;
init_completion(&my_completion);
To wait for the completion, call
void wait_for_completion(struct completion *c);
void wait_for_completion_interruptible(struct
completion *c);
void wait_for_completion_timeout(struct completion
*c, unsigned long timeout);
Completions
To signal a completion event, call one of the
following
/* wake up one waiting thread */
void complete(struct completion *c);
/* wake up multiple waiting threads */
/* need to call INIT_COMPLETION(struct completion c)
to reuse the completion structure */
void complete_all(struct completion *c);
Completions
Example: misc-modules/complete.c
DECLARE_COMPLETION(comp);
ssize_t complete_read(struct file *filp, char __user *buf,
size_t count, loff_t *pos) {
printk(KERN_DEBUG "process %i (%s) going to sleep\n",
current->pid, current->comm);
wait_for_completion(&comp);
printk(KERN_DEBUG "awoken %i (%s)\n", current->pid,
current->comm);
return 0; /* EOF */
}
Completions
Example
ssize_t complete_write(struct file *filp,
const char __user *buf, size_t count,
loff_t *pos) {
printk(KERN_DEBUG
"process %i (%s) awakening the readers...\n",
current->pid, current->comm);
complete(&comp);
return count; /* succeed, to avoid retrial */
}
SPINLOCKS
Spinlocks
Generally used in code that should not sleep
(e.g., interrupt handlers)
Usually implemented as a single bit
If the lock is available, the bit is set and the
code continues
If the lock is taken, the code enters a tight loop
Repeatedly checks the lock until it become
available
Spinlocks
Actual implementation varies for different
architectures
Protect a process from other CPUs and
interrupts
Usually does nothing on uniprocessor
machines
Exception: changing the IRQ masking status
Introduction to Spinlock API
#include <linux/spinlock.h>
To initialize, declare
spinlock_t my_lock = SPIN_LOCK_UNLOCKED;
Or call
void spin_lock_init(spinlock_t *lock);
To acquire a lock, call
void spin_lock(spinlock_t *lock);
Spinlock waits are uninterruptible
To release a lock, call
void spin_unlock(spinlock_t *lock);
Spinlocks and Atomic Context
While holding a spinlock, be atomic
Do not sleep or relinquish the processor
Examples of calls that can sleep
Copying data to or from user space
User-space page may need to be on disk…
Memory allocation
Memory might not be available
Disable interrupts (on the local CPU) as
needed
Hold spinlocks for the minimum time possible
The Spinlock Functions
Four functions to acquire a spinlock
void spin_lock(spinlock_t *lock);
/* disables interrupts on the local CPU */
void spin_lock_irqsave(spinlock_t *lock,
unsigned long flags);
/* only if no other code disabled interrupts */
void spin_lock_irq(spinlock_t *lock);
/* disables software interrupts; leaves hardware
interrupts enabled (e.g. tasklets)*/
void spin_lock_bh(spinlock_t *lock);
The Spinlock Functions
Four functions to release a spinlock
void spin_unlock(spinlock_t *lock);
/* need to use the same flags variable for locking */
/* need to call spin_lock_irqsave and
spin_unlock_irqrestore in the same function, or your
code may break on some architectures */
void spin_unlock_irqrestore(spinlock_t *lock,
unsigned long flags);
void spin_unlock_irq(spinlock_t *lock);
void spin_unlock_bh(spinlock_t *lock);
Locking Traps
It is very hard to manage concurrency
What can possibly go wrong?
Ambiguous Rules
Shared data structure D, protected by lock L
function A() {
lock(&L);
/* call function B() that accesses D */
unlock(&L);
}
If function B() calls lock(&L), we have
a deadlock
Ambiguous Rules
Solution
Have clear entry points to access data
structures
Document assumptions about locking
Lock Ordering Rules
function A() {
function B() {
lock(&L1);
lock(&L2);
/* access D */
unlock(&L2);
unlock(&L1)
}
lock(&L2);
lock(&L1);
/* access D */
unlock(&L1);
unlock(&L2)
}
- Multiple locks should always be acquired in the same order
- Easier said than done
Lock Ordering Rules
function A() {
function B() {
lock(&L1);
X();
unlock(&L1)
lock(&L2);
Y();
unlock(&L2)
}
}
function X() {
function Y() {
lock(&L2);
/* access D */
unlock(&L2);
}
lock(&L1);
/* access D */
unlock(&L1);
}
Lock Ordering Rules of Thumb
Choose a lock ordering that is local to your
code before taking a lock belonging to a more
central part of the kernel
Lock of central kernel code likely has more
users (more contention)
Obtain the mutex first before taking the
spinlock
Grabbing a mutex (which can sleep) inside a
spinlock can lead to deadlocks
Fine- Versus Coarse-Grained Locking
Coarse-grained locking
Poor concurrency
Fine-grained locking
Need to know which one to acquire
And which order to acquire
At the device driver level
Start with coarse-grained locking
Refine the granularity as contention arises
Can enable lockstat to check lock holding time
BKL
Kernel used to have “big kernel lock”
Giant spinlock introduced in Linux 2.0
Only one CPU could be executing locked
kernel code at any time
BKL has been removed
https://lwn.net/Articles/384855/
https://www.linux.com/learn/tutorials/447301:w
hats-new-in-linux-2639-ding-dong-the-bigkernel-lock-is-dead
Alternatives to Locking
Lock-free algorithms
Atomic variables
Bit operations
seqlocks
Read-copy-update (RCU)
Lock-Free Algorithms
Circular buffer
Producer places data into one end of an array
When the end of the array is reached, the
producer wraps back
Consumer removes data from the other end
Lock-Free Algorithms
Producer and consumer can access buffer
concurrently without race conditions
Always store the value before updating the
index into the array
Need to make sure that producer/consumer
indices do not overrun each other
A generic circular buffer is available
See <linux/kfifo.h>
ATOMIC VARIABLES
Atomic Variables
If the shared resource is an integer value
Locking is overkill (if supported by processor)
The kernel provides atomic types
atomic_t - integer
atomic64_t – long integer
Both types must be accessed through special
functions (See <asm/atomic.h>)
SMP safe
Atomic Variables
Atomic operations
atomic_sub(amount, &account1);
atomic_add(amount, &account2);
A higher level locking must be used
Bit Operations
Atomic bit operations
See <asm/bitops.h>
SMP safe
OTHER SYNCHRONIZATION
MECHANISMS
Read-Copy-Update (RCU)
Assumptions
Reads are common
Writes are rare
Resources accessed via pointers
All references to those resources held by atomic
code
Read-Copy-Update
Basic idea
The writing thread makes a copy
Make changes to the copy
Switch a few pointers to commit changes
Deallocate the old version when all references
to the old version are gone
EVEN MORE…
seqlocks
sequential lock
Designed to protect small, simple, and frequently
accessed resource
Write access is rare
Must obtain an exclusive lock (spinlock)
Allow readers free access to the resource
Lockless
Operation
Check for collisions with writers
Retry as needed
Not for protecting pointers
seqlocks
Expected non-blocking reader usage:
do {
seq = read_seqbegin(&foo);
...
} while (read_seqretry(&foo, seq));
lglock (local/global locks)
Fast per-cpu access
Allows access to other cpu data (slow)
Implementation
per-CPU array of spinlocks
Can only be declared as global variables to
avoid overhead and keep things simple
http://lwn.net/Articles/401738/
brlocks
Sat Oct 5 14:19:39 2013 -0400
no need to keep brlock macros anymore...
0f6ed63b170778b9c93fb0ae4017f110c9ee64
16
Reader/Writer Semaphores
Allow multiple concurrent readers
Single writer (for infrequent writes)
Too many writers can lead to reader
starvation (unbounded waiting)
#include <linux/rwsem.h>
Do not follow the return value convention
E.g., returns 1 if successful
Not interruptible
Reader/Writer Spinlocks
Analogous to the reader/writer semaphores
Allow multiple readers to enter a critical
section
Provide exclusive access for writers
#include <linux/spinlock.h>
Reader/Writer Spinlocks
To declare and initialize, there are two ways
/* static way */
rwlock_t my_rwlock = RW_LOCK_UNLOCKED;
/* dynamic way */
rwlock_t my_rwlock;
rwlock_init(&my_rwlock);
Reader/Writer Spinlocks
Similar functions are available
void
void
void
void
read_lock(rwlock_t *lock);
read_lock_irqsave(rwlock_t *lock, unsigned long flags);
read_lock_irq(rwlock_t *lock);
read_lock_bh(rwlock_t *lock);
void read_unlock(rwlock_t *lock);
void read_unlock_irqrestore(rwlock_t *lock,
unsigned long flags);
void read_unlock_irq(rwlock_t *lock);
void read_unlock_bh(rwlock_t *lock);
Reader/Writer Spinlocks
Similar functions are available
void
void
void
void
write_lock(rwlock_t *lock);
write_lock_irqsave(rwlock_t *lock, unsigned long flags);
write_lock_irq(rwlock_t *lock);
write_lock_bh(rwlock_t *lock);
void write_unlock(rwlock_t *lock);
void write_unlock_irqrestore(rwlock_t *lock,
unsigned long flags);
void write_unlock_irq(rwlock_t *lock);
void write_unlock_bh(rwlock_t *lock);