Operating Systems - Binghamton University
Download
Report
Transcript Operating Systems - Binghamton University
Operating Systems
Course Review & Extensions
Distributed Systems
Fault Tolerance
© 2011, D. J. Foreman
2
Byzantine Generals Problem
Byzantine
■
failures = arbitrary failures
Crashes, incorrect results, etc
A
problem for fault-tolerant dist. systems
System Rules:
All loyal generals apply same action plan
b. Small # of traitors cannot force a bad plan
c. Every system must receive the same info
■ N=total # of systems
■ T=# “traitors” (i.e.; failing systems)
a.
Basic
© 2011, D. J. Foreman
solution requires N>3T (i.e.;N>=3T+1)
3
The dilemma
P1
1
1
P2
P3
0
© 2011, D. J. Foreman
4
The dilemma – pt 2
P1
0
1
P2
P3
1
Who sent the bad message, P1 or P3????
© 2011, D. J. Foreman
5
Application of the algorithm
Two
1.
stages
All Pi send messages to n-2 other Pi
Not back to msg originator
All Pi decide on an action
3. There are T rounds of msgs
For N processes,
2.
Pi sends N-1 messages in Round 1,
then (N-1)*(N-2) in Round 2,
and (N-1)*(N-2)*(N-3) in Round 3, etc.
© 2011, D. J. Foreman
6
Distributed Systems
Mutual Exclusion
Algorithms
© 2011, D. J. Foreman
7
Timestamp mechanisms
© 2011, D. J. Foreman
8
Lamport’s Algorithm -1
Assumptions:
1.
2.
Request R from Pi is time-stamped (Ti,i)
where Ti=Ci which is Pi’s CPU time
Pi has a request queue RQi ordered by >=
[see Algorithm on next slide]
Verification:
1.
2.
Rule 3b & the assumption that R are received in
order guarantees that Pi has learned about all
requests preceding R
Since >= totally orders Rn rule 3a provides mutual
exclusion
N-1 requests, N-1 replies, N-1 releases
© 2011, D. J. Foreman
9
Lamport’s Algorithm - pt 2
Rules:
Pi puts Req on RQi & sends Req to all Pi
2. When Pj gets Req, puts it on RQj & acks
3. Pi is allowed access when a&b are true:
1.
a. Pi’s own Req is at the front of RQi
b. Pi has received a Req with Tj>Ti
To release resource, Pi pulls RQi sends
time-stamped RELEASE to all other Pj
5. When Pj receives the RELEASE, Pj pulls R
from its own RQj
4.
© 2011, D. J. Foreman
10
Ricart & Agrawala’s Algorithm
efficient than Lamport’s algorithm
(needs only 2(N-1) messages)
Rules:
More
Pi puts Req on RQi & sends Req to all Pi
2. When Pj gets Req
1.
a. If Pj is not also requesting Req, Pi acks
b. If Pj IS also requesting Req, and (Tj,j)<(Ti,i), keep
(Ti,i) else Reply to Pi
When Pi gets Reply from all Pn Req granted
4. When Pi releases Req, send Reply for all
pending Req’s
3.
© 2011, D. J. Foreman
11
Locking
© 2011, D. J. Foreman
12
Locking Mechanisms
Implies
need for structuring of transactions
Constraints are required
Request rules for transactions, T:
■
Exclusive access
• granted only if no other T has ANY type of lock on
the object
■
Shared lock
• Grant if no other T has an Exclusive lock
© 2011, D. J. Foreman
13
Transactions
Well-formed
Reads only if it has a shared or exclusive lock
■ Writes only if it has an exclusive lock
■
Two
■
phase
Does not request a lock after releasing a lock
Strong
■
two phase
All unlocks are done at the end of T
© 2011, D. J. Foreman
14
Basics
Mutual Exclusion
© 2011, D. J. Foreman
15
Atomic actions
“Appear”
as if done in parallel
“Could” be interruptible
■ Places where done:
■
• Hardware – machine state switching
• Kernel code –
– Semaphores, mutexes, condvars
– Machine state switching
• Library code – when library manages switching
– Semaphores, mutexes, condvars
– Thread state switching
© 2011, D. J. Foreman
16
Threads
Two
■
types
Library-supported
• Atomic actions occur inside the library functions
• May use kernel-supported atomic actions
• May be supplied with system or added on
(Linux+pthreads vs. Windows+pthreads)
• Thread blocking is dependent on library design
■
Kernel-supported
• Atomic actions occur inside kernel code
• Thread blocking done by kernel
© 2011, D. J. Foreman
17
Mutual Exclusion
Mechanism
for critical section safety
Semaphores
Binary
■ Counting
■ Any thread can signal
■
Mutexes
■
Only locker can unlock
Monitors
Use condition variables and mutexes
■ Like a “class” in C++/Java
■
© 2011, D. J. Foreman
18
Addresses & pointers
Pointer
specifies a memory address
“Could” be a virtual address (when is it not?)
■ Must be translated to a “real” address
■ What is a pointer inside the kernel?
■ How does the kernel access user space?
■
© 2011, D. J. Foreman
19
O/S Work Flow
Initialize then create a “main process”
2. Display a user interface
3. Wait for an interrupt. 2 cases:
1.
a.
b.
CPU is idle \NO instructions are processing
A waiting process is allowed to run
h/w applies interrupt voltage
5. Processor switches to handler in kernel-mode
6. Interrupt is handled
7. Scheduler is called, then either:
4.
a.
b.
Wait for an interrupt (go back to 3a)
Resume a ready process (from 3b above)
© 2011, D. J. Foreman
20
Linux architecture
App 1
App 2
System Call Interface
Kernel Subsystems
e.g.; I/O
App n
K
e
r
n
e
l
s
p
a
c
e
Device drivers
Hardware
© 2011, D. J. Foreman
21
Interrupt handling -1
Interrupt
■
■
■
■
■
■
handlers are asynchronous
May interrupt other interrupt handlers
May run with current interrupt-level disabled
May run with all interrupts disabled
May be timing dependent
Must be fast
MUST NOT BLOCK!!!
Divided
into 2 parts
“top halves” (Interrupt handler)
■ “bottom halves” (leftover code from top half)
■
© 2011, D. J. Foreman
22
Interrupt handling -2
Top
■
■
■
■
■
halves
Asynchronous
Ack receipt (“talks” to h/w)
Copies data to/from h/w
Non-interruptible
MUST BE short & FAST!
Bottom
halves
Deferred to “later” (i.e.; when system is not busy)
■ Interrupts enabled
■ May be long, slow
■
© 2011, D. J. Foreman
23
Interrupt handling -3
Original
design:
■
32 BH’s
■
Main Int Hdlr sets a bit to get one called
No extensibility
Globally sync’d (cannot run 2 at same time)
■
■
New
design (2.6 kernels)
Introduced use of queues
■ Softirq’s & tasklets
■ Replace the BH mechanism
■
• but the work is still deferred, still called BH’s
© 2011, D. J. Foreman
24
System Designs
Why
are these good/bad ideas?
Pageable kernel memory
(found in some Unix’s, not in Linux)
■ Monolithic static kernels (Unix, VM, MVS)
■ Interruptible interrupt handlers
■
© 2011, D. J. Foreman
25
How Kernel differs from user apps
No
C library calls allowed (why?)
GNU C or newer Intel compilers only
ISO C99 extensions allowed
Hard to use floating point
■
fp mode switch in PC’s
fixed-size stack on PC’s
(8KB or 16KB for 32- or 64-bit machine)
Synch is a major concern (why?)
Portability is a design point
Small
© 2011, D. J. Foreman
26
Synchronization (in the kernel)
Must
support SMP
(Symmetric MultiProcessing)
Interrupts are asynchronous
Parts of the kernel are preemptable
(interruptable)
Kernel
© 2011, D. J. Foreman
must protect itself AND users
27
Kernel threads
(threads that run in & for the kernel)
No
address space (MM pointer=null)
No context switch to user-space to run
Schedulable!
Interruptible!
E.g.;
pdflush (dirty-page write-back)
■ Ksoftirqd (the kernel soft IRQ daemon)
■ (see next page for details)
■
© 2011, D. J. Foreman
28
Kernel thread examples & notes
pdflush
(dirty page write-back)
Free RAM drops below a threshold
■ Dirty data grows older than a threshold
■ Page-writes are queued
■ Handled when threshold is passed
■
ksoftirqd
(the soft IRQ daemon)
Queuing TCPIP packets
■ Handled after hard IRQ’s and at sched.c
■ NOT preemptable!!!
■
© 2011, D. J. Foreman
29
Page Replacement Algorithms
© 2011, D. J. Foreman
30
The Clock
Init: Create a circular list of frames,
set ptr to newest
Do_page_fault()
{ptr=ptr->next
If (no criterion used) victim found // ≈ FIFO
Else if (Referenced==0) then
If (Dirty==1) schedule for cleaning
Else {victim found} // ≈ LRU
}
© 2011, D. J. Foreman
31
WSClock
If (R==1)
LR[f]=Process CPU Time
R=0
Else //not ref’d in last cycle
if PT-LR[f]<T {victim found} // page is older than T
*T is working set window size (units=time)
LR is array of Last Reference times
Note: LRU needs hw to set time at each reference
© 2011, D. J. Foreman
32
Paging: Direct mapping
Start of
page table
Computed (virtual) address
in user space
Page # (p)
Page bits
+
Offset bits
Page Table
P
F
Frame
bits
Offset
bits
Computed (real) address
© 2011, D. J. Foreman
F=pagetable[P]
33
Paging: associative table
Computed (virtual) address
in user space
Page # (p)
Page bits
Virt.
Page #
(VPN)
Offset bits
Frame
#
P
F
Frame
bits
Offset
bits
Computed (real) address
All compares are simultaneous
© 2011, D. J. Foreman
F ≠pagetable[P]
F=match(VPN(p))
34
Paging-inverted table
Start of
page table
Computed (virtual) address
in user space
Page # (p)
Page bits
+ hash(p)
Offset bits
Page Table
F
Frame
bits
Offset
bits
Computed (real) address
© 2011, D. J. Foreman
F=pagetable[hash(P)]
35
Paging effects
Total
execution
time of Pi
Interfault
time
# frames allocated to Pi
Number of
Pages in Pi
Allocating more frames to Pi
increases i.f. time (less paging occurs for Pi)
© 2011, D. J. Foreman
36
Linux page replacement
Variant of Clock
2 linked lists
■
algorithm
Active pages (A) (referenced recently)
• Never used as victims
■
■
Inactive pages (I)
Most recently used at head of each list
New page-> inactive, marked Ref’d
MM checks all pages, if Ref’d, R=1
If inactive & R=1 already,
■
■
move to head(A)
R=0
Periodic
© 2011, D. J. Foreman
check to move from A to I
37