Transcript General

UCDavis, ecs150
Fall 2007
:
Operating System
ecs150 Fall 2007
#4: Memory Management
(chapter 5)
Dr. S. Felix Wu
Computer Science Department
University of California, Davis
http://www.cs.ucdavis.edu/~wu/
[email protected]
10/25/2007
ecs150, Fall 2007
1
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
2
UCDavis, ecs150
Fall 2007
file volume
with
executable programs
Modified (dirty)
pages are pushed to
backing store (swap)
on eviction.
text
data
BSS
user stack
args/env
Fetches for clean text
or data are typically
fill-from-file.
kernel
Paged-out pages are
fetched from backing
store when needed.
Initial references to user
stack and BSS are satisfied
by zero-fill on demand.
10/25/2007
ecs150, Fall 2007
3
UCDavis, ecs150
Fall 2007

Logical vs. Physical Address
The concept of a logical address space that is
bound to a separate physical address space is
central to proper memory management.
– Logical address – generated by the CPU; also referred
to as virtual address.
– Physical address – address seen by the memory unit.

Logical and physical addresses are the same in
compile-time and load-time address-binding
schemes; logical (virtual) and physical addresses
differ in execution-time address-binding scheme.
10/25/2007
ecs150, Fall 2007
4
UCDavis, ecs150
Fall 2007
Memory-Management Unit
(MMU)



Hardware device that maps
virtual to physical address.
In MMU scheme, the value in the
relocation register is added to
every address generated by a user
process at the time it is sent to
memory.
The user program deals with
logical addresses; it never sees
the real physical addresses.
10/25/2007
ecs150, Fall 2007
CPU
Virtual
address
MMU
Data
Physical
address
Memory
5
UCDavis, ecs150
Fall 2007







Paging: Page and Frame
Logical address space of a process can be noncontiguous;
process is allocated physical memory whenever the latter is
available.
Divide physical memory into fixed-sized blocks called frames
(size is power of 2, between 512 bytes and 8192 bytes).
Divide logical memory into blocks of same size called pages.
Keep track of all free frames.
To run a program of size n pages, need to find n free frames and
load program.
Set up a page table to translate logical to physical addresses.
Internal fragmentation.
10/25/2007
ecs150, Fall 2007
6
UCDavis, ecs150
Fall 2007
frames
10/25/2007
ecs150, Fall 2007
7
UCDavis, ecs150
Fall 2007
10/25/2007
Address Translation Architecture
ecs150, Fall 2007
8
UCDavis, ecs150
Fall 2007
Address Translation Scheme

Address generated by CPU is divided into:
– Page number (p) – used as an index into a page
table which contains base address of each page
in physical memory.
– Page offset (d) – combined with base address to
define the physical memory address that is sent
to the memory unit.
10/25/2007
ecs150, Fall 2007
9
UCDavis, ecs150
Fall 2007
Virtual Memory
MAPPING
in MMU
10/25/2007
ecs150, Fall 2007
10
UCDavis, ecs150
Fall 2007
shared by all user processes
10/25/2007
ecs150, Fall 2007
11
UCDavis, ecs150
Fall 2007
kernel
10/25/2007
ecs150, Fall 2007
12
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
13
UCDavis, ecs150
Fall 2007
executable
file
virtual
memory
(big)
header
text
text
data
idata
data
wdata
symbol
table, etc.
BSS
program
sections
physical
memory
(small)
backing
storage
pageout/eviction
user stack
args/env
page fetch
kernel
process
segments
physical
page frames
virtual-to-physical
translations
How to represent 
10/25/2007
ecs150, Fall 2007
MAPPING
in MMU
14
UCDavis, ecs150
Fall 2007
Paging
Advantages?
 Disadvantages?

10/25/2007
ecs150, Fall 2007
15
UCDavis, ecs150
Fall 2007



Fragmentation
External Fragmentation – total memory space exists to
satisfy a request, but it is not contiguous.
Internal Fragmentation – allocated memory may be
slightly larger than requested memory; this size difference
is memory internal to a partition, but not being used.
Reduce external fragmentation by compaction
– Shuffle memory contents to place all free memory together in one
large block.
– Compaction is possible only if relocation is dynamic, and is done
at execution time.
– I/O problem


10/25/2007
Latch job in memory while it is involved in I/O.
Do I/O only into OS buffers.
ecs150, Fall 2007
16
UCDavis, ecs150
Fall 2007
Page size?
Page Table Size?
10/25/2007
ecs150, Fall 2007
17
UCDavis, ecs150
Fall 2007
1 page = 212
220 pages
222 bytes
4 MB
32 bits
Address bus
232 bytes
1 page = 4K bytes
256M bytes main memory
10/25/2007
ecs150, Fall 2007
18
UCDavis, ecs150
Fall 2007
Page Table Entry
referenced modified
caching
disabled
10/25/2007
present/absent
protection
ecs150, Fall 2007
page frame number
19
UCDavis, ecs150
Fall 2007
Free Frames
Before allocation
10/25/2007
After allocation
ecs150, Fall 2007
20
UCDavis, ecs150
Fall 2007
Page Faults

Page table access
Load the missing page (replace one)
Re-access the page table access.

How large is the page table?


– 232 address space, 4K (212) size page.
– How many entries? 220 entries (1 MB).
– If 246, you need to access to both segment table and
page table…. (226 GB or 216 TB)

Cache the page table!!
10/25/2007
ecs150, Fall 2007
21
UCDavis, ecs150
Fall 2007

Page Faults
Hardware Trap
– /usr/src/sys/i386/i386/trap.c

VM page fault handler  vm_fault()
– /usr/src/sys/vm/vm_fault.c
10/25/2007
ecs150, Fall 2007
22
UCDavis, ecs150
Fall 2007
/usr/src/sys/vm/vm_map.h
On the hard disk or Cache – Page Faults
How to implement?
10/25/2007
ecs150, Fall 2007
23
UCDavis, ecs150
Fall 2007
Implementation of Page Table





Page table is kept in main memory.
Page-table base register (PTBR) points to the page table.
Page-table length register (PRLR) indicates size of the
page table.
In this scheme every data/instruction access requires two
memory accesses. One for the page table and one for the
data/instruction.
The two memory access problem can be solved by the use
of a special fast-lookup hardware cache called associative
memory or translation look-aside buffers (TLBs)
10/25/2007
ecs150, Fall 2007
24
UCDavis, ecs150
Fall 2007
Two Issues
Virtual Address Access Overhead
 The size of the page table

10/25/2007
ecs150, Fall 2007
25
UCDavis, ecs150
Fall 2007
TLB (Translation Lookaside Buffer)

Associative Memory:
– expensive, but fast -- parallel searching

TLB: select a small number of page table
entries and store them in TLB
virt-page
10/25/2007
modified protection page frame
140
1
RW
31
20
0
RX
38
130
1
RW
29
129
1
RW
62
ecs150, Fall 2007
26
UCDavis, ecs150
Fall 2007

Associative Memory
Associative memory – parallel search
Page #
Frame #
Address translation (A´, A´´)
– If A´ is in associative register, get frame # out.
– Otherwise get frame # from page table in memory
10/25/2007
ecs150, Fall 2007
27
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
28
UCDavis, ecs150
Fall 2007
Paging Hardware With TLB
TLB Miss
Versus
Page Fault
10/25/2007
ecs150, Fall 2007
29
UCDavis, ecs150
Fall 2007
Hardware or Software

TLB is part of MMU (hardware):
– Automated page table entry (pte) update
– OS handling TLB misses

Why software????
– Reduce HW complexity
– Flexibility in Paging/TLB content management
for different applications
10/25/2007
ecs150, Fall 2007
30
UCDavis, ecs150
Fall 2007

Inverted Page Table
264 address space with 4K pages
– page table: 252 ~ 1 million gigabytes
10/25/2007
ecs150, Fall 2007
31
UCDavis, ecs150
Fall 2007

Inverted Page Table (iPT)
264 address space with 4K pages
– page table: 252 ~ 1 million gigabytes

One entry per one page of real memory.
– 128 MB with 4K pages ==> 214 entries

Disadvantage:
– For every memory access, we need to search
for the whole paging hash list.
10/25/2007
ecs150, Fall 2007
32
UCDavis, ecs150
Fall 2007
10/25/2007
Page Table
ecs150, Fall 2007
33
UCDavis, ecs150
Fall 2007
10/25/2007
Inverted Page Table
ecs150, Fall 2007
34
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
35
UCDavis, ecs150
Fall 2007

Brainstorming
How to design an “inverted page table”
such that we can do it “faster”?
10/25/2007
ecs150, Fall 2007
36
UCDavis, ecs150
Fall 2007
Hashed Page Tables

Common in address spaces > 32 bits.

The virtual page number is hashed into a page
table. This page table contains a chain of elements
hashing to the same location.

Virtual page numbers are compared in this chain
searching for a match. If a match is found, the
corresponding physical frame is extracted.
10/25/2007
ecs150, Fall 2007
37
UCDavis, ecs150
Fall 2007
virtual page#
Hash
virtual page#
10/25/2007
ecs150, Fall 2007
physical page#
38
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
39
UCDavis, ecs150
Fall 2007

iPT/Hash Performance
Issues
still do TLB (hw/sw)
– if we can hit the TLB, we do NOT need to
access the iPT and hash.

caching the iPT and/or Hash Table??
– any benefits under regular on-demand caching
schemes?

hardware support for iPT/Hash
10/25/2007
ecs150, Fall 2007
40
UCDavis, ecs150
Fall 2007
TLB (Translation Lookaside
Buffer)

Associative Memory:
– expensive, but fast -- parallel searching

TLB: select a small number of page table
entries and store them in TLB
virt-page
10/25/2007
modified protection page frame
140
1
RW
31
20
0
RX
38
130
1
RW
29
129
1
RW
62
ecs150, Fall 2007
41
UCDavis, ecs150
Fall 2007

Paging Virtual Memory
CPU address-ability: 32 bits -- 232 bytes!!
– 232 is 4 Giga bytes (un-segmented).
– Pentium II can support up to 246 (64 Tera) bytes


32 bits – address, 14 bits – segment#, 2 bits – protection.
Very large addressable space (64 bits), and
relatively smaller physical memory available…
– Let the programs/processes enjoy a much larger virtual
space!!
10/25/2007
ecs150, Fall 2007
42
UCDavis, ecs150
Fall 2007
VM with 1 Segment
MAPPING
in MMU
10/25/2007
ecs150, Fall 2007
43
UCDavis, ecs150
Fall 2007
Eventually…
???
MAPPING
in MMU
10/25/2007
ecs150, Fall 2007
44
UCDavis, ecs150
Fall 2007

On-Demand Paging
On-demand paging:
– we have to kick someone out…. But which
one?
– Triggered by page faults.

Loading in advance. (Predictive/Proactive)
– try to avoid page fault at all.
10/25/2007
ecs150, Fall 2007
45
UCDavis, ecs150
Fall 2007

Demand Paging
On a page fault the OS:
– Save user registers and process state.
– Determine that exception was page fault.
– Find a free page frame.
– Issue read from disk to free page frame.
– Wait for seek and latency and transfers page
into memory.
– Restore process state and resume execution.
10/25/2007
ecs150, Fall 2007
46
UCDavis, ecs150
Fall 2007
Page Replacement
1.
Find the location of the desired page on disk.
2.
Find a free frame:
- If there is a free frame, use it.
- If there is no free frame, use a page
replacement algorithm to select a victim frame.
3.
Read the desired page into the (newly) free frame.
Update the page and frame tables.
4.
Restart the process.
10/25/2007
ecs150, Fall 2007
47
UCDavis, ecs150
Fall 2007
Page Replacement
Algorithms

minimize page-fault rate
10/25/2007
ecs150, Fall 2007
48
UCDavis, ecs150
Fall 2007
Page Replacement
Optimal
 FIFO
 Least Recently Used (LRU)
 Not Recently Used (NRU)
 Second Chance
 Clock Paging

10/25/2007
ecs150, Fall 2007
49
UCDavis, ecs150
Fall 2007
Optimal
Estimate the next page reference time in the
future.
 Select the longest one.

10/25/2007
ecs150, Fall 2007
50
UCDavis, ecs150
Fall 2007
LRU

an implementation issue
– I need to keep tracking the last modification or
access time for each page
– timestamp: 32 bits

How to implement LRU efficiently?
10/25/2007
ecs150, Fall 2007
51
UCDavis, ecs150
Fall 2007

LRU Approximation
Reference bit (one-bit timestamp)
– With each page associate a bit, initially = 0
– When page is referenced bit set to 1.
– Replace the one which is 0 (if one exists). We do
not know the order, however.

Second chance
– Need reference bit.
– Clock replacement.
– If page to be replaced (in clock order) has
reference bit = 1. then:



10/25/2007
set reference bit 0.
leave page in memory.
replace next page (in clock order), subject to same
rules.
ecs150, Fall 2007
52
UCDavis, ecs150
Fall 2007
NRU
Not Recently Used
 Clear the bits every 20 milliseconds.

referenced modified
What is the problem??
10/25/2007
ecs150, Fall 2007
53
UCDavis, ecs150
Fall 2007
Page Replacement??
Efficient Approximation of LRU
 No periodic refreshing


How to do that?
10/25/2007
ecs150, Fall 2007
54
UCDavis, ecs150
Fall 2007
Second Chance/Clock Paging
Do not need any “periodic” bit clearing
 Have a “current candidate pointer” moving
along the “clock”
 Choose the first page with zero flag(s)

10/25/2007
ecs150, Fall 2007
55
UCDavis, ecs150
Fall 2007
Clock Pages
B
C
G
A
A
B
C
D
D
F
E
10/25/2007
E
F
ecs150, Fall 2007
56
UCDavis, ecs150
Fall 2007
Clock Pages
B
C
H
G
G
B
C
D
D
F
E
10/25/2007
E
F
ecs150, Fall 2007
57
UCDavis, ecs150
Fall 2007
Clock Pages
B
C
H
G
G
B
C
D
D
F
E
10/25/2007
E
F
ecs150, Fall 2007
58
UCDavis, ecs150
Fall 2007
Clock Pages
B
H
I
G
G
B
H
D
D
F
E
10/25/2007
E
F
ecs150, Fall 2007
59
UCDavis, ecs150
Fall 2007
Clock Pages
B
H
I
G
G
B
H
D
D
F
E
10/25/2007
E
F
ecs150, Fall 2007
60
UCDavis, ecs150
Fall 2007
Clock Pages
I
H
G
G
I
H
D
D
F
E
10/25/2007
E
F
ecs150, Fall 2007
61
UCDavis, ecs150
Fall 2007
Evaluation
the page-fault rate.
 Evaluate algorithm by running it on a
particular string of memory references
(reference string) and computing the
number of page faults on that string.
 In all our examples, the reference string is
2, 3, 2, 1, 5, 2, 4, 5, 3, 2, 5, 2.

10/25/2007
ecs150, Fall 2007
62
UCDavis, ecs150
Fall 2007
3 physical pages
FIFO
2, 3, 2, 1, 5, 2, 4, 5, 3, 2, 5, 2
2
2
3
2
3
2
3
1
5 (2)
3
1
5
2 (3)
1
Page Faults
10/25/2007
ecs150, Fall 2007
63
UCDavis, ecs150
Fall 2007
Page Replacement
2, 3, 2, 1, 5, 2, 4, 5, 3, 2, 5, 2
 OPT/LRU/FIFO/CLOCK and 3 pages
 how many page faults?

10/25/2007
ecs150, Fall 2007
64
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
65
UCDavis, ecs150
Fall 2007

Thrashing
If a process does not have “enough” pages, the
page-fault rate is very high. This leads to:
– low CPU utilization.
– operating system thinks that it needs to increase the
degree of multiprogramming.
– another process added to the system.

Thrashing  a process is busy swapping pages in
and out.
10/25/2007
ecs150, Fall 2007
66
UCDavis, ecs150
Fall 2007

Thrashing
Why does paging work?
Locality model
– Process migrates from one locality to another.
– Localities may overlap.

Why does thrashing occur?
 size of locality > total memory size
10/25/2007
ecs150, Fall 2007
67
UCDavis, ecs150
Fall 2007
How to Handle Thrashing?

Brainstorming!!
10/25/2007
ecs150, Fall 2007
68
UCDavis, ecs150
Fall 2007
Locality In A Memory-Reference Pattern
10/25/2007
ecs150, Fall 2007
69
UCDavis, ecs150
Fall 2007
FreeBSD VM
10/25/2007
ecs150, Fall 2007
70
UCDavis, ecs150
Fall 2007
/usr/src/sys/vm/vm_map.h
How to implement?
10/25/2007
ecs150, Fall 2007
71
UCDavis, ecs150
Fall 2007
Text
Initialized
Data
(Copy on Write)
Unintialized
Data
(Zero-Fill)
Anonymous
Object
Stack
(Zero-Fill)
Anonymous
Object
10/25/2007
ecs150, Fall 2007
72
UCDavis, ecs150
Fall 2007
Page-level Allocation
• Kernel maintains a list of free physical pages.
• Two principal clients:
the paging system
the kernel memory allocator
10/25/2007
ecs150, Fall 2007
73
UCDavis, ecs150
Fall 2007
Memory allocation
physical page
Page-level
allocator
Kernel Memory
Allocator
Network
buffers
10/25/2007
Data
structures
Paging
system
temp
storage
process
ecs150, Fall 2007
Buffer cache
74
UCDavis, ecs150
Fall 2007
kernel text
initialized/un-initialized data
kernel
malloc
kernel
I/O
10/25/2007
ecs150, Fall 2007
network
buffer
75
UCDavis, ecs150
Fall 2007
Why Kernel MA is special?




Typical request is for less than 1 page
Originally, kernel used statically allocated, fixed size
tables, but it is too limited
Kernel requires a general purpose allocator for both
large and small chunks of memory.
handles memory requests from kernel modules, not
user level applications
– pathname translation routine, STREAMS or I/O buffers,
zombie structures, table table entries (proc structure etc)
10/25/2007
ecs150, Fall 2007
76
UCDavis, ecs150
Fall 2007
utilization factor = requested/required memory

–
–

•


Useful metric that factors in fragmentation.
50% considered good
KMA must be fast since extensively used
Simple API similar to malloc and free.


KMA Requirements
desirable to free portions of allocated space, this is different from typical
user space malloc and free interface
Properly aligned allocations: for example 4 byte alignment
Support burst-usage patterns
Interaction with paging system – able to borrow pages from
paging system if running low
10/25/2007
ecs150, Fall 2007
77
UCDavis, ecs150
Fall 2007



KMA Schemes
Resource Map Allocator
Simple Power-of-Two Free Lists
The McKusick-Karels Allocator
– Freebsd

The Buddy System
– Linux



SVR4 Lazy Buddy Allocator
Mach-OSF/1 Zone Allocator
Solaris Slab Allocator
– Freebsd, linux, Solaris,
10/25/2007
ecs150, Fall 2007
78
UCDavis, ecs150
Fall 2007





Resource Map Allocator
Resource map is a set of <base,size> pairs that monitor areas of
free memory
Initially, pool described by a single map entry =
<pool_starting_address, pool_size>
Allocations result in pool fragmenting with one map entry for
each contiguous free region
Entries sorted in order of increasing base address
Requests satisfied using one of three policies:
– First fit – Allocates from first free region with sufficient space.
UNIX, fasted, fragmentation is concern
– Best fit – Allocates from smallest that satisfies request. May leave
several regions that are too small to be useful
– Worst fit - Allocates from largest region unless perfect fit is found. Goal
is to leave behind larger regions after allocation
10/25/2007
ecs150, Fall 2007
79
UCDavis, ecs150
Fall 2007
offset_t rmalloc(size)
void rmfree(base, size)
<0,1024>
after: rmalloc(256), rmalloc(320), rmfree(256,128)
<256,128>
<576,448>
after: rmfree(128,128)
<128,256>
<128,32>
10/25/2007
<576,448>
<288,64>
<544,128>
ecs150, Fall 2007
<832,32>
80
UCDavis, ecs150
Fall 2007
Resource Map -Good/Bad

Advantages:
– simple, easy to implement
– not restricted to memory allocation, any collection
of objects that are sequentially ordered and require
allocation and freeing in contiguous chunks.
– Can allocate exact size within any alignment
restrictions. Thus no internal fragmentation.
– Client may release portion of allocated memory.
– adjacent free regions are coalesced
10/25/2007
ecs150, Fall 2007
81
UCDavis, ecs150
Fall 2007
Resource Map -Good/Bad
• Disadvantages:
 Map may become highly fragmented resulting in low
utilization. Poor for performing large requests.
 Resource map size increases with fragmentation
 static table will overflow
 dynamic table needs it’s own allocator
 Map must be sort for free region coalescing. Sorting
operations are expensive.
 Requires linear search of map to find free region
that matches allocation request.
 Difficult to return borrowed pages to paging system.
10/25/2007
ecs150, Fall 2007
82
UCDavis, ecs150
Fall 2007
Simple Power of Twos

has been used to implement malloc() and free() in the
user-level C library (libc).

Do you know how it is implemented?
10/25/2007
ecs150, Fall 2007
83
UCDavis, ecs150
Fall 2007
Simple Power of Twos



has been used to implement malloc() and free() in the
user-level C library (libc).
Uses a set of free lists with each list storing a particular
size of buffer. Buffer sizes are a power of two.
Each buffer has a one word header
– when free, header stores pointer to next free list element
– when allocated, header stores pointer to associated free list
(where it is returned to when freed). Alternatively, header may
contain size of buffer
10/25/2007
ecs150, Fall 2007
84
UCDavis, ecs150
Fall 2007

How to allocate?
– char *ptr = (char *) malloc(100);
10/25/2007
ecs150, Fall 2007
85
UCDavis, ecs150
Fall 2007

How to allocate?
– char *ptr = (char *) malloc(100);
10/25/2007
ecs150, Fall 2007
86
UCDavis, ecs150
Fall 2007

How to allocate?
– char *ptr = (char *) malloc(100);
10/25/2007
ecs150, Fall 2007
87
UCDavis, ecs150
Fall 2007

How to free?
– char *ptr = (char *) malloc(100);
– free(ptr);
10/25/2007
ecs150, Fall 2007
88
UCDavis, ecs150
Fall 2007
Extra FOUR bytes for a pointer or size
Free  next Free block
Used  size
10/25/2007
ecs150, Fall 2007
89
UCDavis, ecs150
Fall 2007


free list
One word header per buffer (pointer)
– malloc(X): size = roundup(X + sizeof(header))
– roundup(Y) = 2n, where 2n-1 < Y <= 2n

free(buf) must free entire buffer.
10/25/2007
ecs150, Fall 2007
90
UCDavis, ecs150
Fall 2007
Simple and reasonably fast
 eliminates linear searches and fragmentation.

– Bounded time for allocations when buffers are
available
familiar API
 simple to share buffers between kernel modules
since free’ing a buffer does not require knowing
its size

10/25/2007
ecs150, Fall 2007
91
UCDavis, ecs150
Fall 2007

Rounding requests to power of 2 results in wasted
memory and poor utilization.
– aggravated by requiring buffer headers since it is not unusual
for memory requests to already be a power-of-two.



no provision for coalescing free buffers since buffer
sizes are generally fixed.
no provision for borrowing pages from paging system
although some implementations do this.
no provision for returning unused buffers to page
allocator
10/25/2007
ecs150, Fall 2007
92
UCDavis, ecs150
Fall 2007
Simple Power of Two
void *malloc (size)
{
int ndx = 0;
/* free list index */
int bufsize = 1 << MINPOWER /* size of smallest buffer */
size += 4;
/* Add for header */
assert (size <= MAXBUFSIZE);
while (bufsize < size) {
ndx++;
bufsize <<= 1;
}
/* ndx is the index on the freelist array from which a buffer
* will be allocated */
}
10/25/2007
ecs150, Fall 2007
93
UCDavis, ecs150
Fall 2007
Can we eliminate the need for the Extra FOUR bytes?
10/25/2007
ecs150, Fall 2007
94
UCDavis, ecs150
Fall 2007
McKusick-Karels Allocator
/usr/src/sys/kern/kern_malloc.c





Improved power of twos implementation
All buffers within a page must be of equal size
Adds page usage array, kmemsizes[], to manage pages
Managed Memory must be contiguous pages
Does not require buffer headers to indicate page size.
When freeing memory, free(buff) simply masks of the
lower order bit to get the page address (actually the
page offset = pg) which is used as an index into the
kmemsizes array.
10/25/2007
ecs150, Fall 2007
95
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
96
UCDavis, ecs150
Fall 2007
1 page = 212 (4K) bytes
Separate 16 28-bytes blocks
28
10/25/2007
ecs150, Fall 2007
97
UCDavis, ecs150
Fall 2007
1 page = 212 (4K) bytes
Separate 64 26-bytes blocks
26
10/25/2007
ecs150, Fall 2007
98
UCDavis, ecs150
Fall 2007
10/25/2007
On-Demand Page/kmem allocation
ecs150, Fall 2007
99
UCDavis, ecs150
Fall 2007
How would we know the size of this piece of memory?
free(ptr);
10/25/2007
ecs150, Fall 2007
100
UCDavis, ecs150
Fall 2007
10/25/2007
How to point to the next free block?
ecs150, Fall 2007
101
UCDavis, ecs150
Fall 2007
10/25/2007
Used blocks:
Free blocks:
check the page#
pointer
ecs150, Fall 2007
102
UCDavis, ecs150
Fall 2007





McKusick-Karels Allocator
Improved power of twos implementation
All buffers within a page must be of equal size
Adds page usage array, kmemsizes[], to manage pages
Managed Memory must be contiguous pages
Does not require buffer headers to indicate page size.
When freeing memory, free(buff) simply masks of the
lower order bit to get the page address (actually the page
offset = pg) which is used as an index into the kmemsizes
array.
10/25/2007
ecs150, Fall 2007
103
UCDavis, ecs150
Fall 2007
McKusick-Karels Allocator
• Disadvantages:
 similar drawbacks to simple power-of-twos allocator
 vulnerable to burst-usage patterns since no provision
for moving buffers between lists
• Advantages:
 eliminates space wastage in common case where
allocation request is a power-of-two
 optimizes round-up computation and eliminates it if
size is known at compile time
10/25/2007
ecs150, Fall 2007
104
UCDavis, ecs150
Fall 2007
The Buddy System

Another interesting power-of-2 memory
allocation used in Linux Kernel
10/25/2007
ecs150, Fall 2007
105
UCDavis, ecs150
Fall 2007
Free list
Buddy System
32
64
128
256
512
0
1023
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
106
UCDavis, ecs150
Fall 2007
Free list
32
64 128
256 512
0
1023
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Free list
32
64 128
256 512
0
10/25/2007
1023
ecs150, Fall 2007
107
UCDavis, ecs150
Fall 2007
32
Free list
64 128
256 512
0
1023
C
10/25/2007
D
D’
B’
F
F’
ecs150, Fall 2007
E’
108
UCDavis, ecs150
Fall 2007
Free list
Buddy System
32
64
128
256
512
0
1023
A
A’
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
109
UCDavis, ecs150
Fall 2007
Buddy System
Free list
32
64
128
256
512
0
1023
B
B’
A’
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
110
UCDavis, ecs150
Fall 2007
Buddy System
Free list
32
64
128
256
512
0
1023
B
B’
A’
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
111
UCDavis, ecs150
Fall 2007
Buddy System
Free list
32
64
128
256
512
0
1023
B
C
C’
A’
1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
112
UCDavis, ecs150
Fall 2007
Buddy System
Free list
32
64
128
256
512
0
1023
B
C
D D’
A’
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
113
UCDavis, ecs150
Fall 2007
Buddy System
Free list
32
64
128
256
512
0
1023
B
C
D D’
F
F’
E’
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
114
UCDavis, ecs150
Fall 2007
releasing a block
Why “SIZE”?
Free list
32
64
128
256
512
0
1023
B
C
D D’
F
F’
E’
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
115
UCDavis, ecs150
Fall 2007
releasing a block
Free list
32
64
128
256
512
0
1023
B
C
D D’
F
F’
E’
1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
116
UCDavis, ecs150
Fall 2007
releasing a block
Free list
32
64
128
256
512
0
1023
B
C
D D’
F
F’
E’
1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
117
UCDavis, ecs150
Fall 2007
releasing a block
Free list
32
64
128
256
512
0
1023
B
C
D D’
F
F’
E’
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
118
UCDavis, ecs150
Fall 2007
Merging free blocks
Free list
32
64
128
256
512
0
1023
B
B’
F
F’
E’
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Bitmap (32-Bytes chunks)
In-use Free
allocate(256), allocate(128), allocate(64),
allocate(128), release(C, 128), release (D, 64)
10/25/2007
1
ecs150, Fall 2007
119
UCDavis, ecs150
Fall 2007
sizeof(struct proc)?
10/25/2007
ecs150, Fall 2007
120
UCDavis, ecs150
Fall 2007
sizeof(struct proc)?
452 bytes
 How should we allocate the memory?

– Power of 2 --- 512 bytes, no so bad
– IF (Internal Fragmentation): 60 bytes or 12%
10/25/2007
ecs150, Fall 2007
121
UCDavis, ecs150
Fall 2007
struct with 300 bytes
IF?
 How many pages (4K per page) needed for
hosting 16 entries?

10/25/2007
ecs150, Fall 2007
122
UCDavis, ecs150
Fall 2007
“Slab”
One or more pages for one slab
 One slab dedicated to ONE TYPE of
objects (with the same size)

– Breaking the power-of-2 rule
– Example, a 2-pages slab can hold 27 entities of
300 bytes (versus 16 entities using 512 bytes
blocks).
10/25/2007
ecs150, Fall 2007
123
UCDavis, ecs150
Fall 2007
10/25/2007
Slab Allocator
ecs150, Fall 2007
124
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
125
UCDavis, ecs150
Fall 2007
Design of Slab Allocator
cachep = kmem_cache_create (name, size, align, ctor, dtor);
page-level allocator
back end
vnode
cache
proc
cache
mbuf
cache
msgb
cache
front end
vnode
vnode
vnode
vnode
proc
proc
proc
mbuf
mbuf
msgb
msgb
msgb
msgb
msgb
Objects in use by the kernel
10/25/2007
ecs150, Fall 2007
126
UCDavis, ecs150
Fall 2007
VM
file volume
with
executable programs
Modified (dirty)
pages are pushed to
backing store (swap)
on eviction.
text
data
BSS
user stack
args/env
Fetches for clean text
or data are typically
fill-from-file.
kernel
Paged-out pages are
fetched from backing
store when needed.
Initial references to user
stack and BSS are satisfied
by zero-fill on demand.
10/25/2007
ecs150, Fall 2007
127
UCDavis, ecs150
Fall 2007
Text
Initialized
Data
(Copy on Write)
Unintialized
Data
(Zero-Fill)
Anonymous
Object
Stack
(Zero-Fill)
Anonymous
Object
10/25/2007
ecs150, Fall 2007
128
UCDavis, ecs150
Fall 2007
“mmap”

Memory Mapped File
– Read/write versus direct memory access
– Sharing a file among multiple processes
10/25/2007
ecs150, Fall 2007
129
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
130
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
131
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
132
UCDavis, ecs150
Fall 2007
10/25/2007
ecs150, Fall 2007
133
UCDavis, ecs150
Fall 2007
“mmap”

Memory Mapped File
– Read/write versus direct memory access
– Sharing a file among multiple processes

Two modes: Shared or Private
– Applications?
10/25/2007
ecs150, Fall 2007
134
UCDavis, ecs150
Fall 2007
FORK
10/25/2007
ecs150, Fall 2007
135
UCDavis, ecs150
Fall 2007
10/25/2007
Private Mapping: Debugging
ecs150, Fall 2007
136