virtual addresses

Transcript virtual addresses

55:035
Computer Architecture and Organization
Lecture 8
55:035 Computer Architecture and Organization
1
Outline

Virtual Memory








Basics
Address Translation
Cache vs VM
Paging
Replacement
TLBs
Segmentation
Page Tables
55:035 Computer Architecture and Organization
2
The Full Memory Hierarchy
Capacity
Access Time
Cost
CPU Registers
100s Bytes
<10s ns
Cache
K Bytes
10-100 ns
1-0.1 cents/bit
Upper Level
Staging
Xfer Unit
Registers
Instr. Operands
Tape
infinite
sec-min
10
prog./compiler
1-8 bytes
Cache
Blocks
Main Memory
M Bytes
200ns- 500ns
$.0001-.00001 cents /bit
Disk
G Bytes, 10 ms
(10,000,000 ns)
-5
-6
10 - 10 cents/bit
faster
cache cntl
8-128 bytes
Memory
Pages
OS
4K-16K bytes
Files
user/operator
Mbytes
Disk
Tape
Larger
Lower Level
-8
55:035 Computer Architecture and Organization
3
Virtual Memory

Some facts of computer life…




Computers run lots of processes simultaneously
No full address space of memory for each process
Must share smaller amounts of physical memory
among many processes
Virtual memory is the answer!

Divides physical memory into blocks, assigns them to
different processes
55:035 Computer Architecture and Organization
4
Virtual Memory
Virtual memory (VM) allows main memory
(DRAM) to act like a cache for secondary
storage (magnetic disk).
 VM address translation a provides a mapping
from the virtual address of the processor to the
physical address in main memory or on disk.

Compiler assigns data to a “virtual” address.
VA translated to a real/physical somewhere in memory…
(allows any program to run anywhere;
where is determined by a particular machine, OS)
55:035 Computer Architecture and Organization
5
VM Benefit

VM provides the following benefits



Allows multiple programs to share the same physical
memory
Allows programmers to write code as though they
have a very large amount of main memory
Automatically handles bringing in data from disk
55:035 Computer Architecture and Organization
6
Virtual Memory Basics

Programs reference “virtual” addresses in a non-existent
memory



Divide physical memory into blocks, called pages


Anywhere from 512 to 16MB (4k typical)
Virtual-to-physical translation by indexed table lookup


These are then translated into real “physical” addresses
Virtual address space may be bigger than physical address
space
Add another cache for recent translations (the TLB)
Invisible to the programmer

Looks to your application like you have a lot of memory!
55:035 Computer Architecture and Organization
7
VM: Page Mapping
Process 1’s
Virtual
Address
Space
Page Frames
Process 2’s
Virtual
Address
Space
Disk
Physical Memory
55:035 Computer Architecture and Organization
8
VM: Address Translation
20 bits
12 bits
Virtual page number Page offset
Log2 of
pagesize
Per-process page table
Valid bit
Protection bits
Dirty bt
Reference bit
Page
Table
base
Physical page number Page offset
To physical memory
55:035 Computer Architecture and Organization
9
Example of virtual memory


Relieves problem of
making a program that
was too large to fit in
physical memory –
well….fit!
Allows program to run in
any location in physical
memory
 (called relocation)
 Really useful as you
might want to run
same program on lots
machines…
Virtual
Address
0
4
8
12
Physical
Address
A
B
C
D
Virtual Memory
0
4K
8K
12K
16K
20K
24K
28K
Physical
Main Memory
C
A
B
D
Disk
Logical program is in contiguous VA space; here, consists of 4 pages: A, B, C, D;
The physical location of the 3 pages – 3 are in main memory and 1 is located on
the disk
55:035 Computer Architecture and Organization
10
Cache terms vs. VM terms
So, some definitions/“analogies”


A “page” or “segment” of memory is analogous to a
“block” in a cache
A “page fault” or “address fault” is analogous to a
cache miss
so, if we go to main memory and our data
isn’t there, we need to get it from disk…
55:035 Computer Architecture and Organization
“real”/physical
memory
11
More definitions and cache comparisons

These are more definitions than analogies…

With VM, CPU produces “virtual addresses” that are
translated by a combination of HW/SW to “physical
addresses”

The “physical addresses” access main memory

The process described above is called “memory mapping” or
“address translation”
55:035 Computer Architecture and Organization
12
Cache VS. VM comparisons (1/2)
Parameter
First-level cache
Virtual memory
Block (page)
size
12-128 bytes
4096-65,536 bytes
Hit time
1-2 clock cycles
40-100 clock cycles
Miss penalty
(Access time)
(Transfer time)
8-100 clock cycles
(6-60 clock cycles)
(2-40 clock cycles)
700,000 – 6,000,000 clock cycles
(500,000 – 4,000,000 clock cycles)
(200,000 – 2,000,000 clock cycles)
Miss rate
0.5 – 10%
0.00001 – 0.001%
Data memory
size
0.016 – 1 MB
4MB – 4GB
55:035 Computer Architecture and Organization
13
Cache VS. VM comparisons (2/2)

Replacement policy:


Replacement on cache misses primarily controlled by
hardware
Replacement with VM (i.e. which page do I replace?)
usually controlled by OS


Because of bigger miss penalty, want to make the right
choice
Sizes:


Size of processor address determines size of VM
Cache size independent of processor address size
55:035 Computer Architecture and Organization
14
Virtual Memory
 Timing’s
tough with virtual memory:
 AMAT

= Tmem + (1-h) * Tdisk
= 100nS + (1-h) * 25,000,000nS
h
(hit rate) had to be incredibly (almost
unattainably) close to perfect to work
 so:
VM is a “cache” but an odd one.
55:035 Computer Architecture and Organization
15
Paging Hardware
Physical
Memory
How big is a page?
How big is the page table?
32
CPU
32
page offset
page
frame offset
page table
frame
55:035 Computer Architecture and Organization
16
Address Translation in a Paging System
Virtual Address
Page #
Offset
Frame #
Offset
Register
Page Table Ptr
Page Table
Offset
+
P#
Page
Frame
Frame #
Program
Paging
55:035 Computer Architecture and Organization
Main Memory
17
How big is a page table?
 Suppose
 32
bit architecture
 Page size 4 kilobytes
 Therefore:
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
Page Number 220
55:035 Computer Architecture and Organization
Offset 212
18
Test Yourself
A processor asks for the contents of virtual memory address
0x10020. The paging scheme in use breaks this into a VPN of
0x10 and an offset of 0x020.
PTR (a CPU register that holds the address of the page table)
has a value of 0x100 indicating that this process’s page table
starts at location 0x100.
The machine uses word addressing and the page table entries
are each one word long.
PTR
0x100
VPN
Memory Reference
0x010
55:035 Computer Architecture and Organization
OFFSET
0x020
19
Test Yourself
ADDR
0x00000
0x00100
0x00110
0x00120
0x00130
0x00145
0x10000
0x10020
0x22000
0x22020
0x45000
0x45020
CONTENTS
0x00000
0x00010
0x00022
0x00045
0x00078
0x00010
0x03333
0x04444
0x01111
0x02222
0x05555
0x06666
PTR 0x100
VPN
Memory Reference 0x010
•
1.
2.
3.
4.
5.
OFFSET
0x020
What is the physical address
calculated?
10020
22020
45000
45020
none of the above
55:035 Computer Architecture and Organization
20
Test Yourself
ADDR
0x00000
0x00100
0x00110
0x00120
0x00130
0x00145
0x10000
0x10020
0x22000
0x22020
0x45000
0x45020
CONTENTS
0x00000
0x00010
0x00022
0x00045
0x00078
0x00010
0x03333
0x04444
0x01111
0x02222
0x05555
0x06666
PTR 0x100
VPN
Memory Reference 0x010
OFFSET
0x020
• What is the physical address calculated?
• What is the contents of this address
returned to the processor?
• How many memory accesses in total
were required to obtain the contents of
the desired address?
55:035 Computer Architecture and Organization
21
Another Example
Logical memory
0
a
1
b
2
c
3
d
4
e
5
f
6
g
7
h
8
i
9
j
10
k
11
l
12
m
13
n
14
o
15
p
Page Table
00
01
10
11
01
01
0
1
2
3
5
6
1
2
101
110
001
010
110
01
55:035 Computer Architecture and Organization
Physical memory
0
1
2
3
4
i
5
j
6
k
7
l
8
m
9
n
10
o
11
p
12
13
14
15
16
17
18
19
20
a
21
b
22
c
23
d
24
e
25
f
26
g
27
h
28
29
30
31
22
Replacement policies
55:035 Computer Architecture and Organization
23
Block replacement

Which block should be replaced on a virtual
memory miss?


Again, we’ll stick with the strategy that it’s a good
thing to eliminate page faults
Therefore, we want to replace the LRU block



Many machines use a “use” or “reference” bit
Periodically reset
Gives the OS an estimation of which pages are referenced
55:035 Computer Architecture and Organization
24
Writing a block

What happens on a write?

We don’t even want to think about a write through
policy!


Time with accesses, VM, hard disk, etc. is so great that this
is not practical
Instead, a write back policy is used with a dirty bit to
tell if a block has been written
55:035 Computer Architecture and Organization
25
Mechanism vs. Policy
 Mechanism:
 paging
hardware
 trap on page fault
 Policy:
 fetch
policy: when should we bring in the pages of a
process?


1. load all pages at the start of the process
2. load only on demand: “demand paging”
 replacement
policy: which page should we evict given
a shortage of frames?
55:035 Computer Architecture and Organization
26
Replacement Policy
 Given
a full physical memory, which page should
we evict??
 What policy?
 Random
 FIFO:
First-in-first-out
 LRU: Least-Recently-Used
 MRU: Most-Recently-Used
 OPT: (will-not-be-used-farthest-in-future)
55:035 Computer Architecture and Organization
27
Replacement Policy Simulation
 example
0
sequence of page numbers
1 2 3 42 2 37 1 2 3
 FIFO?
 LRU?
 OPT?
 How
do you keep track of LRU info? (another
data structure question)
55:035 Computer Architecture and Organization
28
Page tables and lookups…
 1.
it’s slow! We’ve turned every access to
memory into two accesses to memory
 solution:
add a specialized “cache” called a “translation
lookaside buffer (TLB)” inside the processor
 2.
it’s still huge!
 even
worse: we’re ultimately going to have a page
table for every process. Suppose 1024 processes,
that’s 4GB of page tables!
55:035 Computer Architecture and Organization
29
Paging/VM (1/3)
Operating
System
CPU
Physical
Memory
356
42 356
Disk
page table
i
55:035 Computer Architecture and Organization
30
Paging/VM (2/3)
Physical
Memory
Operating
System
CPU
42 356
356
Disk
page table
i
Place page table in physical memory
However: this doubles the time per memory access!!
55:035 Computer Architecture and Organization
31
Paging/VM (3/3)
Operating
System
CPU
Physical
Memory
42 356
356
Disk
page table
Cache!
i
Special-purpose cache for translations
Historically called the TLB: Translation Lookaside Buffer
55:035 Computer Architecture and Organization
32
Translation Cache
Just like any other cache, the TLB can be organized as fully associative,
set associative, or direct mapped
TLBs are usually small, typically not more than 128 - 256 entries even on
high end machines. This permits fully associative lookup on these machines.
Most mid-range machines use small n-way set associative organizations.
Note: 128-256 entries times 4KB-16KB/entry is only 512KB-4MB
the L2 cache is often bigger than the “span” of the TLB.
VA
CPU
Translation
with a TLB
TLB
Lookup
miss
hit
PA
miss
Cache
Main
Memory
hit
Translation
data
55:035 Computer Architecture and Organization
33
Translation Cache
A way to speed up translation is to use a special cache of recently
used page table entries -- this has many names, but the most
frequently used is Translation Lookaside Buffer or TLB
Virtual Page #
Physical Frame #
Dirty Ref Valid Access
tag
Really just a cache (a special-purpose cache) on the page table mappings
TLB access time comparable to cache access time
(much less than main memory access time)
55:035 Computer Architecture and Organization
34
An example of a TLB
Page frame
address
<30>
1
Page
Offset
<13>
2
Read/write policies and permissions…
V R W Tag
<1> <2> <2> <30>
Phys. Addr.
<21>
(Low-order 13
bits of addr.)
...
<13>
…
4
…
32:1 Mux
3
<21>
55:035 Computer Architecture and Organization
(High-order 21
bits of addr.)
34-bit
physical
address
35
The “big picture” and TLBs

Address translation is usually on the critical path…



Even in the simplest cache, TLB values must be read
and compared
TLB is usually smaller and faster than the cacheaddress-tag memory


…which determines the clock cycle time of the mP
This way multiple TLB reads don’t increase the cache hit time
TLB accesses are usually pipelined b/c its so
important!
55:035 Computer Architecture and Organization
36
The “big picture” and TLBs
Virtual Address
TLB access
No
Yes
TLB Hit?
No
Try to read
from page
table
Write?
Try to read
from cache
Yes
Set in TLB
Page fault?
Yes
Replace
page from
disk
No
No
Cache hit?
Yes
Cache/buffer
memory write
TLB miss
stall
Cache miss
stall
Deliver data
to CPU
55:035 Computer Architecture and Organization
37
Pages are Cached in a Virtual Memory System


Can Ask the Same Four Questions we did about caches
Q1: Block Placement


choice: lower miss rates and complex placement or vice versa
 miss penalty is huge
 so choose low miss rate ==> place page anywhere in
physical memory
 similar to fully associative cache model
Q2: Block Addressing - use additional data structure

fixed size pages - use a page table
 virtual page number ==> physical page number and
concatenate offset
 tag bit to indicate presence in main memory
55:035 Computer Architecture and Organization
38
Normal Page Tables


Size is number of virtual pages
Purpose is to hold the translation of VPN to PPN



Permits ease of page relocation
Make sure to keep tags to indicate page is mapped
Potential problem:




Consider 32bit virtual address and 4k pages
4GB/4KB = 1MW required just for the page table!
Might have to page in the page table…
 Consider how the problem gets worse on 64bit machines
with even larger virtual address spaces!
Might have multi-level page tables
55:035 Computer Architecture and Organization
39
Inverted Page Tables



Similar to a set-associative mechanism
Make the page table reflect the # of physical pages (not virtual)
Use a hash mechanism



virtual page number ==> HPN index into inverted page table
Compare virtual page number with the tag to make sure it is the one you
want
if yes
 check to see that it is in memory - OK if yes - if not page fault

If not - miss
 go to full page table on disk to get new entry
 implies 2 disk accesses in the worst case
 trades increased worst case penalty for decrease in capacity
induced miss rate since there is now more room for real pages
with smaller page table
55:035 Computer Architecture and Organization
40
Inverted Page Table
Page
Offset
•Only store entries
for pages in physical
memory
Hash
Page Frame V
=
OK
Frame Offset
55:035 Computer Architecture and Organization
41
Address Translation Reality
The translation process using page tables takes
too long!
 Use a cache to hold recent translations


Translation Lookaside Buffer








Typically 8-1024 entries
Block size same as a page table entry (1 or 2 words)
Only holds translations for pages in memory
1 cycle hit time
Highly or fully associative
Miss rate < 1%
Miss goes to main memory (where the whole page table
lives)
Must be purged on a process switch
55:035 Computer Architecture and Organization
42
Back to the 4 Questions

Q3: Block Replacement (pages in physical
memory)

LRU is best


However, real LRU is expensive





So use it to minimize the horrible miss penalty
Page table contains a use tag
On access the use tag is set
OS checks them every so often, records what it sees, and
resets them all
On a miss, the OS decides who has been used the least
Basic strategy: Miss penalty is so huge, you can
spend a few OS cycles to help reduce the miss rate
55:035 Computer Architecture and Organization
43
Last Question

Q4: Write Policy

Always write-back




Due to the access time of the disk
So, you need to keep tags to show when pages are dirty and
need to be written back to disk when they’re swapped out.
Anything else is pretty silly
Remember – the disk is SLOW!
55:035 Computer Architecture and Organization
44
Page Sizes


An architectural choice
Large pages are good:




reduces page table size
amortizes the long disk access
if spatial locality is good then hit rate will improve
Large pages are bad:

more internal fragmentation





if everything is random each structure’s last page is only half full
Half of bigger is still bigger
if there are 3 structures per process: text, heap, and control stack
then 1.5 pages are wasted for each process
process start up time takes longer
 since at least 1 page of each type is required to prior to start
 transfer time penalty aspect is higher
55:035 Computer Architecture and Organization
45
More on TLBs

The TLB must be on chip



otherwise it is worthless
small TLB’s are worthless anyway
large TLB’s are expensive


high associativity is likely
==> Price of CPU’s is going up!

OK as long as performance goes up faster
55:035 Computer Architecture and Organization
46
Selecting a Page Size

Reasons for larger page size





Page table size is inversely proportional to the page size;
therefore memory saved
Fast cache hit time easy when cache size < page size (VA caches);
bigger page makes this feasible as cache size grows
Transferring larger pages to or from secondary storage, possibly over a
network, is more efficient
Number of TLB entries are restricted by clock cycle time, so a larger page
size maps more memory, thereby reducing TLB misses
Reasons for a smaller page size


Want to avoid internal fragmentation: don’t waste storage; data must be
contiguous within page
Quicker process start for small processes - don’t need to bring in more
memory than needed
55:035 Computer Architecture and Organization
47
Memory Protection

With multiprogramming, a computer is shared by several programs
or processes running concurrently



Need to provide protection
Need to allow sharing
Mechanisms for providing protection



Provide Base and Bound registers: Base ฃ Address ฃ Bound
Provide both user and supervisor (operating system) modes
Provide CPU state that the user can read, but cannot write
 Branch and bounds registers, user/supervisor bit, exception bits

Provide method to go from user to supervisor mode and vice versa
 system call : user to supervisor
 system return : supervisor to user

Provide permissions for each flag or segment in memory
55:035 Computer Architecture and Organization
48
Pitfall: Address space to small

One of the biggest mistakes than can be made when designing an
architecture is to devote to few bits to the address



address size limits the size of virtual memory
difficult to change since many components depend on it (e.g., PC, registers,
effective-address calculations)
As program size increases, larger and larger address sizes are
needed





8 bit: Intel 8080
16 bit: Intel 8086
24 bit: Intel 80286
32 bit: Intel 80386
64 bit: Intel Merced
(1975)
(1978)
(1982)
(1985)
(1998)
55:035 Computer Architecture and Organization
49
Virtual Memory Summary


Virtual memory (VM) allows main memory (DRAM) to act
like a cache for secondary storage (magnetic disk).
The large miss penalty of virtual memory leads to
different stategies from cache


Designed as




Fully associative, TB + PT, LRU, Write-back
paged: fixed size blocks
segmented: variable size blocks
hybrid: segmented paging or multiple page sizes
Avoid small address size
55:035 Computer Architecture and Organization
50
Summary 2: Typical Choices
Option
TLB
L1 Cache
L2 Cache
VM (page)
Block Size
4-8 bytes (1 PTE)
4-32 bytes
32-256 bytes
4k-16k bytes
Hit Time
1 cycle
1-2 cycles
6-15 cycles
10-100 cycles
Miss Penalty
10-30 cycles
8-66 cycles
30-200 cycles
700k-6M cycles
Local Miss Rate
.1 - 2%
.5 – 20%
13 - 15%
.00001 - 001%
Size
32B – 8KB
1 – 128 KB
256KB - 16MB
Backing Store
L1 Cache
L2 Cache
DRAM
Disks
Q1: Block
Placement
Fully or set
associative
DM
DM or SA
Fully associative
Q2: Block ID
Tag/block
Tag/block
Tag/block
Table
Q3: Block
Replacement
Random (not
last)
N.A. For DM
Random (if SA)
LRU/LFU
Q4: Writes
Flush on PTE
write
Through or
back
Write-back
Write-back
55:035 Computer Architecture and Organization
51