Transcript 멀티코어 프로그래밍
Virtual Memory
Virtual Memory
Next in memory hierarchy
Motivations:
to remove programming burdens of a small, limited amount of main
memory
to allow efficient and safe sharing of memory among multiple programs
the amount of main memory vs.
the total amount of memory required by all the programmers
a fraction of the total required memory is active at any time
bring those active only to the physical memory
each program uses its own address space
VM implements the mapping of a program’s virtual address space to physical
addresses
Motivations
Use physical DRAM as a cache for the disk
Address space of a process can exceed physical memory size
Sum of address spaces of multiple processes can exceed physical
memory
Simplify memory management
Multiple processes resident in main memory
Each process with its own address space
Allocate more memory to process as needed
Because they operate in different address spaces
Different sections of address spaces have different permissions.
Only “active” code and data is actually in memory
Provide virtually contiguous memory space
Provide protection
One process can’t interfere with another
3
User process cannot access privileged information
Motivation 1: Caching
Use DRAM as a cache for the disk
Full address space is quite large:
32-bit addresses: 4 billion (~4 x 109) bytes
64-bit addresses: 16 quintillion (~16 x 1018) bytes
Disk storage is ~300X cheaper than DRAM storage
160GB of DRAM: ~$32,000
160GB of disk: ~$100
To access large amounts of data in a cost-effective manner, the bulk of
the data must be stored on disk
1GB: ~$200
160 GB: ~$100
4 MB: ~$500
SRAM
4
DRAM
DISK
DRAM vs. Disk
DRAM vs. disk is more extreme than SRAM vs. DRAM
Access latencies:
Importance of exploiting spatial locality:
DRAM: ~10X slower than SRAM
Disk: ~100,000X slower than DRAM
Disk: First byte is ~100,000X slower than successive
DRAM: ~4X improvement for fast page-mode vs. regular accesses
Bottom line:
Design decisions made for DRAM caches driven by enormous cost of misses
1GB: 10X slower
160 GB: 100,000X slower
4 MB:
SRAM
5
DRAM
DISK
Cache for Disk
Design parameters
Line (block) size?
Large, since disk better at transferring large blocks
High, to minimize miss rate
Using tables
Write back, since can’t afford to perform small writes to disk
Not to be used in the future
Based on LRU (Least Recently Used)
Associativity? (Block placement)
Block identification?
Write through or write back? (Write strategy)
Block replacement?
Impact of design decisions for DRAM cache (main memory)
Miss rate: Extremely low. << 1%
Hit time: Match DRAM performance
Miss latency:Very high. ~20ms
Tag storage overhead: Low, relative to block size
6
Locating an Object
DRAM Cache
Each allocated page of virtual memory has entry in page table
Mapping from virtual pages to physical pages
Page table entry even if page not in memory
Specifies disk address
Page fault handler places a page from disk in DRAM
OS retrieves information
Page Table
“Main Memory”
Location
Data
Object Name
D:
0
0:
243
X
J:
On Disk
1:
17
•
•
•
105
X:
7
•
•
•
1
N-1:
If there is physical memory only
Examples
Most Cray machines, early PCs, nearly all embedded systems, etc.
Addresses generated by the CPU correspond directly to bytes in
physical memory
Memory
Physical
Addresses
0:
1:
CPU
N-1:
8
System with Virtual Memory
Examples
Workstations, servers, modern PCs, etc.
Address translation: Hardware converts virtual addresses to physical
addresses via OS-managed lookup table (page table)
0:
1:
Page Table
Virtual
Addresses
Physical
Addresses
0:
1:
CPU
P-1:
N-1:
Disk
9
Memory
Page Fault (Cache Miss)
What if an object is on disk rather than in memory?
Page table entry indicates virtual addresses not in memory
OS exception handler invoked to move data from disk into memory
Current process suspends, others can resume
OS has full control over placement, etc.
Before fault
Page Table
Virtual
Physical
Addresses
Addresses
CPU
Memory
Page Table
Virtual
Physical
Addresses
Addresses
CPU
Disk
10
After fault
Memory
Disk
Servicing a Page Fault
Processor signals controller
Read block of length P starting
at disk address X and store
starting at memory address Y
Read occurs
Direct Memory Address (DMA)
Under control of I/O controller
I/O controller signals completion
Interrupt processor
OS resumes suspended process
(1) Initiate Block Read
Processor
Reg
(3) Read
Done
Cache
Memory-I/O Bus
(2) DMA
Transfer
Memory
I/O
Controller
Disk
11
Disk
Motivation 2: Management
Multiple processes can reside in physical memory
Simplifying linking
Each process see the same basic format of its memory image.
kernel virtual memory
stack
%esp
Memory mapped region
for shared libraries
Linux/x86
process
memory
image
the “brk” ptr
runtime heap (via malloc)
0
12
memory invisible to
user code
uninitialized data (.bss)
initialized data (.data)
program text (.text)
forbidden
Virtual Address Space
Two processes share physical pages by mapping in page table
Separate Virtual Address Spaces (separate page table)
Virtual and physical address spaces divided into equal-sized blocks
Blocks are called “pages” (both virtual and physical)
Each process has its own virtual address space
OS controls how virtual pages are assigned to physical memory
0
Virtual
Address
Space for
Process 1:
Address Translation
0
VP 1
VP 2
PP 2
...
N-1
PP 7
Virtual
Address
Space for
Process 2:
13
Physical
Address
Space
(DRAM)
0
VP 1
VP 2
PP 10
...
N-1
M-1
(e.g., read/only
library code)
Memory Management
Allocating, deallocating, and moving memory
Allocating contiguous chunks of memory
Can map contiguous range of virtual addresses to disjoint ranges of
physical addresses
Loading executable binaries
Can be done by manipulating page tables
Just fix page tables for processes
Data in the binaries are paged in on demand
Protection
14
Store protection information on page table entries
Usually checked by hardware
Motivation 3: Protection
Page table entry contains access rights information
Hardware enforces this protection (trap into OS if violation occurs
Process j:
Process i:
Page Tables
15
SUP?
Read? Write?
Memory
Physical Addr
VP 0: No
Yes
No
PP 9
VP 1: No
Yes
Yes
PP 4
VP 2: Yes
No
No
XXXXXXX
SUP?
•
•
•
•
•
•
Read? Write?
•
•
•
Physical Addr
VP 0: No
Yes
Yes
PP 6
VP 1: Yes
Yes
No
PP 9
VP 2: No
No
No
XXXXXXX
•
•
•
•
•
•
0:
1:
•
•
•
N-1:
VM Address Translation
Page size = 212 = 4 KB
# Physical pages = 218 => 2 (12+18) = 1 GB Main Mem.
Virtual address space = 232 = 4 GB
16
VM Address Translation: Hit
Processor
a
virtual address
17
Hardware
Addr Trans
Mechanism
part of the
on-chip
memory
management
unit (MMU)
Main
Memory
a'
physical address
VM Address Translation: Miss
page fault
fault
handler
Processor
a
virtual address
18
Hardware
Addr Trans
Mechanism
part of the
on-chip
MMU
Main
Memory
Secondary
memory
a'
physical address
OS performs
this transfer
(only if miss)
Page Table
page table base register
VPN acts
as
table index
if valid=0
then page
not in memory
virtual address
n–1
p p–1
virtual page number (VPN) page offset
valid access physical page number (PPN)
m–1
p p–1
physical page number (PPN)page offset
physical address
19
0
0
Multi-Level Page Table
Given:
4KB (212) page size
32-bit address space
4-byte PTE
Problem:
Would need a 4 MB page table!
220 *4 bytes
Level 2
Tables
Level 1
Table
Common solution
multi-level page tables
e.g., 2-level table (P6)
20
Level 1 table: 1024 entries, each of
which points to a Level 2 page table.
Level 2 table: 1024 entries, each of
which points to a page
...
TLB
“Translation Lookaside Buffer” (TLB)
Small hardware cache in MMU
Maps virtual page numbers to physical page numbers
Contains complete page table entries for small number of pages
hit
PA
VA
CPU
TLB
Lookup
miss
miss
Cache
hit
Translation
data
21
Main
Memory
Address Translation with TLB
n–1
p p–1
0
virtual page number page offset
valid
.
virtual address
tag physical page number
.
TLB
.
=
TLB hit
physical address
tag
index
valid tag
byte offset
data
Cache
=
cache hit
22
data
MIPS R2000 TLB
Virtual address
31 30 29
15 14 13 12 11 10 9 8
Virtual page number
3210
Page offset
20
Valid Dirty
12
Physical page number
Tag
TLB
TLB hit
20
Physical page number
Page offset
Physical address
Physical address tag
Cache index
14
16
Valid
Tag
Data
Cache
32
Cache hit
23
Data
Byte
offset
2
VM Summary
Programmer’s View
Large “flat” address space
Process “owns” machine
Can allocate large blocks of contiguous addresses
Has private address space
Unaffected by behavior of other processes
System View
Use virtual address space created by mapping to set of pages
Need not be contiguous
Allocated dynamically
Enforce protection during address translation
OS manages many processes simultaneously
Continually switching among processes
Especially when one must wait for resources
24
e.g. disk I/O to handle page faults