멀티코어 프로그래밍

Transcript 멀티코어 프로그래밍

Virtual Memory
Virtual Memory


Next in memory hierarchy
Motivations:


to remove programming burdens of a small, limited amount of main
memory
to allow efficient and safe sharing of memory among multiple programs
the amount of main memory vs.
the total amount of memory required by all the programmers
 a fraction of the total required memory is active at any time
 bring those active only to the physical memory
 each program uses its own address space
 VM implements the mapping of a program’s virtual address space to physical
addresses
Motivations
Use physical DRAM as a cache for the disk
 Address space of a process can exceed physical memory size
 Sum of address spaces of multiple processes can exceed physical
memory
Simplify memory management
 Multiple processes resident in main memory




Each process with its own address space

Allocate more memory to process as needed

Because they operate in different address spaces

Different sections of address spaces have different permissions.
Only “active” code and data is actually in memory
Provide virtually contiguous memory space
Provide protection
 One process can’t interfere with another



3
User process cannot access privileged information
Motivation 1: Caching
Use DRAM as a cache for the disk




Full address space is quite large:
 32-bit addresses: 4 billion (~4 x 109) bytes
 64-bit addresses: 16 quintillion (~16 x 1018) bytes
Disk storage is ~300X cheaper than DRAM storage
 160GB of DRAM: ~$32,000
 160GB of disk: ~$100
To access large amounts of data in a cost-effective manner, the bulk of
the data must be stored on disk
1GB: ~$200
160 GB: ~$100
4 MB: ~$500
SRAM
4
DRAM
DISK
DRAM vs. Disk
DRAM vs. disk is more extreme than SRAM vs. DRAM
 Access latencies:




Importance of exploiting spatial locality:



DRAM: ~10X slower than SRAM
Disk: ~100,000X slower than DRAM
Disk: First byte is ~100,000X slower than successive
DRAM: ~4X improvement for fast page-mode vs. regular accesses
Bottom line:

Design decisions made for DRAM caches driven by enormous cost of misses
1GB: 10X slower
160 GB: 100,000X slower
4 MB:
SRAM
5
DRAM
DISK
Cache for Disk
Design parameters
 Line (block) size?






Large, since disk better at transferring large blocks

High, to minimize miss rate

Using tables

Write back, since can’t afford to perform small writes to disk


Not to be used in the future
Based on LRU (Least Recently Used)
Associativity? (Block placement)
Block identification?
Write through or write back? (Write strategy)
Block replacement?
Impact of design decisions for DRAM cache (main memory)
 Miss rate: Extremely low. << 1%
 Hit time: Match DRAM performance
 Miss latency:Very high. ~20ms
 Tag storage overhead: Low, relative to block size

6
Locating an Object
DRAM Cache
 Each allocated page of virtual memory has entry in page table
 Mapping from virtual pages to physical pages
 Page table entry even if page not in memory



Specifies disk address

Page fault handler places a page from disk in DRAM
OS retrieves information
Page Table
“Main Memory”
Location
Data
Object Name
D:
0
0:
243
X
J:
On Disk
1:
17
•
•
•
105
X:
7
•
•
•
1
N-1:
If there is physical memory only
Examples
 Most Cray machines, early PCs, nearly all embedded systems, etc.
 Addresses generated by the CPU correspond directly to bytes in
physical memory

Memory
Physical
Addresses
0:
1:
CPU
N-1:
8
System with Virtual Memory
Examples
 Workstations, servers, modern PCs, etc.
 Address translation: Hardware converts virtual addresses to physical
addresses via OS-managed lookup table (page table)

0:
1:
Page Table
Virtual
Addresses
Physical
Addresses
0:
1:
CPU
P-1:
N-1:
Disk
9
Memory
Page Fault (Cache Miss)
What if an object is on disk rather than in memory?
 Page table entry indicates virtual addresses not in memory
 OS exception handler invoked to move data from disk into memory



Current process suspends, others can resume
OS has full control over placement, etc.
Before fault
Page Table
Virtual
Physical
Addresses
Addresses
CPU
Memory
Page Table
Virtual
Physical
Addresses
Addresses
CPU
Disk
10
After fault
Memory
Disk
Servicing a Page Fault



Processor signals controller
 Read block of length P starting
at disk address X and store
starting at memory address Y
Read occurs
 Direct Memory Address (DMA)
 Under control of I/O controller
I/O controller signals completion
 Interrupt processor
 OS resumes suspended process
(1) Initiate Block Read
Processor
Reg
(3) Read
Done
Cache
Memory-I/O Bus
(2) DMA
Transfer
Memory
I/O
Controller
Disk
11
Disk
Motivation 2: Management


Multiple processes can reside in physical memory
Simplifying linking
 Each process see the same basic format of its memory image.
kernel virtual memory
stack
%esp
Memory mapped region
for shared libraries
Linux/x86
process
memory
image
the “brk” ptr
runtime heap (via malloc)
0
12
memory invisible to
user code
uninitialized data (.bss)
initialized data (.data)
program text (.text)
forbidden
Virtual Address Space


Two processes share physical pages by mapping in page table
Separate Virtual Address Spaces (separate page table)
 Virtual and physical address spaces divided into equal-sized blocks


Blocks are called “pages” (both virtual and physical)
Each process has its own virtual address space

OS controls how virtual pages are assigned to physical memory
0
Virtual
Address
Space for
Process 1:
Address Translation
0
VP 1
VP 2
PP 2
...
N-1
PP 7
Virtual
Address
Space for
Process 2:
13
Physical
Address
Space
(DRAM)
0
VP 1
VP 2
PP 10
...
N-1
M-1
(e.g., read/only
library code)
Memory Management

Allocating, deallocating, and moving memory


Allocating contiguous chunks of memory


Can map contiguous range of virtual addresses to disjoint ranges of
physical addresses
Loading executable binaries



Can be done by manipulating page tables
Just fix page tables for processes
Data in the binaries are paged in on demand
Protection


14
Store protection information on page table entries
Usually checked by hardware
Motivation 3: Protection
Page table entry contains access rights information
 Hardware enforces this protection (trap into OS if violation occurs

Process j:
Process i:
Page Tables
15
SUP?
Read? Write?
Memory
Physical Addr
VP 0: No
Yes
No
PP 9
VP 1: No
Yes
Yes
PP 4
VP 2: Yes
No
No
XXXXXXX
SUP?
•
•
•
•
•
•
Read? Write?
•
•
•
Physical Addr
VP 0: No
Yes
Yes
PP 6
VP 1: Yes
Yes
No
PP 9
VP 2: No
No
No
XXXXXXX
•
•
•
•
•
•
0:
1:
•
•
•
N-1:
VM Address Translation



Page size = 212 = 4 KB
# Physical pages = 218 => 2 (12+18) = 1 GB Main Mem.
Virtual address space = 232 = 4 GB
16
VM Address Translation: Hit
Processor
a
virtual address
17
Hardware
Addr Trans
Mechanism
part of the
on-chip
memory
management
unit (MMU)
Main
Memory
a'
physical address
VM Address Translation: Miss
page fault
fault
handler
Processor
a
virtual address
18
Hardware
Addr Trans
Mechanism
part of the
on-chip
MMU

Main
Memory
Secondary
memory
a'
physical address
OS performs
this transfer
(only if miss)
Page Table
page table base register
VPN acts
as
table index
if valid=0
then page
not in memory
virtual address
n–1
p p–1
virtual page number (VPN) page offset
valid access physical page number (PPN)
m–1
p p–1
physical page number (PPN)page offset
physical address
19
0
0
Multi-Level Page Table


Given:
 4KB (212) page size
 32-bit address space
 4-byte PTE
Problem:
 Would need a 4 MB page table!


220 *4 bytes
Level 2
Tables
Level 1
Table
Common solution
 multi-level page tables
 e.g., 2-level table (P6)


20
Level 1 table: 1024 entries, each of
which points to a Level 2 page table.
Level 2 table: 1024 entries, each of
which points to a page
...
TLB

“Translation Lookaside Buffer” (TLB)
 Small hardware cache in MMU
 Maps virtual page numbers to physical page numbers
 Contains complete page table entries for small number of pages
hit
PA
VA
CPU
TLB
Lookup
miss
miss
Cache
hit
Translation
data
21
Main
Memory
Address Translation with TLB
n–1
p p–1
0
virtual page number page offset
valid
.
virtual address
tag physical page number
.
TLB
.
=
TLB hit
physical address
tag
index
valid tag
byte offset
data
Cache
=
cache hit
22
data
MIPS R2000 TLB
Virtual address
31 30 29
15 14 13 12 11 10 9 8
Virtual page number
3210
Page offset
20
Valid Dirty
12
Physical page number
Tag
TLB
TLB hit
20
Physical page number
Page offset
Physical address
Physical address tag
Cache index
14
16
Valid
Tag
Data
Cache
32
Cache hit
23
Data
Byte
offset
2
VM Summary

Programmer’s View
 Large “flat” address space


Process “owns” machine



Can allocate large blocks of contiguous addresses
Has private address space
Unaffected by behavior of other processes
System View
 Use virtual address space created by mapping to set of pages




Need not be contiguous
Allocated dynamically
Enforce protection during address translation
OS manages many processes simultaneously


Continually switching among processes
Especially when one must wait for resources

24
e.g. disk I/O to handle page faults

멀티코어 프로그래밍

Transcript 멀티코어 프로그래밍

Directory