Transcript 멀티코어 프로그래밍
Virtual Memory Virtual Memory Next in memory hierarchy Motivations: to remove programming burdens of a small, limited amount of main memory to allow efficient and safe sharing of memory among multiple programs the amount of main memory vs. the total amount of memory required by all the programmers a fraction of the total required memory is active at any time bring those active only to the physical memory each program uses its own address space VM implements the mapping of a program’s virtual address space to physical addresses Motivations Use physical DRAM as a cache for the disk Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify memory management Multiple processes resident in main memory Each process with its own address space Allocate more memory to process as needed Because they operate in different address spaces Different sections of address spaces have different permissions. Only “active” code and data is actually in memory Provide virtually contiguous memory space Provide protection One process can’t interfere with another 3 User process cannot access privileged information Motivation 1: Caching Use DRAM as a cache for the disk Full address space is quite large: 32-bit addresses: 4 billion (~4 x 109) bytes 64-bit addresses: 16 quintillion (~16 x 1018) bytes Disk storage is ~300X cheaper than DRAM storage 160GB of DRAM: ~$32,000 160GB of disk: ~$100 To access large amounts of data in a cost-effective manner, the bulk of the data must be stored on disk 1GB: ~$200 160 GB: ~$100 4 MB: ~$500 SRAM 4 DRAM DISK DRAM vs. Disk DRAM vs. disk is more extreme than SRAM vs. DRAM Access latencies: Importance of exploiting spatial locality: DRAM: ~10X slower than SRAM Disk: ~100,000X slower than DRAM Disk: First byte is ~100,000X slower than successive DRAM: ~4X improvement for fast page-mode vs. regular accesses Bottom line: Design decisions made for DRAM caches driven by enormous cost of misses 1GB: 10X slower 160 GB: 100,000X slower 4 MB: SRAM 5 DRAM DISK Cache for Disk Design parameters Line (block) size? Large, since disk better at transferring large blocks High, to minimize miss rate Using tables Write back, since can’t afford to perform small writes to disk Not to be used in the future Based on LRU (Least Recently Used) Associativity? (Block placement) Block identification? Write through or write back? (Write strategy) Block replacement? Impact of design decisions for DRAM cache (main memory) Miss rate: Extremely low. << 1% Hit time: Match DRAM performance Miss latency:Very high. ~20ms Tag storage overhead: Low, relative to block size 6 Locating an Object DRAM Cache Each allocated page of virtual memory has entry in page table Mapping from virtual pages to physical pages Page table entry even if page not in memory Specifies disk address Page fault handler places a page from disk in DRAM OS retrieves information Page Table “Main Memory” Location Data Object Name D: 0 0: 243 X J: On Disk 1: 17 • • • 105 X: 7 • • • 1 N-1: If there is physical memory only Examples Most Cray machines, early PCs, nearly all embedded systems, etc. Addresses generated by the CPU correspond directly to bytes in physical memory Memory Physical Addresses 0: 1: CPU N-1: 8 System with Virtual Memory Examples Workstations, servers, modern PCs, etc. Address translation: Hardware converts virtual addresses to physical addresses via OS-managed lookup table (page table) 0: 1: Page Table Virtual Addresses Physical Addresses 0: 1: CPU P-1: N-1: Disk 9 Memory Page Fault (Cache Miss) What if an object is on disk rather than in memory? Page table entry indicates virtual addresses not in memory OS exception handler invoked to move data from disk into memory Current process suspends, others can resume OS has full control over placement, etc. Before fault Page Table Virtual Physical Addresses Addresses CPU Memory Page Table Virtual Physical Addresses Addresses CPU Disk 10 After fault Memory Disk Servicing a Page Fault Processor signals controller Read block of length P starting at disk address X and store starting at memory address Y Read occurs Direct Memory Address (DMA) Under control of I/O controller I/O controller signals completion Interrupt processor OS resumes suspended process (1) Initiate Block Read Processor Reg (3) Read Done Cache Memory-I/O Bus (2) DMA Transfer Memory I/O Controller Disk 11 Disk Motivation 2: Management Multiple processes can reside in physical memory Simplifying linking Each process see the same basic format of its memory image. kernel virtual memory stack %esp Memory mapped region for shared libraries Linux/x86 process memory image the “brk” ptr runtime heap (via malloc) 0 12 memory invisible to user code uninitialized data (.bss) initialized data (.data) program text (.text) forbidden Virtual Address Space Two processes share physical pages by mapping in page table Separate Virtual Address Spaces (separate page table) Virtual and physical address spaces divided into equal-sized blocks Blocks are called “pages” (both virtual and physical) Each process has its own virtual address space OS controls how virtual pages are assigned to physical memory 0 Virtual Address Space for Process 1: Address Translation 0 VP 1 VP 2 PP 2 ... N-1 PP 7 Virtual Address Space for Process 2: 13 Physical Address Space (DRAM) 0 VP 1 VP 2 PP 10 ... N-1 M-1 (e.g., read/only library code) Memory Management Allocating, deallocating, and moving memory Allocating contiguous chunks of memory Can map contiguous range of virtual addresses to disjoint ranges of physical addresses Loading executable binaries Can be done by manipulating page tables Just fix page tables for processes Data in the binaries are paged in on demand Protection 14 Store protection information on page table entries Usually checked by hardware Motivation 3: Protection Page table entry contains access rights information Hardware enforces this protection (trap into OS if violation occurs Process j: Process i: Page Tables 15 SUP? Read? Write? Memory Physical Addr VP 0: No Yes No PP 9 VP 1: No Yes Yes PP 4 VP 2: Yes No No XXXXXXX SUP? • • • • • • Read? Write? • • • Physical Addr VP 0: No Yes Yes PP 6 VP 1: Yes Yes No PP 9 VP 2: No No No XXXXXXX • • • • • • 0: 1: • • • N-1: VM Address Translation Page size = 212 = 4 KB # Physical pages = 218 => 2 (12+18) = 1 GB Main Mem. Virtual address space = 232 = 4 GB 16 VM Address Translation: Hit Processor a virtual address 17 Hardware Addr Trans Mechanism part of the on-chip memory management unit (MMU) Main Memory a' physical address VM Address Translation: Miss page fault fault handler Processor a virtual address 18 Hardware Addr Trans Mechanism part of the on-chip MMU Main Memory Secondary memory a' physical address OS performs this transfer (only if miss) Page Table page table base register VPN acts as table index if valid=0 then page not in memory virtual address n–1 p p–1 virtual page number (VPN) page offset valid access physical page number (PPN) m–1 p p–1 physical page number (PPN)page offset physical address 19 0 0 Multi-Level Page Table Given: 4KB (212) page size 32-bit address space 4-byte PTE Problem: Would need a 4 MB page table! 220 *4 bytes Level 2 Tables Level 1 Table Common solution multi-level page tables e.g., 2-level table (P6) 20 Level 1 table: 1024 entries, each of which points to a Level 2 page table. Level 2 table: 1024 entries, each of which points to a page ... TLB “Translation Lookaside Buffer” (TLB) Small hardware cache in MMU Maps virtual page numbers to physical page numbers Contains complete page table entries for small number of pages hit PA VA CPU TLB Lookup miss miss Cache hit Translation data 21 Main Memory Address Translation with TLB n–1 p p–1 0 virtual page number page offset valid . virtual address tag physical page number . TLB . = TLB hit physical address tag index valid tag byte offset data Cache = cache hit 22 data MIPS R2000 TLB Virtual address 31 30 29 15 14 13 12 11 10 9 8 Virtual page number 3210 Page offset 20 Valid Dirty 12 Physical page number Tag TLB TLB hit 20 Physical page number Page offset Physical address Physical address tag Cache index 14 16 Valid Tag Data Cache 32 Cache hit 23 Data Byte offset 2 VM Summary Programmer’s View Large “flat” address space Process “owns” machine Can allocate large blocks of contiguous addresses Has private address space Unaffected by behavior of other processes System View Use virtual address space created by mapping to set of pages Need not be contiguous Allocated dynamically Enforce protection during address translation OS manages many processes simultaneously Continually switching among processes Especially when one must wait for resources 24 e.g. disk I/O to handle page faults