CS740 Virtual Memory October 2, 2000 Topics • • • • Motivations for VM Address Translation Accelerating with TLBs Alpha 21X64 memory system.

Download Report

Transcript CS740 Virtual Memory October 2, 2000 Topics • • • • Motivations for VM Address Translation Accelerating with TLBs Alpha 21X64 memory system.

CS740 Virtual Memory October 2, 2000

Topics

Motivations for VMAddress TranslationAccelerating with TLBsAlpha 21X64 memory system

Motivation 1: DRAM a “Cache” for Disk

The full address space is quite large:

32-bit addresses: ~4,000,000,000 (4 billion) bytes64-bit addresses: ~16,000,000,000,000,000,000 (16 quintillion) bytes

Disk storage is ~30X cheaper than DRAM storage

8 GB of DRAM: ~ $12,0008 GB of disk: ~ $400

To access large amounts of data in a cost-effective manner, the bulk of the data must be stored on disk 256 MB: ~$400 8 GB: ~$400 4 MB: ~$400 SRAM DRAM Disk

– 2 –

Levels in Memory Hierarchy

cache virtual memory

CPU regs

8 B

C a c h e

32 B

Memory

size: speed: $/Mbyte: block size:

Register

200 B 3 ns 8 B

Cache

32 KB / 4MB 6 ns $100/MB 32 B larger, slower, cheaper

Memory

128 MB 60 ns $1.50/MB 8 KB 8 KB

disk Disk Memory

20 GB 8 ms $0.05/MB – 3 –

DRAM vs. SRAM as a “Cache”

DRAM vs. disk is more extreme than SRAM vs. DRAM

access latencies: – DRAM is ~10X slower than SRAM – disk is ~100,000X slower than DRAM • importance of exploiting spatial locality: – first byte is ~100,000X slower than successive bytes on disk » vs. ~4X improvement for page-mode vs. regular accesses to DRAM • “cache” size: – main memory is ~100X larger than an SRAM cache • addressing for disk is based on sector address, not memory address

SRAM DRAM Disk

– 4 –

Impact of These Properties on Design

If DRAM was to be organized similar to an SRAM cache, how would we set the following design parameters?

Line size?Associativity?Replacement policy (if associative)?Write through or write back?

What would the impact of these choices be on:

miss ratehit timemiss latencytag overhead – 5 –

Locating an Object in a “Cache”

1. Search for matching tag

SRAM cache

Object Name

X

= X?

0: 1: N-1: “Cache” Tag D X •• J Data 243 17 •• 105 2. Use indirection to look up actual object location

virtual memory

Lookup Table “Cache”

Object Name

X D: J: X: Location 0 N-1 • 1 0: 1: N-1: Data 243 17 •• 105

– 6 –

A System with Physical Memory Only

Examples:

most Cray machines, early PCs, nearly all embedded systems, etc.

Memory 0: 1: Store 0x10 CPU Load 0xf0 N-1:

CPU’s load or store addresses used directly to access memory.

– 7 –

A System with Virtual Memory

Examples:

workstations, servers, modern PCs, etc.

Virtual Addresses

Store 0x10 Page Table 0: 1:

Physical Addresses

0: 1: Memory CPU Load 0xf0 P-1: N-1: Disk

Address Translation: the hardware converts virtual addresses into physical addresses via an OS-managed lookup table (page table) – 8 –

Page Faults (Similar to “Cache Misses”)

What if an object is on disk rather than in memory?

Page table entry indicates that the virtual address is not in memoryAn OS trap handler is invoked, moving data from disk into memory – current process suspends, others can resume – OS has full control over placement, etc.

Page Table 0: 1: Memory

Virtual Addresses

0: 1:

Physical Addresses

Load 0x05 CPU Store 0xf8 P-1: N-1: Disk

– 9 –

Servicing a Page Fault

Processor Signals Controller

Read block of length P

starting at disk address X and store starting at memory address Y Read Occurs

Direct Memory AccessUnder control of I/O

controller I / O Controller Signals Completion

Interrupt processorCan resume suspended

process

Processor

Reg

Cache (2) DMA Transfer Memory (1) Initiate Block Read (3) Read Done Memory-I/O bus I/O controller disk k – 10 –

Motivation #2: Memory Management

Multiple processes can reside in physical memory.

How do we resolve address conflicts?

(Virtual) Memory Image for Alpha Process Reserved 0000 03FF 8000 0000 Not yet allocated $gp 0000 0001 2000 0000 $sp Dynamic Data Static Data Text (Code) Stack

e.g., what if two different Alpha processes access their stacks at address

0x11fffff80

at the same time?

0000 0000 0001 0000 Not yet allocated Reserved

– 11 –

Soln: Separate Virtual Addr. Spaces

Virtual and physical address spaces divided into equal-sized blocks – “Pages” (both virtual and physical) • Each process has its own virtual address space – operating system controls how virtual pages as assigned to physical memory Process 1: Virtual Addresses 0

VP 1 VP 2 Address Translation

Physical Addresses 0

PP 2

N-1 0

PP 7 (Read-only library code)

Process 2:

VP 1 VP 2 PP 10

N-1 M-1 – 12 –

Contrast: Macintosh Memory Model

Does not use traditional virtual memory

P1 Pointer Table Shared Address Space Process P1

A “Handles”

Process P2 P2 Pointer Table

B C D E All program objects accessed through “handles”

indirect reference through pointer tableobjects stored in shared global address space – 13 –

Macintosh Memory Management

Allocation / Deallocation

Similar to free-list management of malloc/free

Compaction

Can move any object and just update the (unique) pointer in

pointer table

P1 Pointer Table Shared Address Space Process P1

B A “Handles”

P2 Pointer Table

C

Process P2

D E

– 14 –

Mac vs. VM-Based Memory Mgmt

Allocating, deallocating, and moving memory:

can be accomplished by both techniques

Block sizes:

Mac: variable-sized – may be very small or very large • VM: fixed-size – size is equal to one page (8KB in Alpha)

Allocating contiguous chunks of memory:

Mac: contiguous allocation is requiredVM: can map contiguous range of virtual addresses to disjoint

ranges of physical addresses Protection?

Mac: “wild write” by one process can corrupt another’s data – 15 –

Motivation #3: Protection

Page table entry contains access rights information

hardware enforces this protection (trap into OS if violation occurs)

Page Tables Memory Process i: VP 0: Read? Write?

Yes No VP 1: Yes Yes Physical Addr PP 9 PP 4 0: 1: VP 2: No • No • XXXXXXX • Process j: VP 0: Read? Write?

Yes Yes VP 1: VP 2: Yes No • No No • Physical Addr PP 6 PP 9 XXXXXXX •

– 16 –

N-1:

VM Address Translation

V = {0, 1, . . . , N–1} virtual address space P = {0, 1, . . . , M–1} physical address space MAP: V  P U {  } address mapping function N > M MAP(a) = a' if data at virtual address a is present at physical address a' in P =  if data at virtual address a is not present in P Processor a

missing item fault

fault handler Addr Trans Mechanism  a' physical address Main Memory Secondary memory OS performs this transfer (only if miss) – 17 –

VM Address Translation

Parameters

P = 2

p = page size (bytes). Typically 1KB–16KB

N = 2

n

M = 2

m = Virtual address limit = Physical address limit

n–1 p p–1 0 virtual page number page offset

virtual address

address translation m–1 physical page number p p–1 page offset 0

physical address

Notice that the page offset bits don't change as a result of translation – 18 –

Virtual Page Number

Page Tables

Valid

1 1 0 1 0 1 1 1 1 0

Page Table (physical page or disk address) Physical Memory Disk Storage

– 19 –

Address Translation via Page Table

page table base register VPN acts as table index n–1 valid access

virtual address

p p–1 virtual page number page offset physical page number 0

Address

if valid=0 then page not in memory m–1 physical page number p p–1 page offset

physical address

0 – 20 –

Page Table Operation

Translation

Separate (set of) page table(s) per processVPN forms index into page table

Computing Physical Address

Page Table Entry (PTE) provides information about page – if (Valid bit = 1) then page in memory.

» Use physical page number (PPN) to construct address – if (Valid bit = 0) then page in secondary memory » Page fault » Must load into main memory before continuing

Checking Protection

Access rights field indicate allowable access – e.g., read-only, read-write, execute-only – typically support multiple protection modes (e.g., kernel vs. user) • Protection violation fault if don’t have necessary permission – 21 –

CPU

Integrating VM and Cache

VA PA miss Trans lation Cache Main Memory hit data

Most Caches “Physically Addressed”

Accessed by physical addressesAllows multiple processes to have blocks in cache at same timeAllows multiple processes to share pagesCache doesn’t need to be concerned with protection issues – Access rights checked as part of address translation

Perform Address Translation Before Cache Lookup

But this could involve a memory access itselfOf course, page table entries can also become cached – 22 –

Speeding up Translation with a TLB

“Translation Lookaside Buffer” (TLB)

Small, usually fully associative cacheMaps virtual page numbers to physical page numbersContains complete page table entries for small number of pages CPU VA TLB Lookup miss hit PA Trans lation hit Cache data miss Main Memory – 23 –

Address Translation with a TLB

process ID N–1 p p–1 virtual page number page offset 0

virtual address

TLB hit valid dirty valid valid valid valid tag physical page number = tag valid tag

physical address

index data byte offset

TLB Cache

= cache hit – 24 – data

Alpha AXP 21064 TLB

– 25 – • page size: • placement:

8KB

• hit time: • miss penalty: 20 clocks • TLB size:

1 clock ITLB 8 PTEs, DTLB 32 PTEs Fully associative

TLB-Process Interactions

TLB Translates Virtual Addresses

But virtual address space changes each time have context switch

Could Flush TLB

Every time perform context switchRefill for new process by series of TLB misses~100 clock cycles each

Could Include Process ID Tag with TLB Entry

Identifies which address space being accessedOK even when sharing physical pages – 26 –

Virtually-Indexed Cache

CPU VA Index Data TLB Lookup PA Tag Cache = Hit

Cache Index Determined from Virtual Address

Can begin cache and TLB index at same time

Cache Physically Addressed

Cache tag indicates physical addressCompare with TLB result to see if match – Only then is it considered a hit – 27 –

Generating Index from Virtual Address

0 31 12 11 virtual page number 29 physical page number 12 11 page offset page offset 0 Index

Size cache so that index is determined by page offset

Can increase associativity to allow larger cacheE.g., early PowerPC’s had 32KB cache – 8-way associative, 4KB page size

Page Coloring

Index • Make sure lower k bits of VPN match those of PPNPage replacement becomes set associativeNumber of sets = 2

k

– 28 –

Alpha Physical Addresses

Model

210642116421264

Bits Max. Size 34 40 44 16GB 1TB 16TB Why a 1TB (or More) Address Space?

At $1.00 / MB, would cost $1 million for 1 TBWould require 131,072 memory chips, each with 256 MegabitsCurrent uniprocessor models limited to 2 GB – 29 –

Massively-Parallel Machines

Example: Cray T3E

Up to 2048 Alpha 21164 processorsUp to 2 GB memory / processor8 TB physical address space!

M Interconnection Network M M M Logical Structure P P P

Many processors sharing large global address spaceAny processor can reference any physical address • • •

P

VM system manages allocation, sharing, and protection among

processors Physical Structure

Memory distributed over many processor modulesMessages routed over interconnection network to perform remote

reads & writes

– 30 –

Alpha Virtual Addresses

Page Size

Currently 8KB

Page Tables

Each table fits in single pagePage Table Entry 8 bytes – 4 bytes: physical page number – Other bytes: for valid bit, access information, etc.

8K page can have 1024 PTEs

Alpha Virtual Address

Based on 3-level paging structure level 1 level 2 level 3 page offset 10 10 10 13 • Each level indexes into page tableAllows 43-bit virtual address when have 8KB page size – 31 –

Alpha Page Table Structure

Level 2 Page Tables • • • Level 1 Page Table • • • • • • • • • • • •

Tree Structure

Node degree

1024

Depth = 3

Nice Features

No need to enforce contiguous

page layout

Dynamically grow tree as

memory needs increase

– 32 – • • • Level 3 Page Tables • • • Physical Pages

Mapping an Alpha 21064 Virtual Address

10 bits 13 bits PTE size: 8 Bytes PT size: 1024 PTEs 13 bits 21 bits – 33 –

Virtual Address Ranges

Binary Address Segment Purpose 1…1 11 xxxx…xxx seg1 Kernel accessible virtual addresses

– Information maintained by OS but not to be accessed by user

1…1 10 xxxx…xxx kseg Kernel accessible physical addresses

– No address translation performed – Used by OS to indicate physical addresses

0…0 0x xxxx…xxx seg0 User accessible virtual addresses

– Only part accessible by user program

Address Patterns

Must have high order bits all 0’s or all 1’s – Currently 64–43 = 21 wasted bits in each virtual address • Prevents programmers from sticking in extra information – Could lead to problems when want to expand virtual address space in future – 34 –

Alpha Seg0 Memory Layout

3FF FFFF FFFF 3FF 8000 0000 001 4000 0000 001 2000 0000 $sp 000 0001 0000

– 35 –

Reserved (shared libraries) Not yet allocated Dynamic Data Static Data Not used Text (Code) Stack Not yet allocated Reserved Regions

Data – Static space for global variables » Allocation determined at compile time » Access via allocation » E.g., using

$gp

– Dynamic space for runtime

malloc

Text – Stores machine code for program • Stack – Implements runtime stack – Access via

$sp

Reserved – Used by operating system » shared libraries, process info, etc.

Alpha Seg0 Memory Allocation

Address Range

User code can access memory

locations in range 0x0000000000010000 to 0x000003FFFFFFFFFF

Nearly 2 42  4.3980465 X10 12

byte range

In practice, programs access far

fewer Dynamic Memory Allocation

Virtual memory system only allocates

blocks of memory as needed

As stack reaches lower addresses,

add to lower allocation

As break moves toward higher

addresses, add to upper allocation

– Due to calls to

malloc

,

calloc

, etc.

Current Minimum break $sp $sp

– 36 –

Gap Shared Libraries (Read Only) Dynamic Data Static Data Gap Text (Code) Stack Region

Minimal Page Table Configuration

0000 03FF 8000 1FFF User-Accessible Pages

VP4: Shared Library – Read only to prevent undesirable interprocess interactions – Near top of Seg0 address space • VP3: Data – Both static & dynamic – Grows upward from virtual address

0x140000000

VP2: Text – Read only to prevent corrupting code • VP1: Stack – Grows downward from virtual address

0x120000000 VP4 VP3 VP2 VP1 0000 03FF 8000 0000 0000 0001 4000 1FFF 0000 0001 4000 0000 0000 0001 2000 1FFF 0000 0001 2000 0000 0000 0001 1FFF FFFF 0000 0001 1FFF E000

– 37 –

Partitioning Addresses

Address 0x001 2000 0000 0000 0000 0001 0010 0000 0000 0000 0000 0000 0000 0000 0000000000 100100000 0000000000 0000000000000

Level 1: 0

Level 2: 576 Level 3: 0 Address 0x001 4000 0000 0000 0000 0001 0100 0000 0000 0000 0000 0000 0000 0000 0000000000 101000000 0000000000 0000000000000

Level 1: 0

Level 2: 640 Level 3: 0 Address 0x3FF 8000 0000 0011 1111 1111 1000 0000 0000 0000 0000 0000 0000 0000 0111111111 110000000 0000000000 0000000000000

Level 1: 511

Level 2: 768 Level 3: 0

– 38 –

511 0

Mapping Minimal Configuration

0000 03FF 8000 1FFF VP4 0000 03FF 8000 0000 768 0 0000 0001 4000 1FFF VP3 0 0000 0001 4000 0000 0000 0001 2000 1FFF 640 576 0 1023 VP2 0000 0001 2000 0000 0000 0001 1FFF FFFF VP1 0000 0001 1FFF E000

– 39 –

Increasing Heap Allocation

Without More Page Tables

Could allocate 1023 additional

pages

Would give ~8MB heap space

Adding Page Tables

Must add new page table with

each additional 8MB increment Maxiumum Allocation

Our Alphas limit user to 1GB

data segment

Limit stack to 32MB

1023 • • • • • • VP3 0000 0001 407F FFFF 0000 0001 4000 2000 0000 0001 4000 1FFF 0000 0001 4000 0000

– 40 –

Expanding Alpha Address Space

Increase Page Size

Increasing page size 2X increases virtual address space 16X – 1 bit page offset, 1 bit for each level index level 1 level 2 level 3 page offset 10+k 10+k 10+k 13+k

Physical Memory Limits

Cannot be larger than kseg VA bits –2 ≥ PA bits • Cannot be larger than 32 + page offset bits – Since PTE only has 32 bits for PPN

Configurations

Page Size

8K 16K 32K

VA Size

43 47 51

PA Size

41 45 47 64K 55 48

– 41 –

Main Theme

Programmer’s View

Large “flat” address space – Can allocate large blocks of contiguous addresses • Process “owns” machine – Has private address space – Unaffected by behavior of other processes

System View

User virtual address space created by mapping to set of pages – Need not be contiguous – Allocated dynamically – Enforce protection during address translation • OS manages many processes simultaneously – Continually switching among processes – Especially when one must wait for resource » E.g., disk I/O to handle page fault – 42 –