Transcript Outline

Introduction to Computer Organization and Architecture

Lecture 9 By Juthawut Chantharamalee http://dusithost.dusit.ac.th/~juthawut_ cha/home.htm

Outline

 Virtual Memory  Basics  Address Translation  Cache vs VM  Paging  Replacement  TLBs  Segmentation  Page Tables Introduction to Computer Organization and Architecture 2

The Full Memory Hierarchy

Capacity Access Time Cost CPU Registers

100s Bytes <10s ns

Cache

K Bytes 10-100 ns 1-0.1 cents/bit Registers

Instr. Operands

Cache

Staging Xfer Unit

prog./compiler 1-8 bytes cache cntl 8-128 bytes

Blocks

Main Memory

M Bytes 200ns- 500ns $.0001-.00001 cents /bit

Disk

G Bytes, 10 ms (10,000,000 ns) -5 -6 10 - 10 cents/bit

Tape

infinite sec-min 10 -8 Memory Disk Tape

Pages Files

OS 4K-16K bytes user/operator Mbytes

Introduction to Computer Organization and Architecture

Upper Level

faster Larger

Lower Level

3

Virtual Memory

 Some facts of computer life…  Computers run lots of processes simultaneously  No full address space of memory for each process  Must share smaller amounts of physical memory among many processes  Virtual memory is the answer!

 Divides physical memory into blocks, assigns them to different processes Introduction to Computer Organization and Architecture 4

Virtual Memory

 Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk).

 VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk .

Compiler assigns data to a “virtual” address.

VA translated to a real/physical somewhere in memory… (allows any program to run anywhere; where is determined by a particular machine, OS)

Introduction to Computer Organization and Architecture 5

VM Benefit

 VM provides the following benefits  Allows multiple programs to share the same physical memory  Allows programmers to write code as though they have a very large amount of main memory  Automatically handles bringing in data from disk Introduction to Computer Organization and Architecture 6

Virtual Memory Basics

 Programs reference “virtual” addresses in a non-existent memory  These are then translated into real “physical” addresses  Virtual address space may be bigger than physical address space  Divide physical memory into blocks, called pages  Anywhere from 512 to 16MB (4k typical)  Virtual-to-physical translation by indexed table lookup  Add another cache for recent translations (the TLB)  Invisible to the programmer  Looks to your application like you have a lot of memory! Introduction to Computer Organization and Architecture 7

VM: Page Mapping

Process 1 ’ s Virtual Address Space Page Frames Process 2 ’ s Virtual Address Space Physical Memory Introduction to Computer Organization and Architecture Disk 8

VM: Address Translation

20 bits Virtual page number 12 bits Page offset Per-process page table Page Table base Log 2 of pagesize Valid bit Protection bits Dirty bt Reference bit Physical page number Page offset To physical memory Introduction to Computer Organization and Architecture 9

Example of virtual memory

  Relieves problem of making a program that was too large to fit in physical memory – well….fit!

Allows program to run in any location in physical memory  (called relocation)  Really useful as you might want to run same program on lots machines…

Virtual Address 0 4 8 12 A B C D Virtual Memory Physical Address 0 4K 8K 12K 16K 20K 24K 28K C A B D Physical Main Memory Disk Logical program is in contiguous VA space; here, consists of 4 pages: A, B, C, D; The physical location of the 3 pages – 3 are in main memory and 1 is located on the disk

Introduction to Computer Organization and Architecture 10

Cache terms vs. VM terms

So, some definitions/“analogies”  A “

page

” or “

segment

” of memory is analogous to a “ block ” in a cache  A “

page fault

” or “

address fault

” is analogous to a cache miss

so, if we go to main memory and our data isn’t there, we need to get it from disk… “real”/physical memory

Introduction to Computer Organization and Architecture 11

More definitions and cache comparisons

 These are more definitions than analogies…  With VM, CPU produces “

virtual addresses

” that are translated by a combination of HW/SW to “

physical addresses

”  The “

physical addresses

” access main memory  The process described above is called “

memory mapping

” or “

address translation

” Introduction to Computer Organization and Architecture 12

Cache VS. VM comparisons (1/2)

Parameter Block (page) size First-level cache 12-128 bytes Virtual memory 4096-65,536 bytes Hit time 1-2 clock cycles Miss penalty (Access time) (Transfer time) 8-100 clock cycles (6-60 clock cycles) (2-40 clock cycles) 40-100 clock cycles 700,000 – 6,000,000 clock cycles (500,000 – 4,000,000 clock cycles) (200,000 – 2,000,000 clock cycles) Miss rate Data memory size 0.5 – 10% 0.016 – 1 MB 0.00001 – 0.001% 4MB – 4GB

Introduction to Computer Organization and Architecture 13

Cache VS. VM comparisons (2/2)

 Replacement policy:  Replacement on cache misses primarily controlled by hardware  Replacement with VM (i.e. which page do I replace?) usually controlled by OS  Because of bigger miss penalty, want to make the right choice  Sizes:  Size of processor address determines size of VM  Cache size independent of processor address size Introduction to Computer Organization and Architecture 14

Virtual Memory

 Timing’s tough with virtual memory:  AMAT = T mem + (1-h) * T disk  = 100nS + (1-h) * 25,000,000nS  h (hit rate) had to be

incredibly

(almost unattainably) close to perfect to work  so: VM is a “cache” but an odd one.

Introduction to Computer Organization and Architecture 15

Paging Hardware

How big is a page?

How big is the page table?

CPU

32

page offset frame offset

32

Physical Memory page page table frame Introduction to Computer Organization and Architecture 16

Address Translation in a Paging System

Virtual Address Page # Offset Frame # Offset Register Page Table Ptr Page Table Offset + P# Frame # Page Frame Program Paging

Introduction to Computer Organization and Architecture

Main Memory

17

How big is a page table?

 Suppose  32 bit architecture  Page size 4

kilobytes

Therefore:

0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Page Number 2

20

Introduction to Computer Organization and Architecture Offset 2

12

18

Test Yourself

A processor asks for the contents of virtual memory address 0x10020. The paging scheme in use breaks this into a VPN of 0x10 and an offset of 0x020.

PTR (a CPU register that holds the address of the page table) has a value of 0x100 indicating that this process’s page table starts at location 0x100.

The machine uses word addressing and the page table entries are each one word long.

PTR 0x100 Memory Reference VPN 0x010 OFFSET 0x020

Introduction to Computer Organization and Architecture 19

Test Yourself

ADDR CONTENTS 0x00000 0x00100 0x00110 0x00000 0x00010 0x00022 0x00120 0x00130 0x00145 0x00045 0x00078 0x00010 0x10000 0x10020 0x22000 0x22020 0x45000 0x45020 0x03333 0x04444 0x01111 0x02222 0x05555 0x06666

PTR 0x100 Memory Reference VPN OFFSET 0x010 0x020

• 1.

2.

3.

4.

5.

What is the physical address calculated?

10020 22020 45000 45020 none of the above Introduction to Computer Organization and Architecture 20

Test Yourself

ADDR CONTENTS 0x00000 0x00000 0x00100 0x00110 0x00120 0x00010 0x00022 0x00045 0x00130 0x00145 0x10000 0x10020 0x22000 0x22020 0x45000 0x45020 0x00078 0x00010 0x03333 0x04444 0x01111 0x02222 0x05555 0x06666

PTR 0x100 Memory Reference VPN OFFSET 0x010 0x020

• What is the physical address calculated?

• What is the contents of this address returned to the processor?

• How many memory accesses in total were required to obtain the contents of the desired address?

Introduction to Computer Organization and Architecture 21

Another Example

7 8 9 10 11 12 13 14 15 2 3 4 Logical memory 0 a 1 b 5 6 c d e f g h i j k l m n o p

01 00 01 10 11 01 Page Table 0 1 2 3 5 6 1 2 101 110 001 010 110 01 Introduction to Computer Organization and Architecture

12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Physical memory 0 1 2 3 4 i 5 j 6 7 k l 8 9 10 11 m n o p a b c d e f g h

22

Replacement policies

Introduction to Computer Organization and Architecture 23

Block replacement

 Which block should be replaced on a virtual memory miss?

 Again, we’ll stick with the strategy that it’s a good thing to eliminate page faults  Therefore, we want to replace the LRU block  Many machines use a “use” or “reference” bit  Periodically reset  Gives the OS an estimation of which pages are referenced Introduction to Computer Organization and Architecture 24

Writing a block

 What happens on a write?

 We don’t even want to think about a write through policy!

 Time with accesses, VM, hard disk, etc. is so great that this is not practical  Instead, a write back policy is used with a dirty bit to tell if a block has been written Introduction to Computer Organization and Architecture 25

Mechanism vs. Policy

 Mechanism:  paging hardware  trap on page fault  Policy:  fetch policy : when should we bring in the pages of a process?

 1. load all pages at the start of the process  2. load only on demand: “demand paging”  replacement policy : which page should we evict given a shortage of frames?

Introduction to Computer Organization and Architecture 26

Replacement Policy

 Given a full physical memory, which page should we evict??

 What policy?

 Random  FIFO: First-in-first-out  LRU: Least-Recently-Used  MRU: Most-Recently-Used  OPT: (will-not-be-used-farthest-in-future) Introduction to Computer Organization and Architecture 27

Replacement Policy Simulation

 example sequence of page numbers  0 1 2 3 42 2 37 1 2 3  FIFO?

 LRU?

 OPT?

 How do you keep track of LRU info? (another data structure question) Introduction to Computer Organization and Architecture 28

Page tables and lookups…

 1. it’s slow! We’ve turned every access to memory into two accesses to memory  solution: add a specialized “cache” called a “translation lookaside buffer (TLB)” inside the processor  2. it’s still huge!

 even worse: we’re ultimately going to have a page table for every

process

. Suppose 1024 processes, that’s 4GB of page tables!

Introduction to Computer Organization and Architecture 29

Paging/VM (1/3)

Operating System CPU 42 356 356 Physical Memory page table i

Disk

Introduction to Computer Organization and Architecture 30

Paging/VM (2/3)

Operating System CPU 42 356 356 Physical Memory page table i

Disk

Place page table in physical memory However: this doubles the time per memory access!!

Introduction to Computer Organization and Architecture 31

Paging/VM (3/3)

Operating System CPU 42 356 356 Physical Memory page table i

Cache!

Disk

Special-purpose cache for translations Historically called the TLB: Translation Lookaside Buffer

Introduction to Computer Organization and Architecture 32

Translation Cache

Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations.

Note: 128-256 entries times 4KB-16KB/entry is only 512KB-4MB the L2 cache is often bigger than the “span” of the TLB.

CPU

Translation with a TLB

VA TLB Lookup miss Trans lation Cache hit miss data

Introduction to Computer Organization and Architecture

Main Memory

33

Translation Cache

A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB Virtual Page # Physical Frame # Dirty Ref Valid Access tag Really just a cache (a special-purpose cache) on the page table mappings TLB access time comparable to cache access time (much less than main memory access time)

Introduction to Computer Organization and Architecture 34

An example of a TLB

Page frame address <30> Page Offset <13> Read/write policies and permissions…

1 2

V R W <1> <2> <2> Tag <30> Phys. Addr.

<21> (Low-order 13 bits of addr.) <13>

... … 3 4 …

32:1 Mux <21> (High-order 21 bits of addr.) 34-bit physical address

Introduction to Computer Organization and Architecture 35

The “big picture” and TLBs

 Address translation is usually on the critical path…  …which determines the clock cycle time of the m P  Even in the simplest cache, TLB values must be read and compared  TLB is usually smaller and faster than the cache address-tag memory  This way multiple TLB reads don’t increase the cache hit time  TLB accesses are usually pipelined b/c its so important!

Introduction to Computer Organization and Architecture 36

The “big picture” and TLBs

Virtual Address TLB access No Yes TLB Hit?

Try to read from page table No Write?

Try to read from cache Page fault?

Yes Replace page from disk No TLB miss stall No Cache miss stall Cache hit?

Yes Deliver data to CPU

Introduction to Computer Organization and Architecture

Yes Set in TLB Cache/buffer memory write

37

Pages are Cached in a Virtual Memory System

 Can Ask the Same Four Questions we did about caches  Q1: Block Placement  choice: lower miss rates and complex placement or vice versa  miss penalty is huge  so choose low miss rate ==> place page anywhere in physical memory  similar to fully associative cache model  Q2: Block Addressing - use additional data structure  fixed size pages - use a page table  virtual page number ==> physical page number and concatenate offset  tag bit to indicate presence in main memory Introduction to Computer Organization and Architecture 38

Normal Page Tables

 Size is number of virtual pages  Purpose is to hold the translation of VPN to PPN   Permits ease of page relocation Make sure to keep tags to indicate page is mapped  Potential problem:     Consider 32bit virtual address and 4k pages 4GB/4KB = 1MW required just for the page table!

Might have to page in the page table…  Consider how the problem gets worse on 64bit machines with even larger virtual address spaces!

Might have multi-level page tables Introduction to Computer Organization and Architecture 39

Inverted Page Tables

   Similar to a set-associative mechanism Make the page table reflect the # of physical pages (not virtual) Use a hash mechanism  virtual page number ==> HPN index into inverted page table    Compare virtual page number with the tag to make sure it is the one you want if yes  check to see that it is in memory - OK if yes - if not page fault If not - miss    go to full page table on disk to get new entry implies 2 disk accesses in the worst case trades increased worst case penalty for decrease in capacity induced miss rate since there is now more room for real pages with smaller page table Introduction to Computer Organization and Architecture 40

Inverted Page Table

Page Offset Hash Page Frame V •Only store entries for pages in physical memory = OK Frame Offset Introduction to Computer Organization and Architecture 41

Address Translation Reality

 The translation process using page tables

takes too long!

 Use a cache to hold recent translations  Translation Lookaside Buffer         Typically 8-1024 entries Block size same as a page table entry (1 or 2 words) Only holds translations for pages in memory 1 cycle hit time Highly or fully associative Miss rate < 1% Miss goes to main memory (where the whole page table lives) Must be purged on a process switch Introduction to Computer Organization and Architecture 42

Back to the 4 Questions

 Q3: Block Replacement (pages in physical memory)    LRU is best  So use it to minimize the horrible miss penalty However, real LRU is expensive     Page table contains a use tag On access the use tag is set OS checks them every so often, records what it sees, and resets them all On a miss, the OS decides who has been used the least Basic strategy: Miss penalty is so huge, you can spend a few OS cycles to help reduce the miss rate Introduction to Computer Organization and Architecture 43

Last Question

 Q4: Write Policy  Always write-back  Due to the access time of the disk  So, you need to keep tags to show when pages are dirty and need to be written back to disk when they’re swapped out.   Anything else is pretty silly Remember – the disk is

SLOW!

Introduction to Computer Organization and Architecture 44

Page Sizes

   An architectural choice Large pages are good:    reduces page table size amortizes the long disk access if spatial locality is good then hit rate will improve Large pages are bad:  more internal fragmentation  if everything is random each structure’s last page is only half full    Half of bigger is still bigger if there are 3 structures per process: text, heap, and control stack then 1.5 pages are wasted for each process  process start up time takes longer   since at least 1 page of each type is required to prior to start transfer time penalty aspect is higher Introduction to Computer Organization and Architecture 45

More on TLBs

 The TLB must be on chip    otherwise it is worthless small TLB’s are worthless anyway large TLB’s are expensive  high associativity is likely  ==> Price of CPU’s is going up!

 OK as long as performance goes up faster Introduction to Computer Organization and Architecture 46

Selecting a Page Size

  Reasons for larger page size  Page table size is inversely proportional to the page size; therefore memory saved   Fast cache hit time easy when cache size < page size (VA caches); bigger page makes this feasible as cache size grows Transferring larger pages to or from secondary storage, possibly over a network, is more efficient  Number of TLB entries are restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses Reasons for a smaller page size   Want to avoid internal fragmentation: don’t waste storage; data must be contiguous within page Quicker process start for small processes memory than needed don’t need to bring in more Introduction to Computer Organization and Architecture 47

Memory Protection

  With multiprogramming , a computer is shared by several programs or processes running concurrently  Need to provide protection  Need to allow sharing Mechanisms for providing protection  Provide Base and Bound registers: Base ฃ Address ฃ Bound     Provide both user and supervisor (operating system) modes Provide CPU state that the user can read, but cannot write  Branch and bounds registers, user/supervisor bit, exception bits Provide method to go from user to supervisor mode and vice versa  system call : user to supervisor  system return : supervisor to user Provide permissions for each flag or segment in memory Introduction to Computer Organization and Architecture 48

Pitfall: Address space to small

  One of the biggest mistakes than can be made when designing an architecture is to devote to few bits to the address  address size limits the size of virtual memory  difficult to change since many components depend on it (e.g., PC, registers, effective-address calculations) As program size increases, larger and larger address sizes are needed   8 bit: Intel 8080 (1975) 16 bit: Intel 8086 (1978)   24 bit: Intel 80286 (1982) 32 bit: Intel 80386 (1985)  64 bit: Intel Merced (1998) Introduction to Computer Organization and Architecture 49

Virtual Memory Summary

 Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk).

 The large miss penalty of virtual memory leads to different stategies from cache  Fully associative, TB + PT, LRU, Write-back  Designed as  paged: fixed size blocks  segmented: variable size blocks  hybrid: segmented paging or multiple page sizes  Avoid small address size Introduction to Computer Organization and Architecture 50

Summary 2: Typical Choices

Option Block Size Hit Time Miss Penalty Local Miss Rate Size Backing Store Q1: Block Placement Q2: Block ID Q3: Block Replacement Q4: Writes TLB 4-8 bytes (1 PTE) 1 cycle 10-30 cycles .1 - 2% 32B – 8KB L1 Cache Fully or set associative Tag/block Random (not last) Flush on PTE write L1 Cache 4-32 bytes 1-2 cycles 8-66 cycles .5 – 20% 1 – 128 KB L2 Cache DM Tag/block N.A. For DM Through or back L2 Cache 32-256 bytes 6-15 cycles 30-200 cycles 13 - 15% 256KB - 16MB DRAM DM or SA Tag/block Random (if SA) Write-back

Introduction to Computer Organization and Architecture

VM (page) 4k-16k bytes 10-100 cycles 700k-6M cycles .00001 - 001% Disks Fully associative Table LRU/LFU Write-back

51

The End Lecture 9