Computer Organization - Murray State University

Download Report

Transcript Computer Organization - Murray State University

Computer Organization
CSC 405
Multi-Level Memory
Memory Hierarchies
CPU
Registers
Cache
Main
Memory
Secondary
Storage
very fast
very fast
fast
very fast
SRAM
SRAM
DRAM
mag/optical
tiny
small
large
very large
128 byte-4kb
32kb-4Mb
4Mb-512Mb
100Mb-16Gb
The limitations of technology prevent memory from being at once cheap,
fast and large. The tact that memory requests are non-random provide the
opportunity to significantly enhance performance in a memory hierarchy.
Multi-Level Memory Organization
Multi-Level Memory are used to improve performance in computer systems. We want
to have transparent (no user/programmer management required) large high-speed
memory. Very large memory has historically been slower for a number of reasons.
The cost of memory increases with speed of access and the speed of access is a
function of the size of memory.
Set associative memory access is very fast for cache memory mainly because the size
of the cache is small compared to main memory (RAM).
As machine speeds increase we are quickly approaching a speed limit will will not
likely be able to break. This is the speed of light (3x108 M/s). Approximately how far
does light travel in a nanosecond (1x10-9)? This distance (~30 cm) is the popular notion
for the speed of electrons in a wire, but the drift velocity of electrons is only about 1/10
the speed of light. Therefore, we should not expect to obtain access times that would
require electrons to move farther than 3 cm in a nanosecond.
Currently, the best multi-level memory access times are ~2nsec for L1 cache (memory
on the CPU), ~4nsec for L2 cache, and ~10nsec for RAM (due to the system bus).*
*These numbers will be outdated by the time this page is published.
By now you should be working diligently on the following homework problem:
A 450MHz Pentium with 32KBytes L1 cache, 128MBytes RAM, and a 133MHz system bus runs a program with an
average working set size of 80KBytes. While in a working set the program has a 0.9997 probability that the next
memory request will be from this working set and a 0.9 probability that the next memory request will be the next
instruction/data value in memory (i.e. 10% of the time a request is from a random memory address in the working
set). (Note: when the program changes working sets, it will begin making memory requests from the new working
set with 0.9997 probability.)
Your task...
(1) Determine how much (if any) performance improvement could be achieved by adding a 256KByte L2 (access
speed= 450/2MHz) to the processor.
(2) Determine what size memory blocks should be moved between cache and RAM.
(3) Give an outline of a memory caching strategy that makes sense.
It is strongly recommended that you take the time to investigate the details of cache
memory, especially the operation of the cache controller chip-set. As a guide, try to
answer the following questions about the operation of the Pentium II/III and the related
cache controller.
When there is a cache miss, how many words of memory does the CPU need to be able to continue processing?
When the cache memory is replaced, how many words are transferred? (assume 4K).
What hardware component is responsible for the transfer of blocks of memory to cache?
Given the stats listed in the problem, what will happen to the hit-ratio while a new block of memory is being loaded
into cache?
Effective Latency
A useful performance parameter is the effective latency. If the needed word is
found in a level of the hierarchy, it is a hit; if a request must be sent to the next
lower level, the request is said to miss. If the latency Lhit is known in the case
of a hit and the latency in the case of a miss is Lmiss, the effective latency for
that level in the hierarcy can be determined from the hit ratio Hratio:
L eff  L hit H ratio  L miss 1  H ratio
Cache
L miss  20 ns
Main
Memory
L hit  2 ns
L hit  10 ns
H ratio  0 . 94
H ratio  0 . 998
L miss  20  s

Secondary
Storage
Bandwidth
Another measure of performance is bandwidth which is the rate at which
information can be transferred from the memory system. If R is the number
of requests that the memory can service simultaneously, then
BW 
R
L
For processors that can produce multiple memory requests, it is important
not only to reduce latency but also to increase bandwidth by designing a
memory system that is capable of servicing multiple requests simultaneously.
Memory hierarchies provide decreased average latency and reduced bandwidth requirements, whereas parallel or interleaved memories provide
higher bandwidth.
Components of Cache Memory
state tag
data
tag index offset
decode
The time
Cache
basic
contains
often
memory
can
that
unit
becomprises
asimplified
ofbank
must
redundant
construction
be
is two
busy
contentbycopies
direct
of
a semiconductor
servicing
of
memories:
addressable
mapping
portions
each
a the
of
request
and
the
memory
data
all
memory
address
is
memory
tags
called
location
must
system
space
the
and
be
is aatag
bank
which
the
compared
to
module
single
busy
ismemory,
wholly
concurrently
time.
location
or bank.
contained
as
Caches
shown.
in
. .Athe
. single
inhave
cache.
order
in
The
the
bank
much
main
address
to
This
achieve
simplification
can
memory.
shorter
ofservice
the
each
bank
low
cache
only
latency
is
busy
achieved
one
line
times
which
in at
request
than
data
is
the
the
cost
memory
do
point
at
main
of aaof
time.
lower
is
memory
cache
stored
hitmemory.
ratio.
banks.
in the tag
memory along with the state.
compare incoming & stored tags
and select data word
data word
hit/miss
Methods of Memory Allocation
Fixed Partition Allocation - each job is
allocated the same amount of memory.
Jobs larger than the allocated space
split into a number of segments or
overlays and brought into real memory
as needed during program execution.
Jobs smaller than the allocated space
leave portions of the memory unused.
Variable Partition Allocation - each job
is allocated exactly the amount of space
it needs to run to completion. There
could still be a limit in size forcing the
use of overlays for large jobs.
unused
memory
unused
memory
fixed
partitions
available
memory
variable
partitions
Memory Fragmentation
Although the use of variable-sized partitions improves memory utilization,
memory fragments can still develop as jobs come and go in a variable partition
memory system.
memory compaction - this is
the process of moving jobs in
memory together in order to
open contiguous blocks of
memory large enough to be
useful.
coalescing holes - scanning
memory in order to recogize
that adjacent blocks of
available memory are contiguous and to refine them as
single blocks.
compaction
coalescing
Job Placement Strategies
First Fit - place the job in the first available
memory location large enough to hold it.
new job
Best Fit - place the job in the available
memory block whose size is closest to the job
size
new job
Worst Fit - place the job in the available
memory block whose size is the most different
(largest).
new job
new job
Basic Concepts of Virtual Storage
In general, virtual memory management is the function of the OS that permits a
program which is larger than available memory to operate without additional
control or consideration of the user or application programmer.
virtual storage
real storage
memory block
allocated to process
secondary storage
The key to controlling virtual storage is disassociating the addresses referenced
in a running process form the addresses available in primary memory.
process in
secondary
storage
Pure Paging
An OS that supports only fixed blocksize virtual memory management is called a
pure paging system. A virtual address in a pure paging system is an ordered
pair (p,d) where p is the page number and d is the displacement within the page.
base address of
page map table
b
page #
displacement
p
d
virtual address
+
secondary
storage
address
p’
r
s
page
reference
bit
p’
page
frame #
page map table
p’
d
real address
How many entries are there
in the page map table?
Paging with Associative Mapping
b
virtual address
+
p
implemented
in hardware
d
associative map (AM)
p
try
these
first
p’
p’
page map table
(PMT)
use PMT
if miss
in AM
p’
d
real address
Segmentation
If an OS allocates arbitrary sized memory blocks to suit the needs of processes
we refer to this process as segmentation. In a virtual segmentation system, a
virtual address is an ordered pair (s,d) where s is the segment number and d is
the displacment within that segment. Only processes with their current segments
in primary memory may run.
virtual address
segment #
displacement
b
+
s
d
Since segment sizes are set
by the program size we
cannot simply transfer the
displacement address bits to
the low-order end of the real
addess. The address of the
start of the segment in real
memory (s’) must be added
to the displacement d.
s’
segment map table
+
real address
Pentium-II Memory-Management
With 32-bit addressing the Pentium processor has been equipped with sophisticated
memory management hardware and OS software similar to that provided in larger scale
computer systems.
The Pentium II includes hardware support for both paging and
segmentation. These approaches are selectable, allowing the
implementation of four differenent memory management schemes:
The virtual addresss is the same as the physical address. This is useful in
low-complexity, hight-performance controller applications.
Memory is viewed as a paged linear address space. This is favored by some
operating system such as Berkeley UNIX.
Memory is viewed as a collection of locial address spaces, which
quarantees that the segment map table is in-cache with the segment.
Segmentation is used to define logical memory partitions, and paging is
used to manage memory allocation within segments.
Pentium II Segmentation
2 bit
protection
b
+
virtual address
segment #
14 bits
displacement
32 bits
Segmentation increases addressable
memory from 4Gbytes (232) up to
64Terabytes (246).
Virtual address space is divided into two
parts. Half of virtual memory is global
and half is local and distict for each
processor.
Each segment is protected by privelege
levels and an access attribute. Four levels
of privilege are possible from 00=most
protected to 11=least protected.
s’
segment map table
+
real address
The access attribute regulates access to data segments by giving read-write or read-only access.
For program segments, access is limited to read-execute or read-only.
Pentium II Paging
In the Pentium II, the paging mechanism is a two-level table lookup operation. The first
level is a page directory, containing up to 1K entries which partitions the 4Gbyte memory
space into 1K page groups (4Mbytes in size), each with its own page table. Each PMT
contains up to 1024 entries corresponding to 4Kbyte pages.
base address of
directory
b
pmt #
+
page #
10 bits
secondary
storage
address
p’
r
page
reference
bit
s
p’
page
frame #
page map table
displacement
10 bits
12 bits
virtual address
p’
d
real address
A translation-lookaside buffer
holds up to 32 page table entries
giving faster access to the most
recently used addresses.