Cache Memory Ben Aranguren Contents Memory Hierarchy Why Cache? Principle of Locality Cache Mapping Replacement Strategies.
Download ReportTranscript Cache Memory Ben Aranguren Contents Memory Hierarchy Why Cache? Principle of Locality Cache Mapping Replacement Strategies.
Cache Memory
Ben Aranguren
Contents
Memory Hierarchy Why Cache?
Principle of Locality Cache Mapping Replacement Strategies
Memory Hierarchy
Memory is organized in levels Fastest and most expensive near CPU but are small Slowest and least expensive farthest from CPU but are big Conflicting goals
Memory Hierarchy
Why Cache?
Memory acces is slow Memory is often the bottleneck when executing code CPU is faster than reading from memory Cache is the solution Cache is fast but small but can hold most commonly accessed locations Cache is placed both physically closer and logically closer to the CPU
Principle of Locality
Typically 90% of execution time is spent in just 10% of the code Temporal – reference the same memory location again Spatial – memory near a recently referenced location is more likely to be referenced than one that is farther away Temporal – attributed to iteration or reusing subroutines in programs Spatial – data and program code tend to be stored in contiguous blocks.
Taking Advantage of Locality
Remember cache is faster than memory If data we need is in cache then there is no need to retrieve from memory
Cache Mapping
To access 32 bytes from memory use A31-5 + 00000 ... A31-5 + 11111 A31-5 is called
tag
A4-0 is
offset
Data in this block of address is called
cache line
Slot consists of:
Valid, Dirty bit
tag bits
cache line
Cache Mapping
Valid bit if set means that it is being used Dirty bit is set means data has changed since it was loaded from memory 2 methods of writing data back to memory: Write-through as soon as data changes in cache, write it to memory Write-back if slot needs be freed and dirty bit is set, write data to memory first before clearing this slot
Cache Mapping
Cache mapping is needed since there are more main memory blocks than there are cache slots Direct Mapping Associative Mapping Set-Associative Mapping
Direct Mapped
Similar to assigned parking Consider a parking lot with 1000 spaces (000 999) Use the first 3 numbers of your SSN to find your parking space.
Cache line mapping A31-0: Algorithm: find_slot() if tag == address_tag_bit: return data[OFFSET]
Direct Mapped
Advantages: Easy to implement Cost Direct comparison Disadvantages: Memory block can only be placed in one cache slot. Could be a problem if locations referenced are at both ends Only a fraction of cache memory is used Collision
Associative Mapped
Parking lot analogy: SJSU parking There are more permits than parking spaces Anyboy can park anywhere (not assigned) Cache line mapping A31-0: Algorithm: for tag in tags: if tag == address_tag_bits: return data[offset] else: get_from_memory()
Associative Mapped
Advantages Make use of all cache slots Disadvantages Expensive to implement Requires additional hardware to do comparison
Set-Associative Mapped
The compromise Parking lot analogy: Instead of numbering parking spaces 000-999, divide it into 10 groups and marked each group 00-99. Use first 2 numbers of your student ID Cache slots are grouped into sets Cache line mapping:
Set-Associative Mapped
Algorithm: set = find_set() for slot in set: if tag == address_tag_bit: return data[offset] else: get_from_memory()
Set-Associative Mapped
Advantages Benefits of Direct mapping and associated mapping Comparison is only performed within a set Disadvantages Additional hardware Most commonly used in today's microprocessors
Replacement Strategies
LRU – Least Recently Used pick the slot the has not been used in the longest time LFU – Least Frequently Used pick the slot that has not been accessed frequently FIFO – First In First Out pick the oldest slot Random randomly decide which slot to evict All strategies require additional hardware source: http://www.cs.umd.edu/class/spring2003/cmsc311/Notes/Memory