1-new-memory-technol..

Download Report

Transcript 1-new-memory-technol..

Phase Change Memory
What to wear out today?
Chris Craik, Aapo Kyrola, Yoshihisa Abe
Memory Technologies
• Concerns
– Density
– Latency
– Energy
• Off Chip Technologies
– DRAM
• Moderately dense, but not very fast
– Flash
• Fairly dense, but near-disk slowness
Evaluation of Technologies
DRAM
NAND Flash
NOR Flash
Density
1
4
0.25
Read Latency
60ns
25,000ns
300ns
Write Speed
1000MB/s
2.4MB/s
0.5MB/s
Endurance
Eff. Infinite
10^4
10^4
Retention?
Refresh
10 Years
10 Years
Phase Change Memory
• Bit recorded in ‘Phase Change Material’
– SET to 1 by heating to crystallization point
– RESET to 0 by heating to melting point
– Resistance indicates state
Phase Change Memory
• Density
– 4x increase over DRAM
• Latency
– 4x increase over DRAM
• Energy
– No leakage
– Reads are worse(2x), writes much worse (40x)
• Wear out
– Limited number of writes (but better than Flash)
• Non-volatile
– data persists in memory
Evaluation of Technologies
DRAM
NAND Flash
NOR Flash
PCM
Density
1
4
0.25
2-4
Read Latency
60ns
25,000ns
300ns
200-300ns
Write Speed
1000MB/s
2.4MB/s
0.5MB/s
100MB/s
Endurance
Eff. Infinite
10^4
10^4
10^6 to 10^8
Retention?
Refresh
10 Years
10 Years
10 Years
Solutions to wearing & energy
• Partial writes = write only bits that
have changed
a) Caches keep track of written
bytes/words per cacheline (Lee
et. al)
•
storage overhead vs.
accuracy
b) When writing a row to memory,
first read old row and compare
=> write only modified bits (Zhou
et al.)
Writes cause thermal expansion / contraction that wears
the material and requires strong current. But contrary to
DRAM, PCM does not leak energy.
Most
written bits
redundant!
Solutions to wearing & energy (cont.)
•
•
Buffer organisation (Lee et al.)
– DRAM uses one row buffer (2048B)
– propose using up to 32 * 64B
narrow buffers, each with own
association
• capture coalescing writes:
temporal locality more
important than spatial locality
• find 4*512B most effective
• area-neutral
• also helps decrease latency
Small DRAM buffer for PCM (Qureshi
et al.)
– combine low latency of DRAM with
high capacity of PCM
– similarly use Flash cache for Disk
Solutions to wearing & energy
Spatial locality is now a
problem!
• Wear leveling (Zhou et al.)
– row shifting: even out writes among cells in a
row
• needs extra hardware
– segment swapping: even out between pages
• implemented in memory controller
PCM as On-chip Cache
• Hybrid on-chip cache architecture consisting of multiple memory
technologies
• PCM, SRAM, embedded DRAM (eDRAM), and Magnetic RAM (MRAM)
• PCM is slow compared to SRAM etc.
– But high density, non-volatility etc. help
• Use as complement to faster memory technologies
• As “slow” L2 cache, as L3 cache etc.
PCM
Cache Structure Example
• Use PCM as huge L3 cache
• SRAM and eDRAM both as L2
– Faster and smaller SRAM region
– Slower and larger eDRAM region
L3 SRAM
1MB
L2
SRAM
256K
B
Core
w/
L1
Same Footprint
L2 eDRAM (Slow: <4MB)
L2 SRAM (Fast:
256KB)
L3 PCM (32MB)
•
Compared to 3-level SRAM cache model:
• 18% improvement in instructions per cycle
• Comparable power consumption
• Despite additional layer of PCM and its large capacity
•
Various design possibilities
• PCM as “third” L2 cache etc.
Core
w/ L1
Summary
• PCM can be viable approach towards next-generation
memory architecture
– High density, non-volatility
– Various techniques to overcome shortcomings
• Short endurance, high-energy writes, latencies
– Could be used as main memory or in on-chip cache hierarchy
Questions
• How well do results obtained on benchmark
apps translate to real usage?
• Variance of endurance of memory cells?
– may some cells wear out very quickly?
• Possibilities of PCM non-volatility
instant wake-up from hibernation etc.