Slides - Carnegie Mellon University
Download
Report
Transcript Slides - Carnegie Mellon University
Error Patterns in MLC NAND Flash Memory:
Measurement, Characterization, and Analysis
Yu Cai1, Erich F. Haratsch2 , Onur Mutlu1 and Ken Mai1
1. DSSC, ECE Department, Carnegie Mellon University
2. LSI Corporation
03/14/2012
Evolution of NAND Flash Memory
CMOS scaling
More bits per Cell
Seaung Suk Lee, “Emerging Challenges in NAND Flash Technology”, Flash Summit 2011 (Hynix)
Flash memory widening its range of applications
Portable consumer devices, laptop PCs and enterprise servers
2
Reliability and Endurance Challenges for
NAND Flash Memories
Endurance continues to deteriorate
Only a few thousand reliable P/E cycles of NAND Flash memory
Error correction capability requirements of ECC keep increasing
Big gap between MLC flash endurance and storage reliability requirements
Enterprise storage needs >50k P/E cycles
3
Future NAND Flash Storage Architecture
Noisy
Memory
Signal
Processing
Raw Bit
Error Rate
• Read voltage adjusting
• Data scrambler
• Data recovery
• Soft-information estimation
Error
Correction
• Hamming codes
• BCH codes
• Reed-Solomon codes
• LDPC codes
• Other Flash friendly codes
Need to understand NAND flash error patterns
4
BER < 10-15
Test System Infrastructure
Algorithms
Wear Leveling
Address Mapping
Garbage Collection
ECC
(BCH, RS, LDPC)
Signal Processing
1.
2.
3.
4.
Control
Firmware
Reset
Erase block
Program page
Read page
Software Platform
USB Driver
USB
PHYChip
FPGA
USB controller
NAND
Controller
Flash
Memories
Host USB PHY
Host Computer
USB Daughter Board
5
Mother Board
Flash Board
NAND Flash Testing Platform
USB Daughter Board
USB Jack
HAPS-52 Mother Board
Virtex-V FPGA
(NAND Controller)
Virtex-II Pro
(USB controller)
3x-nm
NAND Flash
NAND Daughter Board
6
NAND Flash Usage and Error Model
Erase Errors
Program Errors
Start
P/E cycle 0
…
P/E cycle i
…
P/E cycle n
Erase
Block
Program
Page
(Page0 - Page128)
Read Errors
Retention Errors
Retention1
Read
Page
(t1 days)
…
Retention Errors
Retention j
(tj days)
End of life
7
Read Errors
Read
Page
Testing Methodology
Erase errors
Count the number of cells that fail to be erased to “11” state
Program interference errors
Compare the data immediately after page programming and the data
after the whole block being programmed
Read errors
Continuously read a given block and compare the data between
consecutive read sequences
Retention errors
Compare the data read before retention and after retention
Characterize short term retention errors under room temperature
Characterize long term retention errors by baking in the oven under 125℃
8
Flash Error Rates Comparison
retention errors
Error rate increases with P/E cycles
Retention errors are the most dominant errors
Retention error rates increase as retention time increase
9
Retention Error Mechanism
LSB/MSB
Stress Induced Leakage Current (SILC)
Floating
Gate
REF1
11
REF2
REF3
10
01
00
Vth
Erased
Fully programmed
Electrons loss from the floating gate causes retention errors
Cells with more programmed electrons suffer more from retention errors
Threshold voltage is more likely to shift one interval than multiple intervals
10
Retention Error Value Dependency (3 months)
00 01
01 10
Cells with more programmed electrons tend to suffer more from
retention noise (i.e. 00 and 01)
11
2-bit MLC Background Overview
Internal Architecture of 2-bit NAND Flash Memory
LSB-Even Page Sets
LSB-Odd Page Sets
MSB-Even Page Sets
MSB-Odd Page Sets
12
Retention Error Location Dependency
LSB page has less BER
REF1
Odd Page Cells
LSB/MSB
11
Even pages have less BER
REF3
REF2
10
01
Even Page Cells
00
Vth
13
Program interference
LSB/MSB
Additional Electrons Injected
Floating
Gate
REF1
11
REF2
REF3
10
01
00
VT
Erased
Fully programmed
Program interference errors are caused by extra electrons injection
when programming neighbor cells
Cells with less programmed electrons suffer more from interference errors
Threshold voltage is less likely to shift up more than one level
14
Program Interference Error Value
Dependency
10 01
11 10
Cells with less programmed electrons tend to suffer more from
neighboring cell interference (i.e. 11 and 10)
15
Program Interference Error Location
Dependency
Program interference errors appear in even-MSB pages
BER of bottom pages are orders of magnitude higher
16
Write Interference on bottom wordline
0V
Vpass(10V)
SGS
WL0
GND
Vpgm(20V)
Vpass(10V)
WL n
WL31
…
…
Vdd
SGD
Vdd
bitline
10 V
Channel Voltage
0V
Potential of drain edge of SGS transistor is raised by channel
boosting
Electrons are accelerated between SGS and WL0 and are quite
possible to injected into the floating gate of WL0
HCI noise generated by source/drain hot-electrons in WL0
Threshold voltage of cells on WL0 shift right and it can even shift
across more than one level (e.g. 11->01 or 00)
17
Read Error Analysis
Floating
Gate
REF1
11
REF2
REF3
10
01
00
VT
Erased
Fully programmed
18
Erase Errors Analysis
0V
Continuous erases can
significantly reduce errors
remove residual electrons
n+
n+
+18 V
19
Conclusions & Future work
Flash errors could show up for any operations
Erase error, program error, retention error and read error
Retention errors are the most dominant errors
Flash errors show explainable error patterns
Cycle-dependency, value-dependency and location-dependency
Understanding of modern flash memory error patterns will
enable designing effective error tolerance mechanisms
Value-asymmetry aware coding techniques
Cell location-aware wear leveling mechanisms
20