Transcript Slide 1
Tri-Level-Cell Phase Change Memory: Toward an Efficient and Reliable Memory System Nak Hee Seong Sungkap Yeo Hsien-Hsin Sean Lee Phase Change Memory High resistivity Low resistivity Source: Hegedüs & Elliott, Nature Materials & Sangyeun Cho, MICRO-42 2 Single-Level Cell vs. Multi-Level Cell Two Storage Levels 2LC or SLC = one bit per cell 0 1 Four Storage Levels 4LC = two bits per cell 002 012 102 112 3 Single-Level Cell PCM i SET i RESET t t # of Cells 1k 1M 103 Difference 4 Multi-Level Cell PCM i SET i t i RESET t t # of Cells 1k Storage Level 0 Storage Level 1 Storage Level 2 1M Storage Level 3 5 Resistance Drift t=1 # of Cells SET Storage Level 0 RESET Storage Level 1 Storage Level 2 Storage Level 3 Decision Boundaries 6 Resistance Drift t=2 # of Cells SET Storage Level 0 RESET Storage Level 1 Storage Level 2 Storage Level 3 7 Resistance Drift t=4 # of Cells SET Storage Level 0 RESET Storage Level 1 Storage Level 2 Storage Level 3 8 Resistance Drift t=8 # of Cells SET Storage Level 0 RESET Storage Level 1 Storage Level 2 Storage Level 3 Drift-induced Soft Errors!!! 9 Executive Summary • Observation ▷ MLC PCM offers capacity but not reliable • Goal ▷ Capacity & Reliability • Solution ▷ Tri-Level-Cell PCM • Challenges ▷ Conversion and ECC • Results (over 4LC PCM) 105 lower soft error rates 36.4% performance improvement • Results (over SLC PCM) 1.33x higher information density 10 1. Multi-Level-Cell PCM 2. Error Models 3. Our Approach • Tri-Level-Cell PCM • Conversion & ECC 4. Evaluations 11 11 Drifted Resistance • Modeled by power law equation [Ielmini et al. IEEE TED and IEDM ‘07] Tuned between I. program boundaries Rdrift (t ) RI t tI t log 10 Rdrift (t ) log 10 RI log 10 tI II. N (µα,α2) IV. Sensing boundaries III. N (µR,R2) 12 Probability of Drift-induced Errors (DE) Prob DIE R 2.75 R R 2.75 R where R 3 R m n (1 ( )) f (m)dm, n m log 10 RI , n log 10 t , ( x) CDF of normal distributi on tI f (m) Truncated PDF of normal distributi on α log10R Storag e Level µR 0 3.0 1 4.0 2 5.0 3 6.0 R µα α 0.001 1/6 0.02 0.06 0.4×µα 0.10 µR - 2.75R ≤ Program boundary ≤ µR + 2.75R µR – 3.0R ≤ Sensing boundary ≤ µR + 3.0R [Xu et al., TVLSI vol.19] 13 Probability of Drift-induced Errors (DE) Program boundaries N (µα,α2) N (µR,R2) Sensing boundaries Probability of Soft Error 14 Probability of Soft Error Probability of Drift-induced Errors (DE) 1,E-01 1,E-03 1,E-05 > 17% 1,E-07 1,E-09 Storage Level 1 (Simulated results) 1,E-11 Storage Level 2 (Simulated results) 1,E-13 Storage Level 1 (Equation) Storage Level 1 (Equation) Storage Level 2 (Equation) Storage Level 2 (Equation) 1,E-15 per bit-hour [Schroeder et al., SIGMETRICS ’09] 2^1 2^1 2^2 2^2 2^3 2^3 2^4 2^4 2^5 2^5 2^6 2^6 2^7 2^7 2^8 2^8 2^9 2^9 2^10 2^10 2^11 2^11 2^12 2^12 2^13 2^13 2^14 2^14 2^15 2^15 2^16 2^16 2^17 2^17 DRAM 2.5 ~ 7.5 × 10-9 % Time (sec) ~36 Hours 15 Naïve Solution (1) • Fine-tune resistance level & secure large margins – Requires more write-&-verify iterations • Compromise write latencies • Shorten lifetime Decision Boundaries # of Cells Storage Level 0 Storage Level 1 Storage Level 2 Storage Level 3 16 Naïve Solution (2) • Periodic reprogramming – Similar to DRAM scrubbing – Significantly compromises performance • Programming a 2-bit cell takes ~1s – Consumes more write energy # of Cells Storage Level 0 Storage Level 1 Storage Level 2 Storage Level 3 17 1. Multi-Level-Cell PCM 2. Error Models 3. Our Approach • Tri-Level-Cell PCM • Conversion & ECC 4. Evaluations 18 18 Terminology II Two Storage Levels 2LC or SLC = one bit per cell 0 1 Three Storage Levels 03 13 23 Four Storage Levels 4LC = two bits per cell 002 012 102 112 Binary System 3LC ~ 1.5 bits per cell ≠ three bits per cell Ternary System 19 Proposed Solution • 4-level cell PCM – unreliable • Tri-level cell PCM – Removing the most error-prone state i SET i One step at a time i t RESET t t # of Cells L0 1k L1 L2 1M 20 Bandwidth-Enhanced (BE) 3LC PCM • Exploit safety margin Relaxing programming range i SET Reducing programming latency Increasing write bandwidth i i i RESET or t t # of Cells t t ModerateQuenched writing Method* L0 L1 L1 1k *[Kang et al., 2008 Symposium on VLSI technology] L2 1M 21 Bandwidth-Enhanced (BE) 3LC PCM • More capacity than 2LC – log23 times denser • More reliable than 4LC – Much lower probability of drift-induced errors • Longer lifetime than 4LC – fewer write-&-verify iterations for programming • Higher write bandwidth than 4LC • Issues – Efficient conversion methods – Error correcting schemes 22 1. Multi-Level-Cell PCM 2. Error Models 3. Our Approach • Tri-Level-Cell PCM • Conversion & ECC 4. Evaluations 23 23 Efficient Conversion Method • In theory 11 bits of binary = 2048 states 7 ternary cells = 2187 states ~94% utilization • Our approach Simple 3 bits of binary = 8 states Hardware 2 ternary cells = 9 states Realistic ~89% utilization Chip Config. Notation: <3,2> conversion 24 Efficient <3,2> Conversion 00 01 02 10 11 12 20 21 22 Ternary 25 Efficient <3,2> Conversion 00 01 02 10 11 12 20 21 22 Ternary 26 Efficient <3,2> Conversion 00 01 02 10 11 12 20 21 22 Ternary 27 Efficient <3,2> Conversion 00 000 01 001 010 100 02 011 101 10 11 20 110 12 21 111 22 Binary Ternary 28 Efficient <3,2> Conversion 00 000 01 001 010 100 02 011 101 10 11 20 110 12 21 111 22 Binary Ternary 29 ECC for Tri-Level-Cell PCM Single Bit Error Single Bit Error Binary Ternary • Legacy ECC for binary can be used Simple (72, 64) Hamming Code Memory controller requires minimal change • Chip configuration – Binary systems 72 bits = 9 chips * 8 bits per chip – Ternary systems 72 bits = 8 chips * 9 bits per chip 6 Ternary Cells No redundancy! 30 1. Multi-Level-Cell PCM 2. Error Models & Recent Work 3. Our Approach • Tri-Level-Cell PCM • Conversion & ECC 4. Evaluations 31 31 Drift-induced Error Rate Elapsed Time (s) 215 (9 hours) 3LC PCM BE-3LC PCM (too small) (too small) 32 Drift-induced Error Rate Elapsed Time (s) 215 (9 hours) 220 (12 days) 3LC PCM BE-3LC PCM (too small) (too small) (too small) 3.60E-16% 33 Drift-induced Error Rate Elapsed Time (s) 215 (9 hours) 220 (12 days) 3LC PCM BE-3LC PCM BE-3LC PCM + (72,64) ECC (too small) (too small) (too small) (too small) (too small) 3.60E-16% 34 Drift-induced Error Rate Elapsed Time (s) 215 (9 hours) 220 (12 days) 225 (1 year) 3LC PCM BE-3LC PCM BE-3LC PCM + (72,64) ECC (too small) (too small) (too small) (too small) 3.60E-16% (too small) (too small) 1.28E-10% 2.66E-15% 35 Performance (SPEC2006) 100% IPC 80% 60% 40% 20% 0% 2LC 4LC 4LC+LARDD 3LC BE-3LC 36 Performance (SPEC2006) 100% IPC 80% 60% 40% 20% 0% 2LC 4LC 4LC+LARDD 3LC BE-3LC 37 Performance (SPEC2006) 100% IPC 80% 60% 40% 20% 0% 2LC 4LC 4LC+LARDD * 3LC BE-3LC *[Awasthi et al., HPCA 2012] 38 Performance (SPEC2006) 100% IPC 80% 60% 40% 20% 0% 2LC 4LC 4LC+LARDD 3LC BE-3LC 39 Performance (SPEC2006) 100% IPC 80% 60% 40% 20% 0% 2LC 4LC 4LC+LARDD 3LC BE-3LC 40 Information Density 4LC 1,80 1,60 1,40 1,20 1,00 2LC 0,80 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Bits per cell 2,00 Number of Correctable Bits 41 Information Density 4LC 1,80 1,60 1,40 4LC+ECC 1,20 1,00 2LC 0,80 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Bits per cell 2,00 Number of Correctable Bits 42 Information Density 4LC 1,80 1,60 3LC 1,40 1,20 1,00 2LC 0,80 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Bits per cell 2,00 Number of Correctable Bits 43 Information Density 4LC 1,80 1,60 3LC 1,40 1,20 BE3LC+ECC 1,00 2LC 0,80 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Bits per cell 2,00 Number of Correctable Bits 44 Conclusions Conclusion Goal: Reliable Multi-Level-Cell PCM Analytical Error Model for MLC PCM 4LC PCM: Immature Tri-level-cell PCM State-mapping <3,2> conversion Deliver 1.33 bits per cell Performance & Reliability: Close to SLC Tri-Level-Cell Phase Change Memory: Toward an Efficient and Reliable Memory System • Nak Hee Seong • Sungkap Yeo • Hsien-Hsin Sean Lee 46