Document 7442653

Download Report

Transcript Document 7442653

CS252
Graduate Computer Architecture
Lecture 16
Memory Technology (Con’t)
Error Correction Codes
John Kubiatowicz
Electrical Engineering and Computer Sciences
University of California, Berkeley
http://www.eecs.berkeley.edu/~kubitron/cs252
Review: 12 Advanced Cache Optimizations
• Reducing hit time
1. Small and simple
caches
2. Way prediction
3. Trace caches
• Reducing Miss Penalty
7. Critical word first
8. Merging write buffers
• Increasing cache
bandwidth
4. Pipelined caches
5. Multibanked caches
6. Nonblocking caches
3/30/2009
• Reducing Miss Rate
9. Victim Cache
10. Hardware prefetching
11. Compiler prefetching
12. Compiler Optimizations
cs252-S09, Lecture 16
2
Review: Main Memory Background
• Performance of Main Memory:
– Latency: Cache Miss Penalty
» Access Time: time between request and word arrives
» Cycle Time: time between requests
– Bandwidth: I/O & Large Block Miss Penalty (L2)
• Main Memory is DRAM: Dynamic Random Access Memory
– Dynamic since needs to be refreshed periodically (8 ms, 1% time)
– Addresses divided into 2 halves (Memory as a 2D matrix):
» RAS or Row Address Strobe
» CAS or Column Address Strobe
• Cache uses SRAM: Static Random Access Memory
– No refresh (6 transistors/bit vs. 1 transistor
Size: DRAM/SRAM 4-8,
Cost/Cycle time: SRAM/DRAM 8-16
3/30/2009
cs252-S09, Lecture 16
3
DRAM Architecture
Col.
1
M
word lines
Row 1
Row Address
Decoder
N
N+M
bit lines
Col.
2M
Row 2N
Column Decoder &
Sense Amplifiers
Data
Memory cell
(one bit)
D
• Bits stored in 2-dimensional arrays on chip
• Modern chips have around 4 logical banks on each chip
– each logical bank physically implemented as many smaller arrays
3/30/2009
cs252-S09, Lecture 16
4
Review:1-T Memory Cell (DRAM)
row select
• Write:
– 1. Drive bit line
– 2.. Select row
• Read:
– 1. Precharge bit line to Vdd/2
– 2.. Select row
bit
– 3. Cell and bit line share charges
» Very small voltage changes on the bit line
– 4. Sense (fancy sense amp)
» Can detect changes of ~1 million electrons
– 5. Write: restore the value
• Refresh
– 1. Just do a dummy read to every cell.
3/30/2009
cs252-S09, Lecture 16
5
DRAM Capacitors: more capacitance
in a small area
• Trench capacitors:
–
–
–
–
3/30/2009
• Stacked capacitors
Logic ABOVE capacitor
Gain in surface area of capacitor
Better Scaling properties
Better Planarization
– Logic BELOW capacitor
– Gain in surface area of capacitor
– 2-dim cross-section quite small
cs252-S09, Lecture 16
6
DRAM Operation: Three Steps
• Precharge
– charges bit lines to known value, required before next row access
• Row access (RAS)
– decode row address, enable addressed row (often multiple Kb in row)
– bitlines share charge with storage cell
– small change in voltage detected by sense amplifiers which latch
whole row of bits
– sense amplifiers drive bitlines full rail to recharge storage cells
• Column access (CAS)
– decode column address to select small number of sense amplifier
latches (4, 8, 16, or 32 bits depending on DRAM package)
– on read, send latched bits out to chip pins
– on write, change sense amplifier latches. which then charge storage
cells to required value
– can perform multiple column accesses on same row without another
row access (burst mode)
3/30/2009
cs252-S09, Lecture 16
7
DRAM Read Timing (Example)
• Every DRAM access
begins at:
RAS_L
– The assertion of the RAS_L
– 2 ways to read:
early or late v. CAS
CAS_L
A
WE_L
256K x 8
DRAM
9
OE_L
D
8
DRAM Read Cycle Time
RAS_L
CAS_L
A
Row Address
Col Address
Junk
Row Address
Col Address
Junk
WE_L
OE_L
D
High Z
Junk
Data Out
Read Access
Time
Data Out
Output Enable
Delay
Early Read Cycle: OE_L asserted before CAS_L
3/30/2009
High Z
Late Read Cycle: OE_L asserted after CAS_L
cs252-S09, Lecture 16
8
Main Memory Performance
Cycle Time
Access Time
Time
• DRAM (Read/Write) Cycle Time >> DRAM
(Read/Write) Access Time
– 2:1; why?
• DRAM (Read/Write) Cycle Time :
– How frequent can you initiate an access?
– Analogy: A little kid can only ask his father for money on Saturday
• DRAM (Read/Write) Access Time:
– How quickly will you get what you want once you initiate an access?
– Analogy: As soon as he asks, his father will give him the money
• DRAM Bandwidth Limitation analogy:
– What happens if he runs out of money on Wednesday?
3/30/2009
cs252-S09, Lecture 16
9
Increasing Bandwidth - Interleaving
Access Pattern without Interleaving:
CPU
Memory
D1 available
Start Access for D1
Start Access for D2
Memory
Bank 0
Access Pattern with 4-way Interleaving:
CPU
Memory
Bank 1
Access Bank 0
Memory
Bank 2
Memory
Bank 3
Access Bank 1
Access Bank 2
Access Bank 3
We can Access Bank 0 again
3/30/2009
cs252-S09, Lecture 16
10
Main Memory Performance
• Wide:
• Simple:
• Interleaved:
– CPU/Mux 1 word;
Mux/Cache, Bus,
Memory N words
(Alpha: 64 bits & 256
bits)
– CPU, Cache, Bus 1 word:
Memory N Modules
(4 Modules); example is
word interleaved
– CPU, Cache, Bus, Memory
same width
(32 bits)
3/30/2009
cs252-S09, Lecture 16
11
Main Memory Performance
• Timing model
– 1 to send address,
– 4 for access time, 10 cycle time, 1 to send data
– Cache Block is 4 words
• Simple M.P.
= 4 x (1+10+1) = 48
• Wide M.P.
= 1 + 10 + 1
= 12
• Interleaved M.P. = 1+10+1 + 3 =15
address
address
address
address
0
4
8
12
1
5
9
13
2
6
10
14
3
7
11
15
Bank 0
3/30/2009
Bank 1
Bank 2
cs252-S09, Lecture 16
Bank 3
12
Avoiding Bank Conflicts
• Lots of banks
int x[256][512];
for (j = 0; j < 512; j = j+1)
for (i = 0; i < 256; i = i+1)
x[i][j] = 2 * x[i][j];
• Even with 128 banks, since 512 is multiple of 128,
conflict on word accesses
• SW: loop interchange or declaring array not power of 2
(“array padding”)
• HW: Prime number of banks
–
–
–
–
3/30/2009
bank number = address mod number of banks
bank number = address mod number of banks
address within bank = address / number of words in bank
modulo & divide per memory access with prime no. banks?
cs252-S09, Lecture 16
13
Finding Bank Number and Address
within a bank
• Problem: Determine the number of banks, Nb and the number of
words in each bank, Wb, such that:
– given address x, it is easy to find the bank where x will be found,
B(x), and the address of x within the bank, A(x).
– for any address x, B(x) and A(x) are unique
– the number of bank conflicts is minimized
• Solution: Use the following relation to determine B(x) and A(x):
B(x) = x MOD Nb
A(x) = x MOD Wb where Nb and Wb are co-prime (no factors)
– Chinese Remainder Theorem shows that B(x) and A(x) unique.
• Condition is satisfied if Nb is prime of form 2m-1:
– Since 2k = 2k-m (2m-1) + 2k-m  2k MOD Nb = 2k-m MOD Nb=…= 2j with j < m
– And, remember that: (A+B) MOD C = [(A MOD C)+(B MOD C)] MOD C
• Simple circuit for x mod Nb
– for every power of 2, compute single bit MOD (in advance)
– B(x) = sum of these values MOD Nb
(low complexity circuit, adder with ~ m bits)
3/30/2009
cs252-S09, Lecture 16
14
Quest for DRAM Performance
1. Fast Page mode
– Add timing signals that allow repeated accesses to row buffer
without another row access time
– Such a buffer comes naturally, as each array will buffer 1024 to
2048 bits for each access
2. Synchronous DRAM (SDRAM)
– Add a clock signal to DRAM interface, so that the repeated
transfers would not bear overhead to synchronize with DRAM
controller
3. Double Data Rate (DDR SDRAM)
– Transfer data on both the rising edge and falling edge of the
DRAM clock signal  doubling the peak data rate
– DDR2 lowers power by dropping the voltage from 2.5 to 1.8
volts + offers higher clock rates: up to 400 MHz
– DDR3 drops to 1.5 volts + higher clock rates: up to 800 MHz
•
3/30/2009
Improved Bandwidth, not Latency
cs252-S09, Lecture 16
15
Fast Memory Systems: DRAM specific
• Multiple CAS accesses: several names (page mode)
– Extended Data Out (EDO): 30% faster in page mode
• Newer DRAMs to address gap;
what will they cost, will they survive?
– RAMBUS: startup company; reinvented DRAM interface
» Each Chip a module vs. slice of memory
» Short bus between CPU and chips
» Does own refresh
» Variable amount of data returned
» 1 byte / 2 ns (500 MB/s per chip)
– Synchronous DRAM: 2 banks on chip, a clock signal to DRAM, transfer
synchronous to system clock (66 - 150 MHz)
» DDR DRAM: Two transfers per clock (on rising and falling edge)
– Intel claims FB-DIMM is the next big thing
» Stands for “Fully-Buffered Dual-Inline RAM”
» Same basic technology as DDR, but utilizes a serial “daisy-chain”
channel between different memory components.
3/30/2009
cs252-S09, Lecture 16
16
Fast Page Mode Operation
• Regular DRAM Organization:
– N rows x N column x M-bit
– Read & Write M-bit at a time
– Each M-bit access requires
a RAS / CAS cycle
Column
Address
DRAM
– N x M “SRAM” to save a row
• After a row is read into the
register
– Only CAS is needed to access
other M-bit blocks on that row
– RAS_L remains asserted while
CAS_L is toggled
Row
Address
N rows
• Fast Page Mode DRAM
1st M-bit Access
N cols
N x M “SRAM”
M bits
M-bit Output
2nd M-bit
3rd M-bit
4th M-bit
Col Address
Col Address
Col Address
RAS_L
CAS_L
A
Row Address
3/30/2009
Col Address
cs252-S09, Lecture 16
17
SDRAM timing (Single Data Rate)
CAS
RAS
(New Bank)
x
CAS Latency
Precharge
Burst
READ
• Micron 128M-bit dram (using 2Meg16bit4bank ver)
– Row (12 bits), bank (2 bits), column (9 bits)
3/30/2009
cs252-S09, Lecture 16
18
Double-Data Rate (DDR2) DRAM
200MHz
Clock
Row
Column
Precharge
Row’
Data
[ Micron, 256Mb DDR2 SDRAM datasheet ]
3/30/2009
cs252-S09, Lecture 16
400Mb/s
Data Rate
19
DDR vs DDR2 vs DDR3
• All about increasing the
rate at the pins
• Not an improvement in
latency
– In fact, latency can
sometimes be worse
• Internal banks often
consumed for increased
bandwidth
3/30/2009
cs252-S09, Lecture 16
20
DRAM name based on Peak Chip Transfers / Sec
DIMM name based on Peak DIMM MBytes / Sec
Standard
Clock Rate
(MHz)
M transfers
/ second
DRAM
Name
Mbytes/s/
DIMM
DIMM
Name
DDR
133
266
DDR266
2128
PC2100
DDR
150
300
DDR300
2400
PC2400
DDR
200
400
DDR400
3200
PC3200
DDR2
266
533
DDR2-533
4264
PC4300
DDR2
333
667
DDR2-667
5336
PC5300
DDR2
400
800
DDR2-800
6400
PC6400
DDR3
533
1066
DDR3-1066
8528
PC8500
DDR3
666
1333
DDR3-1333
10664
PC10700
DDR3
800
1600
DDR3-1600
12800
PC12800
x2
3/30/2009
x8
cs252-S09, Lecture 16
21
DRAM Packaging
Clock and control signals
~7
Address lines multiplexed
row/column address ~12
DRAM
chip
Data bus
(4b,8b,16b,32b)
• DIMM (Dual Inline Memory Module) contains multiple
chips arranged in “ranks”
• Each rank has clock/control/address signals
connected in parallel (sometimes need buffers to
drive signals to all chips), and data pins work
together to return wide word
– e.g., a rank could implement a 64-bit data bus using 16x4-bit
chips, or a 64-bit data bus using 8x8-bit chips.
• A modern DIMM usually has one or two ranks
(occasionally 4 if high capacity)
– A rank will contain the same number of banks as each
constituent chip (e.g., 4-8)
3/30/2009
cs252-S09, Lecture 16
22
DRAM Channel
Rank
Rank
Bank
Bank
Chip
Chip
16
16
Bank
Bank
Chip
Chip
16
Memory
Controller
16
Bank
Bank
Chip
Chip
16
64-bit
Data
Bus
16
Bank
Bank
Chip
Chip
16
16
Command/Address Bus
3/30/2009
cs252-S09, Lecture 16
23
FB-DIMM Memories
Regular
DIMM
FB-DIMM
• Uses Commodity DRAMs with special controller on
actual DIMM board
• Connection is in a serial form:
FB-DIMM
FB-DIMM
FB-DIMM
FB-DIMM
FB-DIMM
3/30/2009
cs252-S09, Lecture 16
Controller
24
FLASH Memory
Samsung 2007:
– Has a floating gate that can hold charge 16GB, NAND Flash
• Like a normal transistor but:
– To write: raise or lower wordline high enough to cause charges to tunnel
– To read: turn on wordline as if normal transistor
» presence of charge changes threshold and thus measured current
• Two varieties:
– NAND: denser, must be read and written in blocks
– NOR: much less dense, fast to read and write
3/30/2009
cs252-S09, Lecture 16
25
Phase Change memory (IBM, Samsung, Intel)
• Phase Change Memory (called PRAM or PCM)
– Chalcogenide material can change from amorphous to crystalline
state with application of heat
– Two states have very different resistive properties
– Similar to material used in CD-RW process
• Exciting alternative to FLASH
– Higher speed
– May be easy to integrate with CMOS processes
3/30/2009
cs252-S09, Lecture 16
26
Tunneling Magnetic Junction
• Tunneling Magnetic Junction RAM (TMJ-RAM)
– Speed of SRAM, density of DRAM, non-volatile (no refresh)
– “Spintronics”: combination quantum spin and electronics
– Same technology used in high-density disk-drives
3/30/2009
cs252-S09, Lecture 16
27
Big storage (such as DRAM/DISK):
Potential for Errors!
• Motivation:
– DRAM is dense Signals are easily disturbed
– High Capacity  higher probability of failure
• Approach: Redundancy
– Add extra information so that we can recover from errors
– Can we do better than just create complete copies?
• Block Codes: Data Coded in blocks
–
–
–
–
k data bits coded into n encoded bits
Measure of overhead: Rate of Code: K/N
Often called an (n,k) code
Consider data as vectors in GF(2) [ i.e. vectors of bits ]
• Code Space is set of all 2n vectors,
Data space set of 2k vectors
– Encoding function: C=f(d)
– Decoding function: d=f(C’)
– Not all possible code vectors, C, are valid!
3/30/2009
cs252-S09, Lecture 16
28
Error Correction Codes (ECC)
• Memory systems generate errors (accidentally flippedbits)
– DRAMs store very little charge per bit
– “Soft” errors occur occasionally when cells are struck by alpha particles
or other environmental upsets.
– Less frequently, “hard” errors can occur when chips permanently fail.
– Problem gets worse as memories get denser and larger
• Where is “perfect” memory required?
– servers, spacecraft/military computers, ebay, …
• Memories are protected against failures with ECCs
• Extra bits are added to each data-word
– used to detect and/or correct faults in the memory system
– in general, each possible data word value is mapped to a unique “code
word”. A fault changes a valid code word to an invalid one - which can
be detected.
3/30/2009
cs252-S09, Lecture 16
29
General Idea: Code Vector Space
Code Space
C0=f(v0)
Code Distance
(Hamming Distance)
v0
• Not every vector in the code space is valid
• Hamming Distance (d):
– Minimum number of bit flips to turn one code word into another
• Number of errors that we can detect: (d-1)
• Number of errors that we can fix: ½(d-1)
3/30/2009
cs252-S09, Lecture 16
30
Some Code Types
• Linear Codes:
C  G d
S  H C
Code is generated by G and in null-space of H
– (n,k) code: Data space 2k, Code space 2n
– (n,k,d) code: specify distance d as well
• Random code:
– Need to both identify errors and correct them
– Distance d  correct ½(d-1) errors
• Erasure code:
– Can correct errors if we know which bits/symbols are bad
– Example: RAID codes, where “symbols” are blocks of disk
– Distance d  correct (d-1) errors
• Error detection code:
– Distance d  detect (d-1) errors
• Hamming Codes
– d = 3  Columns nonzero, Distinct
– d = 4  Columns nonzero, Distinct, Odd-weight
• Binary Golay code: based on quadratic residues mod 23
– Binary code: [24, 12, 8] and [23, 12, 7].
– Often used in space-based schemes, can correct 3 errors
3/30/2009
cs252-S09, Lecture 16
31
Hamming Bound, symbols in GF(2)
• Consider an (n,k) code with distance d
– How do n, k, and d relate to one another?
• First question: How big are spheres?
– For distance d, spheres are of radius ½ (d-1),
» i.e. all error with weight ½ (d-1) or less must fit within sphere
– Thus, size of sphere is at least:
1 + Num(1-bit err) + Num(2-bit err) + …+ Num( ½(d-1) – bit err) 
1
( d 1)
2
e 0
Size  
n
 
e
• Hamming bound reflects bin-packing of spheres:
– need 2k of these spheres within code space
1
( d 1)
2
e 0
2 
k
3/30/2009
n
   2 n
e

2k  (1  n)  2n , d  3
cs252-S09, Lecture 16
32
How to Generate code words?
• Consider a linear code. Need a Generator Matrix.
– Let vi be the data value (k bits), Ci be resulting code (n bits):
Ci  G  vi
G must be an nk matrix
• Are there 2k unique code values?
– Only if the k columns of G are linearly independent!
• Of course, need some way of decoding as well.
 
v  f C
'
i
d
i
– Is this linear??? Why or why not?
• A code is systematic if the data is directly encoded
within the code words.
I
G   
P
– Means Generator has form:
– Can always turn non-systematic
code into a systematic one (row ops)
3/30/2009
cs252-S09, Lecture 16
33
Implicitly Defining Codes by Check Matrix
• But – what is the distance of the code? Not obvious
• Instead, consider a parity-check matrix H (n[n-k])
– Compute the following syndrome Si given code element Ci:
S i  H  Ci
– Define valid code words Ci as those that give Si=0 (null space of H)
– Size of null space? (n-rank H)=k if (n-k) linearly independent columns in H
• Suppose you transmit code word C, and there is an error. Model this as vector
E which flips selected bits of C to get R (received):
R CE
Error vector
• Consider what happens when we multiply by H:
S  H  R  H  (C  E )  H  E
• What is distance of code?
– Code has distance d if no sum of d-1 or less columns yields 0
– I.e. No error vectors, E, of weight < d have zero syndromes
– Code design: Design H matrix with these properties
3/30/2009
cs252-S09, Lecture 16
34
How to relate G and H (Binary Codes)
• Defining H makes it easy to understand distance of code,
but hard to generate code (H defines code implicitly!)
• However, let H be of following form:
H  P | I 
P is (n-k)k, I is (n-k)(n-k)
Result: H is (n-k)k
• Then, G can be of following form (maximal code size):
I
G   
P
P is (n-k)k, I is kk
Result: G is nk
• Notice: G generates values in null-space of H



 I 
S i  H  G  v i   P | I       v i  0
 P

3/30/2009
cs252-S09, Lecture 16
35
Simple example (Parity, D=2)
• Parity code (8-bits):
1 0 0 0 0

0 1 0 0 0
0 0 1 0 0

0 0 0 1 0
G   0 0 0 0 1
0 0 0 0 0

0 0 0 0 0
0 0 0 0 0

1 1 1 1 1
H  111111111
0
0
0
0
0
1
0
0
1
0 0

0 0
0 0

0 0
0 0 
0 0

1 0
0 1

1 1
C8
C7
C6
C5
C4
C3
C2
C1
C0
v7
v6
v5
v4
v3
v2
v1
v0
+
c8
+
s0
• Note: Complexity of logic depends on number of 1s in row!
3/30/2009
cs252-S09, Lecture 16
36
Simple example: Repetition (voting, D=3)
• Repetition code (1-bit):
1
 
G  1
1
 
1 1 0 

H  
1 0 1 
• Positives: simple
• Negatives:
C0
v0
C1
C2
C0
C1
Error
C2
– Expensive: only 33% of code word is data
– Not packed in Hamming-bound sense (only D=3). Could get much more
efficient coding by encoding multiple bits at a time
3/30/2009
cs252-S09, Lecture 16
37
Simple Example: Hamming Code (d=3)
• Example: (7,4) code:
1 1 0
– Protect 4 data bits with 3 parity bits

1 2 3 4 5 6 7
1 0 1
p1 p2 d1 p3 d2 d3 d4
1 0 0
• Bit position number

001 = 110
G  0 1 1
011 = 310
0 1 0
p
1
Note:
101 = 510

number
bits
111 = 710
0 0 1
from left to
010 = 210
right.

011 = 310
0 0 0
p2
3/30/2009
110 = 610
111 = 710
100 = 410
101 = 510
110 = 610
111 = 710
p3
1

1
0

1

0
0

1
 1 0 1 0 1 0 1


H   0 1 1 0 0 1 1
 0 0 0 1 1 1 1


cs252-S09, Lecture 16
38
How to correct errors?
• But – what is the distance of the code? Not obvious
• Instead, consider a parity-check matrix H (n[n-k])
– Compute the following syndrome Si given code element Ci:
S i  H  Ci  H  E
• Suppose that two correctable error vectors E1 and E2 produce same
syndrome:


H  E1  H  E2  H  E1  E2  0
 E1  E2 has d or more bits set
• But, since both E1 and E2 have  (d-1)/2 bits, E1 + E2  d-1 bits set this
cannot be true!
• So, syndrome is unique indicator of correctable error vectors
3/30/2009
cs252-S09, Lecture 16
39
Example, d=4 code (SEC-DED)
• Design H with:
– All columns non-zero, odd-weight, distinct
» Note that odd-weight refers to Hamming Weight, i.e. number of zeros
• Why does this generate d=4?
– Any single bit error will generate a distinct, non-zero value
– Any double error will generate a distinct, non-zero value
» Why? Add together two distinct columns, get distinct result
– Any triple error will generate a non-zero value
» Why? Add together three odd-weight values, get an odd-weight value
– So: need four errors before indistinguishable from code word
• Because d=4:
– Can correct 1 error (Single Error Correction, i.e. SEC)
– Can detect 2 errors (Double Error Detection, i.e. DED)
• Example:
– Note: log size of nullspace will
be (columns – rank) = 4, so:
» Rank = 4, since rows
independent, 4 cols indpt
» Clearly, 8 bits in code word
» Thus: (8,4) code
3/30/2009
 S0   1
  
 S1   1
 S   1
 2 
 S  0
 3 
cs252-S09, Lecture 16
1
1
0
1
1
0
1
1
0
1
1
1
1
0
0
0
0
1
0
0
0
0
1
0
 C0 
 
 C1 
0   C2 
  
0   C3 
 
0   C4 

1   C5 
 
 C6 
C 
 7
40
Tweeks:
• No reason cannot make code shorter than required
• Suppose n-k=8 bits of parity. What is max code size (n) for
d=4?
– Maximum number of unique, odd-weight columns: 27 = 128
– So, n = 128. But, then k = n – (n – k) = 120. Weird!
– Just throw out columns of high weight and make 72, 64 code!
• But – shortened codes like this might have d > 4 in some
special directions
– Example: Kaneda paper, catches failures of groups of 4 bits
– Good for catching chip failures when DRAM has groups of 4 bits
• What about EVENODD code?
– Can be used to handle two erasures
– What about two dead DRAMs? Yes, if you can really know they are dead
3/30/2009
cs252-S09, Lecture 16
41
3/30/2009
cs252-S09, Lecture 16
42
Aside: Galois Field Elements
• Definition: Field: a complete group of elements with:
–
–
–
–
Addition, subtraction, multiplication, division
Completely closed under these operations
Every element has an additive inverse
Every element except zero has a multiplicative inverse
• Examples:
– Real numbers
– Binary, called GF(2)  Galois Field with base 2
» Values 0, 1. Addition/subtraction: use xor. Multiplicative inverse of 1 is 1
– Prime field, GF(p)  Galois Field with base p
» Values 0 … p-1
» Addition/subtraction/multiplication: modulo p
» Multiplicative Inverse: every value except 0 has inverse
» Example: GF(5): 11  1 mod 5, 23  1mod 5, 44  1 mod 5
– General Galois Field: GF(pm)  base p (prime!), dimension m
» Values are vectors of elements of GF(p) of dimension m
» Add/subtract: vector addition/subtraction
» Multiply/divide: more complex
» Just like read numbers but finite!
» Common for computer algorithms: GF(2m)
3/30/2009
cs252-S09, Lecture 16
43
Reed-Solomon Codes
• Galois field codes: code words consist of symbols
– Rather than bits
• Reed-Solomon codes:
–
–
–
–
–
Based on polynomials in GF(2k) (I.e. k-bit symbols)
Data as coefficients, code space as values of polynomial:
P(x)=a0+a1x1+… ak-1xk-1
Coded: P(0),P(1),P(2)….,P(n-1)
Can recover polynomial as long as get any k of n
• Properties: can choose number of check symbols
– Reed-Solomon codes are “maximum distance separable” (MDS)
– Can add d symbols for distance d+1 code
– Often used in “erasure code” mode: as long as no more than n-k
coded symbols erased, can recover data
• Side note: Multiplication by constant in GF(2k) can
be represented by kk matrix: ax
– Decompose unknown vector into k bits: x=x0+2x1+…+2k-1xk-1
– Each column is result of multiplying a by 2i
3/30/2009
cs252-S09, Lecture 16
44
Reed-Solomon Codes (con’t)
• Reed-solomon codes
(Non-systematic):
– Data as coefficients, code space
as values of polynomial:
– P(x)=a0+a1x1+… a6x6
– Coded: P(0),P(1),P(2)….,P(6)
• Called Vandermonde
Matrix: maximum rank
• Different representation
(This H’ and G not related)
– Clear that all combinations of
two or less columns
independent  d=3
– Very easy to pick whatever d
you happen to want
 10
 0
2
 0
3
G   40
 0
5
 60
 0
7
0

1
H '   1
1
11
21
31
41
51
61
71
12
22
32
42
52
62
72
13
23
33
43
53
63
73
14 

4
2   a0 
 
4
3   a1 
4 4    a2 
 

54   a3 
 
6 4   a4 

4
7 
20
30
40
50
60
21
31
41
51
61
70 

1
7 
• Fast, Systematic version of
Reed-Solomon:
– Cauchy Reed-Solomon
3/30/2009
cs252-S09, Lecture 16
45
Conclusion
• Main memory is Dense, Slow
– Cycle time > Access time!
• Techniques to optimize memory
–
–
–
–
Wider Memory
Interleaved Memory: for sequential or independent accesses
Avoiding bank conflicts: SW & HW
DRAM specific optimizations: page mode & Specialty DRAM
• ECC: add redundancy to correct for errors
– (n,k,d)  n code bits, k data bits, distance d
– Linear codes: code vectors computed by linear transformation
• Erasure code: after identifying “erasures”, can correct
• Reed-Solomon codes
– Based on GF(pn), often GF(2n)
– Easy to get distance d+1 code with d extra symbols
– Often used in erasure mode
3/30/2009
cs252-S09, Lecture 16
46