Outline - University of Iowa

Download Report

Transcript Outline - University of Iowa

55:035
Computer Architecture and Organization
Lecture 6
1
Outline
Memory Arrays and Hierarchy
 SRAM Architecture





SRAM Cell
Decoders
Column Circuitry
Multiple Ports
Serial Access Memories
 Flash
 DRAM

55:035 Computer Architecture and Organization
2
Memory Arrays
Memory Arrays
Random Access Memory
Read/Write Memory
(RAM)
(Volatile)
Static RAM
(SRAM)
Dynamic RAM
(DRAM)
Mask ROM
Programmable
ROM
(PROM)
Content Addressable Memory
(CAM)
Serial Access Memory
Read Only Memory
(ROM)
(Nonvolatile)
Shift Registers
Serial In
Parallel Out
(SIPO)
Erasable
Programmable
ROM
(EPROM)
Queues
Parallel In
Serial Out
(PISO)
Electrically
Erasable
Programmable
ROM
(EEPROM)
55:035 Computer Architecture and Organization
First In
First Out
(FIFO)
Last In
First Out
(LIFO)
Flash ROM
3
Levels of the Memory Hierarchy
Part of The On-chip
CPU Datapath
ISA 16-128
Registers
One or more levels (Static RAM):
Level 1: On-chip 16-64K
Level 2: On-chip 256K-2M
Level 3: On or Off-chip 1M-16M
Dynamic RAM (DRAM)
256M-16G
Interface:
SCSI, RAID,
IDE, 1394
80G-300G
CPU
Registers
Cache
Level(s)
Main Memory
Farther away from
the CPU:
Lower Cost/Bit
Higher Capacity
Increased Access
Time/Latency
Lower Throughput/
Bandwidth
Magnetic Disc
Optical Disk or Magnetic Tape
55:035 Computer Architecture and Organization
4
Memory Hierarchy Comparisons
Capacity
Access Time
Cost
CPU Registers
100s Bytes
<10s ns
Cache
K Bytes
10-100 ns
1-0.1 cents/bit
Main Memory
M Bytes
200ns- 500ns
$.0001-.00001 cents /bit
Disk
G Bytes, 10 ms
(10,000,000 ns)
-5
-6
10 - 10 cents/bit
Tape
infinite
sec-min
-8
10
Staging
Xfer Unit
faster
Registers
Instr. Operands
prog./compiler
1-8 bytes
Cache
Blocks
cache cntl
8-128 bytes
Memory
Pages
OS
4K-16K bytes
Files
user/operator
Mbytes
Disk
Larger
Tape
55:035 Computer Architecture and Organization
5
Connecting Memory
Processor
Memory
k-bit
address bus
MAR
n-bit
data bus
MDR
Up to 2k addressable
locations
Word length =n bits
Control lines
( R / W , MFC, etc.)
55:035 Computer Architecture and Organization
6
Array Architecture


2n words of 2m bits each
If n >> m, fold by 2k into fewer rows of more columns
wordlines
bitline conditioning
bitlines
row decoder
memory cells:
2n-k rows x
2m+k columns
n-k
column
circuitry
k
n


column
decoder
2m bits
Good regularity – easy to design
Very high density if good cells are used
55:035 Computer Architecture and Organization
7
6T SRAM Cell

Cell size accounts for most of array size


6T SRAM Cell



Used in most commercial chips
Data stored in cross-coupled inverters
Read:



Reduce cell size at expense of complexity
Precharge bit, bit_b
Raise wordline
bit
bit_b
word
Write:


Drive data onto bit, bit_b
Raise wordline
55:035 Computer Architecture and Organization
8
SRAM Read




Precharge both bitlines high
Then turn on wordline
One of the two bitlines will be pulled down by the cell
Ex: A = 0, A_b = 1
bit_b
bit
word

bit discharges, bit_b stays high
P1 P2
N2
A
N4
A_b
N1 N3
55:035 Computer Architecture and Organization
9
SRAM Write




Drive one bitline high, the other low
Then turn on wordline
Bitlines overpower cell with new value
Ex: A = 0, A_b = 1, bit = 1, bit_b = 0

Force A_b low
55:035 Computer Architecture and Organization
bit_b
bit
word
P1 P2
N2
A
N4
A_b
N1 N3
10
SRAM Column Example
Read
Write
Bitline Conditioning
Bitline Conditioning
2
2
More
Cells
More
Cells
word_q1
word_q1
SRAM Cell
bit_b_v1f
out_b_v1r
H
bit_v1f
H
bit_b_v1f
bit_v1f
SRAM Cell
write_q1
out_v1r
data_s1
1
2
word_q1
bit_v1f
out_v1r
55:035 Computer Architecture and Organization
11
Decoders

n:2n decoder consists of 2n n-input AND gates


One needed for each row of memory
Build AND from NAND or NOR gates
A1
A0
word0
word1
word2
word3
55:035 Computer Architecture and Organization
12
Large Decoders

For n > 4, NAND gates become slow

Break large gates into multiple smaller gates
A3
A2
A1
A0
word0
word1
word2
word3
word15
55:035 Computer Architecture and Organization
13
Column Circuitry

Some circuitry is required for each column



Bitline conditioning
Sense amplifiers
Column multiplexing
55:035 Computer Architecture and Organization
14
Bitline Conditioning

Precharge bitlines high before reads

bit

bit_b
Equalize bitlines to minimize voltage difference
when using sense amplifiers

bit
bit_b
55:035 Computer Architecture and Organization
15
Differential Pair Amp


Differential pair requires no clock
But always dissipates static power
sense_b
bit
P1
N1
P2
N2
sense
bit_b
N3
55:035 Computer Architecture and Organization
16
Column Multiplexing


Recall that array may be folded for good aspect ratio
Ex: 2 kword x 16 folded into 256 rows x 128 columns


Must select 16 output bits from the 128 columns
Requires 16 8:1 column multiplexers
55:035 Computer Architecture and Organization
17
Multiple Ports

We have considered single-ported SRAM



One read or one write on each cycle
Multiported SRAM are needed for register files
Examples:



Multicycle MIPS must read two sources or write a result on
some cycles
Pipelined MIPS must read two sources and write a third
result each cycle
Superscalar MIPS must read and write many sources and
results each cycle
55:035 Computer Architecture and Organization
18
Dual-Ported SRAM

Simple dual-ported SRAM


Two independent single-ended reads
Or one differential write
bit
bit_b
wordA
wordB

Do two reads and one write by time multiplexing

Read during ph1, write during ph2
55:035 Computer Architecture and Organization
19
Multi-Ported SRAM



Adding more access transistors hurts read stability
Multiported SRAM isolates reads from state node
Single-ended design minimizes number of bitlines
bA bB bC
bD bE bF bG
wordA
wordB
wordC
wordD
wordE
wordF
wordG
write
circuits
read
circuits
55:035 Computer Architecture and Organization
20
Serial Access Memories

Serial access memories do not use an address





Shift Registers
Tapped Delay Lines
Serial In Parallel Out (SIPO)
Parallel In Serial Out (PISO)
Queues (FIFO, LIFO)
55:035 Computer Architecture and Organization
21
Shift Register
Shift registers store and delay data
 Simple design: cascade of registers


Watch your hold times!
clk
Din
Dout
8
55:035 Computer Architecture and Organization
22
Denser Shift Registers



Flip-flops aren’t very area-efficient
For large shift registers, keep data in SRAM instead
Move read/write pointers to RAM rather than data


Initialize read address to first entry, write to last
Increment address on each cycle
Din
clk
11...11
reset
counter
counter
00...00
readaddr
writeaddr
dual-ported
SRAM
Dout
55:035 Computer Architecture and Organization
23
Tapped Delay Line
A tapped delay line is a shift register with a
programmable number of stages
 Set number of stages with delay controls to mux


Ex: 0 – 63 stages of delay
clk
delay2
55:035 Computer Architecture and Organization
SR1
delay3
SR2
delay4
SR4
delay5
SR8
SR16
SR32
Din
delay1
Dout
delay0
24
Serial In Parallel Out

1-bit shift register reads in serial data

After N steps, presents N-bit parallel output
clk
Sin
P0
P1
P2
P3
55:035 Computer Architecture and Organization
25
Parallel In Serial Out

Load all N bits in parallel when shift = 0

Then shift one bit out per cycle
P0
P1
P2
P3
shift/load
clk
Sout
55:035 Computer Architecture and Organization
26
Queues
Queues allow data to be read and written at
different rates.
 Read and write each use their own clock, data
 Queue indicates whether it is full or empty
 Build with SRAM and read/write counters
(pointers)

WriteClk
WriteData
FULL
ReadClk
Queue
ReadData
EMPTY
55:035 Computer Architecture and Organization
27
FIFO, LIFO Queues

First In First Out (FIFO)






Initialize read and write pointers to first element
Queue is EMPTY
On write, increment write pointer
If write almost catches read, Queue is FULL
On read, increment read pointer
Last In First Out (LIFO)


Also called a stack
Use a single stack pointer for read and write
55:035 Computer Architecture and Organization
28
Memory Timing: Approaches
DRAM Timing
Multiplexed Adressing
55:035 Computer Architecture and Organization
SRAM Timing
Self-timed
29
Non-Volatile Memories

Floating-gate transistor
Floating gate
Gate
Source
D
Drain
G
tox
tox
n+
p
n+_
S
Substrate
Device cross-section
55:035 Computer Architecture and Organization
Schematic symbol
30
NOR Flash Operations ―Erase
55:035 Computer Architecture and Organization
31
NOR Flash Operations ―Program
55:035 Computer Architecture and Organization
32
NOR Flash Operations ―Read
55:035 Computer Architecture and Organization
33
NAND Flash Memory
Word line(poly)
Unit Cell
55:035 Computer Architecture and Organization
Source line
(Diff. Layer)
Courtesy Toshiba
34
Read-Write Memories (RAM)

Static (SRAM)





Data stored as long as supply is applied
Large (6 transistors/cell)
Fast
Differential
Dynamic (DRAM)




Periodic refresh required
Small (1-3 transistors/cell)
Slower
Single Ended
55:035 Computer Architecture and Organization
35
1-Transistor DRAM Cell



Write: Cs is charged or discharged by asserting WL
and BL
Read: Charge redistribution takes place between bit
line and storage capacitance
Voltage swing is small; typically around 250 mV
55:035 Computer Architecture and Organization
36
DRAM Cell Observations





1T DRAM requires a sense amplifier for each bit line, due to
charge redistribution read-out.
DRAM memory cells are single ended in contrast to SRAM
cells.
The read-out of the 1T DRAM cell is destructive; read and
refresh operations are necessary for correct operation.
1T cell requires presence of an extra capacitance that must
be explicitly included in the design.
When writing a “1” into a DRAM cell, a threshold voltage is
lost. This charge loss can be circumvented by bootstrapping
the word lines to a higher value than VDD
55:035 Computer Architecture and Organization
37
Sense Amp Operation
V BL
V (1)
V
PRE
ΔV(1)
V(0)
Sense amp activated
Word line activated
55:035 Computer Architecture and Organization
t
38
DRAM Timing
55:035 Computer Architecture and Organization
39