Takahiro Hanyu, Tohoku University

Download Report

Transcript Takahiro Hanyu, Tohoku University

Essderc2010_ITRS workshop on Emerging Spin and Carbon-based Nanoelectronic Logic
Devices @ Barcelo Renacimiento Hotel, Seville, Spain, Sep. 17, 2010
Magnetic FPGAs:
Challenge of Nonvolatile Logic-in-Memory
Architecture Using MOSFETs and Magnetic Tunnel
Junctions
Takahiro Hanyu
Laboratory for Brainware Systems
Research Institute of Electrical Communication (RIEC)
Tohoku University, Japan
Acknowledgements:
This work supported by the Japan Society for the Promotion of Science
(JSPS) through its “Funding Program for World-Leading Innovative R&D on
Science and Technology (FIRST Program; Prof. Hideo Ohno).“
This work was also supported by Laboratory for Nanoelectronics and
Spintronics, Tohoku University, Japan.
Outline
• Impact of Nonvolatile (NV) Logic-inMemory (LIM) Architecture
• Design of an MTJ-Based NV LIM Circuit
• Application 1: NV-FPGA
• Application 2: NV-TCAM
• Conclusions & Future Prospects
2
Power (W)
Power (W)
Power (W)
Background: Increasing delay & power
Active
Active
Active
Leakage
Leakage
Leakage
Leakage
current
1960 1970
1970 1980
1980 1990
1990 2000
20002010
2010
1960
1960 1970 1980 1990 2000
2010
Year
Year
W.
M.
Elgharbawy
et.
al.,
IEEE
CAS
Magazine,
2005.
Year
W. M. Elgharbawy et. al., IEEE CAS Magazine, 2005.
W. M. Elgharbawy et. al., IEEE CAS Magazine, 2005.
Logic and Memory modules are separated
Many interconnections between modules
On-chip memory modules are volatile.
Wire delay dominates chip performance
Global wires requires large drivers.
Power supply must be continuously
applied in memory modules.
Delay: Long
Power: Large
Static power: Large
3
Nonvolatile logic-in-memory architecture
Logic-in-Memory Architecture (proposed in 1969):
Storage elements are distributed over a logic-circuit plane.
•
Magnetic Tunnel Junction
(MTJ) device
MTJ
layer
CMOS
layer
●Storage is nonvolatile:
(Leakage current is cut off)
●MTJ devices are put
on the CMOS layer
●Storage/logic are merged:
(global-wire count is reduced)
•No volatility
•Unlimited endurance
•Fast writability
•Scalability
•CMOS compatibility
•3-D stack capability
Static power is cut off.
Chip area is reduced.
Wire delay is reduced.
Dynamic power is reduced.
4
Outline
• Impact of Nonvolatile (NV) Logic-inMemory (LIM) Architecture
• Design of an MTJ-Based NV LIM
Circuit
• Application 1: NV-FPGA
• Application 2: NV-TCAM
• Conclusions & Future Prospects
5
Model of a MOS/MTJ-hybrid circuit
Configuration
inputs
◆ Circuit configuration
◆ Pattern data
Data
inputs
Outputs
Storage (MTJ device)
Logic-circuit plane (CMOS)
Typical applications:
◆ Circuit-configuration type: Field-Programmable Gate Array (FPGA)
◆ Pattern-data type: Content-Addressable Memory (CAM)
6
Design example
x1
x1・x2
MUX
Data inputs
x1+x2
Output
x2
Configuration
input
y
1-bit
storage
Configuration Memory
How to design this logic circuit ?
7
CMOS implementation
VDD
NOR
MUX
x1+x2
x1
x2
GND
Output
NAND
x1・x2
Logic and
storage parts are
separated each
other.
VDD
CTRL
SRAM
cell
y
Small ?
y’
GND
Transistor counts : 20+α (nonvolatile devices)
8
Principle of MOS/MTJ-hybrid circuitry
x
RMOS
RH (High resistance)
if x=0
RL (Low resistance)
if x=1
RMOS =
RAP (High resistance) if y=0
y
RMTJ
RMTJ =
RP (Low resistance)
if y=1
y
x1
x2
(Ry , Rx1 , Rx2 )
Comparison
NAND
out1
NOR
out2
0
0
0
( RAP , RH , RH )
I < I’
1
-
0
0
1
( RAP , RH , RL )
I < I’
1
-
0
1
0
( RAP , RL , RH )
I < I’
1
-
0
1
1
( RAP , RL , RL )
I > I’
0
-
1
0
0
( RP , RH , RH )
I < I’
-
1
1
0
1
( RP , RH , RL )
I > I’
-
0
1
1
0
( RP , RL , RH )
I > I’
-
0
1
1
1
( RP , RL , RL )
I > I’
-
0
Logic function is configurable by stored data in MTJ.
9
MOS/MTJ-hybrid circuit implementation
VDD
Rload
Output
generator
Rload
(Vout’) out’
out (Vout)
Rx1
Rx2
x1
Logic
&
Storage
I
I’ Rx2’
Rx1’
x1’
x2’
y’
x2
y
y
Ry’
Ry
y’
y’
y
CTRL
CTRL
CLK
Current
comparator
0
(if
I > I’)
1
(if
I’ > I)
out =
Transistor counts : 11
Merging logic & storage
Compact
10
Outline
• Impact of Nonvolatile (NV) Logic-inMemory (LIM) Architecture
• Design of an MTJ-Based NV LIM Circuit
• Application 1: NV-FPGA
• Application 2: NV-TCAM
• Conclusions & Future Prospects
11
Typical Application 1 : Nonvolatile FPGA
NV devices are distributed
across the FPGA.
NV LUT (Lookup Table)
NV device
☺
Leakage current elimination
and short latency are possible.
NVM
NV
 How to design?
FPGA
MOS/MTJ-hybrid circuit
Not required!
12
Conventional nonvolatile FPGA
 CMOS logic circuit requires
high-voltage input swing.
MTJ
MTJ
MTJ
MTJ
SA
SA
SA
SA
Combinational
logic
(CMOS)
Output
(SA: Sense Amplifier)
Low voltage
High Voltage
How do we perform logic operation by using
low swing signal from MTJ device directly?
13
MOS/MTJ-hybrid circuitry (Proposed)
Current-mode logic (CML)
 Logic operation is performed even low swing voltage by
using the small difference of the current value.
MTJ
MTJ
MTJ
MTJ
Combinational
logic
(Current-Mode)
Low voltage
SA
Output
High voltage
Device count is reduced to 28% with less
performance degradation.
14
MOS/MTJ-hybrid structure
Selection
transistor tree
IF
A
B
R11
A
B
R10
B
R01
Reference
resistor
IREF
A
B
R00
B
A
B
RREF
Truth table
A
B
Z
0
0 Z00
0
1 Z01
1
0 Z10
1
1
Z11
ZAB=0RAB=RAP
ZAB=1RAB=RP
RAP >RREF > RP
2-input LUT function is realized by using
10 NMOS trs and 4 MTJs (and 1-resistor).
15
Operation example (XOR)
Sense Amplifier
Z=0
IF > IREF
IF
‘0’
RAP
‘1’
‘0’
‘1’
‘1’
RP
‘0’
RP
‘1’
RAP
‘0’
Z=1
IREF
‘0’
‘1’
RREF
Truth table
A
B
Z
0
0
1
0
1
0
1
0
0
1
1
1
Logic operation in low swing voltage is performed
by using a MOS/MTJ-hybrid network.
16
Precharge-Evaluate Logic
SA
Z
IF
C1
Z
IREF
MOS/MTJ-hybrid
network
(LUT operation)
CLK
CLK
IF < IREF  (Z, Z) = (0, 1)
IF > IREF  (Z, Z) = (1, 0)
C2
VCLK
VC1
VC2
VZ
VZ
CL
Precharge
(CLK=0)
Evaluate
(CLK=1)
Dynamic current-mode logic (DyCML)-based circuit.
 Reduction of dynamic power dissipation.
17
Spin-Injection Write Operation
Selection Transistor
Tree
Reference
Resistor
W=‘1’
ITMR
BL
=‘0’
WL0
WL1
WL2
RTMR
0
WL3 =1
1
BL =1
Spin-injection-based write
operation.
ICAP
RAP
RP
0 ICP
ITMR
18
Test chip features
Fabricated 2-input LUT
Selection
Transistor Tree
4 MTJ devices are
stacked over MOS layer
Process
0.14mm
MTJ/MOS
1-Poly,
3-Metal
Area
287mm2
MTJ Size
50nm
150nm
TMR Ratio
Current
Write
Time
Standby Current
100%
150mA
10ns
0A
19
Measured waveforms (Basic operations)
P E P E P E P E
P: Pre-Charge
E: Evaluate
Input A
Input B
Output Z
‘1’
‘0’
‘0’
‘0’
‘1’
‘1’
‘1’
‘0’
Output Z
NAND
NOR
A
0.78V/div
B
Z
100ms/div
‘0’
‘1’
‘1’
‘0’
‘1’
‘0’
‘0’
‘1’
Z
XOR
XNOR
20
Immediate wakeup behavior
Active
Standby
Active
VDD= 0
A B
CLK
A B
VDD
00 01 10 11
00 01 10 11
Z
Z
0.78V/div
50ms/div
Immediate wakeup behavior
has also measured successfully.
21
Comparison of performances
SRAM/MRAM
Device Counts
29 MOSs
+ 4 MTJs
702 mm2
287mm2
Delay *3)
100 ps
140 ps
185 ps
Power*3)
22.5 mW
26.7 mW
17.5 mW
Power
0 mW
0 mW
0 mW
Delay
42 ns/bit
0 ns/bit
0 ns/bit
Energy
19 pJ/bit
0 pJ/bit
0 pJ/bit
Active
Standby
to
Active
Proposed
455 mm2 *1)
Area *2)
Standby
46 MOSs
+ 1 MRAM *1)
Nonvolatile
SRAM [3]
102 MOSs
+ 8 MTJs
*1) It consists of four SRAM cells (24 MOSs), three 2-input multiplexers (18 MOSs),
and two output buffers (4 MOSs). MRAM and its peripheral circuits are not
considered in this evaluation.
*2) Estimation based on a 0.14mm process
*3) HSPICE simulation based on a 0.14mm MOS/MTJ-hybrid process
22
MRR vs. Operation Margin in NV-LUT
□ MRR in 6-input LUT. Shmoo Plot
RREF [k]
MR Ratio [%]
2.00
2.50
3.00
3.50
4.00
4.50
5.00
5.50
6.00
6.50
7.00
7.50
8.00
8.50
9.00
9.50
10.0
10.5
11.0
11.5
12.0
12.5
13.0
13.5
14.0
100
F
F
F
F
P
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
200
F
F
F
F
P
P
P
P
P
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
F
300
F
F
F
F
P
P
P
P
P
P
P
P
P
F
F
F
F
F
F
F
F
F
F
F
F
400
F
F
F
F
P
P
P
P
P
P
P
P
P
P
P
P
P
F
F
F
F
F
F
F
F
500
F
F
F
F
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
F
F
F
F
600
F
F
F
F
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
(P: Pass, F: Fail)
Large MRR →Sufficient operation margin
23
Outline
• Impact of Nonvolatile (NV) Logic-inMemory (LIM) Architecture
• Design of an MTJ-Based NV LIM Circuit
• Application 1: NV-FPGA
• Application 2: NV-TCAM
• Conclusions & Future Prospects
24
Application 2: Ternary Content-Addressable Memory (TCAM)
0
1
0
0
・・・
1
1
0
Fully parallel
masked equality
search
Search-line / Word-line driver
2
2
BL1
BL1’
BL2
BL2’
1
0
2
1
1
2
0
0
2
2
0
0
0
1
Stored words
2
BLn
BLn’
1
0
1
0
1
2
・・・
・・・
1
X
2
X
X
OUT1
OUT2
・・・
Bit-line driver
2
・・・
X
X
OUTn
Output driver
Input key
0 (Mismatch)
1 (Match)
0 (Mismatch)
Fully parallel search and fully parallel comparison can be done.
TCAM is a “functional memory.”
TCAM is the powerful data-search engine
useful for various applications such as database machine and virus checker in
network router
TCAM must be implemented more compactly with lower power dissipation.
25
NV-TCAM Cell Circuit
S’ / WL1
S / WL2
Wired-OR
ML
Stored data
(Match line)
B
b1
b2
0
S’ ・b1
(b1,b2 )
IZ
X
ML= b1・S’ + b2・S
don’t
care
S
Current
comparison
0
IZ < IZ ’
1
IZ > IZ ’
0
IZ > IZ ’
1
IZ < IZ’
0
IZ < IZ ’
1
IZ < IZ ’
(0,1)
S・b2
1
Search
input
(1,0)
(0,0)
Match
result
ML
1
(Match)
0
(Mismatch)
0
(Mismatch)
1
(Match)
1
(Match)
1
(Match)
26
CMOS-based TCAM cell circuit
1-bit storage
Equality-detection
(ED) circuit
1-bit storage
ML
VDD
Leakage
current
WL
VSS
BL1
Leakage
current
SL’
SL
BL2
Transistor counts : 12 (ED;4T, 2-bit storage;8T)
Input/output wires : 8 (BL;2, WL;1, VDD&VSS;2, SL;2, ML;1)
Always supply the power : Many leakage current path
How to realize compact & cut off the leakage current ?
27
MOS/MTJ-hybrid TCAM cell circuit
S. Matsunaga, K. Hiyama, A. Matsumoto, S. Ikeda, H. Hasegawa, K. Miura, J. Hayakawa, T. Endoh,
Hideo Ohno, and Takahiro Hanyu, "Standby-Power-Free Compact Ternary Content-Addressable
Memory Cell Chip Using Magnetic Tunnel Junction Devices," Applied Physics Express (APEX),
vol. 2, no. 2, pp. 023004-1~023004-3, 2009.
ML/BL
2-bit storage
(MTJs)
Logic
(MTJs & MOSs)
SL’/WL1
SL/WL1
•Merge storage into logic circuit : Compact (2T-2MTJ)
•Share wires : 4 (ML/BL, SL/WL, No-VDD)
•3-D stack structure : Great reduction of circuit area
Compact & nonvolatile TCAM cell with MTJ devices
28
Power-Gating Scheme of Bit-Serial NV-TCAM
1st-bit search
1
1
1
Search word
0
0
X
SA
ACC
0
1
0
SA
0
X
1
1
0
1
2nd-bit search
1
1
1
Search word
Mismatch
0
0
X
SA
ACC
ACC
Mismatch
0
1
0
SA
SA
ACC
Mismatch
0
X
1
X
SA
ACC
Match
1
0
1
0
SA
ACC
Match
1
1
X
1
SA
ACC
Match
X
0
X
SA
ACC
X
1
0
SA
X
X
1
SA
3rd-bit search
1
1
1
Search word
Mismatch
0
0
X
SA
ACC
Mismatch
ACC
Mismatch
0
1
0
SA
ACC
Mismatch
SA
ACC
Mismatch
0
X
1
SA
ACC
Mismatch
X
SA
ACC
Mismatch
1
0
X
SA
ACC
Mismatch
1
0
SA
ACC
Match
1
1
0
SA
ACC
Mismatch
1
X
1
SA
ACC
Match
1
X
1
SA
ACC
Match
Match
X
0
X
SA
ACC
Mismatch
X
0
X
SA
ACC
Mismatch
ACC
Match
X
1
0
SA
ACC
Match
X
1
0
SA
ACC
Mismatch
ACC
Match
X
X
1
SA
ACC
Match
X
X
1
SA
ACC
Match
TCAM cell in standby mode
(Static power is suppressed.)
TCAM cell in active mode
SA Sense amplifier in active mode
SA
ACC Accumulator in active mode
Sense amplifier in standby mode
(Static power is suppressed.)
According to the word length of the TCAM,
the effectiveness of the standby-power reduction is increased.
29
TCAM cell circuit test chip
3.0 mm
Chip features
9.8 mm
Output
generator
in
MLSA
TCAM
cell
Ref.
cell
Dynamic
current
comparator
in
MLSA
Process
0.14mm CMOS/MTJ
1-Poly, 3-Metal
Total area
29.4 mm2
TCAM cell size
3.15 mm2
(2.1 mm×1.5 mm) a)
Cell structure
2MOSs-2MTJs
MTJ size
50 nm×200 nm
TMR ratio
167 %
Average
write current
274 mA (tp = 10 ms) b)
Standby current
0A (Power off)
TCAM cell with 12 transistors, whose cell size is 17.54 mm2 under a 0.18 mm CMOS process,
has been reported.8) The size of the conventional TCAM cell can be estimated as 10.61 mm2 under a 0.14 mm
CMOS process by scaling down. Thus, the size of the fabricated TCAM cell is reduced to 30 % compared to that
of the conventional one. Moreover, minimum size of the proposed TCAM cell can be considered as 1/6 of the
conventional one.
b) More high-speed write operation is possible with increase of write current. For example, with the average current
of 327 mA at 10 ns write.
30
a) A CMOS-based
Waveforms of equality-search operations
P : Precharge phase
P
E
P
P
E
P
Stored data B=1
S
Search
data
S=0
S=0
Match
10ms
Mismatch
P
E
Stored data B=X
S=1 ・・・
・・・
780mV
E
・・・
Stored data B=0
S=1 ・・・
OUT
P
E
・・・
CLK
Match
result
E
E : Evaluate phase
S=0
S=1
Match
Match
・・・
Match
Mismatch
Bit-level equality-search is successfully demonstrated.
31
Waveforms of sleep/wake-up operations
VDD
Power-off
Power-off
Active
P
Active
E
Standby
P
E
P
Active
E
Standby
P
E
CLK
Stored data B=0
S
S=0
OUTbefore=1
Stored data B=0
S=0
OUTafter=1
S=1
OUTbefore=0
S=1
OUTafter=0
OUT
780mV
10ms
Match
Match
Mismatch
Mismatch
Instant sleep/wake-up behavior is successfully demonstrated.
32
Comparison of 144-bit x 256-word Bit-Serial TCAM
HSPICE simulation under a 90nm CMOS/MTJ technology @125MHz, RP : 2k, TMR ratio : 100%
Active
Power
Standby
Power
CMOS-only
MOS/MTJ-hybrid
Cell array
109.6 mW
107.3 mW
SA
30.8 mW
9.6 mW
ACC
3.7 mW
CLK
32.7 mW
62.0 mW
Cell array
340.9 mW
1.8 mW
SA
1.2 mW
Delay
1.39 ns
103%
1.2%
43%
3.7 mW
2.3 mW
0.60 ns
Ultra-low-power/high-performance bit-serial TCAM is achieved
by MOS/MTJ-hybrid circuit with fine-grain power gating.
33
Outline
• Impact of Nonvolatile (NV) Logic-inMemory (LIM) Architecture
• Design of an MTJ-Based NV LIM Circuit
• Application 1: NV-FPGA
• Application 2: NV-TCAM
• Conclusions & Future Prospects
34
Conclusions
Propose a MOS/MTJ-hybrid circuit (nonvolatile logic-inmemory circuit using MTJ devices) style
Two kinds of typical applications with logic-in-memory
architecture; NV-LUT circuit and NV- TCAM
Compact and no static power dissipation
Confirm basic behavior with fabricated test chips
under an MTJ/CMOS process.
It could open an ultra-low-power logic-circuit paradigm
Future Prospects and Issues:
1. Establish the fabrication line
2. Establish the CAD tools
3. Explore the appropriate application fields
(Impact towards “Reliability Enhancement”)
35
Reliability Enhancement Using MTJs
Adjust VGS by MTJ devices connected to transistors
MTJ device
Programmable resistance value
(RTMR
Rmax or Rmin)
Non-volatile storage element
VS
RMTJ
VGS can be adjusted by controlling RMTJ
Vth-variation compensation after fabrication is realized
Small overhead
Non-volatility
Compensation state is held without electric supply
MTJ devices can be set above CMOS layer
Vth-variation compensation is realized with small overhead
by using MTJ devices
36
Evaluation in comparator
1.2
Proposed comparator
1.2
Vo
Output [V]
Output [V]
Conventional comparator
1.0
0.8
Vo’
-0.2
Vo
1.0
0.8
0
Shift range of
cross-point:60mV
0.2
VIN - VT [V]
Vo’
-0.2
38%
0
0.2
VIN - VT [V]
23mV
Robustness of the proposed comparator against the Vth variation
Fabricated chip
0.18μm CMOS/MTJ process
(Measurement now on going…)
37
38