PowerPoint Presentation - SiGe HBT BiCMOS Field

Download Report

Transcript PowerPoint Presentation - SiGe HBT BiCMOS Field

SiGe HBT BiCMOS Field
Programmable Gate Arrays for
Fast Reconfigurable Computing
Bryan S. Goda
Rensselaer Polytechnic Institute
Troy, New York
Agenda
• Introduction
• BiCMOS FPGA History
• SiGe HBT BiCMOS Process
• Current Mode Logic
• Xilinx 6200 FPGA Design
• Configuration Memory
• Performance Results
• Conclusions and Future Work
Current Role of SiGe
• “More Zip per Chip”
•
Wireless Phones -> Watch Sized Phone
•
Direct Broadcast Satellite
•
Fiber-Optic Lines, Switches, and Routers
Programmable Bipolar Logic
1983: Fairchild ECL Field Programmable Logic Array
• Fuse Based
• 4ns Cycle Rate
• High Power
• Scaling Problems
1990: Algotronix 1.2uM 256 Cell Configurable Logic Array
• fT 6 GHz, 200ps Gate Delay
• 4 Transistor Static RAM Memory Cells
• ASIC Emulation and Signal Processing
• Forerunner of XC6200
US Patent CMOS Switchable 2 Input Multiplexer
V+
Y1
Y1
Y2
a
a
a
a
Vref
EN1
EN2
V-
Y2
SiGe Heterojunction Bipolar Transistor
• Selectively introduce Ge into the base of a
Si BJT
• Smaller Base Bandgap increases einjection, higher Beta (100)
• Higher Beta allows more heavily doped
base RB (125 Ohm)
• Graded Bandgap decrease base transit
time fT
SiGe HBT
• 50Ghz Process, 100Ghz process within a
year (30uA at 50 Ghz)
• 5 layers of metal
• Used in RPI VLSI Class
• co-integrated with CMOS process
– can have HBT logic with CMOS memory
– low power and high speed
fT Curves for Various Emitter Lengths
Emitter
SiGe HBT Layout
Base
Collector
Sub-Collector
Band Diagram
Eg,Ge(x=0)
Eg,Ge(grade)= Eg,Ge(x=Wb)- Eg,Ge(x=0)
=0.031 ev
e-
EC
EV
n+ Si
emitter
p-SiGe
base
h+
Ge
p-Si
Drift Field
n- Si
collector
Dielectric Constant
Si = 11.7
Ge =16.2
SiGe (7.5% Ge)=12.03
CML Branch Current vs. Differential DC Voltage
IBM SiGe and CMOS Load Gate Delays on M1, M2, LM
Current Steering Logic
Vcc 0 V
Level 1
-250 mV
Fastest Logic Level
Limited Drive Capability
-950 mV
Level 2
-1.2 V
Inter-block Signal Level
Good Fan-Out (10)
-1.90 V
Level 3
-2.15 V
Clock Signal
Slowest Level
Vee 4.5 V
Level 4 Possible
Current Steering Logic In SiGe
• 13ps Transistor Switching Time (75 Ghz)
– 6ps Process Next Year
• Small Voltage Swings (250mv) vs 3.3 or 5 V
– Less Power
– Smaller Swing = Faster
• “Steer” Currents, Use Differential Logic
– Less Switch Noise
• Less Transistors needed, Complement Signal Present
• Flip-Flops and Multiplexers Easy to Implement
Vcc O V
CML
XOR Logic Schematic
A
0
0
1
1
B
0
1
0
1
1
0
A XOR B
A XOR B
0
1
1
0
1
1
Level 1
0 -0.25 V
A
A XOR B
A
0
1
1
A
0
1
B
B
A
level1
Level 2
-0.95 -1.2V
B
level 2
0
0
1
A XOR B
0
0
1
1
1
0
0
1
0
1
1
Vref
0
1
0
Vee -4.5V
General FPGA Structure
Logic
Cell
I/O Cell
Routing Network
Configuration
Memory
High Speed FPGA Applications
• Real Time Image Processing
- Radar
- Pattern Recognition
• Digital Networks
- Mobile Subscriber Equipment
- Command Information Systems
- High Speed Switching Nodes
• Control Systems
- Guidance Systems
- Reprogrammable Survivability
• Satellite Systems
Image Correlation
Search Image
Desired Image
1.
2.
3.
4.
Desired Image is programmed into chip (1 pixel = 1CLB)
Load a section of search image
If enough pixels match, then turn found bit on
Load another section, or reprogram with new desired image
Samples From
XC6200 CAD Tools
IO Blocks
CLBs
Pins
FPGA Drawbacks
• Slowdown
– 200 Mhz Internal Speed down to 30-60 MHz External
– Pass Transistor = Low Pass Filter
• Limited Bandwidth
• Relatively Long Configuration Times (Seconds)
• Vender Guarded Information
• More Expensive than Comparable ASIC
Pass Transistor Interconnect Modeling
3
1
3
M
1
M
M
2
1
2
3
4
On
M
4
2
M
M
4
(Memory)
Interconnect
Pass Transistor
Equivalent Circuit from
Node 3 to Node 2
Field Programmable Gate Arrays (FPGA)
• Hierarchy Level Organization (Sea of Gates)
– Simple
Cells (Configurable Logic Blocks)
– 4x4, 16x16, 64x64 groupings
– Hierarchy of routing resources at each level
– I/O Blocks (external interface)
Design Parameters
• Logic Swings Levels
- Based on Differential Pair Switching
- Current Levels
• Redesign of the Configurable Logic Block
- Take Advantage of Differential Wiring
- What Parts Can be Turned off if not Used?
• Supply Levels
- How Many Levels of Logic?
• Routing Resources
• CMOS Voltage Levels
- Integrate CMOS into Bipolar Current Tree
Current Tree with CMOS Routing
VCC 0 V
OUT
Level 1
0 -0.25V
OUT
a
a
b
b
S1
S1
c
c
S1
Level 2
-0.95 -1.2V
Level 3
-1.9 -2.15V
d
S1
S2
S2
Vref
Replace with
Vee -3.4 V
d
Bipolar vs Bipolar/CMOS Current Trees
CMOS
Bipolar
Pulse Width 50ps
60ps
70ps
100ps
4:1 Multiplexer
Level 1 Inputs
Level 1
Output
Level 1
Output
Level 2
Input
Level 2
Input
Level 3
Input
Level 3
Input
CMOS
Version
W/L 5:1
X1:= a
Sample Logic Using Multiplexers
A and B
X2:= b
Y2
1
0
X3:= a
Y3
If a=1 then select Y2
output = b
If a=0 then select Y3
output = 0
X1:= a
A OR B
X2:= a
Y2
1
0
X3:= b
Y3
If a=1 then select Y2
output = 1
If a=0 then select Y3
output = b
X1:= a
X2:= b
Redesign of XC6200 Logic
Original XC6200 Design
• Have to Track Inversions
Y2
1
Inverted Output
0
X3:= a
Y3
X1:= a
X2:=b
Revised Design
• Use Differential Pair Logic
• Eliminate XC6200 Fast Logic
• No Inversion Tracking
Y2
1
0
X3:= a
Y3
Non-Inverted Output
X2
X1
Y2
1
0
CS
Multiplexer
RP Multiplexer
C
D Q
X3
F
S
Original XC6200
Architecture
Y3
Clk
Q
Clr
X2
X1
Y2
1
0
CS
Multiplexer
RP Multiplexer
C
D Q
X3
F
Redesigned
Architecture
S
Y3
Bipolar with CMOS Routing
Clk
Clr
Q
Switchable
10 Ghz Three CLB Simulation
CLB Layout
4:1 Mux (off switchable)
Master/Slave Latch (off switchable)
CMOS Control
4:1 Mux
2:1 Mux
High Speed Logic
CMOS Control
(off switchable)
Buffer
Sample CLB Test Circuit
Vref 8:1 Mux
CLB
Vref
Buffer 8/1 Divide
Pad Drivers
Actual Fabricated Test Circuit
Pads (110u x 110u)
Outgoing CLB Routing
Incoming CLB Routing
N S E W N4 S4 E4 W4
X3
N
S
E
W
N4
S4
E4
W4
X1
X2
CLB
F
N
S
E
W
N4
S4
E4
W4
4x4 Block Boundary Routing
N Switches
S Switches
Local Routing
Magic Routing
W Switches
E Switches
E Switches
W Switches
N Switches
S Switches
Length 4 FastLane (4x4)
Length 16 Fastlane (16x16)
Chip Length Fastlane (64x64)
Local CLB Routing
Nout
N E W F
N
S
E
W
N4
S4
E4
W4
X1
• Nearest Neighbor Routing
• Output (F) or Local Through
N
S
E
W
N4
S4
E4
W4
X2
CLB
F
S EW F
Sout
Example: Route East Signal Through to Next CLB
Note: Can’t Route Signal Back to Origin at this Level
Eout
X3
N S E F
N S W F
Wout
N S E W N4 S4 E4 W4
Normal CMOS Memory-CML Interface
New Configuration Data
SRAM Bits In Memory Planes
VSS
CMOS to CML Buffer
V SS
V REF
decode
VEE
V EE
CLB
Multiplexer
Inputs
Memory Design
Q
D
Q
D Latch M/S
40 Transistors
CLK
D
Clock
Q
D Latch M/S
Q 18 Transistors
CLK
Data
Data
Word
Out
Out
RAM Cell
6 Transistors
Parallel Load
3-D Chip Stacking
Memory Planes
CLBs
• Shorter
Wires
• More CLBs/Area
• Optimize Memory
CLB
Select
CLB with Routing and RAM (2)
RAM2
CLB
RAM1
MUX
MUX
MUX
MUX Selects
Layout of Configurable Logic Block with 2 sets of RAM
RAM
2:1 Mux
Circuit Elements:
240 nfets
122 pfets
36 resistors
98 npn1 HBTs
16 npnhb1 HBTs
8:1Mux (routing)
CMOS Selects
CLB (logic)
Master/Slave Latch
(memory)
SiGe Performance
Circuit Type
Propagation Delay
Buffer
17ps
CML
MUX
XOR,AND,OR XOR,AND,OR
22-25ps
23-26ps
CLB
100ps
Power Decreasing Ideas
Date
Dec 98
June 99
Aug 99
Dec 99
Mar 00
Dec 00*
Idea
Power Consumption/CLB
Original CLB
73 mW
CLB Redesign I
34 mW
CLB Redesign II
24 mW
Widlar Current Mirror
with CMOS Control,
CMOS Routing
10.8 mW
Supply Voltage 4.5 -> 3.3V
7 mW
7HP Process
0.3 mW
* Projected Power Levels for 7HP Process:
At 50Ghz, 30 uA, 20x+ reduction in power
Multiplexer Performance vs Temperature
-50o C
25o C
125oC
Normal 250 mV Swing
200 mV Min Swing
(b) Current Tree Voltage Turn-Off
Vcc
Voltage (mV)
0
Input
-50
-100
-150
-200
-250
0
1
2
3
4
5
Time (nS)
(c) Current Tree Voltage Turn-On
Vref
Vee
Voltage (mV)
0
-50
-100
-150
-200
-250
0
Widlar Current Mirror
with CMOS Control
0.1
0.2
0.3
Time (nS)
0.4
0.5
0.6
XC6200 Design Improvements
• Developed at the University of Scotland
• Inversion of Signal at Every CLB
- Taken care of due to differential pair wiring
• No Pass Transistors, Use Multiplexers for Routing
• Able to turn off unused parts with CMOS controlled current mirror
• No CMOS-CML Conversion circuits needed, CMOS in current trees
• Handcrafted, dense layouts
• Context Switching
uW/gate/Mhz (log scale)
Power Delay Product
1
5HP
PDP CMOS High
0.1
PDP CMOS Low
PDP BiCMOS
7HP
8HP
0.01
0.001
1998
1999
2000
Year
2001
2002
Data Dependent Switching
Differential Logic has
Complement Switching
In Opposite Direction
A
A
B
B
C
C
Slow Transition
Bit Line Twisting
Could Vary Signals Up to 30%
A
A
B
B
C
C
Fast Transition
Setup Time Violations
Future Work
• Testing
• Overall FPGA Architecture
• Scaling
• Integrate with Other Systems
• Projected Graduation May 2001, work to continue at USMA
• Power Reduction
- 7HP Process
CLB Context Switch Example
Pattern1
0001100100
70ps ~ 7.1 GHz
Pattern2
1011011100
70ps
Select
0001100100
1011011100
0001000100 AND
1011111100 OR
AND
OR
AND
OR
Redesigned CLB Cell with Routing and Memory (2x)
2x24 Bit
RAM
Three 8-1 Input
Mux
CLB
Four 4-1 Output
Mux
M1
M2
M3
M4
CLB Row 4x1
Memory Bus Lines
Circuit Elements
Switch
1520 Nfets
792 Pfets
260 Resistors
140 NPN1 HB
576 NPN1
N/S Input Output
XC6200 Device Family
Device
XC6209
XC6216
XC6236 XC6264
Gate Count
9-13K
16-24K
36-55K
64-100K
Number Cells 2304
4096
9216
16384
I/O Blocks
192
256
384
512
Row x Col
48x48
64x64
96x96
128x128
Typical Routing Delays
Symbol Parameter
XC6200
SiGe Redesign
TNN
Route Nearest Neighbor
1 ns
23 ps
Tmagic
Route X2/X3 to Magic Out
1.5 ns
47 ps
TL4
Length 4 FastLane
TL16
Length 16 FastLane
2 ns
70 ps
TCL64
Chip-Length (64) Delay
3 ns
94 ps
1.5 ns
47 ps
~31x improvement
4x4 CLB Layout Cell
• Largest Basic
Block
• Over 13,000
Transistors
• Commercial
Product Size is
a 4x4 Array
of this Cell
5 Stage Ring Oscillator
Speed
Relative to Schematic
Current
Schematic
6.36 Ghz
--
8.4mA
Parasitics
5.71 Ghz
89%
8.6mA
50oC
5.26 Ghz
82%
8.85 mA
75oC
4.87 Ghz
76%
9.1 mA
100oC
4.16 Ghz
65%
9.34 mA
125oC
3.12 Ghz
49%
9.5 mA