Sunil’s presentation - Texas A&M University
Download
Report
Transcript Sunil’s presentation - Texas A&M University
Design and
Impementation of a Subthreshold BFSK
Transmitter
By:
Suganth Paul#
Rajesh Garg$
Sunil P. Khatri$
Sheila Vaidya%
#Intel
Corporation, Austin, TX
$Department of ECE, Texas A&M University, College Station, TX
%Lawrence Livermore National Lab., Livermore, CA
1
Outline
Sub-threshold circuits – the opportunity
Challenges
Process/temperature/voltage variations
Solution – dynamic body bias
Validation via test chip
Design methodology
Silicon results
Conclusions
2
The Opportunity
Power consumption has become a major issue for recent ICs
There is a large and growing class of applications where
power reduction is paramount – not speed.
Such applications are ideal candidates for sub-threshold circuit design
Traditional Ckt
Sub-threshold Ckt (Vb = 0V) Sub-threshold Ckt (Vb = VDD)
Process Delay(ps) Power(W) P-D-P(J) Delay Power P-D-P Delay Power P-D-P
bsim70
14.157 4.08E-05 5.82E-07 17.01X 308.82X 18.50X
9.93X 141.10X 14.43X
bsim100 17.118 6.39E-05 1.08E-06 24.60X 497.54X 20.08X
12.00X 100.96X
8.20X
Compared traditional circuit with sub-threshold (obtained by
simply setting VDD < VT)
Performed simulations for 2 different processes on a 21 stage
ring oscillator.
Impressive power reduction (100X – 500X)
Power-Delay-Product (P-D-P) improves by as much as 20X
P-D-P is an important metric to compare circuit design styles
3
Sub-threshold Logic
Ids has an exponential dependence on process,
voltage and temperature (PVT)
I dssub I o
W
e
L
Vgs VT Voff
nvt
V
ds
v
1 e t
Need to stabilize the circuit performance by
compensating for PVT variations
No approach to compensate sub-threshold delay
Existing approaches compensate sub-threshold currents
To compensate delay, need a representative
circuit
Not easy to come up with representative circuit for
standard cells
4
Our Solution
We propose a technique that uses self-adjusting body-bias
to phase-lock the circuit delay to a beat clock.
Use a network of PLAs to implement circuits.
Several PLAs in a cluster share a common nbulk node.
A representative PLA in each cluster is chosen to phase
lock the delay of the PLAs to the beat clock
If the delay is too high, a forward body bias is applied to
speed up the representative PLA.
If the delay is low, body bias is brought back down to zero
to slow down the representative PLA.
All other PLAs exhibit the same delay as the representative
PLA, since they all share a common nbulk terminal
5
Objective
Validate and verify flow by designing a sub-threshold
circuit for the application
Choose a test application
Low power, low speed
Develop a sub-threshold circuit design flow
Implement our delay compensation scheme to negate
PVT variations
Implement the same application using a standard cell based
flow on the same die
Fabricate and test the chip (TSMC 0.25 um process)
Compare the sub-threshold circuit with the standard
cell circuit in terms of power consumption
6
Test Application - Binary Frequency
Shift Keying (BFSK) Transmitter
Binary
Input
Data
Digital BFSK Modulator
Produces two tones
f1 if Input is LOW
f2 if Input is HIGH
Digital Block Implemented Using
Sub-threshold Circuits
DAC
Amplifier
Antenna
Specifications
Input bit Rate: RB = 32kbps, Broadcast distance: D = 1000m
FSK tones: f1=150kHz, f2=450kHz, Channel bandwidth: B = 300kHz
7
Sub-threshold Design
Approach
Digital part of the circuit implemented as NPLA (Network of
Programmable Logic Arrays)
NPLAs have low delay
Critical path delay easy to find
PLAs have common nbulk node
Circuit level PVT compensation
An external Beat Clock (BCLK) signal is phase locked
with the critical path delay
Delay controlled by a charge pump that modulates the
bulk voltage of transistors in the circuit
Compensates for both inter- and intra-die variations
8
Dynamic NOR-NOR PLA
Inputs
Outputs
clk
clk
clk
Precharge
Evaluate
completion
We use precharged
NOR-NOR PLAs as the
structure of choice
Wordlines run
horizontally
Inputs / their
complements and outputs
run vertically
Each PLA has a
“completion” signal that
switches low after all the
outputs switch
Several PLAs in a cluster
share a common nbulk
node.
9
Network of PLAs (NPLA)
Inputs
Combinational Logic
Implemented as NPLA
Timing Diagram
L1
PLA
L2
PLA
L3
PLA
L4
PLA
Outputs
L2
PLA
Throughput = Tpchg +n.Teval
L1 PLA
L2 PLA
L3 PLA
L4 PLA
clk
10
The Charge Pump
- PLA “completion” signal lags beat
clock
- nbulk node gets forward biased
pullup
pulldown
- PLA “completion” signal leads beat
clock
- nbulk goes back to zero bias
11
Effectiveness of the
Approach
We simulated a single
PLA from 0ºC to
100ºC. Also applied
VT variations (10%)
and VDD variations
(10%).
The light region shows
the variations on delay
over all the corners
without delay
compensation.
The red region shows
the delays with the
self-adjusting bodybias circuit.
12
Design Flow
BFSK
Design
HDL
Synthesis
Logic Verification
RC Extraction
LVS
Full Chip
Spice
Verification
Layout
Map
to
NPLA
Design
Of Analog
Components
Spice Verification:
Functional,
timing, charge pump
Integrated
Spice Netlist
13
BFSK Design
Phase Accumulator
Phase
Increment
Mux
DFF
DFF
9
Sine Lookup
8
Table
Depth:
fout = fclk
9
2 = 512
Binary
Clk
Clk
Input
fout < fclk/2, Nyquist criterion, implies < 256.
Phase increments chosen based on fclk or left
programmable in real time to get Software Defined Radio
(SDR) operation.
We fix phase increments to avoid extra input pins required
for SDR
512
14
Design Flow
BFSK
Design
HDL
Synthesis
Logic Verification
RC Extraction
LVS
Full Chip
Spice
Verification
Layout
Map
to
NPLA
Design
Of Analog
Components
Spice Verification:
Functional,
timing, charge pump
Integrated
Spice Netlist
15
Basic BFSK transmitter
Block Diagram
Binary
Input
Data
Digital BFSK Modulator
Produces two tones
f1 if Input is LOW
f2 if Input is HIGH
DAC
Digital Block Implemented Using
NPLA based Sub-threshold Circuits
Amplifier
Antenna
16
System Architecture
CLK
Ref. PLA completion
BEAT CLK
Phase
Detector
Charge
Pump
Common Bulkn
Input
Phase
Accum
Digital BFSK
Modulator
9
DFF
NCO
8
DFF
Binary to
Thermometer
Encoder
CLK
19
Digital BFSK
using NPLA
Antenna
Amplifier
DAC
4 LSBs - Binary
15 MSBs - Thermometer
Avoids glitches in DAC o/p
17
Delay Compensated Subthreshold Design block diagram
NPLA
DFFs
L1
PLA
L2
PLA
L3
PLA
L4
PLA
DFFs
L2
PLA
L1
PLA
L2
PLA
L2
PLA
Clk
Clk
Completion of
Reference PLA
Phase
Detector
Beat Clk
Charge
Pump
Common nbulk
node of a cluster
of PLAs, modulated
by charge pump
18
HDL to Schematic of Digital BFSK
Digital BFSK transmitter described using VHDL
VHDL synthesized using FPGA synthesis tool, to get a
gate level netlist
This is imported into SIS in “blif” format
The “blif” file is logically optimized and mapped into
NPLA
Technology Independent Optimization done on
circuit
Circuit converted to a mult-level network of nodes
with 5 or less inputs per node
Circuit traversed from inputs to outputs, and nodes
are implemented using PLAs of size (8/6/12)
Using NPLA throughput equation, fclk estimated as
1.2MHz
We choose f1≈0.115* fclk and f2 = 0.345* fclk
19
Design Flow
BFSK
Design
HDL
Synthesis
Logic Verification
RC Extraction
LVS
Full Chip
Spice
Verification
Layout
Map
to
NPLA
Design
Of Analog
Components
Spice Verification:
Functional,
timing, charge pump
Integrated
Spice Netlist
20
System Architecture
CLK
Ref. PLA completion
BEAT CLK
Phase
Detector
Charge
Pump
Common Bulkn
Input
Phase
Accum
Digital BFSK
Modulator
9
DFF
NCO
8
DFF
Binary to
Thermometer
Encoder
CLK
19
Antenna
Amplifier
DAC
21
Thermometer Coded 8-BIT
DAC
Digital
BFSK
Output
4
Binary to
Thermometer
Code
Conversion
15
DAC
4 LSBs
Binary
Therm
00
000
01
001
10
011
11
111
Adjacent Values
Differ by 1-bit
22
8-BIT DAC Schematic
Currents flow
through mirror legs
based on input value
Output current /
voltage modulated
based by sum of
weighted currents
through Rout
Thermometer codes
prevent glitches at
output
DAC supply is 0.7V to
handle 0.6V digital
signals
Rout, Rcm are off-chip
resistances
W1
CM leg
T4 - T18
B3
B2
B1
B0
Device size
16W1
8W1
4W1
2W1
W1
23
Amplifier Schematic
Common Source
Amplifer
Supply of 0.7V
Rd, Rs are off-chip
resistances
M1 biased by DAC Rout
resistor
CL on-chip antenna load
80pF
24
Testability Features added before
Integration
CLK
Ref. PLA completion
BEAT CLK
Input
Phase
Accum
9
Phase
Detector
NCO
DFF
Charge
Pump
8
DFF
Charge
Pump
Supply
Bulkn
Common
Bulkn
Binary to
Thermometer
Encoder
CLK
19
Antenna
Amplifier
DAC
CHIP
Amp Ouput
DAC Ouput
8-BIT BFSK Output or
8-BIT DAC Input
25
Layout
Manual PLA layout for every PLA in design
NPLA routed using SEDSM
I/O pad cells, ESD diodes layout done manually
DAC, amplifier layout done manually
Antenna coil layout done manually
26
PLA Layout
Input, Bit Line
Word, Lines
Transistors, modified
based on logic to
be implemented
Output, Lines
27
I/O PAD CELL Layout
Fully Compliant with
TSMC Design rules
ESD Diodes have
guard rings to prevent
latchup
I/O PAD
I/O
Drivers
Primary ESD Diodes
Secondary ESD Diodes
28
Die Photo
Digital BFSK inputs domain, 0.7V
Digital BFSK domain, 0.6V
Digital BFSK output domain, 2V
Std Cell domain, 2.5V
29
Experimental Results from
Silicon
Output of BFSK transistor is
shown
As input changes from 0 to 1,
the output frequency changes
showing the modulation
Fclk = 1MHz
F1 = 117kHz
F2 = 347kHz
The adjacent peaks are around -10dB
below the fundamental peaks
We found from Matlab Simulations that,
signals from the extracted Spice netlist,
could be demodulated at the receiver
side
30
Results from Silicon
Operating Range
Nbulk kept at 0V, 0.45V
Maximum frequency shows an quadratic dependence on
supply Voltage
31
Power Comparison
Design Style
Operating
Voltage
Frequency of
Operation
Avg
Current
Power
Dissipated
Sub-threshold
0.6V
1.05MHz
A
26.8W
Std Cell
2.5V
1.05MHz
208A
520W
Sub-threshold power calculated only for Phase
Accumulator, and NCO blocks on 0.6V power supply,
Std Cell implements only this portion of BFSK circuit
Sub-threshold gives 19.4X lesser power
32
Bulkn Node Modulation
Bulk node modulates
when beat clock demands
speedup or slow-down
Bulk node modulates as
supply voltage is changed,
so that circuit delay is
maintained constant.
33
Conclusion
Validated a sub-threshold circuit design methodology
based on dynamic body bias (first-of-kind)
Validated design tools and techniques
First-of-kind design automation flow, will help bring subthreshold design to mainstream.
We implemented an ultra low power, low data rate
wireless BFSK transmitter
The fabricated chip, works as expected, validating our
design flow.
We compared the sub-threshold design a with Std Cell
based design and showed 19.4X reduction in power.
34
Thank you!!
35
Backup Slides
36
Introduction
Power consumption has become a significant
hurdle for recent ICs
Higher power consumption leads to
Shorter battery life
Higher on-chip temperatures – reduced operating
life of the chip
There is a large and growing class of applications
where power reduction is paramount – not speed.
Such applications are ideal candidates for subthreshold circuit design
For sub-threshold circuits, VDD ≤ VT
37
TX/RX System Testing
TX PCB with subthreshold IC
TX antennas
RX board
38
RX setup
Solving the Problem of
Delay Sensitivity to
Process, Voltage and
Temperature Variations
"A Variation-tolerant Sub-threshold Design Approach",
Jayakumar, Khatri. Design Automation Conference (DAC)
2005 Anaheim, CA , June 13-17.
39
An Example Showing Phase
Locking
VDD change
0.2V to 0.22V
VDD change
0.22V to 0.18V
This figure shows how
the body bias (and
hence the delay of the
PLA) changes with
changes in VDD.
The adjustment is very
quick (within a few
clock cycles).
40
Energy and Speed
We may be interested in the minimum energy operating
point for the design
Minimizing VDD reduces power but minimum VDD does not
mean minimum energy
The optimum VDD value increases with increased logical depth,
and with temperature
"Minimum Energy Near-threshold Network of PLA based Design", Jayakumar, Khatri.
International Conference on Computer Design (ICCD) 2005, Oct 2-5, San Jose, CA.
Reclaiming the speed penalty
Can be done for datapath circuits, using asynchronous
micropipelining
Showed that speedup of 7X is possible, with a area overhead
of 44%
"A PLA based Asynchronous Micropipelining Approach for Subthreshold Circuit
Design", Jayakumar, Garg, Gamache, Khatri. IEEE/ACM Design Automation
Conference (DAC) 2006, July 24-28, San Francisco, CA.
41
On-chip Antenna
Antenna size needs to be at least a 10th of the transmit
wavelength to radiate effectively
Transmit wavelength around 600m
Due to on-chip space constraints, antenna coil length is
only 0.2m
We have the option of using an external antenna
And we had a 60dB safety margin in the link budget
analysis.
This could compensate for a lossy antenna
42
Spectrum of Amplifier Tones
Fclk = 1MHz
F1 = 117kHz
F2 = 347kHz
The adjacent peaks are
around -10dB below the
fundamental peaks
We found from Matlab
Simulations that, signals
from the extracted Spice
netlist, could be
demodulated at the
receiver side
43