Transcript Slide 1

Wireless Sensor
Networks
Low Power Design
Outline
Introduction – Importance of Low
Power Design
Power and Energy
Low Power at various levels of circuit
design:
System and Architecture Level
Register Transfer and Logic Level
Physical Level
Conclusion
Importance of Low Power Design
Power is considered as the most important
constraint in embedded systems
Low power design is essential in:
high-performance systems (reason: excessive
power dissipation reduces reliability and
increases the cost imposed by cooling systems
and packaging)
portable systems (reason: battery technology
cannot keep the pace with large demands for
devices with light batteries and long time
between recharges)
Sources of Power Consumption
The three major sources of power consumption
in digital CMOS circuits are:
Pavg  p t  C L  V dd  f clk   I sc  V dd  I leakage  V dd  P1  P2  P3
2
where:
P1 – capacitive switching power
P2 – short circuit power
P3 – leakage current power
Trends in Power Management
Reducing power is now a mainstream design issue
Power and Energy
Power and Energy are related (E=∫Pdt)
Minimizing the power consumption is important for
the design of the power supply
the design of voltage regulators
the dimensioning of interconnect
short term cooling
Minimizing the energy consumption is important due to
restricted availability of energy (mobile systems)
limited battery capacities (only slowly improving)
very high costs of energy (solar panels, in space)
cooling
high costs
limited space
long lifetimes, low temperatures
Low Power at various levels of circuit design
higher impact
more options
System
Level
Design partitioning, Power Down
Algorithm
Level
Complexity, Concurrency, Locality,
Regularity, Data representation
Architecture
Level
Voltage scaling, Parallelism,
Instruction set, Signal correlations
Circuit
Level
Transistor sizing, Logic optimization,
Activity Driven Power Down, Lowswing logic, Adiabatic switching
Process Device
Level
Threshold Reduction, Multithreshold
The design of low power circuits can be tackled at different levels, from
system to technology
Potential for Power Savings
Power and Synthesis Flow
400%
50%
20%
10%
Behavioral
RTL
Gate
Switch
Accuracy of Power Estimation
Expectations
Algorithmic
Algorithm selection
orders of magnitude
Behavioral
Concurrency
Memory
several times
Power manage
Clock ctrl
10-90%
RT Level
Structural transform.
10-15%
Tech. indepen.
Extraction/decomp.
15%
Tech dep.
Tech. mapping
Gate sizing
20%
20%
Layout
Placement
20%
System and Architecture Level
Given a certain application, there are
several possibilities for low power
optimizations of the system:
Selection of an optimum algorithm with
respect to the cost function the design
Partitioning into building blocks
Voltage/Frequency scaling
Dynamic power management
Minimize waste and overhead (indirectly) –
increase regularity, locality
System and Architecture Level
Instruction set
selection
Mult-Add
vs
Mult, Add
Module
selection
Ripple Adder
vs
Carry Select
Hardware
Library
Memory Management
Global Flow
Selection
Allocation
How Many?
2 MULTs
(M1, M2)
2 ADDERs
(A1,A2)
Memory
Assignment
Memory
Selection
Assignment
Which HW?
A1
+
A1
+
Scheduling
When?
Exu
D
+
A1
M1
M1
A2
D
M2
M2
+
Time
Input
Algorithm
Output
Architecture
System and Architecture Level
Algorithm selection and optimization
The first choice in design flow is usually the selection of an
optimum algorithm with respect to the cost function
The term cost depends on the application and typically includes
the number of operations, memory accesses and the memory size
that is required by this algorithm
Power reduction is achieved with:
Scheduling of operations
Adaptive implementations of certain algorithms
System and Architecture Level
Optimizations for Memory Accesses
A paradigm for energy efficient software:
Avoid using of memory operands as far as possible
Improve register utilization
Example of heapsort program [Jan M. Rabaey ‘97]:
Handtuning for performance:
15% reduction in time, 13.5 reduction in energy
Register allocation of temporaries:
5% reduction in current, 7% reduction in time, 11.4% reduction in
energy
Further optimization
Further 22.4% reduction
Total: 40.6% reduction in energy cost
System and Architecture Level
Design partitioning
Optimum partitioning of the design will result in
orders of magnitude power reduction
Examples of partitioning for low power:
Partitioning the design in such a way as to confine the
operations involving maximum switching activity to a single
block
Partitioning the memory and distributing it to different blocks
instead of centralized memory
Hardware/Software partitioning
Optimum partition of a design into analog and digital sections
System and Architecture Level
Design partitioning – Interconnections
Interconnect power is important
Interconnect may contribute large percentage
to total power dissipation and to total reduction
Interconnect power is greatly affected by
architecture level design decisions
System and Architecture Level
Design partitioning
Spatially Global
A ll c o m m u n ic a tio n s
u s e lo n g g lo b a l b u s e s
Spatially Local
F e w g lo b a l
bus accesses
C h e a p lo c a lize d
c o m m u n ic a tio n
Reduced # of global bus accesses
Reduced buffer power
Reduced # of multiplexers
System and Architecture Level
Design partitioning –
spectral partitioning
8 th -o rd e r IIR
c a s c a d e filte r
-0 .2
-0 .1
0 .0
0 .1
0 .2
Spectral Partitioning places computational nodes on 1-D axis
based on “closeness” — identifies candidates for clustering
Partitioning may lead to extra hardware units. This does not
necessarily mean an increase in area!
System and Architecture Level
DP
c tl
DP
c tl
DP
DP
DP
c tl
DP
c tl
Units
Global buses
Bus power
Total Power
Area
c tl
DP
DP
DP
DP
c tl
c tl
c tl
Non-local
Local
4 add, 3 shift 4 add, 4 shift
106 accesses 6 accesses
2 mW
0.3 mW
21.3 mW
16.3 mW
8.78 mW
7.46 mW
Average: Power reduction: 18.5 % Area Reduction: 1%
G lo b al b u s: 24 0 0 
G lobal bus: 4700 
Design partitioning Result
System and Architecture Level
Exploiting Regularity
C oar se- gr aine d r egu lar ity
Fin e-g rai ned re gul arit y
+
+
*
-
*
>>
*
=
U sua lly ev iden t to us er
• Loo ps
• Su br ou tine s
N ot o bv iou s to use r
• Sim ilar co de
fr ag m en ts
Regular implementations typically reduce interconnect
and/or controller requirements [Mehru96]
System and Architecture Level
Common Design Approaches
Desired
Throughput
1) Compute-intensive and
short-latency processes
Max. processor
speed
(TMAX)
Processor Usage Model
Time
3) System
idle
2) Background and
long-latency processes
In order to reduce power following design approaches can be used:
Compute ASAP
Clock Frequency Reduction
Voltage Scaling
System and Architecture Level
Compute ASAP
Delivered Throughput, Energy/Operation
In this approach the processor always performs the desired
computation at maximum throughput
This is the simplest approach
Delivered throughput
Energy/Operation
Desired throughput
Time
System and Architecture Level
Clock Frequency Reduction
Delivered Throughput, Energy/Operation
A common low power design technique is to reduce the clock frequency, fclk
This in turn reduces the throughput, and power dissipation, by proportional amount
The energy consumption remains unchanged
This approach is more energy inefficient, because the processor delivers the same
amount of computation per battery life, but at lower level of peak throughput
Delivered throughput
Energy/Operation
Desired throughput
Time
System and Architecture Level
Voltage Scaling
Delivered Throughput, Energy/Operation
When fclk is reduced the processor’s circuits have a longer cycle time to
complete their computation
With voltage scaling down, i.e. reducing Vdd, the delay of the circuits
increase
But, the energy/operation, which is quadratic function of Vdd, decreases
Delivered throughput
Energy/Operation
Desired throughput
Time
System and Architecture Level
Voltage Scaling
Minimizing the delay penalty due to voltage scaling
Architecture-level
speedup (pipelining, concurrency), then downscale supply
voltage, or
match supply voltage with throughput requirement
multiple supply voltages in the same design
one supply voltage for each block
Circuit-level
lowering threshold voltage
heavily process-dependent
System and Architecture Level
Dynamic Power Management
Dynamic power management is a design methodology that
dynamically reconfigures an electronic system to provide the
requested services and performance levels with a minimum number
of active components or a minimum load on such components
Power Manager
P=400mW
Workload
information
OBSERVER
RUN
CONTROLLER
Observations
Commands
~10s
~90s
~10s
160ms
P=50mW
P=0.16mW
IDLE
SLEEP
~90s
SYSTEM
Wait for interrupt
Wait for wake-up event
Power State Machine
Power Manager
Register Transfer and Logic Level
Low-power techniques at RTL and Logic Level
can be subdivided into:
techniques for lowering the capacitance and
the switched voltage
minimizing global communication
logic optimization by synthesis tools (area, speed)
techniques to reduce the toggle rate of nodes
with a high relative capacitance
guarding techniques
pipelining
reorganization of logic gates and operators
Register Transfer and Logic Level
Reducing switching activity
Guarding technique (clock gating)
Clock gating means to shut down the
clocking for a certain group of registers
under a certain guard condition
advantages: they are implemented with
minor overhead in area and design effort
disadvantages: testability
Register Transfer and Logic Level
Examples of guarding technique
An
A
B
Latch
Latch
L_A
L_B
R1
Bn
N-bits binary
Comparator
Adder
Y=A>B
A1.....An-1
R2
1
B1…..Bn-1
Datain
Ctrl
0
Sel
Register Transfer and Logic Level
Reducing switching activity
Pipelining
reduces critical path (enables savings
due to voltage scaling, or slower but
energy-efficient algorithms)
reduces glitches
disadvantages: area overhead (with an
implicit increase of capacitances and
increase in clock power)
Register Transfer and Logic Level
Reducing switching activity
Reorganization of logic gates and operators
manual (reorganization of logic cells and
reordering inputs)
automatic (performed by synthesis tools):
combinatorial
don’t care optimization
path balancing
factorization
sequential
state encoding
retiming
Register Transfer and Logic Level
Reducing switching activity – examples of
reorganization
A
+
B
+
+
C
D
Flattening
A
+
+
B
+
A
C
D
Factoring
Idea: Remove common expressions to reduce capacitance
Pa = 0.1
Pb = 0.5
Pc = 0.5
Caveat: This may increase activity!
Don’t Care Optimization
Example: a  b  c
Activity is maximized for P(1) = 0.5!
Sequential logic optimization
State encoding
seems to be of minimal impact in general
Data encoding in data paths
e.g. use of sign-magnitude , one-hot, or
redundant representations
mostly ad hoc
Retiming for low power
registers can be strategically placed to reduce
glitching, or to perform path balancing
Physical Level
On this level of abstraction the number of
manually guided optimizations is quite limited
The place and route tools automatically
minimize the wire length (and wire
capacitances) according to the time
constraints
This doesn’t represent the optimum
concerning power consumption
There are some design tasks which can
nevertheless be exploited to save power:
partitioning (taking into account the interconnections between
the layout blocks)
back-annotating of layout capacitances together with
switching activity information from gate level simulation to the
synthesis tool (enables reoptimization of logic for low-power)
Conclusion
Power is a distributed problem – spans all
designs disciplines: standards (GSM, OS),
software, digital and analog hardware, process
Power related design decisions must be
weighed against all of the system constraints:
size cost, performance, testability, time to market
… to develop a successful system
Low power design techniques have to be
implemented at different levels of system design
in order to achieve the best results