Monitoring and Control of Temperature in Networks-onChip Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk Timmermann; MEMICS 2010, Mikulov, Czech Republic, October 22-24 University of Rostock Institute.
Download
Report
Transcript Monitoring and Control of Temperature in Networks-onChip Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk Timmermann; MEMICS 2010, Mikulov, Czech Republic, October 22-24 University of Rostock Institute.
Monitoring and Control of
Temperature in Networks-onChip
Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk
Timmermann;
MEMICS 2010, Mikulov, Czech Republic, October 22-24
University of Rostock
Institute of Applied Microelectronics and Computer Engineering
Monitoring and Control of
Temperature in NoCs
Outline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
2
Monitoring and Control of
Temperature in NoCs
Transistor count
1. Introduction
1954: IBM 704
Mainframe
1981: IBM
PC5150
2007: Apple
iPhone
Impacts of technological development
Increasing integration density → rising complexity, shrinking device sizes
NoCs able to deal with arising requirements (e.g. for communication)
But: Reliability becomes a dominant factor for chip design
Goal: Increase reliability in NoC-based systems
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
3
Monitoring and Control of
Temperature in NoCs
Outline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
4
Monitoring and Control of
Temperature in NoCs
2. Networks-on-Chip
Properties
R
Infrastructure for on-chip interconnection
Point-to-point links replace long global busses
Parallel packet-based communication
Separation of communication & computation
Globally asynchronous locally synchronous (GALS)
Modularity of IP cores (not part of actual NoC)
reusability, high abstraction level
IP
core
CLK2
IP
core
CLK0
R
IP
core
R
CLK3
R
IP
core
CLK1
NoCs are able to satisfy requirements
of modern VLSI systems
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
5
Monitoring and Control of
Temperature in NoCs
Outline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
6
Monitoring and Control of
Temperature in NoCs
3. Impact of Temperature on Reliability
Impacts of technological progress
Increasing integration densities, progress of nanotechnology
Growing number of transistors per chip = raised probability of failure
decreasing structural size of ICs = higher susceptibility to
environmental influences & deterioration
Intel 8086 (1978):
≈879 transistors/mm²
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
Intel Bloomfield (2008):
≈2,78 Mio. transistors/mm²
7
Monitoring and Control of
Temperature in NoCs
3. Impact of Temperature on Reliability
Why is thermal awareness important?
Particular physical effects (e.g. TDDB, EM) contribute to
deterioration
Abetted by high temperatures
Correlation between temperature & failure mechanisms
established by Arrhenius model
Exponential decrease of IC lifetime with temperature
T fail e
Ea
k b *T
Growing influence of on-chip temperature distribution
on lifetime, operability, performance etc.
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
8
Monitoring and Control of
Temperature in NoCs
Outline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
9
Monitoring and Control of
Temperature in NoCs
4. Monitoring and Control of Temperature for NoCs
Objective:
Mitigate effects contributing to deterioration & delay occurrence of failures
Control of on-chip temperature distribution
Requirements:
Effective mechanisms to monitor & control on-chip temperature
Integration into existing NoC
Preservation of modularity & reusability
Minimum costs (area, frequency)
Maximum performance of monitoring and control
Minimum impact on system performance
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
10
Monitoring and Control of
Temperature in NoCs
4.1 Mechanisms for monitoring
Concept: attach physical monitoring probes to
every IP core
Event-driven:
temperature variation ∆T
Continuous checking of TIPC
|TIPC,old - TIPC,new| ≥ ∆T ?
Report TIPC,new
Area: 66 LUT/FF pairs
Frequency: 227 MHz
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
Time-driven:
Period of time ∆t
Report TIPC,new every ∆t
Area: 80 LUT/FF pairs
Frequency: 338 MHz
11
Monitoring and Control of
Temperature in NoCs
4.2 Mechanisms for control
Central Control Unit (CCU):
Reception & interpretation of probe packets
Instructions for Dynamic Frequency Scaling
to probes (if necessary)
Area: 507 LUT/FF pairs
Frequency: 165 MHz
R
IP
core
R
IP
core
R
CCU
R
IP
core
!!! Not the smartest approach, but suffices to test functionality !!!
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
12
Monitoring and Control of
Temperature in NoCs
4.3 Integration of monitoring
3 approaches
Different impact on performance & costs
Into IP core:
Area penalty: /
Freq. penalty: /
Router port of IP core:
Area penalty: 7,3%
Freq. penalty: / (but
Mux/Demux)
Extra router port:
Area penalty: 30,5%
Freq. penalty: 8,2%
R
R
R
P
P
IP core
IP core
IP core
P
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
13
Monitoring and Control of
Temperature in NoCs
4.4 Impact on system performance
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
14
Monitoring and Control of
Temperature in NoCs
4.5 Performance of monitoring & control
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
15
Monitoring and Control of
Temperature in NoCs
5. Summary
Implementation of 2 approaches for monitoring on-chip
temperature + 3 methods for integration into NoC
Investigation of:
Costs (area, frequency)
Impact on system performance
Performance of monitoring & control
Conclusion
Event-driven approach preferable (situational monitoring, better
performance, no redundant traffic, lower area costs)
Integration into NoC using router port of IP core best trade-off between
costs & preservation of modularity/non-intrusiveness
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
16
Thanks for your attention!
Any questions?
Contact: [email protected]
Homepage: www.networks-on-chip.com
University of Rostock, Germany
Institute of Applied Microelectronics and Computer Engineering
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of
Temperature in NoCs
Arrhenius Model
Establishes relationship between temperature and failure mechanisms
Describes dependence of chemical reactions on temperature changes
Assumption: all other parameters constant
Tfail
Ea
T fail e
kb *T
Temperature
Lifetime of ICs decreases exponentially with temperature
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
18
Monitoring and Control of
Temperature in NoCs
Time Dependent Dielectric Breakdown (TDDB)
Formation
of charge
traps
Conducting
path
through
gate oxide
Current
flow
More
charge
traps
!!! HEAT !!!
Inoperability of transistor through gate oxide breakdown
(long-term)
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
19
Monitoring and Control of
Temperature in NoCs
Electromigration (EM)
Transport of material in conductors (i.e. wires)
Cause: ion movement induced by current flow (ions’ mobility
increases with temperature)
Effects:
• Hillocks short circuits
• Voids interruption of current paths
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
20
Monitoring and Control of
Temperature in NoCs
Intel Processors
Intel Bloomfield:
• Year: 2008
• 731 Mio. Transistors
• 263mm²
• 2779467 Tr./mm2
Intel 8086:
• Year: 1978
• 29k transistors
• 33mm²
• 879 Tr./mm²
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
21
Monitoring and Control of
Temperature in NoCs
Impact on system performance
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
22
Monitoring and Control of
Temperature in NoCs
Performance of monitoring & control
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
23
Monitoring and Control of
Temperature in NoCs
Synthesis results for monitoring & control
Component
Integration method
Eventdriven
probe
Timedriven
probe
Central
Control
Unit
Into IP
core
Using IP
core port
Extra
port
Frequency
[MHz]
227
338
165
122
119
112
Area
[LUT/FF
pairs]
66
80
507
1901
1896
2312
Unmodified NoC router: 1771 LUT/FF pairs, 122 MHz
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
24