Monitoring and Control of Temperature in Networks-onChip Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk Timmermann; MEMICS 2010, Mikulov, Czech Republic, October 22-24 University of Rostock Institute.

Download Report

Transcript Monitoring and Control of Temperature in Networks-onChip Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk Timmermann; MEMICS 2010, Mikulov, Czech Republic, October 22-24 University of Rostock Institute.

Monitoring and Control of
Temperature in Networks-onChip
Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk
Timmermann;
MEMICS 2010, Mikulov, Czech Republic, October 22-24
University of Rostock
Institute of Applied Microelectronics and Computer Engineering
Monitoring and Control of
Temperature in NoCs
Outline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
2
Monitoring and Control of
Temperature in NoCs
Transistor count
1. Introduction
1954: IBM 704
Mainframe
1981: IBM
PC5150
2007: Apple
iPhone
Impacts of technological development
 Increasing integration density → rising complexity, shrinking device sizes
 NoCs able to deal with arising requirements (e.g. for communication)
 But: Reliability becomes a dominant factor for chip design
 Goal: Increase reliability in NoC-based systems
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
3
Monitoring and Control of
Temperature in NoCs
Outline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
4
Monitoring and Control of
Temperature in NoCs
2. Networks-on-Chip
Properties
R
Infrastructure for on-chip interconnection
Point-to-point links replace long global busses
Parallel packet-based communication
Separation of communication & computation
Globally asynchronous locally synchronous (GALS)
Modularity of IP cores (not part of actual NoC)
 reusability, high abstraction level






IP
core
CLK2
IP
core
CLK0
R
IP
core
R
CLK3
R
IP
core
CLK1
NoCs are able to satisfy requirements
of modern VLSI systems
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
5
Monitoring and Control of
Temperature in NoCs
Outline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
6
Monitoring and Control of
Temperature in NoCs
3. Impact of Temperature on Reliability
Impacts of technological progress
 Increasing integration densities, progress of nanotechnology
 Growing number of transistors per chip = raised probability of failure
 decreasing structural size of ICs = higher susceptibility to
environmental influences & deterioration
Intel 8086 (1978):
≈879 transistors/mm²
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
Intel Bloomfield (2008):
≈2,78 Mio. transistors/mm²
7
Monitoring and Control of
Temperature in NoCs
3. Impact of Temperature on Reliability
Why is thermal awareness important?
 Particular physical effects (e.g. TDDB, EM) contribute to
deterioration
 Abetted by high temperatures
 Correlation between temperature & failure mechanisms
established by Arrhenius model
 Exponential decrease of IC lifetime with temperature
T fail  e
Ea
k b *T
Growing influence of on-chip temperature distribution
on lifetime, operability, performance etc.
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
8
Monitoring and Control of
Temperature in NoCs
Outline
1. Introduction
2. Networks-on-Chip (NoCs)
3. Impact of Temperature on Reliability
4. Monitoring & Control of Temperature in NoCs
5. Summary
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
9
Monitoring and Control of
Temperature in NoCs
4. Monitoring and Control of Temperature for NoCs
Objective:

Mitigate effects contributing to deterioration & delay occurrence of failures
 Control of on-chip temperature distribution
Requirements:
 Effective mechanisms to monitor & control on-chip temperature
 Integration into existing NoC
 Preservation of modularity & reusability
 Minimum costs (area, frequency)
 Maximum performance of monitoring and control
 Minimum impact on system performance
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
10
Monitoring and Control of
Temperature in NoCs
4.1 Mechanisms for monitoring
Concept: attach physical monitoring probes to
every IP core

Event-driven:

temperature variation ∆T
Continuous checking of TIPC
|TIPC,old - TIPC,new| ≥ ∆T ?
Report TIPC,new


Area: 66 LUT/FF pairs
Frequency: 227 MHz



Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
Time-driven:

Period of time ∆t
Report TIPC,new every ∆t


Area: 80 LUT/FF pairs
Frequency: 338 MHz

11
Monitoring and Control of
Temperature in NoCs
4.2 Mechanisms for control
Central Control Unit (CCU):




Reception & interpretation of probe packets
Instructions for Dynamic Frequency Scaling
to probes (if necessary)
Area: 507 LUT/FF pairs
Frequency: 165 MHz
R
IP
core
R
IP
core
R
CCU
R
IP
core
!!! Not the smartest approach, but suffices to test functionality !!!
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
12
Monitoring and Control of
Temperature in NoCs
4.3 Integration of monitoring
3 approaches
 Different impact on performance & costs

Into IP core:


Area penalty: /
Freq. penalty: /
Router port of IP core:


Area penalty: 7,3%
Freq. penalty: / (but
Mux/Demux)
Extra router port:


Area penalty: 30,5%
Freq. penalty: 8,2%
R
R
R
P
P
IP core
IP core
IP core
P
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
13
Monitoring and Control of
Temperature in NoCs
4.4 Impact on system performance
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
14
Monitoring and Control of
Temperature in NoCs
4.5 Performance of monitoring & control
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
15
Monitoring and Control of
Temperature in NoCs
5. Summary
Implementation of 2 approaches for monitoring on-chip
temperature + 3 methods for integration into NoC
 Investigation of:
 Costs (area, frequency)
 Impact on system performance
 Performance of monitoring & control

Conclusion
 Event-driven approach preferable (situational monitoring, better
performance, no redundant traffic, lower area costs)
 Integration into NoC using router port of IP core best trade-off between
costs & preservation of modularity/non-intrusiveness
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
16
Thanks for your attention!
Any questions?
Contact: [email protected]
Homepage: www.networks-on-chip.com
University of Rostock, Germany
Institute of Applied Microelectronics and Computer Engineering
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
Monitoring and Control of
Temperature in NoCs
Arrhenius Model


Establishes relationship between temperature and failure mechanisms
Describes dependence of chemical reactions on temperature changes
Assumption: all other parameters constant
Tfail

Ea
T fail e
kb *T
Temperature
Lifetime of ICs decreases exponentially with temperature
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
18
Monitoring and Control of
Temperature in NoCs
Time Dependent Dielectric Breakdown (TDDB)
Formation
of charge
traps
Conducting
path
through
gate oxide
Current
flow
More
charge
traps
!!! HEAT !!!
 Inoperability of transistor through gate oxide breakdown
(long-term)
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
19
Monitoring and Control of
Temperature in NoCs
Electromigration (EM)



Transport of material in conductors (i.e. wires)
Cause: ion movement induced by current flow (ions’ mobility
increases with temperature)
Effects:
• Hillocks  short circuits
• Voids  interruption of current paths
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
20
Monitoring and Control of
Temperature in NoCs
Intel Processors

Intel Bloomfield:
• Year: 2008
• 731 Mio. Transistors
• 263mm²
• 2779467 Tr./mm2

Intel 8086:
• Year: 1978
• 29k transistors
• 33mm²
• 879 Tr./mm²
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
21
Monitoring and Control of
Temperature in NoCs
Impact on system performance
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
22
Monitoring and Control of
Temperature in NoCs
Performance of monitoring & control
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
23
Monitoring and Control of
Temperature in NoCs
Synthesis results for monitoring & control
Component
Integration method
Eventdriven
probe
Timedriven
probe
Central
Control
Unit
Into IP
core
Using IP
core port
Extra
port
Frequency
[MHz]
227
338
165
122
119
112
Area
[LUT/FF
pairs]
66
80
507
1901
1896
2312

Unmodified NoC router: 1771 LUT/FF pairs, 122 MHz
Tim Wegner - 23 October 2010
MEMICS 2010, Mikulov, Czech Republic, October 22-24
24