Monitoring and Control of Temperature in Networks-onChip Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk Timmermann; MEMICS 2010, Mikulov, Czech Republic, October 22-24 University of Rostock Institute.
Download ReportTranscript Monitoring and Control of Temperature in Networks-onChip Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk Timmermann; MEMICS 2010, Mikulov, Czech Republic, October 22-24 University of Rostock Institute.
Monitoring and Control of Temperature in Networks-onChip Tim Wegner, Claas Cornelius, Andreas Tockhorn, Dirk Timmermann; MEMICS 2010, Mikulov, Czech Republic, October 22-24 University of Rostock Institute of Applied Microelectronics and Computer Engineering Monitoring and Control of Temperature in NoCs Outline 1. Introduction 2. Networks-on-Chip (NoCs) 3. Impact of Temperature on Reliability 4. Monitoring & Control of Temperature in NoCs 5. Summary Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 2 Monitoring and Control of Temperature in NoCs Transistor count 1. Introduction 1954: IBM 704 Mainframe 1981: IBM PC5150 2007: Apple iPhone Impacts of technological development Increasing integration density → rising complexity, shrinking device sizes NoCs able to deal with arising requirements (e.g. for communication) But: Reliability becomes a dominant factor for chip design Goal: Increase reliability in NoC-based systems Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 3 Monitoring and Control of Temperature in NoCs Outline 1. Introduction 2. Networks-on-Chip (NoCs) 3. Impact of Temperature on Reliability 4. Monitoring & Control of Temperature in NoCs 5. Summary Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 4 Monitoring and Control of Temperature in NoCs 2. Networks-on-Chip Properties R Infrastructure for on-chip interconnection Point-to-point links replace long global busses Parallel packet-based communication Separation of communication & computation Globally asynchronous locally synchronous (GALS) Modularity of IP cores (not part of actual NoC) reusability, high abstraction level IP core CLK2 IP core CLK0 R IP core R CLK3 R IP core CLK1 NoCs are able to satisfy requirements of modern VLSI systems Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 5 Monitoring and Control of Temperature in NoCs Outline 1. Introduction 2. Networks-on-Chip (NoCs) 3. Impact of Temperature on Reliability 4. Monitoring & Control of Temperature in NoCs 5. Summary Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 6 Monitoring and Control of Temperature in NoCs 3. Impact of Temperature on Reliability Impacts of technological progress Increasing integration densities, progress of nanotechnology Growing number of transistors per chip = raised probability of failure decreasing structural size of ICs = higher susceptibility to environmental influences & deterioration Intel 8086 (1978): ≈879 transistors/mm² Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 Intel Bloomfield (2008): ≈2,78 Mio. transistors/mm² 7 Monitoring and Control of Temperature in NoCs 3. Impact of Temperature on Reliability Why is thermal awareness important? Particular physical effects (e.g. TDDB, EM) contribute to deterioration Abetted by high temperatures Correlation between temperature & failure mechanisms established by Arrhenius model Exponential decrease of IC lifetime with temperature T fail e Ea k b *T Growing influence of on-chip temperature distribution on lifetime, operability, performance etc. Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 8 Monitoring and Control of Temperature in NoCs Outline 1. Introduction 2. Networks-on-Chip (NoCs) 3. Impact of Temperature on Reliability 4. Monitoring & Control of Temperature in NoCs 5. Summary Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 9 Monitoring and Control of Temperature in NoCs 4. Monitoring and Control of Temperature for NoCs Objective: Mitigate effects contributing to deterioration & delay occurrence of failures Control of on-chip temperature distribution Requirements: Effective mechanisms to monitor & control on-chip temperature Integration into existing NoC Preservation of modularity & reusability Minimum costs (area, frequency) Maximum performance of monitoring and control Minimum impact on system performance Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 10 Monitoring and Control of Temperature in NoCs 4.1 Mechanisms for monitoring Concept: attach physical monitoring probes to every IP core Event-driven: temperature variation ∆T Continuous checking of TIPC |TIPC,old - TIPC,new| ≥ ∆T ? Report TIPC,new Area: 66 LUT/FF pairs Frequency: 227 MHz Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 Time-driven: Period of time ∆t Report TIPC,new every ∆t Area: 80 LUT/FF pairs Frequency: 338 MHz 11 Monitoring and Control of Temperature in NoCs 4.2 Mechanisms for control Central Control Unit (CCU): Reception & interpretation of probe packets Instructions for Dynamic Frequency Scaling to probes (if necessary) Area: 507 LUT/FF pairs Frequency: 165 MHz R IP core R IP core R CCU R IP core !!! Not the smartest approach, but suffices to test functionality !!! Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 12 Monitoring and Control of Temperature in NoCs 4.3 Integration of monitoring 3 approaches Different impact on performance & costs Into IP core: Area penalty: / Freq. penalty: / Router port of IP core: Area penalty: 7,3% Freq. penalty: / (but Mux/Demux) Extra router port: Area penalty: 30,5% Freq. penalty: 8,2% R R R P P IP core IP core IP core P Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 13 Monitoring and Control of Temperature in NoCs 4.4 Impact on system performance Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 14 Monitoring and Control of Temperature in NoCs 4.5 Performance of monitoring & control Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 15 Monitoring and Control of Temperature in NoCs 5. Summary Implementation of 2 approaches for monitoring on-chip temperature + 3 methods for integration into NoC Investigation of: Costs (area, frequency) Impact on system performance Performance of monitoring & control Conclusion Event-driven approach preferable (situational monitoring, better performance, no redundant traffic, lower area costs) Integration into NoC using router port of IP core best trade-off between costs & preservation of modularity/non-intrusiveness Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 16 Thanks for your attention! Any questions? Contact: [email protected] Homepage: www.networks-on-chip.com University of Rostock, Germany Institute of Applied Microelectronics and Computer Engineering Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 Monitoring and Control of Temperature in NoCs Arrhenius Model Establishes relationship between temperature and failure mechanisms Describes dependence of chemical reactions on temperature changes Assumption: all other parameters constant Tfail Ea T fail e kb *T Temperature Lifetime of ICs decreases exponentially with temperature Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 18 Monitoring and Control of Temperature in NoCs Time Dependent Dielectric Breakdown (TDDB) Formation of charge traps Conducting path through gate oxide Current flow More charge traps !!! HEAT !!! Inoperability of transistor through gate oxide breakdown (long-term) Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 19 Monitoring and Control of Temperature in NoCs Electromigration (EM) Transport of material in conductors (i.e. wires) Cause: ion movement induced by current flow (ions’ mobility increases with temperature) Effects: • Hillocks short circuits • Voids interruption of current paths Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 20 Monitoring and Control of Temperature in NoCs Intel Processors Intel Bloomfield: • Year: 2008 • 731 Mio. Transistors • 263mm² • 2779467 Tr./mm2 Intel 8086: • Year: 1978 • 29k transistors • 33mm² • 879 Tr./mm² Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 21 Monitoring and Control of Temperature in NoCs Impact on system performance Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 22 Monitoring and Control of Temperature in NoCs Performance of monitoring & control Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 23 Monitoring and Control of Temperature in NoCs Synthesis results for monitoring & control Component Integration method Eventdriven probe Timedriven probe Central Control Unit Into IP core Using IP core port Extra port Frequency [MHz] 227 338 165 122 119 112 Area [LUT/FF pairs] 66 80 507 1901 1896 2312 Unmodified NoC router: 1771 LUT/FF pairs, 122 MHz Tim Wegner - 23 October 2010 MEMICS 2010, Mikulov, Czech Republic, October 22-24 24