Transcript SLATE A System-Level Analysis Tool for Early Exploration
0
© 2006 IBM Corporation
IBM Research
Multi-Core Design Automation Challenges
John Darringer
IBM T. J. Watson Research Center Yorktown Heights, NY, USA
DAC 2007
© 2007 IBM Corporation
2
System Performance Requires An Integrated Approach
200 Device Performance Application Languages, Software Tuning Efficient Programming Middleware Recent Historical Trend System Level 100 Dynamic optimization Assist Threads Fast Computation Power Optimization Compiler Support Production Date 20 1998 2000 2002 2004 2006 2008
Scaling no longer provides traditional performance boost
Power limits everything
Advances will come from entire performance stack Chip Level Technology Compiler Support Multiple Cores SMT Accelerators Power Management Interconnect Circuits Packaging, Cooling New Devices Dense SRAM, eDRAM Optics
Innovation in System Design
3 FPU IDU IFU BXU ISU LSU L2 L2 LSU ISU FPU IFU BXU IDU L2 Power 5
Multi-Thread-2004
Power 4
Multi-Core-2001
Power 6
4.7 Ghz-2007
CELL
Accelerators-2006 © 2006 IBM Corporation
Trend to Modular Application Optimized Systems
SMP Core Cache Accelerator Memory ...
4 Blades
Growing use of diverse modular components
Chip integration may evolve to component assembly
Challenge is in system-level design
– Optimizing architecture for specific applications © 2006 IBM Corporation
5
Multi-Core ASICs
Multi-core ASIC SoCs are common today
– Address broad range of markets – Enables high functional integration – Provides rapid time to market
One example from 2004
– Cisco Silicon Packet Processor – 188 32-bit RISC processors – 47 BIPS © 2006 IBM Corporation
6
Multi-Core Processors
Power efficient, reusable cores
Application matched accelerators
Flexible scaleable interconnect
Optimized memory hierarchy
High speed I/O
Energy management
Deliver system performance
Rapid chip assembly to serve diverse markets
© 2006 IBM Corporation
7
CHALLENGE
System Design
– Continued performance growth – Increasing power efficiency – Optimizing for new applications
Design Automation
– Custom design efficiency – AISC productivity – Design and verification
Enablers
– Physical Architecture – Integrated Early Analysis – Multi-Core Verification © 2006 IBM Corporation
8
Physical Architecture
Complement logical architecture
Streamline chip integration
Plan for interconnect
Provide predictable results
Multiple strategies
– Fixed layout per block – Parametric or generated – Extended synthesis
Example Logical Architecture Example Physical Architecture
© 2006 IBM Corporation
9
Modular Components
Components need self-contained vertical stack
–
with clean interfaces to enable automated integration Current “Component” Mixed Fabric and Component Function; Custom Interface Current Chips Custom crafting of clock, data, and power meshes Component Fabric Component Function Future Component Future Chips Automated connection with parametric fabric
© 2006 IBM Corporation
Custom Design
Careful interconnect design
– Communication – Clock distribution – Power and ground
Better power efficiency
– Clock gating, Power gating – Detailed transistor sizing
High bandwidth memory and I/O
Higher frequency operation 10
© 2006 IBM Corporation
Challenges of Modular Design
Custom Layout
– Flexible shape and orientation – Optimum mesh for power and clock – Distributed communication and test – Manually optimized
Modular Layout
– Constrained shape and orientation – Separate power and clock per core – Parametric interconnect fabric – Automatic connection to fabric
11 Core Core Core Core Core Core Core Core
© 2006 IBM Corporation
Custom Clock Design
Distribution network
– Latches and clocked gates – Control skew and jitter – Minimize power – Survive variation and noise
Interconnect models
– Inductance critical – Transmission line – Buffer placement
Hand optimized
– Still an art
12
Phillip Restle © 2006 IBM Corporation
Custom Power Distribution
Distribute to all devices Multiple voltage domains Simulate detailed power demand Model chip and package Consider ground coupling Balance mesh and trees Allocate decoupling capacitors Focus on resonant frequency Explore clock/power gating scenarios 13
Howard Chen © 2006 IBM Corporation
Challenges of Modular Design
Custom Wiring
– Optimized over chip – Resources shared – Variation minimized – Complex analysis and integration
Modular Wiring
– Optimized at block level – Fixed resource allocation – Some variation in results – Requires automated integration
14
© 2006 IBM Corporation
Spectrum of Strategies
15 Modular Reuse Extended Synthesis Fixed physical architecture
Fixed Layout …. Parametric Careful block design Custom within block Automated block connect Predictable results Good for planned cases Stresses design ….. Generated Generated physical architecture
More abstract layout
Heavy physical synthesis
Unique block configuration
Results will vary
Flexible restructuring
Stresses tools
© 2006 IBM Corporation
Systems Demand Early Analysis
To explore many more options
– Cores, Accelerators, Interconnect, Memory Hierarchy, …
To consider many design criteria simultaneously
– Power, Performance, Latency, Hotspots, Reliability, …
To optimize system for specific market
Environment exists for early functional modeling
But today’s tools are not linked to physical design
© 2006 IBM Corporation
16
Early System Analysis
Design Performance Models Assumptions Design Team Technology Power Analysis Implementation Interconnect Analysis Floorplan Thermal Analysis Package
17
Loosely coupled disciplines with multiple experts and distinct models
© 2006 IBM Corporation
Performance Modeling Is Changing
New parallel workloads emerging
– Execution vs. trace driven
Shifting to multi-core designs
– Stresses balance of model performance and accuracy
Complex interconnect fabric and memory hierarchy
– Bus, switch, network, asynchronous,…
Increasing use of SystemC
– For early software development and component sharing © 2006 IBM Corporation
18
Early Physical Planning is Essential
19
Interconnect requires full chip layout
– Estimate component area before implementation – Need more accurate methods – Have to plan for all facilities to predict chip size
Placement coupled to many factors
– Interconnect performance – Power – Thermal and reliability concerns – Yield © 2006 IBM Corporation
20
Modeling Interconnects in Multi-Core Designs
Interconnect delays – Effect performance – Depend on placement – Require accurate modeling Core
Cache
Core
Cache
Interconnect Delays
Interconnect Fabric Cache
Core
Cache
Core
Memory Controller
Async/Sync Interface with Parametric delay © 2006 IBM Corporation
Power is Key Criteria, but Hard to Predict
21
Need estimate before implementation
– Voltage/Frequency scaling, Voltage islands, clock gating, leakage
Not just core, but many diverse chip components
– Core, cache, interconnect, controllers, I/O, pervasive
Model “interesting” states and transitions
Scale known implementations
– Complex measurement process for calibration – Requires data from chip layout © 2006 IBM Corporation
Integrated Early System Analysis
Couple all forms of early analysis
Share data in central repository
Industry standard data model
– Open Access
Hand-off to chip integration
– Assumptions, blocks, layout, …
Graphic interface for editing
Stage is set for optimization
Design Team
Design Floorplan Package Technology Assumptions Results Handoff
Implementation
22 Performance Power Interconnect Thermal Optimize
© 2006 IBM Corporation
23
Multi-Core Verification
Verification has always been the greatest challenge
Complexity grows with each generation
Challenge is to exploit reuse with multi-core designs
– Requires clear interface definition
Core Traditional Approach Core Verification Core Core Core Core Multi-Core Approach System Verification
© 2006 IBM Corporation
Core Verification
Complexity growing
– Clock/Power gating, Voltage and frequency scaling
Formal methods are used
– – – Checking RTL = netlist Checking assertions Proving implementation equivalent to reference model
Simulation still dominates
Need higher level of specification
– – Improve quality Stretch synthesis and verification tools
Reuse verification environment
© 2006 IBM Corporation
24
System Verification
More complex systems
– Many cores, accelerators, networks, asynchronous links
Memory and network contention is critical area
Formal methods have made impact
– Verifying abstract memory protocols
Simulation is still the final check
Need system-level test case generation
– Use system knowledge to expose resource contention issues © 2006 IBM Corporation
25
Summary
Exciting and challenging times
– Designing application optimized multi-core systems – Delivering custom efficiency with ASIC productivity
Focus areas
– Physical Architecture to streamline chip integration – Integrated Early Analysis to explore design space – Multi-core verification that exploits reuse
Long history of invention in today’s RTL flow
Innovation is needed now at the system level
© 2006 IBM Corporation
26
Acknowledgements
Thanks to the following people
– Emrah Acar, Reinaldo Bergamaschi, Pradip Bose, Howard Chen, Nagu Dhanwada, Steven German, Steve Kosonocky, Indira Nair, Ruchir Puri, Phillip Restle, Albert Ruehli, Michael Vinov. © 2006 IBM Corporation
27