SLATE A System-Level Analysis Tool for Early Exploration

Download Report

Transcript SLATE A System-Level Analysis Tool for Early Exploration

0

© 2006 IBM Corporation

IBM Research

Multi-Core Design Automation Challenges

John Darringer

IBM T. J. Watson Research Center Yorktown Heights, NY, USA

DAC 2007

© 2007 IBM Corporation

2

System Performance Requires An Integrated Approach

200 Device Performance Application Languages, Software Tuning Efficient Programming Middleware Recent Historical Trend System Level 100 Dynamic optimization Assist Threads Fast Computation Power Optimization Compiler Support Production Date 20 1998 2000 2002 2004 2006 2008

Scaling no longer provides traditional performance boost

Power limits everything

Advances will come from entire performance stack Chip Level Technology Compiler Support Multiple Cores SMT Accelerators Power Management Interconnect Circuits Packaging, Cooling New Devices Dense SRAM, eDRAM Optics

Innovation in System Design

3 FPU IDU IFU BXU ISU LSU L2 L2 LSU ISU FPU IFU BXU IDU L2 Power 5

Multi-Thread-2004

Power 4

Multi-Core-2001

Power 6

4.7 Ghz-2007

CELL

Accelerators-2006 © 2006 IBM Corporation

Trend to Modular Application Optimized Systems

SMP Core Cache Accelerator Memory ...

4 Blades

Growing use of diverse modular components

Chip integration may evolve to component assembly

Challenge is in system-level design

– Optimizing architecture for specific applications © 2006 IBM Corporation

5

Multi-Core ASICs

Multi-core ASIC SoCs are common today

– Address broad range of markets – Enables high functional integration – Provides rapid time to market 

One example from 2004

– Cisco Silicon Packet Processor – 188 32-bit RISC processors – 47 BIPS © 2006 IBM Corporation

6

Multi-Core Processors

Power efficient, reusable cores

Application matched accelerators

Flexible scaleable interconnect

Optimized memory hierarchy

High speed I/O

Energy management

Deliver system performance

Rapid chip assembly to serve diverse markets

© 2006 IBM Corporation

7

CHALLENGE

System Design

– Continued performance growth – Increasing power efficiency – Optimizing for new applications 

Design Automation

– Custom design efficiency – AISC productivity – Design and verification 

Enablers

– Physical Architecture – Integrated Early Analysis – Multi-Core Verification © 2006 IBM Corporation

8

Physical Architecture

Complement logical architecture

Streamline chip integration

Plan for interconnect

Provide predictable results

Multiple strategies

– Fixed layout per block – Parametric or generated – Extended synthesis

Example Logical Architecture Example Physical Architecture

© 2006 IBM Corporation

9

Modular Components

Components need self-contained vertical stack

with clean interfaces to enable automated integration Current “Component” Mixed Fabric and Component Function; Custom Interface Current Chips Custom crafting of clock, data, and power meshes Component Fabric Component Function Future Component Future Chips Automated connection with parametric fabric

© 2006 IBM Corporation

Custom Design

Careful interconnect design

– Communication – Clock distribution – Power and ground 

Better power efficiency

– Clock gating, Power gating – Detailed transistor sizing 

High bandwidth memory and I/O

Higher frequency operation 10

© 2006 IBM Corporation

Challenges of Modular Design

Custom Layout

– Flexible shape and orientation – Optimum mesh for power and clock – Distributed communication and test – Manually optimized 

Modular Layout

– Constrained shape and orientation – Separate power and clock per core – Parametric interconnect fabric – Automatic connection to fabric

11 Core Core Core Core Core Core Core Core

© 2006 IBM Corporation

Custom Clock Design

Distribution network

– Latches and clocked gates – Control skew and jitter – Minimize power – Survive variation and noise 

Interconnect models

– Inductance critical – Transmission line – Buffer placement 

Hand optimized

– Still an art

12

Phillip Restle © 2006 IBM Corporation

Custom Power Distribution

        

Distribute to all devices Multiple voltage domains Simulate detailed power demand Model chip and package Consider ground coupling Balance mesh and trees Allocate decoupling capacitors Focus on resonant frequency Explore clock/power gating scenarios 13

Howard Chen © 2006 IBM Corporation

Challenges of Modular Design

Custom Wiring

– Optimized over chip – Resources shared – Variation minimized – Complex analysis and integration 

Modular Wiring

– Optimized at block level – Fixed resource allocation – Some variation in results – Requires automated integration

14

© 2006 IBM Corporation

Spectrum of Strategies

15 Modular Reuse Extended Synthesis Fixed physical architecture

     

Fixed Layout …. Parametric Careful block design Custom within block Automated block connect Predictable results Good for planned cases Stresses design ….. Generated Generated physical architecture

More abstract layout

Heavy physical synthesis

Unique block configuration

Results will vary

Flexible restructuring

Stresses tools

© 2006 IBM Corporation

Systems Demand Early Analysis

To explore many more options

– Cores, Accelerators, Interconnect, Memory Hierarchy, … 

To consider many design criteria simultaneously

– Power, Performance, Latency, Hotspots, Reliability, … 

To optimize system for specific market

Environment exists for early functional modeling

But today’s tools are not linked to physical design

© 2006 IBM Corporation

16

Early System Analysis

Design Performance Models Assumptions Design Team Technology Power Analysis Implementation Interconnect Analysis Floorplan Thermal Analysis Package

17

Loosely coupled disciplines with multiple experts and distinct models

© 2006 IBM Corporation

Performance Modeling Is Changing

New parallel workloads emerging

– Execution vs. trace driven 

Shifting to multi-core designs

– Stresses balance of model performance and accuracy 

Complex interconnect fabric and memory hierarchy

– Bus, switch, network, asynchronous,… 

Increasing use of SystemC

– For early software development and component sharing © 2006 IBM Corporation

18

Early Physical Planning is Essential

19

Interconnect requires full chip layout

– Estimate component area before implementation – Need more accurate methods – Have to plan for all facilities to predict chip size 

Placement coupled to many factors

– Interconnect performance – Power – Thermal and reliability concerns – Yield © 2006 IBM Corporation

20

Modeling Interconnects in Multi-Core Designs

 Interconnect delays – Effect performance – Depend on placement – Require accurate modeling Core

Cache

Core

Cache

Interconnect Delays

Interconnect Fabric Cache

Core

Cache

Core

Memory Controller

Async/Sync Interface with Parametric delay © 2006 IBM Corporation

Power is Key Criteria, but Hard to Predict

21

Need estimate before implementation

– Voltage/Frequency scaling, Voltage islands, clock gating, leakage 

Not just core, but many diverse chip components

– Core, cache, interconnect, controllers, I/O, pervasive 

Model “interesting” states and transitions

Scale known implementations

– Complex measurement process for calibration – Requires data from chip layout © 2006 IBM Corporation

Integrated Early System Analysis

Couple all forms of early analysis

Share data in central repository

Industry standard data model

– Open Access 

Hand-off to chip integration

– Assumptions, blocks, layout, … 

Graphic interface for editing

Stage is set for optimization

Design Team

Design Floorplan Package Technology Assumptions Results Handoff

Implementation

22 Performance Power Interconnect Thermal Optimize

© 2006 IBM Corporation

23

Multi-Core Verification

Verification has always been the greatest challenge

Complexity grows with each generation

Challenge is to exploit reuse with multi-core designs

– Requires clear interface definition

Core Traditional Approach Core Verification Core Core Core Core Multi-Core Approach System Verification

© 2006 IBM Corporation

Core Verification

Complexity growing

– Clock/Power gating, Voltage and frequency scaling 

Formal methods are used

– – – Checking RTL = netlist Checking assertions Proving implementation equivalent to reference model 

Simulation still dominates

Need higher level of specification

– – Improve quality Stretch synthesis and verification tools 

Reuse verification environment

© 2006 IBM Corporation

24

System Verification

More complex systems

– Many cores, accelerators, networks, asynchronous links 

Memory and network contention is critical area

Formal methods have made impact

– Verifying abstract memory protocols 

Simulation is still the final check

Need system-level test case generation

– Use system knowledge to expose resource contention issues © 2006 IBM Corporation

25

Summary

Exciting and challenging times

– Designing application optimized multi-core systems – Delivering custom efficiency with ASIC productivity 

Focus areas

– Physical Architecture to streamline chip integration – Integrated Early Analysis to explore design space – Multi-core verification that exploits reuse 

Long history of invention in today’s RTL flow

Innovation is needed now at the system level

© 2006 IBM Corporation

26

Acknowledgements

Thanks to the following people

– Emrah Acar, Reinaldo Bergamaschi, Pradip Bose, Howard Chen, Nagu Dhanwada, Steven German, Steve Kosonocky, Indira Nair, Ruchir Puri, Phillip Restle, Albert Ruehli, Michael Vinov. © 2006 IBM Corporation

27