An Integrated Debugging Environment for Reprogrammable

Download Report

Transcript An Integrated Debugging Environment for Reprogrammable

An Integrated
Debugging Environment for
Reprogrammable Hardware Systems
Kevin Camera
Hayden So
Bob Brodersen
Berkeley Wireless Research Center
University of California, Berkeley
AADEBUG 2005
Outline

Motivation

Existing platform

Existing design/verification flow

Proposed solution

Environment features

Walkthrough

Implementation strategy
1
Application Domain

Direct-mapped, reprogrammable
hardware systems

FPGA-based signal
processing and
supercomputing
arrays
2
FPGA Computing Benefits

Superior power, computation, and cost
efficiency than any processor-based solution,
due to direct mapping of algorithms
XC2VP70-7
C6415T-1G
Computation Rate
(Gop/s)
72
4
Power Efficiency
(Gop/s/W)
2.72
1.84
Price/Performance
(Mop/s/$)
31.0
14.81
Chang, Wawrzynek, Brodersen; ISCA ‘05
3
BEE2: 2nd Berkeley Emulation Engine

(5) Xilinx V2P100 per board






~100K logic cells
2 PowerPC405 cores
444 dedicated multipliers
1MB on-chip SRAM
3.125Gb/s duplex links
(4) DDR2 banks per FPGA



4
72 bits per bank with ECC
Up to 12.8 (DDR400) or 17
(533DDR) GB/s bandwidth
Up to 4GB capacity
BEE Design Flow
Design
Netlist
Place and
Route
Verify
Hardware

Design entry is in the Matlab/Simulink environment


Graphical, library based; also allows custom HDL
Typical FPGA path to physical implementation


HDL synthesis and place and route
Hierarchy is flattened in each pass (non-modular flow)
5
Design Verification Methods

High-level functional simulation

HDL/RTL simulation

Native FPGA execution
Complexity,
Accuracy
6
High-level Functional Simulation




Design execution
in Matlab/Simulink
Intended to be
correct by
construction
Fastest softwarebased simulation
Powerful and convenient algorithm exploration
7
Drawbacks of High-level Simulation

sec
50
45
38.8
Even with high level of
abstraction, vastly slower
than hardware
40

35
30
25

20
15
10
5
Doesn’t cover any sideeffects or requirements of
the backend tool chain
2E-06
0
HW
Trend is worsening with
increased FPGA capacity
SW
8
HDL/RTL Simulation




Varying levels
of accuracy
Access to
arbitrary
internal signals
But, simulation speed is even slower
Parameterization/Iteration is much harder
9
Native FPGA Execution

Runs at full speed of hardware

Three tools for on-FPGA testing:
 Xilinx
ChipScope Pro
 System
 Good
Generator HW-in-the-loop
old-fashioned signal probing
10
Xilinx ChipScope Pro
11

Inserts BRAM cores into
design and binds to JTAG

Captures selected signals
and provides trigger
conditions

Signals of interest must
be chosen in advance

Captured state is limited
by available BRAM

Any changes require tool
flow re-iteration
System Generator HW-in-the-loop



12
Allows hardware itself
to accept and process
data from Simulink via
JTAG
Arbitrary number of
data elements can be
accessed as “ports”
Very powerful tool, but
features limited
process control
Hands-on Hardware Debugging


Most accurate method
for finding timing-related
bugs in a “production”
system
Tradeoffs are all too
well-known:




13
Complex equipment
Limited probing pins
A priori signal output
Limited input options
Drawback of On-FPGA Execution
100

Place and route time
is a major bottleneck
90
80
70
Complete run is
needed for every
design change
min

60
50
40
30
20

Increasingly
problematic due to
larger FPGA capacity
Synthesis
10
0
Place and Route
14
PFB (3805)
PFB x4
(15,301)
PFB x8
(30,601)
0.28333
3.85
1.26667
35.4
3.4
90.46667
Proposed Solution

Enable extensive debugging and design
exploration functionality directly on the
hardware platform
 Vastly
superior execution time for today’s
large-scale computing challenges

Exploit the spatial resources of the hardware
to assist in debugging
 Essentially
a -g switch to the hardware
design flow

Minimize or eliminate iterations through
implementation flow
15
Caveats

Final timing of design will not be preserved
 Critical
path will definitely be increased,
but 106 is a lot of headroom
 Timing-driven implementation still needed once
verification is complete

Significantly more FPGA capacity and memory
will be needed
 Acceptable
for scalable BEE-like platforms and
for modular, tiled algorithms
16
Essential Features of Environment
1. Robustly parameterized library
components with soft configuration

Design exploration without tool iterations
2. Readily accessible variable contents

Reading and writing of any values by user
3. Complete user-driven control over
process execution

Single-step, bursts, breakpoints, assertions
17
1: Parameterized Library
• Number of bits
• Saturate / Wrap
• Binary point position
• Microarchitecture

Library components provide configuration
parameters as inputs, which can be set by variables

Allows runtime modification of function properties,
including precision, range, and latency
 Enables design-space exploration at hardware speed,
plus correction of configuration errors without reimplementation
18
2: Data Management

Ability to dynamically observe any variable’s
value at the user’s request

Ability to overwrite a variable’s value at
runtime and continue operation

Ability to rewind system state within the
bounds of buffer capacity
19
2: Data Management Requirements

Too expensive to re-implement the hardware
to expose new data
 All
variables are streamed into local and offchip storage, such as DRAM and disks

Unlike software, hardware is highly parallel,
and often deeply pipelined
 Memory
requirements could be extreme
 Can be offset by hierarchical memory
architecture and/or periodic sampling
20
3: Process Control

Inherit the most useful features of software
debuggers like GDB
 Cycle-by-cycle
(single-step) execution
 Breakpoints (either state dependent, or fixed
cycle count)

Implemented using multiple clock domains
and clock buffer control
 Already
available for use on BEE2
21
Walkthrough: Design
22

Use specialized
libraries to provide
soft configuration

Integrates directly
into the existing
BEE2 tool flow
Walkthrough: Tagging

User tags signals
of interest with
debugging
testpoints

Defines a
variable name
 Defines other
parameters of
interest for data
observation

23
Also includes
breakpoints and
assertions
Walkthrough: Stitching

“Stitcher” updates
the design before
entering back-end
tool flow

Inserts logic as
needed for debug
functions
 Instantiates
PowerPC core
and master
controller
 Adds underlying
connections to
route data
24
Walkthrough: Runtime

User can monitor
variables and
control process
execution from
remote client

Embedded
PowerPC software
provides a thin
service layer
 Client is fully
integrated with
Matlab and
Simulink input
description
25
Control Architecture on BEE2
Control FPGA
Network
PPC
Singlestep
Clock
Buffer
Logic
100MHz
User Defined
(~1-10MHz)
Clock
domains
Breakpoint
interrupt
Control
DRAM
Inserted Logic
User
Design
User FPGA
26
Stitching

Stitcher traverses the design hierarchy and:
 Replaces
debugging component placeholders
with necessary logic
 Creates a simple route from all variables to
off-chip storage devices

During execution, the stitcher records:
 A mapping
between variable names and their
physical variable unit in hardware
 The latency within the variable routing
network
27
Variable Control Unit (VCU)



Inserted in place of
each variable block in
design
Automatically implied
for every state variable
in a state machine
Combination of local
buffers and off-chip
DRAM

Exact memory allocation
is subject to
experimentation
28
Debug Controller (DC)

Interface between
all variable and
assertion
instances, the
runtime user shell,
and process
control “services”

Regulates the
system clock both
for exceptions and
to prevent variable
storage overflows
29
Runtime Shell Examples
load
Initialize or reset a design
halt
Stop the design as soon as possible
runfor
Run the design for a number of cycles
cont
Run the design until the next exception
break
View, enable, or change a breakpoint
view
View a variable’s value or history
set
Override a variable’s value or source
rewind
Rewind the system state by n cycles
30
Future Work

Complete infrastructure for BEE2

Extensive experiments with variable memory
 Efficient
methods for variable routing
 Storage requirements and hierarchy
 Time/Space tradeoffs for periodic sampling

Generalize framework to define concepts
such as variable priorities, multiple debug
levels, and extensions to text-based
languages
31
Questions?