An Integrated Debugging Environment for Reprogrammable
Download
Report
Transcript An Integrated Debugging Environment for Reprogrammable
An Integrated
Debugging Environment for
Reprogrammable Hardware Systems
Kevin Camera
Hayden So
Bob Brodersen
Berkeley Wireless Research Center
University of California, Berkeley
AADEBUG 2005
Outline
Motivation
Existing platform
Existing design/verification flow
Proposed solution
Environment features
Walkthrough
Implementation strategy
1
Application Domain
Direct-mapped, reprogrammable
hardware systems
FPGA-based signal
processing and
supercomputing
arrays
2
FPGA Computing Benefits
Superior power, computation, and cost
efficiency than any processor-based solution,
due to direct mapping of algorithms
XC2VP70-7
C6415T-1G
Computation Rate
(Gop/s)
72
4
Power Efficiency
(Gop/s/W)
2.72
1.84
Price/Performance
(Mop/s/$)
31.0
14.81
Chang, Wawrzynek, Brodersen; ISCA ‘05
3
BEE2: 2nd Berkeley Emulation Engine
(5) Xilinx V2P100 per board
~100K logic cells
2 PowerPC405 cores
444 dedicated multipliers
1MB on-chip SRAM
3.125Gb/s duplex links
(4) DDR2 banks per FPGA
4
72 bits per bank with ECC
Up to 12.8 (DDR400) or 17
(533DDR) GB/s bandwidth
Up to 4GB capacity
BEE Design Flow
Design
Netlist
Place and
Route
Verify
Hardware
Design entry is in the Matlab/Simulink environment
Graphical, library based; also allows custom HDL
Typical FPGA path to physical implementation
HDL synthesis and place and route
Hierarchy is flattened in each pass (non-modular flow)
5
Design Verification Methods
High-level functional simulation
HDL/RTL simulation
Native FPGA execution
Complexity,
Accuracy
6
High-level Functional Simulation
Design execution
in Matlab/Simulink
Intended to be
correct by
construction
Fastest softwarebased simulation
Powerful and convenient algorithm exploration
7
Drawbacks of High-level Simulation
sec
50
45
38.8
Even with high level of
abstraction, vastly slower
than hardware
40
35
30
25
20
15
10
5
Doesn’t cover any sideeffects or requirements of
the backend tool chain
2E-06
0
HW
Trend is worsening with
increased FPGA capacity
SW
8
HDL/RTL Simulation
Varying levels
of accuracy
Access to
arbitrary
internal signals
But, simulation speed is even slower
Parameterization/Iteration is much harder
9
Native FPGA Execution
Runs at full speed of hardware
Three tools for on-FPGA testing:
Xilinx
ChipScope Pro
System
Good
Generator HW-in-the-loop
old-fashioned signal probing
10
Xilinx ChipScope Pro
11
Inserts BRAM cores into
design and binds to JTAG
Captures selected signals
and provides trigger
conditions
Signals of interest must
be chosen in advance
Captured state is limited
by available BRAM
Any changes require tool
flow re-iteration
System Generator HW-in-the-loop
12
Allows hardware itself
to accept and process
data from Simulink via
JTAG
Arbitrary number of
data elements can be
accessed as “ports”
Very powerful tool, but
features limited
process control
Hands-on Hardware Debugging
Most accurate method
for finding timing-related
bugs in a “production”
system
Tradeoffs are all too
well-known:
13
Complex equipment
Limited probing pins
A priori signal output
Limited input options
Drawback of On-FPGA Execution
100
Place and route time
is a major bottleneck
90
80
70
Complete run is
needed for every
design change
min
60
50
40
30
20
Increasingly
problematic due to
larger FPGA capacity
Synthesis
10
0
Place and Route
14
PFB (3805)
PFB x4
(15,301)
PFB x8
(30,601)
0.28333
3.85
1.26667
35.4
3.4
90.46667
Proposed Solution
Enable extensive debugging and design
exploration functionality directly on the
hardware platform
Vastly
superior execution time for today’s
large-scale computing challenges
Exploit the spatial resources of the hardware
to assist in debugging
Essentially
a -g switch to the hardware
design flow
Minimize or eliminate iterations through
implementation flow
15
Caveats
Final timing of design will not be preserved
Critical
path will definitely be increased,
but 106 is a lot of headroom
Timing-driven implementation still needed once
verification is complete
Significantly more FPGA capacity and memory
will be needed
Acceptable
for scalable BEE-like platforms and
for modular, tiled algorithms
16
Essential Features of Environment
1. Robustly parameterized library
components with soft configuration
Design exploration without tool iterations
2. Readily accessible variable contents
Reading and writing of any values by user
3. Complete user-driven control over
process execution
Single-step, bursts, breakpoints, assertions
17
1: Parameterized Library
• Number of bits
• Saturate / Wrap
• Binary point position
• Microarchitecture
Library components provide configuration
parameters as inputs, which can be set by variables
Allows runtime modification of function properties,
including precision, range, and latency
Enables design-space exploration at hardware speed,
plus correction of configuration errors without reimplementation
18
2: Data Management
Ability to dynamically observe any variable’s
value at the user’s request
Ability to overwrite a variable’s value at
runtime and continue operation
Ability to rewind system state within the
bounds of buffer capacity
19
2: Data Management Requirements
Too expensive to re-implement the hardware
to expose new data
All
variables are streamed into local and offchip storage, such as DRAM and disks
Unlike software, hardware is highly parallel,
and often deeply pipelined
Memory
requirements could be extreme
Can be offset by hierarchical memory
architecture and/or periodic sampling
20
3: Process Control
Inherit the most useful features of software
debuggers like GDB
Cycle-by-cycle
(single-step) execution
Breakpoints (either state dependent, or fixed
cycle count)
Implemented using multiple clock domains
and clock buffer control
Already
available for use on BEE2
21
Walkthrough: Design
22
Use specialized
libraries to provide
soft configuration
Integrates directly
into the existing
BEE2 tool flow
Walkthrough: Tagging
User tags signals
of interest with
debugging
testpoints
Defines a
variable name
Defines other
parameters of
interest for data
observation
23
Also includes
breakpoints and
assertions
Walkthrough: Stitching
“Stitcher” updates
the design before
entering back-end
tool flow
Inserts logic as
needed for debug
functions
Instantiates
PowerPC core
and master
controller
Adds underlying
connections to
route data
24
Walkthrough: Runtime
User can monitor
variables and
control process
execution from
remote client
Embedded
PowerPC software
provides a thin
service layer
Client is fully
integrated with
Matlab and
Simulink input
description
25
Control Architecture on BEE2
Control FPGA
Network
PPC
Singlestep
Clock
Buffer
Logic
100MHz
User Defined
(~1-10MHz)
Clock
domains
Breakpoint
interrupt
Control
DRAM
Inserted Logic
User
Design
User FPGA
26
Stitching
Stitcher traverses the design hierarchy and:
Replaces
debugging component placeholders
with necessary logic
Creates a simple route from all variables to
off-chip storage devices
During execution, the stitcher records:
A mapping
between variable names and their
physical variable unit in hardware
The latency within the variable routing
network
27
Variable Control Unit (VCU)
Inserted in place of
each variable block in
design
Automatically implied
for every state variable
in a state machine
Combination of local
buffers and off-chip
DRAM
Exact memory allocation
is subject to
experimentation
28
Debug Controller (DC)
Interface between
all variable and
assertion
instances, the
runtime user shell,
and process
control “services”
Regulates the
system clock both
for exceptions and
to prevent variable
storage overflows
29
Runtime Shell Examples
load
Initialize or reset a design
halt
Stop the design as soon as possible
runfor
Run the design for a number of cycles
cont
Run the design until the next exception
break
View, enable, or change a breakpoint
view
View a variable’s value or history
set
Override a variable’s value or source
rewind
Rewind the system state by n cycles
30
Future Work
Complete infrastructure for BEE2
Extensive experiments with variable memory
Efficient
methods for variable routing
Storage requirements and hierarchy
Time/Space tradeoffs for periodic sampling
Generalize framework to define concepts
such as variable priorities, multiple debug
levels, and extensions to text-based
languages
31
Questions?