Transcript PPT

Programming with actors
Jörn W. Janneck
Xilinx Research Labs
credits
Dave B. Parlour Xilinx Research Labs
Thomas A. Lenart Lund University
Robert Esser University of Adelaide
Ptolemy Miniconference VI, 2005-05-12 - 2
Xilinx Research Labs
The FPGA Platform:
Huge amounts of fine-grained concurrency.
... along with specialized blocks (multipliers, RAM, ALUs, processors...)
Ptolemy Miniconference VI, 2005-05-12 - 3
Xilinx Research Labs
The Problem:
Using FPGAs to implement DSP
applications requires circuit design
expertise.
Ptolemy Miniconference VI, 2005-05-12 - 4
Xilinx Research Labs
The Research Goal:
Design and build models and tools that
make it possible for application domain
experts to program FPGAs with highquality implementations.
Ptolemy Miniconference VI, 2005-05-12 - 5
Xilinx Research Labs
models and tools
What does it take to program with actors?
actors and dataflow as a concurrent model
Cal as the language for writing actors
driver application
tools
code generation
circuits and software and combinations thereof
animation & visualization
Ptolemy Miniconference VI, 2005-05-12 - 6
Xilinx Research Labs
models and tools
We are focusing on...
... hardware that can be programmed.
... programming concepts that can be
implemented.
Ptolemy Miniconference VI, 2005-05-12 - 7
Xilinx Research Labs
driver application
MPEG-4 decoder
Ptolemy Miniconference VI, 2005-05-12 - 8
Xilinx Research Labs
driver application
MPEG-4 decoder
metrics
• 60 atomic actors
• 22 atomic actor classes
• 3307 LOC (Cal)
• LOC per actor class
between 7 and 2054
actor constructs
• variable token rates
• static/cyclostatic rates
• data-dependent choice
• test for absence of tokens
• non-prefix-monotonic
actors
Ptolemy Miniconference VI, 2005-05-12 - 9
Xilinx Research Labs
driver application
MPEG-4 decoder
development time approx. 2 months.
approximate sizes of various models
Cal:
“architectural ” C code:
synthesizable VHDL:
3,300 LOC
4,200 LOC
15,000 LOC
Cal
VHDL
Ptolemy Miniconference VI, 2005-05-12 - 10
Xilinx Research Labs
code generation
2D-IDCT implementation
first step to complete decoder implementation
naive code generator has sufficient language coverage for
IDCT compute.
10 classes
200 LOC
covers most language
features, except...
- addressable memory
- multicycle actions
Ptolemy Miniconference VI, 2005-05-12 - 11
Xilinx Research Labs
code generation
2D-IDCT, version 1
Starting architecture is very inefficient: 22 multipliers
with 12.5% utilization.
1-D
Ptolemy Miniconference VI, 2005-05-12 - 12
AT
1-D
Xilinx Research Labs
code generation
2D-IDCT, version 2
interleave row and column streams
pipelined 1D-IDCT
result:
6 multipliers with 46% utilization
more operator re-use costly in terms of operand routing
>100 Mhz clock
Pipelined 1-D IDCT
Ptolemy Miniconference VI, 2005-05-12 - 13
Xilinx Research Labs
code generation
summary
good QoR for “naive” code generator
redesigned model compares favorably with
existing VHDL implementations
smaller, faster, simpler to use
HDTV rate
demonstrates strength of programming
model, rather than quality of code generator
lots of room for improvement
pipelining, folding
Ptolemy Miniconference VI, 2005-05-12 - 14
Xilinx Research Labs
animation & visualization
actor animation (1/4)
input queues
Ptolemy Miniconference VI, 2005-05-12 - 15
Xilinx Research Labs
animation & visualization
actor animation (2/4)
input queue history
Ptolemy Miniconference VI, 2005-05-12 - 16
Xilinx Research Labs
animation & visualization
actor animation (3/4)
actor state variables
Ptolemy Miniconference VI, 2005-05-12 - 17
Xilinx Research Labs
animation & visualization
actor animation (4/4)
action
selection
status
Ptolemy Miniconference VI, 2005-05-12 - 18
Xilinx Research Labs
outlook
improved hardware code generation
language coverage
optimizations
software code generation
analysis and optimization tools
debugging/visualization tools
alternative entry mechanisms
“VisualCal”
Ptolemy Miniconference VI, 2005-05-12 - 19
Xilinx Research Labs