Transcript PPT
Programming with actors Jörn W. Janneck Xilinx Research Labs credits Dave B. Parlour Xilinx Research Labs Thomas A. Lenart Lund University Robert Esser University of Adelaide Ptolemy Miniconference VI, 2005-05-12 - 2 Xilinx Research Labs The FPGA Platform: Huge amounts of fine-grained concurrency. ... along with specialized blocks (multipliers, RAM, ALUs, processors...) Ptolemy Miniconference VI, 2005-05-12 - 3 Xilinx Research Labs The Problem: Using FPGAs to implement DSP applications requires circuit design expertise. Ptolemy Miniconference VI, 2005-05-12 - 4 Xilinx Research Labs The Research Goal: Design and build models and tools that make it possible for application domain experts to program FPGAs with highquality implementations. Ptolemy Miniconference VI, 2005-05-12 - 5 Xilinx Research Labs models and tools What does it take to program with actors? actors and dataflow as a concurrent model Cal as the language for writing actors driver application tools code generation circuits and software and combinations thereof animation & visualization Ptolemy Miniconference VI, 2005-05-12 - 6 Xilinx Research Labs models and tools We are focusing on... ... hardware that can be programmed. ... programming concepts that can be implemented. Ptolemy Miniconference VI, 2005-05-12 - 7 Xilinx Research Labs driver application MPEG-4 decoder Ptolemy Miniconference VI, 2005-05-12 - 8 Xilinx Research Labs driver application MPEG-4 decoder metrics • 60 atomic actors • 22 atomic actor classes • 3307 LOC (Cal) • LOC per actor class between 7 and 2054 actor constructs • variable token rates • static/cyclostatic rates • data-dependent choice • test for absence of tokens • non-prefix-monotonic actors Ptolemy Miniconference VI, 2005-05-12 - 9 Xilinx Research Labs driver application MPEG-4 decoder development time approx. 2 months. approximate sizes of various models Cal: “architectural ” C code: synthesizable VHDL: 3,300 LOC 4,200 LOC 15,000 LOC Cal VHDL Ptolemy Miniconference VI, 2005-05-12 - 10 Xilinx Research Labs code generation 2D-IDCT implementation first step to complete decoder implementation naive code generator has sufficient language coverage for IDCT compute. 10 classes 200 LOC covers most language features, except... - addressable memory - multicycle actions Ptolemy Miniconference VI, 2005-05-12 - 11 Xilinx Research Labs code generation 2D-IDCT, version 1 Starting architecture is very inefficient: 22 multipliers with 12.5% utilization. 1-D Ptolemy Miniconference VI, 2005-05-12 - 12 AT 1-D Xilinx Research Labs code generation 2D-IDCT, version 2 interleave row and column streams pipelined 1D-IDCT result: 6 multipliers with 46% utilization more operator re-use costly in terms of operand routing >100 Mhz clock Pipelined 1-D IDCT Ptolemy Miniconference VI, 2005-05-12 - 13 Xilinx Research Labs code generation summary good QoR for “naive” code generator redesigned model compares favorably with existing VHDL implementations smaller, faster, simpler to use HDTV rate demonstrates strength of programming model, rather than quality of code generator lots of room for improvement pipelining, folding Ptolemy Miniconference VI, 2005-05-12 - 14 Xilinx Research Labs animation & visualization actor animation (1/4) input queues Ptolemy Miniconference VI, 2005-05-12 - 15 Xilinx Research Labs animation & visualization actor animation (2/4) input queue history Ptolemy Miniconference VI, 2005-05-12 - 16 Xilinx Research Labs animation & visualization actor animation (3/4) actor state variables Ptolemy Miniconference VI, 2005-05-12 - 17 Xilinx Research Labs animation & visualization actor animation (4/4) action selection status Ptolemy Miniconference VI, 2005-05-12 - 18 Xilinx Research Labs outlook improved hardware code generation language coverage optimizations software code generation analysis and optimization tools debugging/visualization tools alternative entry mechanisms “VisualCal” Ptolemy Miniconference VI, 2005-05-12 - 19 Xilinx Research Labs