Transcript Document
Diseño de Filtros FIR usando FPGA IEEE-Argentina Parte 2 07/07/2015 Ing Daniel A Jacoby 1 Outline • Power of Parallelism • Platform FPGA Virtex-II/Virtex-II Pro Series • Why should I use FPGAs for DSP? Introduction - 1 - 2 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Essence of a DSP Processor Cycles expended making decisions and controlling flow Program Counter and Control I/O Cycles expended communicating with outside world or other processors Introduction - 1 - 3 Program must be stored in ROM and many instructions do not directly contribute to processing Program Memory Registers Instruction Decode ALU ALU supports many operations but only one or a few can be used at one time © 2003 Xilinx, Inc. All Rights Reserved Fixed bit width. Algorithm may not require all bits Memory All values currently not in use must be retained ALU contains a fixed set of operations and multiple operations (cycles) required to achieve desired effect For Academic Use Only Multiply Accumulate Single Engine • Sequential processing limits data throughput – Time-shared MAC unit – High clock frequency creates difficult system challenge • 256 Tap FIR Filter – 256 multiply and accumulate (MAC) operations per data sample – One output every 256 clock cycles Introduction - 1 - 4 © 2003 Xilinx, Inc. All Rights Reserved Data In Reg Loop Algorithm 256 times MAC unit Data Out For Academic Use Only Sequential Processing Limits System Performance Sample Rate (MSamples/s) 40 Single 300 MHz Processor Two 300 MHz Processor 35 30 Channel Density or Sample Rate 25 Fixed Processor Clock Rate = Number of operations per sample 20 Max Sample Rate 15 10 5 2 46 8 16 24 32 40 48 56 64 72 80 88 96 104 No. of coefficients Algorithmic Complexity Introduction - 1 - 5 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Multiply Accumulate Multiple Engines • Parallel processing maximizes data throughput – Support any level of parallelism – Optimal performance/cost tradeoff Data In C0 Reg1 Reg0 C1 Reg2 C2 Reg255 .... C255 • 256 Tap FIR Filter – 256 multiply and accumulate (MAC) operations per data sample – One output every clock cycle All 256 MAC operations in one clock cycle Data Out • Flexible architecture – Distributed DSP resources (LUT, registers, multipliers, & memory) Introduction - 1 - 6 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-II Platform FPGA (1) Active Interconnect™ Powerful CLB Slice S3 Switch Matrix • Fully buffered • Fast, predictable CLB, IOB, DCM Slice S2 Switch Matrix Slice S0 BRAM Block RAM • 18KBit True Dual Port • Up to 3 Mbits / device Introduction - 1 - 8 Slice S1 Multipliers • 18b x 18b multiplier • 200+ MHz pipelined © 2003 Xilinx, Inc. All Rights Reserved • 8 LUTs • 128b distributed RAM • Wide input functions (32:1) • Support for slices based multipliers For Academic Use Only Virtex-II Platform FPGA (2) 16 Global Clocks • Eight clocks to any quadrant • Switch glitch-free between clocks DCM 16 Clocks DCI • On-chip termination • Guaranteed signal integrity • Eliminates 100s of resistors Introduction - 1 - 9 © 2003 Xilinx, Inc. All Rights Reserved • Zero delay clock • Precision phase shift • Frequency synthesis • Duty cycle correction • Clock multiply and divide For Academic Use Only Virtex-II Memory Hierarchy Distributed RAM 16x1 16x1 16x1 16x1 16x1 16x1 16x1 16x1 High-Performance External Memory Interfaces DDR SDRAM 16k x 1 ZBT® SRAM 8k x 2 4k x 4 2k x 9 1k x 18 512 x 36 QDR SRAM True-Dual Port™ Synchronous Block RAM Introduction - 1 - 10 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-II CLB • Flexible resources COUT TBUF TBUF — Wide-input functions • 16:1 multiplexer in 1 CLB COUT Slice S3 — Fast arithmetic functions • Two dedicated carry chains — Cascadable shift registers in LUT • 128-b shift register in 1 CLB • Ease of Performance Slice S2 Switch Matrix SHIFT Slice S1 — Direct routing enabling high speed Slice S0 CIN Introduction - 1 - 11 © 2003 Xilinx, Inc. All Rights Reserved CIN For Academic Use Only Fast Connects Virtex-II Slice • Each slice contains two: — Four inputs lookup tables — 16-bit distributed SelectRAM — 16-bit shift register RAM16 SRL16 MUXFx LUT G • Each register: — D flip-flop — Latch Register CY MUXF5 • Dedicated logic: LUT F CY Register Arithmetic Logic Introduction - 1 - 12 © 2003 Xilinx, Inc. All Rights Reserved — Muxes — Arithmetic logic — MULT_AND — Carry Chain For Academic Use Only Unique Distributed RAM • LUTs used as memory inside the fabric • Flexible, can be used as RAM, ROM, or shift register • Distributed memory with fast access time RAM16 • Cascadable with built-in CLB routing SRL16 • Applications – – – – – Linear feedback shift register Distributed arithmetic Time-shared registers Small FIFO -1 Digital delay lines (Z ) 64b 64b Dual Port RAM 1 CLB 128b Single Port RAM 16b LUT 1 CLB Shift register 128b 16b Introduction - 1 - 13 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only 1 CLB The SRL16E • The 16 SRAM cells have been organized into a shift register – – The ‘CE’ is used, in conjunction with the clock, to write data into the first flip-flop and for all other data to move right by one position Because this is a predictable operation, no address is required for writing • The SRL16E is excellent in implementing efficient DSP Functions – – A very efficient way to delay data samples Shifting samples and scanning at faster rate D Q15 CE A Q SRLC16E D Q Cascadable CE D CE D Q A[3:0] CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q CE D Q 0000 Q15 1111 Q Introduction - 1 - 14 CE D Q © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Multiplier Unit • • Embedded 18-bit x 18-bit multiplier Quantity: – • • – • • • • 36 Bit 2V40 : 4 2V8000 : 168 18 Bit Virtex-II Pro : up to 556 • • 18 Bit Virtex-II : up to 168 2VP2 : 12 2VP125 : 556 2s complement signed operation 4- to 18-bit operands Combinational & pipelined options Operates with block RAM and fabric to implement MAC function Introduction - 1 - 15 Signed Multiply Performance Virtex-II 18 x 18 Virtex-II Pro 18x18 245 MHz 300 MHz Pipelined multiplier with registered inputs and outputs Preliminary V1.60 Speeds File © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-II Family Virtex-II Part Number XC2V 40 XC2V 80 XC2V 250 XC2V 500 XC2V 1000 XC2V 1500 XC2V 2000 XC2V 3000 XC2V 4000 XC2V 6000 XC2V 8000 LUTs + FFs 512 1,024 3,072 6,144 10,240 15,360 21,504 28,672 46,080 67,584 93,184 BRAM (kb) 72 144 432 576 720 864 1,008 1,728 2,160 2,592 3,024 Multipliers 4 8 24 32 40 48 56 96 120 144 168 DCM Units 4 4 8 8 8 8 8 12 12 12 12 392 456 484 528 624 824 824 824 912 1,104 1,108 684 684 684 Package Available SelectIO CS144 88 92 92 FG256 88 120 172 172 172 200 264 324 FG456 FG676 FF896 432 FF1152 720 FF1517 BG575 328 392 408 BG728 456 516 BG957 624 684 Introduction - 1 - 16 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Virtex-II Pro versus Virtex-II • • • • Lower cost Up to 24 RocketIO™ embedded multi-gigabit transceivers Up to four PowerPC More Memory – – • 10 MBits in block RAM 1,738 KBits in Distributed RAM • More block RAMs and multiplier blocks – – • Smaller technology – More I/O pins per package (up to 1704) Introduction - 1 - 17 556 embedded multipliers 556 Block RAMs © 2003 Xilinx, Inc. All Rights Reserved 0.13 µm Nine-Layer Copper Process with 90 nm high-speed transistors For Academic Use Only Virtex-II Pro Family Virtex-II Pro 2VP2 2VP4 2VP7 2VP20 2VP30 2VP40 2VP50 2VP70 2VP100 2VP125 Logic Cells 3,168 6,768 11,088 20,880 30,816 43,632 53,136 74,448 99,216 125,136 PPC405 0 1 1 2 2 2 2 2 2 4 MGT3.125Gb 4 4 8 8 8 12* 16* 20 20* 24* BRAM (kb) 216 504 792 1,584 2,448 3,456 4,176 5,904 7,992 10,008 Multipliers 12 28 44 88 136 192 232 328 444 556 DCM Units 4 4 4 8 8 8 8 8 12 12 1040 1040 1164 1200 Package MGT Available SelectIO FG256 4 140 140 FG456 8 156 248 248 FF672 8 204 348 396 FF896 8 FF1152 12 FF1148* 396 556 556 564 692 692 692 0 804 812 FF1517 16 804 852 FF1704 20 FF1696* 0 964 996 * FF1148 and FF1696 special bond option: No MGT with Maximum SelectIO Introduction - 1 - 18 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only FPGAs Mean Parallelism Reason 1: FPGAs handle high computational workloads FPGA Conventional DSP Device (Von Neumann architecture) Data In Data In Reg Reg1 Reg0 C0 C1 Reg2 C2 Reg255 .... C255 MAC unit Data Out Data Out 256 Loops needed to process samples All 256 MAC operations in 1 clock cycle 256 Tap FIR Filter Example Introduction - 1 - 19 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only FPGAs are Ideal for Multi-channel DSP designs 20MHz Samples • LPF ch1 LPF ch2 LPF ch3 LPF ch4 80MHz Samples LPF Multi Channel Filter FPGAs are also ideally suited for multi-channel DSP designs – – Many low sample rate channels can be multiplexed (e.g. TDM) and processed in the FPGA, at a high rate Interpolation (using zeros) can also drive sample rates higher Introduction - 1 - 20 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Why FPGAs for DSP? (2) Reason 2: Tremendous Flexibility A Q = (A x B) + (C x D) + (E x F) + (G x H) can be implemented in parallel B C D E F G H × × × × + + Q + + + But is this the only way in the FPGA? Introduction - 1 - 21 © 2003 Xilinx, Inc. All Rights Reserved + For Academic Use Only Customize Architectures to Suit Your Ideal Algorithms Parallel × × Semi-Parallel Serial + + + + × × + × × + + DQ + + + DQ × + + Speed Optimized for? Cost FPGAs allow Area (cost) / Performance tradeoffs Introduction - 1 - 22 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only DSP Design Flows in FPGA This material exempt per Department of Commerce license exception TSU © 2003 Xilinx, Inc. All Rights Reserved Objectives After completing this module, you will be able to: • • • • • • Describe the advantages and disadvantages of three different design flows Use HDL, CORE Generator, or System Generator for DSP depending on design requirements and familiarity with the tools Explain why there is a need for an integrated flow from system design to implementation Describe the System Generator and the tools it interfaces with Build a model, simulate it, generate VHDL, and go through the design flow Describe how Hardware in the Loop verification is beneficial in complex system design Introduction - 1 - 24 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Outline • • • • • • • • • • Introduction - 1 - 25 Using HDL Using the Xilinx CORE Generator Using the Xilinx System Generator for DSP Lab 1: Creating a 12x8 MAC HDL Co-Simulation Hardware Verification In System Debug Resource Estimator Summary Simulink Tips and Tricks © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only HDL Design Verification HDL Behavioral Simulation Synthesis Functional Simulation Implementation Timing Simulation Download In-Circuit Verification Introduction - 1 - 26 © 2003 Xilinx, Inc. All Rights Reserved Implement your design using VHDL or Verilog For Academic Use Only Synthesis Design Verification HDL Behavioral Simulation Synthesis Functional Simulation Implementation Timing Simulation Download In-Circuit Verification Introduction - 1 - 27 © 2003 Xilinx, Inc. All Rights Reserved Synthesize the design to create an FPGA netlist For Academic Use Only Implementation Design Verification HDL Behavioral Simulation Synthesis Functional Simulation Implementation Timing Simulation Download In-Circuit Verification Introduction - 1 - 28 © 2003 Xilinx, Inc. All Rights Reserved Translate, place and route, and generate a bitstream to download in the FPGA For Academic Use Only Outline • • • • • • • • • Introduction - 1 - 29 Using HDL Using the Xilinx CORE Generator Using the Xilinx System Generator for DSP Lab 1: Creating a 12x8 MAC HDL Co-Simulation Hardware Verification Resource Estimator Summary Simulink Tips and Tricks © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only CORE Generator Design Verification HDL COREGen Synthesis Behavioral Simulation Functional Simulation Implementation Timing Simulation Download In-Circuit Verification Introduction - 1 - 30 Instantiate optimized IP within the HDL code © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Synthesize, Implement, Download Design Verification HDL COREGen Synthesis Behavioral Simulation Functional Simulation Implementation Timing Simulation Download In-Circuit Verification Introduction - 1 - 31 © 2003 Xilinx, Inc. All Rights Reserved Synthesize, Implement, and Download the bitstream, similar to the original design flow For Academic Use Only Xilinx IP Solutions DSP Functions Math Functions Memory Functions P Multiplier Generator P Asynchronous FIFO $P Additive White Gaussian Noise (AWGN) - Parallel Multiplier P Block Memory modules $P Reed Solomon Dyn Constant Coefficient Mult P Distributed Memory $ 3GPP Turbo Code - Serial Sequential Multiplier P Distributed Mem Enhance $P Viterbi Decoder - Multiplier Enhancements P Sync FIFO (SRL16) P Convolution Encoder P Pipelined Divider P Sync FIFO (Block RAM) $P Interleaver/De-interleaver P CORDIC P CAM (SRL16) P LFSR P CAM (Block RAM) P 1D DCT Base Functions P 2D DCT P DA FIR P Binary Decoder P MAC P Twos Complement P MAC-based FIR filter P Shift Register RAM/FF Fixed FFTs 16, 64, 256, 1024 points P Gate modules P FFT 16- to 16384- points P Multiplexer functions P FFT - 32 Point P Registers, FF & latch based P Sine Cosine Look-Up Tables IP CENTER P Adder/Subtractor $P Turbo Product Code (TPC) http://www.xilinx.com/ipcenter P Accumulator P Direct Digital Synthesizer P Comparator P Cascaded Integrator Comb P Binary Counter P Bit Correlator P Digital Down Converter Key: $ = License Fee, P = Parameterized, S = Project License Available, BOLD = Available in the Xilinx Blockset for the System Generator for DSP Introduction - 1 - 32 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx CORE Generator List of available IP from or Fully Parameterizable Introduction - 1 - 33 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx CORE Generator Introduction - 1 - 34 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx CORE Generator Introduction - 1 - 35 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx CORE Generator Mi_filtro.coe Introduction - 1 - 36 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx CORE Generator Habilita directivas para el posicionamiento de la MAC Introduction - 1 - 37 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Smart-IP Technology • Pre-defined placement and routing enhances performance and predictability Fixed Placement Relative Placement Other logic has no effect on the core • I/Os Guarantees I/O and Logic Predictability Fixed Placement & Pre-defined Routing Guarantees Performance Performance is independent of: 200 MHz 200 MHz Core Placement Number of Cores Device Size 200 MHz Introduction - 1 - 38 200 MHz © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Outputs • • • .EDN (EDIF implementation netlist) .XCO (core implementation data file / log file) Optional: – – – – – Introduction - 1 - 39 .ASY Foundation or Innoveda symbols .VEO Verilog instantiation template .V Verilog behavioral simulation model .VHO VHDL instantiation template .VHD VHDL behavioral simulation model © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Challenges for a DSP Software Platform • Industry Trends – – – – – • Trend towards platform chips (FPGAs, DSP) resulting in greater complexity Highly flexible systems required to meet changing standards Multiple design methodologies - control plane/datapath Challenges in modeling and implementing an entire platform Hardware in the loop verification is useful in complex system design and System Generator supports it System Design Challenges – – – Introduction - 1 - 40 Leveraging legacy HDL code Modeling & implementing control logic and datapath No expert exists for all facets of system design © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Traditional Simulink FPGA Flow System Verification System Architect GAP Simulink FPGA Designer HDL Synthesis Implementation Download Introduction - 1 - 41 Functional Simulation Verify Equivalence Timing Simulation In-Circuit Verification © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only System Generator for DSP v8.1 – An Overview • Industry’s system-level design environment (IDE) for FPGAs – – Integrated design flow from Simulink to bit file Leverages existing technologies • • • • • Simulink library of arithmetic, logic operators and DSP functions (Xilinx Blockset) – • Matlab/Simulink from The MathWorks HDL synthesis IP Core libraries FPGA implementation tools Bit and cycle true to FPGA implementation Arithmetic abstraction – – Introduction - 1 - 42 Arbitrary precision fixed-point, including quantization and overflow Simulation of double precision as well as fixed point © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only System Generator for DSP v8.1 – An Overview VHDL code generation for Virtex-4™, Virtex-II Pro™, Virtex™-II, Virtex™-E, Virtex™, Spartan™-3E, Spartan™-3, Spartan™-IIE and Spartan™-II devices – – – – – – – – • Hardware expansion and mapping Synthesizable VHDL with model hierarchy preserved Mixed language support for Verilog Automatic invocation of CORE Generator to utilize IP cores ISE project generation to simplify the design flow HDL testbench and test vector generation Constraint file (.xcf), simulation ‘.do’ files generation HDL Co-Simulation via HDL C-Simulation Verification acceleration using Hardware in the Loop Introduction - 1 - 43 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only System Generator for DSP Platform Designs ISIM Introduction - 1 - 44 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The SysGen Design Flow DSP Development Flow 1. Develop Algorithm & System Model Simulink MDL 2. Automatic Code Generation RTL VHDL & Cores 3. Xilinx Implementation Flow HDL Test Bench Simulation ISIM Bitstream Download to FPGA Introduction - 1 - 45 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only System Generator Based Design Flow MATLAB/Simulink HDL System Generator System Verification Synthesis Functional Simulation Implementation Timing Simulation Download In-Circuit Verification Introduction - 1 - 46 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only System Generator Based Design Flow MATLAB/Simulink HDL System Generator System Verification Synthesis Functional Simulation Implementation Timing Simulation Files Used •Configuration file •VHDL •IP •Constraints File HDL-CoSimulation Download * In-Circuit Verification ModelSim helper block not needed when ISIM simulator is used Introduction - 1 - 47 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only System Generator Based Design Flow Files Used •Configuration file •VHDL •IP •Constraints File MATLAB/Simulink HDL System Generator System Verification Synthesis Functional Simulation Implementation Timing Simulation Download Introduction - 1 - 48 In-Circuit Verification © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Creating a System Generator Design • • • • Invoke Simulink library browser To open the Simulink library browser, click the Simulink library browser button or type “Simulink” in MATLAB console The library browser contains all the blocks available to designers Start a new design by clicking the new sheet button Introduction - 1 - 49 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Creating a System Generator Design • • Build the design by dragging and dropping blocks from the Xilinx blockset onto your new sheet. Design Entry is similar to a schematic editor Connect up blocks by pulling the arrows on the sides of each block Introduction - 1 - 50 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Finding Blocks • • Use the Find feature to search ALL Simulink libraries Xilinx blockset has nine major sections – Basic elements • – Communication • – Dual Port RAM, Single Port RAM Tools • Introduction - 1 - 51 Multiply, accumulate, inverter Memory • – All Xilinx blocks – quick way to view all blocks Math • – FDATool, FFT, FIR Index • – Convert, Slice DSP • – MCode, Black Box Data Types • – Error correction blocks Control Logic • – Counters, delays ModelSim, Resource Estimator © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Configure Your Blocks • Double-click or go to Block Parameters to view a block’s configurable parameters – – – – – – – • Arithmetic Type: Unsigned or twos complement Implement with Xilinx Smart-IP Core (if possible)/ Generate Core Latency: Specify the delay through the block Overflow and Quantization: Users can saturate or wrap overflow. Truncate or Round Quantization Override with Doubles: Simulation only Precision: Full or the user can define the number of bits and where the decimal point is for the block Sample Period: Can be inherent with a “-1” or must be an integer value Note: While all parameters can be simulated, not all are realizable Introduction - 1 - 52 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Values Can Be Equations • • • You can also enter equations in the block parameters, which can aid calculation and your own understanding of the model parameters The equations are calculated at the beginning of a simulation Useful MATLAB operators – – – – – – – Introduction - 1 - 53 + add - subtract * multiply / divide ^ power pi (3.1415926535897.…) exp(x) exponential (ex) © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Important Concept 1: The Numbers Game • Simulink uses a “double” to represent numbers in a simulation. A double is a “64-bit twos complement floating point number” – • Because the binary point can move, a double can represent any number between +/- 9.223 x 1018 with a resolution of 1.08 x 10-19 …a wide desirable range, but not efficient or realistic for FPGAs Xilinx Blockset uses n-bit fixed point number (twos complement optional) 2 -2 2 1 1 0 Integer 2 0 1 2 -1 1 2 -2 0 2 -3 1 2 -4 1 2 -5 1 2 -6 1 2 -7 0 2 -8 2 1 Fraction -9 0 2 -10 2 0 -11 1 2 -12 0 2 -13 1 Value = -2.261108… Format = Fix_16_13 (Sign: Fix = Signed Value Format = Sign_Width_Decimal point from the LSB UFix = Unsigned value) Design Hint: Always try to maximize the dynamic range of design by using only the required number of bits Thus, a conversion is required when communicating with Xilinx blocks with Simulink blocks (Xilinx blockset MATLAB I/O Gateway In/Out) Introduction - 1 - 54 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only What About All Those Other Bits? • The Gateway In and Out blocks support parameters to control the conversion from double precision to N - bit fixed point precision DOUBLE 6 -2 .... 1 4 5 2 2 1 1 2 1 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 .... 1 0 1 1 0 1 1 1 1 0 1 0 0 1 0 1 OVERFLOW QUANTIZATION - Saturate - Truncate 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -2 2 2 2 2 2 2 2 2 2 2 2 1 0 1 1 0 1 1 1 1 0 1 0 FIX_12_9 Introduction - 1 - 55 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Other Type: Boolean • • The Xilinx Blockset also uses the type Boolean for control ports like CE and RESET The Boolean type is a variant on the 1-bit unsigned number in that it will always be defined (High or Low). A 1-bit unsigned number can become invalid; a Boolean type cannot Introduction - 1 - 56 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Knowledge Check Using the technique shown, convert the following fractional values… • Define the format of the following twos complement binary fraction and calculate the value it represents Format = < 1 • 1 0 0 0 1 0 1 0 1 1 _ > Value = What format should be used to represent a signal that has: a) Max value: +0.999… Min value: -0.999… Quantized to 12 bit data Format = < • 1 _ Fill in the table: Introduction - 1 - 57 _ _ b) Max value: 0.8 Min value: 0.2 Quantized to 10 bit data > Format = < _ Operation <Fix_12_9> + <Fix_8_3> <Fix_8_7> x <Ufix_8_6> © 2003 Xilinx, Inc. All Rights Reserved _ c) Max value: 278 Min value: -138 Quantized to 11 bit data > Format = < _ Full Precision Output Type For Academic Use Only _ > Answers Using the technique below, convert the following fractional values • Define the format of the following twos complement binary fraction and calculate the value it represents Format = < Fix_12_5 > 1 • • 1 0 0 0 1 1 0 1 0 1 1 Value = -917 = -28.65625 32 What format should be used to represent a signal that has: a) Max value: +1 Min value: -1 Quantized to 12-bit data b) Max value: 0.8 Min value: 0.2 Quantized to 10-bit data c) Max value: 278 Min value: -138 Quantized to 11-bit data Format = < FIX _12_10 > Format = <UFIX_10_10> Format = < FIX _11_1> Fill in the table: Operation <Fix_12_9> + <Fix_8_3> <Fix_8_7> x <Ufix_8_6> Introduction - 1 - 58 © 2003 Xilinx, Inc. All Rights Reserved Full Precision Output Type <Fix_15_9> <Fix_16_13> For Academic Use Only Creating a System Generator Design Simulink Sources SysGen Data Path and helper blocks Simulink Sinks Gateway blocks used to interface between Simulink and SysGen blocks Introduction - 1 - 59 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only System Generator Design Start simulation by pressing the play button • All SysGen design must contain a System Generator block • Used to set global netlisting attributes Introduction - 1 - 60 © 2003 Xilinx, Inc. All Rights Reserved • Designs may have levels of hierarchy • Double click to “push” into a subsystem For Academic Use Only Picking Bits: Why We Do It • • • • To combine two data buses together to form a new bus To force a conversion of data type including the number of bits and binary bits To reinterpret unsigned data as signed, or the converse To extract certain bits of data, especially when there is bit growth Introduction - 1 - 61 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Xilinx Blocks • – • • Concat • – Available in basic elements, math, and index libraries Introduction - 1 - 62 – Available in basic elements, data types, math, and index libraries Available in basic elements, data types, and index libraries Reinterpret Convert Slice – Available in basic elements, control logic, data types, and index libraries © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Concat Block • • Performs a concatenation of two bit vectors Both inputs must be unsigned integers – • • i.e., two unsigned numbers with binary points at position zero Reinterpret block provides signed to unsigned conversion capabilities that can extend the functionality of the concat block Does not use Xilinx LogiCORE and hardware resources Introduction - 1 - 63 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Convert Block • The Xilinx convert block converts each input sample to a number of a desired arithmetic type – – – – Introduction - 1 - 64 A number can be converted to a signed (twos complement) or unsigned value Total number of bits and binary point are specified by the user Rounding and quantization options apply to the output value Does not use Xilinx LogiCORE but may use additional hardware depending on the overflow and quantization options © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Convert Block • What is it doing? – – – Introduction - 1 - 65 User specifies the total number of bits, where the binary point is, and the arithmetic type (signed or unsigned) First it lines up the binary point between input and output port types Next, the total number of bits and binary point the user specifies are used, and depending if overflow and quantization options are used the output may change, as opposed to dropping bits © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Convert Block • The following through the convert block would result in the same value using a different number of bits and binary point FIX_10_8 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 FIX_7_4 Introduction - 1 - 66 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Convert Block • • Saturating the overflow may change the fractional number to get the saturated value Rounding the quantization may also affect the value to the left of the binary point (the whole number) Introduction - 1 - 67 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Convert Block • When we convert to a Fix_6_0, how do we get two different values? OVERFLOW - Wrap - Saturate - Flag Error QUANTIZATION 0 1 1 0 0 0 0 0 0 - Truncate - Round FIX_10_8 0 0 0 0 1 0 Round to decimal +2 Add ‘1’ to round 0 1 Truncate to decimal +1 Drop the bits FIX_6_0 0 0 0 0 FIX_6_0 Introduction - 1 - 68 0 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Reinterpret Block • • • • • Forces its output to a new type without any regard for retaining the numerical value represented by the input Total number of bits in = total number of bits out Allows for unsigned data to be reinterpreted as signed data, and the converse Also allows scaling of the data through the repositioning of the binary point Does not use Xilinx LogiCORE and hardware resources Introduction - 1 - 69 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Reinterpret Block • Reinterpret the UFIX_10_8 number and force the binary point to position 5 0 1 1 0 0 0 0 0 0 0 +1.5 0 0 0 0 +12 FIX_10_8 0 1 1 0 0 0 FIX_10_5 Introduction - 1 - 70 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Slice Block • • The Xilinx slice block allows you to slice off a sequence of bits from your input data and create a new data value The output data type is unsigned with its binary point at zero Introduction - 1 - 71 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only The Slice Block • Take a slice of the FIX_10_8 number by taking a 4-bit slice and offsetting the bottom bit of the slice by 5 bits 0 1 1 0 0 0 0 0 0 0 +1.5 12 • Upper Bit Location + Width: Offset of top bit from MSB = 0 and width = 4 0 • 1 1 0 6 Two Bit Locations: Offset of top bit from MSB of Input = -1 and Offset of Bottom bit from LSB of Input = 5 12 Introduction - 1 - 72 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Coffee Break 07/07/2015 Ing Daniel A Jacoby 73 Important Concept 2: Sample Period • • • Every SysGen signal must be “sampled”; transitions occur at equidistant discrete points in time called sample times Each block in a Simulink design has a “Sample Period” and it corresponds to how often that block’s function is calculated and the results outputted This sample period must be set explicitly for: • • • Gateway in Blocks w/o inputs (note: constants are idiosyncratic) Sample period can be “derived” from input sample times for other blocks Introduction - 1 - 74 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Important Concept 2: Sample Period • The units of the sample period can be thought of as arbitrary, BUT a lot of Simulink source blocks do have an essence of time – • • For example, a sample period of 1/44100 means the block’s function will be executed every 1/44100 of a sec Remember Nyquist Theorem (Fs 2fmax) when setting sample periods The sample period of a block DIRECTLY relates to how that block will be clocked in the actual hardware. More on this later Introduction - 1 - 75 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Setting the Global Sample Period • The Simulink System Period MUST be set in the System Generator token. For single rate systems it will be the same as the Sample Periods set in the design. More on Multi Rate designs later Sample Period = 1 Introduction - 1 - 76 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only SysGen Token Slave Controls Master Controls “Simulink System Period” MUST be set correctly for simulation to work Introduction - 1 - 77 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Using the Scope • • Click Properties to change the number of axes displayed and the time range value (X-axis) Use the Data History tab to control how many values are stored and displayed on the scope – • • Also can direct output to workspace Click Autoscale to quickly let the tools configure the display to the correct axis values Right-click on the Y-axis to set its value Introduction - 1 - 78 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Design and Simulate in Simulink Push “play” to simulate the design. Go to “Simulation Parameters” under the “Simulation” menu to control the length of simulations Introduction - 1 - 79 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Generate the HDL Code Once complete, double-click the System Generator token • • • • • Select the target device Select Synthesis Tool Set the FPGA clock period desired Select to generate the testbench Generate the VHDL Introduction - 1 - 80 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only System Generator Output Files • Design files – – – – • Project files – – – • <design>.VHD/ .V : HDL design files <design>_cw.VHD/.V: Top-level HDL wrapper that contains clock circuit and SysGen Design .EDN and .NGC: Core implementation file <design>_cw.XCF : Xilinx constraints file for timing and location constraints <design>_cw.ISE : Project Navigator project file .PRJ: Synthesis project files for XST and Synplify .TCL : Scripts for Synplify and Leonardo project creation Simulation files – – – Introduction - 1 - 81 .DO : Simulation scripts for MTI .DAT : Data files containing the test vectors from System Generator <design>_tb.VHD/.V : Simulation testbench © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Black Box • • • Allows a way to import HDL models into System Generator Allows co-simulation of black box HDL with Simulink by using either ModelSim or ISE Simulator Integrates the imported HDL and implementation files (EDN, NGC) with the netlist generated from System Generator Introduction - 1 - 82 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Choice of Target Hardware • JTAG Hardware-in-the-loop development platforms: – Any board with a JTAG header • • Configure your JTAG-based board in 20 minutes Xilinx – XtremeDSP Development kit (Virtex-4, Virtex-II Pro) » – – • ML-401 and ML402 Boards (Virtex-4) Multimedia Board (Virtex-II) Distributors: – • Avnet, Insight, Nu Horizons Key board vendors: – Alphadata, Annapolis, Nallatech, Lyrtech » • Also features high bandwidth interface via PCI High bandwidth interfaces Ethernet Hardware Co-simulation Introduction - 1 - 83 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Add Your Own Board in 20 Minutes • • Integrate your board into System Generator for hardware in the loop co-simulation Simple wizard collects information for your target platform Introduction - 1 - 84 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Hardware Co-simulation (1) Begin with a model that is ready to be compiled for hardware co-simulation. (2) Open the SysGen GUI and select a compilation target. (4) A post-generation function is invoked to produce an FPGA configuration file. (3) Press the Generate button in the SysGen GUI. Introduction - 1 - 85 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Hardware Co-simulation (5) The post-generation script creates a new library containing a parameterized run-time co-simulation block. (6) The co-simulation runtime block is copied into the original model. Introduction - 1 - 86 (7) The hardware output is bit and cycle accurate when compared to the original model. © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Ethernet Hardware Co-simulation • Networked Ethernet hardware cosimulation – – – Introduction - 1 - 87 Provides remote access over a 10/100 Mbps network Communication handled by EMAC OPB peripheral Reconfiguration via Ethernet connection using MicroBlaze + SystemACE © 2003 Xilinx, Inc. All Rights Reserved FPGA Configuration For Academic Use Only WaveScope • Debugging tool to visualize signals – – – Displays data in analog(a la Scope) and logic mode Supports hex, decimal and binary radices Allows cross-referencing signals in the model Introduction - 1 - 88 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Debugging the FPGA at System Speed (1) • • • • Insert Chipscope Pro blocks into your Simulink design Configure FPGA using JTAG interface Chipscope probes will be inserted into the FPGA Perform in-system debug at system speed Introduction - 1 - 89 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Debugging the FPGA at System Speed (2) • New Shared Memory Interfaces – • What types of resources? – – – – • Allow multiple, independent resources (inside and outside Simulink) to share a common address space FPGA hardware (co-simulation) Simulink Blocks MATLAB Console Command Line Applications Common Address Space This makes for easy, in-system debugging Introduction - 1 - 90 © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Resource Estimator • • The block provides fast estimates of FPGA resources required to implement the subsystem Most of the blocks in the System Generator Blockset carries the resources information – – – – – – Introduction - 1 - 91 LUTs FFs BRAM Embedded multipliers 3-state buffers I/Os © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Resource Estimator • Three types of estimation – Estimate Area • – Quick Sum • – Uses the resources stored in block directly and sum them up (no sub-levels functions are invoked) Post-Map Area • Introduction - 1 - 92 This option computes resources for the current level and all sub-levels Opens up a file browser and let user select map report file. The design should have been generated and gone through synthesis, translate, and mapping phases. © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Summary • Full VHDL/Verilog (RTL code) – Advantages: • • • – Disadvantages: • • • • Introduction - 1 - 93 Portability Complete control of the design implementation and tradeoffs Easier to debug and understand a code that you own Can be time-consuming Don’t always have control over the Synthesis tool Need to be familiar with the algorithm and how to write it Must be conversant with the synthesis tools to obtain optimized design © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Summary • Full VHDL/Verilog (Instantiating Primitives) – Advantages: • • • – Disadvantages: • • • Introduction - 1 - 94 Full access to all architecture features Carry on further with optimization Best optimization Not as portable as RTL VHDL/Verilog Must be an FPGA expert and know the architecture Time-consuming © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Summary • CORE Generator – Advantages • • • – Disadvantages • • • Introduction - 1 - 95 Can quickly access and generate existing functions No need to reinvent the wheel and re-design a block if it meets specifications IP is optimized for the specified architecture IP doesn’t always do exactly what you are looking for Need to understand signals and parameters and match them to your specification Dealing with black box and have little information on how the function is implemented © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Summary • System Generator for DSP – Advantages • • • • • • – Disadvantages • • • • Introduction - 1 - 96 Huge productivity gains through high-level modeling Ability to simulate the complete designs at a system level Very attractive for FPGA novices Excellent capabilities for designing complex testbenches HDL Testbench, test vector and golden data written automatically Hardware in the loop simulation improves productivity and provides quick verification of the system functioning correctly or not Minor cost of abstraction: doesn’t always give the best result from an area usage point of view Customer may not be familiar with Simulink Not well suited to multiple clock designs No bi-directional bus supported © 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only End Day 2 07/07/2015 Ing Daniel A Jacoby 97