Transcript ADC Board VHDL Firmware development for Mona Lisa
ADC Board VHDL Firmware development for Mona Lisa
Roy Wastie
Overview
• Introduction • ADC Board • Hardware Blocks • Basic FPGA Architectures • Xilinx ISE 10.1 Tool Flow • USB • Algorithm • VHDL
Introduction
• Applications of FPGAs include digital signal processing , software-defined radio , aerospace and defense systems, ASIC prototyping, medical imaging , computer vision , speech recognition , cryptography , bioinformatics , computer hardware emulation & glue logic for PCBs .
ADC Board
External Clock & Trigger
Hardware Blocks
FIFO FPGA Memory controller SDRAM Memory 16 channel ADC FPGA DAQ USB Interface
Basic FPGA Architectures
Overview
• All Xilinx FPGAs contain the same basic resources – Logic Resources • Slices (grouped into CLBs) – Contain combinatorial logic and register resources • Memory • Multipliers – Interconnect Resources • Programmable interconnect • IOBs – Interface between the FPGA and the outside world – Other resources • Global clock buffers • Boundary scan logic
Basic Building Block
Configurable Logic block • Slices contain logic resources and are arranged in two colums • A switch matrix provides access to general routing resources • Local routing provides connection between slices in the same CLB, and it provides routing to neighboring CLBs Switch Matrix BUFT BUF T COUT SHIFT Slice S1 Slice S0 CIN COUT Slice S3 Slice S2 CIN
Virtex-II CLB contains four slices
Local Routing
Basic Building Blocks
Simplified Slice Structure • Each slice has four outputs two non-registered outputs – Two BUFTs associated with each CLB, accessible by all 16 CLB outputs LUT Carry • Carry logic runs vertically, LUT Carry up only – Two independent carry chains per CLB
PRE D CE Q CLR D PRE CE Q CLR
The Slice
Detailed Structure • The next few slides discuss the slice features – LUTs – MUXF5, MUXF6, MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram) – Carry Logic – MULT_ANDs – Sequential Elements
• Also called Function Generators (FGs) • Capacity is limited by the number of inputs, not by the complexity • Delay through the LUT is constant
A B C D
Combinatorial logic
Boolean logic is stored in Look-Up Tables (LUTs)
Combinatorial Logic Z A B C D Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 .
.
.
1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1
Storage Elements
Can be implemented as either flip-flops or latches • Two in each slice; eight in each CLB • Inputs come from LUTs or from an independent CLB input • Separate set and reset controls – Can be synchronous or asynchronous • All controls are shared within a slice – Control signals can be inverted locally within a slice D CE S Q R FDCPE D PRE CE Q CLR LDCPE D PRE CE Q G CLR
Dedicated Logic
FPGAs contain built-in logic for speeding up logic operations and saving resources • Multiplexer Logic – Connect Slices and LUTs • Carry Chains – Speed up arithmetic operations • Multiplier AND gate – Speed up LUT-based multiplication • Shift Register LUT – LUT-based shift register • Embedded Multiplier – 18x18 Multiplier
CLB
Multiplexer Logic
Dedicated MUXes provided to connect slices and LUTs
Slice S3
MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3
Slice S2
MUXF7 combines the two MUXF6 outputs
Slice S1 Slice S0
MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice
Carry Chains
Dedicated carry chains speeds up arithmetic operations • Simple, fast, and complete arithmetic Logic – Dedicated XOR gate for single-level sum completion – Uses dedicated routing resources – All synthesis tools can infer carry logic COUT To S0 of the next CLB First Carry Chain CIN COUT COUT To CIN of S2 of the next CLB SLICE S3 CIN COUT SLICE S2 SLICE S1 SLICE S0 CIN Second Carry Chain CLB CIN
Multiplier AND Gate
Speed up LUT-based multiplication • Highly efficient multiply and add implementation – Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition – The MULT_AND gate enables an area reduction by performing the
multiply
and the
add
in one LUT per bit
LUT A
S DI CO
CY_MUX
CI
CY_XOR MULT_AND A x B LUT B LUT
Shift Register LUT (SRL16CE)
The shift register LUT saves from having to use dedicated registers • Dynamically addressable serial shift registers – Maximum delay of 16 clock cycles per LUT (128 per CLB) – Cascadable to other LUTs or CLBs for longer shift registers • Dedicated connection from Q15 to D input of the next SRL16CE
LUT
– Shift register length can be changed asynchronously by toggling address A D CE CLK A[3:0] LUT
D CE Q D CE Q D CE Q D CE Q
Q Q15 (cascade out)
Embedded Multiplier Blocks
Saves from having to use LUTs to implement multiplications and increases performance • 18-bit twos complement signed operation • Optimized to implement Multiply and Accumulate functions • Multipliers are physically located next to block SelectRAM ™ memory
Data_A (18 bits)
4 x 4 signed 8 x 8 signed 18 x 18 Multiplier
Output (36 bits)
12 x 12 signed
Data_B (18 bits)
18 x 18 signed
IOB Element
Connects the FPGA design to external components • Input path – Two DDR registers • Output path – Two DDR registers – Two 3-state enable DDR registers • Separate clocks and clock enables for I and O • Set and reset signals are shared
Reg
OCK1
DDR MUX Reg
OCK2
3-state Reg
OCK1
DDR MUX Reg
OCK2
Output
IOB
Input Reg
ICK1
Reg
ICK2 PAD
Distributed RAM
Uses a LUT in a slice as memory • Synchronous write • Asynchronous read – Accompanying flip-flops can be used to create synchronous read • RAM and ROM are initialized during configuration – Data can be written to RAM after configuration • Emulated dual-port RAM – One read/write port – One read-only port • 1 LUT = 16 RAM bits LUT Slice LUT LUT
D RAM16X1S WE WCLK O A0 A1 A2 A3 D RAM32X1S WE WCLK A0 A1 A2 A3 A4 O D RAM16X1D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPRA1 DPRA2 DPRA3 DPO
Block RAM
Embedded blocks of RAM arranged in columns • Up to 3.5 Mb of RAM in 18-kb blocks – Synchronous read and write • True dual-port memory – Each port has synchronous read and write capability – Different clocks for each port • Supports initial values • Synchronous reset on output latches • Supports parity bits – One parity bit per eight data bits • Situated next to embedded multiplier for fast multiply-accumulate
18-kb block SelectRAM memory DIA DIPA ADDRA WEA ENA SSRA CLKA DOA DOPA DIB DIPB ADDRB WEB ENB SSRB CLKB DOB DOPB
Global Routing
• Sixteen dedicated global clock multiplexers – Eight on the top-center of the die, eight on the bottom-center – Driven by a clock input pad, a DCM, or local routing • Global clock multiplexers provide the following: – Traditional clock buffer (BUFG) function – Global clock enable capability (BUFGCE) – Glitch-free switching between clock signals (BUFGMUX) • Up to eight clock nets can be used in each clock region of the device – Each device contains four or more clock regions
Digital Clock Manager (DCM)
• Up to twelve DCMs per device – Located on the top and bottom edges of the die – Driven by clock input pads • DCMs provide the following: – Delay-Locked Loop (DLL) – Digital Frequency Synthesizer (DFS) – Digital Phase Shifter (DPS) • Up to four outputs of each DCM can drive onto global clock buffers – All DCM outputs can drive general routing
The Spartan-3 Family
Built for high volume, low-cost applications
18x18 bit Embedded Pipelined Multipliers for efficient DSP Up to eight on-chip Digital Clock Managers to support multiple system clocks Spartan-3 Configurable 18K Block RAMs + Distributed RAM
Bank 0 Ba Bank 2
4 I/O Banks, Support for all I/O Standards including PCI, DDR333, RSDS, mini-LVDS
Spartan-3 Family
Based upon Virtex-II Architecture – Optimized for Lower Cost • Smaller process = lower core voltage – .09 micron versus .15 micron – Vccint = 1.2V versus 1.5V
• Logic resources – Only one-half of the slices support RAM or SRL16s (SLICEM) – Fewer block RAMs and multiplier blocks • Clock Resources – Fewer global clock multiplexers and DCM blocks • I/O Resources – Fewer pins per package – No internal 3-state buffers – Support for different standards • New standards: 1.2V LVCMOS, 1.8V HSTL, and SSTL • Default is LVCMOS, versus LVTTL
SLICEM and SLICEL
• Each Spartan™-3 CLB contains four slices – Similar to the Virtex™-II • Slices are grouped in pairs – Left-hand SLICEM (Memory) • LUTs can be configured as memory or SRL16 – Right-hand SLICEL (Logic) • LUT can be used as logic only Switch Matrix
Left-Hand SLICEM Right-Hand SLICEL
COUT COUT SHIFTIN Slice X0Y1 Slice X0Y0 SHIFTOUT CIN Slice X1Y1 Slice X1Y0 CIN Fast Connects
Xilinx Tool Flow
Xilinx Design Flow
Plan & Budget Implement Translate Create Code/ Schematic Functional Simulation HDL RTL Simulation Synthesize to create netlist Map Place & Route Attain Timing Closure Timing Simulation Generate BIT File Configure FPGA
Synthesis
Generate a netlist file • After coding up your HDL code, you will need a tool to generate a netlist (NGC or EDIF) – Xilinx Synthesis Tool (XST) included – Support for Popular Third Party Synthesis tools: Synplify, Leonardo Spectrum
Implementation
Process a netlist file • Consists of three phases –
Translate:
Merge multiple design files into a single netlist –
Map:
Group logical symbols from the netlist (gates) into physical components (slices and IOBs) –
Place & Route:
Place components onto the chip, connect the components, and extract timing data into reports • Access Xilinx reports and tools at each phase – Timing Analyzer, Floorplanner, FPGA Editor, XPower Netlist Generated From Synthesis
. . .
Implement
Translate
. . .
Map Place & Route
.
. .
Configuration
• Once a design is implemented, you must create a file that the FPGA can understand – This file is called a bitstream: a BIT file (.bit extension) • The BIT file can be downloaded – Directly into the FPGA • Use a download cable such as Platform USB – To external memory device such as a Xilinx Platform Flash PROM • Must first be converted into a PROM file
ISE Project Navigator
Xilinx ISE Foundation is built around the Xilinx Design Flow • Enter Designs • Access to synthesis tools – Including third-party synthesis tools • Implement your design with a simple double-click – Fine-tune with easy-to-access software options • Download – Generate a bitstream – Configure FPGA using iMPACT
Synthesizing Designs
Generate a netlist file using XST (Xilinx Synthesis Technology) Synthesis Processes and Analysis • Access report • View Schematics (RTL or Technology) • Check syntax • Generate Post-Synthesis Simulation Model 1 Highlight HDL Sources 2 Double-click to Synthesize
The Design Summary Displays Design Data
• Quick View of Reports, Constraints • Project Status • Device Utilization • Design Summary Options • Performance and Constraints • Reports
Outline
• Overview • ISE • • Summary
Lab 1: Xilinx Tool Flow
USB
USB2
• Peer to Peer.
• Host computer is master.
• 480Mbits/s 53.24Mb/s theoretical • 30MB/s readily achievable in Bulk transfer mode.
• The speeds USB 1.0 Low & Full ,USB2 High • Hot Plug.
• Peripherals electronics can be relatively simple and inexpensive.
• Power 500mA from the bus.
USB Data Travels in Packets
•Identified by “Packet ID” (PID) •Token packet tells what’s coming •Data packets deliver bytes •Handshake packets report success or otherwise
USB Packets
S Y N C S E T U P A D D R E N D P C R C 5 S Y N C D A T A 0 Data Token Packet Data Packet Setup Stage C R C 1 6 S Y N C A C K H/S Pkt S Y N C O U T A D D R E N D P C R C 5 Token Packet Data Stage S Y N C D A T A 1 Data Data Packet C R C 1 6 S Y N C A C K S Y N C O U T A D D R E N D P C R C 5 S Y N C D A T A 0 Data H/S Pkt Token Packet Data Stage (cont'd) Data Packet C R C 1 6 S Y N C A C K H/S Pkt S Y N C O U T A D D R E N D P C R C 5 S Y N C D A T A 1 D a t a C R C 1 6 S Y N C A C K Token Packet Data Packet Data Stage (cont'd) H/S Pkt S Y N C I N A D D R E N D P C R C 5 S Y N C D A T A 1 C R C 1 6 S Y N C A C K Token Packet Data Packet H/S Pkt Status Stage
A Control Write Transfer
USB2 Controller
• EZ-USB FX2LP(TM) USB Microcontroller High-Speed USB Peripheral Controller • Integrated 8051 Microprocessor.
• Code/Data Downloaded via USB, or EEPROM.
• Many Integrated Peripherals.
Simple Algorithm
• Sample Data at full rate 2.77Ms/s (16 channels) • Down Convert Data to by 4 • Write data to USB interface 21.19MB/s
VHDL
VHDL Example
An example of a two-input XNOR gate is shown below.
entity XNOR2 is port (A, B: in std_logic; Z: out std_logic); end XNOR2; architecture behavioral_xnor of XNOR2 is -- signal declaration (of internal signals X, Y) signal X, Y: std_logic; begin X <= A and B; Y <= (not A) and (not B); Z <= X or Y; End behavioral_xnor;