ADC Board VHDL Firmware development for Mona Lisa

Transcript ADC Board VHDL Firmware development for Mona Lisa

ADC Board VHDL Firmware development for Mona Lisa

Roy Wastie

Overview

• Introduction • ADC Board • Hardware Blocks • Basic FPGA Architectures • Xilinx ISE 10.1 Tool Flow • USB • Algorithm • VHDL

Introduction

• Applications of FPGAs include digital signal processing , software-defined radio , aerospace and defense systems, ASIC prototyping, medical imaging , computer vision , speech recognition , cryptography , bioinformatics , computer hardware emulation & glue logic for PCBs .

ADC Board

External Clock & Trigger

Hardware Blocks

FIFO FPGA Memory controller SDRAM Memory 16 channel ADC FPGA DAQ USB Interface

Basic FPGA Architectures

Overview

• All Xilinx FPGAs contain the same basic resources – Logic Resources • Slices (grouped into CLBs) – Contain combinatorial logic and register resources • Memory • Multipliers – Interconnect Resources • Programmable interconnect • IOBs – Interface between the FPGA and the outside world – Other resources • Global clock buffers • Boundary scan logic

Basic Building Block

Configurable Logic block • Slices contain logic resources and are arranged in two colums • A switch matrix provides access to general routing resources • Local routing provides connection between slices in the same CLB, and it provides routing to neighboring CLBs Switch Matrix BUFT BUF T COUT SHIFT Slice S1 Slice S0 CIN COUT Slice S3 Slice S2 CIN

Virtex-II CLB contains four slices

Local Routing

Basic Building Blocks

Simplified Slice Structure • Each slice has four outputs two non-registered outputs – Two BUFTs associated with each CLB, accessible by all 16 CLB outputs LUT Carry • Carry logic runs vertically, LUT Carry up only – Two independent carry chains per CLB

PRE D CE Q CLR D PRE CE Q CLR

The Slice

Detailed Structure • The next few slides discuss the slice features – LUTs – MUXF5, MUXF6, MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram) – Carry Logic – MULT_ANDs – Sequential Elements

• Also called Function Generators (FGs) • Capacity is limited by the number of inputs, not by the complexity • Delay through the LUT is constant

A B C D

Combinatorial logic

Boolean logic is stored in Look-Up Tables (LUTs)

Combinatorial Logic Z A B C D Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 .

1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1

Storage Elements

Can be implemented as either flip-flops or latches • Two in each slice; eight in each CLB • Inputs come from LUTs or from an independent CLB input • Separate set and reset controls – Can be synchronous or asynchronous • All controls are shared within a slice – Control signals can be inverted locally within a slice D CE S Q R FDCPE D PRE CE Q CLR LDCPE D PRE CE Q G CLR

Dedicated Logic

FPGAs contain built-in logic for speeding up logic operations and saving resources • Multiplexer Logic – Connect Slices and LUTs • Carry Chains – Speed up arithmetic operations • Multiplier AND gate – Speed up LUT-based multiplication • Shift Register LUT – LUT-based shift register • Embedded Multiplier – 18x18 Multiplier

CLB

Multiplexer Logic

Dedicated MUXes provided to connect slices and LUTs

Slice S3

MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3

Slice S2

MUXF7 combines the two MUXF6 outputs

Slice S1 Slice S0

MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice

Carry Chains

Dedicated carry chains speeds up arithmetic operations • Simple, fast, and complete arithmetic Logic – Dedicated XOR gate for single-level sum completion – Uses dedicated routing resources – All synthesis tools can infer carry logic COUT To S0 of the next CLB First Carry Chain CIN COUT COUT To CIN of S2 of the next CLB SLICE S3 CIN COUT SLICE S2 SLICE S1 SLICE S0 CIN Second Carry Chain CLB CIN

Multiplier AND Gate

Speed up LUT-based multiplication • Highly efficient multiply and add implementation – Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition – The MULT_AND gate enables an area reduction by performing the

multiply

and the

add

in one LUT per bit

LUT A

S DI CO

CY_MUX

CY_XOR MULT_AND A x B LUT B LUT

Shift Register LUT (SRL16CE)

The shift register LUT saves from having to use dedicated registers • Dynamically addressable serial shift registers – Maximum delay of 16 clock cycles per LUT (128 per CLB) – Cascadable to other LUTs or CLBs for longer shift registers • Dedicated connection from Q15 to D input of the next SRL16CE

LUT

– Shift register length can be changed asynchronously by toggling address A D CE CLK A[3:0] LUT

D CE Q D CE Q D CE Q D CE Q

Q Q15 (cascade out)

Embedded Multiplier Blocks

Saves from having to use LUTs to implement multiplications and increases performance • 18-bit twos complement signed operation • Optimized to implement Multiply and Accumulate functions • Multipliers are physically located next to block SelectRAM ™ memory

Data_A (18 bits)

4 x 4 signed 8 x 8 signed 18 x 18 Multiplier

Output (36 bits)

12 x 12 signed

Data_B (18 bits)

18 x 18 signed

IOB Element

Connects the FPGA design to external components • Input path – Two DDR registers • Output path – Two DDR registers – Two 3-state enable DDR registers • Separate clocks and clock enables for I and O • Set and reset signals are shared

Reg

OCK1

DDR MUX Reg

OCK2

3-state Reg

OCK1

DDR MUX Reg

OCK2

Output

IOB

Input Reg

ICK1

Reg

ICK2 PAD

Distributed RAM

Uses a LUT in a slice as memory • Synchronous write • Asynchronous read – Accompanying flip-flops can be used to create synchronous read • RAM and ROM are initialized during configuration – Data can be written to RAM after configuration • Emulated dual-port RAM – One read/write port – One read-only port • 1 LUT = 16 RAM bits LUT Slice LUT LUT

D RAM16X1S WE WCLK O A0 A1 A2 A3 D RAM32X1S WE WCLK A0 A1 A2 A3 A4 O D RAM16X1D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPRA1 DPRA2 DPRA3 DPO

Block RAM

Embedded blocks of RAM arranged in columns • Up to 3.5 Mb of RAM in 18-kb blocks – Synchronous read and write • True dual-port memory – Each port has synchronous read and write capability – Different clocks for each port • Supports initial values • Synchronous reset on output latches • Supports parity bits – One parity bit per eight data bits • Situated next to embedded multiplier for fast multiply-accumulate

18-kb block SelectRAM memory DIA DIPA ADDRA WEA ENA SSRA CLKA DOA DOPA DIB DIPB ADDRB WEB ENB SSRB CLKB DOB DOPB

Global Routing

• Sixteen dedicated global clock multiplexers – Eight on the top-center of the die, eight on the bottom-center – Driven by a clock input pad, a DCM, or local routing • Global clock multiplexers provide the following: – Traditional clock buffer (BUFG) function – Global clock enable capability (BUFGCE) – Glitch-free switching between clock signals (BUFGMUX) • Up to eight clock nets can be used in each clock region of the device – Each device contains four or more clock regions

Digital Clock Manager (DCM)

• Up to twelve DCMs per device – Located on the top and bottom edges of the die – Driven by clock input pads • DCMs provide the following: – Delay-Locked Loop (DLL) – Digital Frequency Synthesizer (DFS) – Digital Phase Shifter (DPS) • Up to four outputs of each DCM can drive onto global clock buffers – All DCM outputs can drive general routing

The Spartan-3 Family

Built for high volume, low-cost applications

18x18 bit Embedded Pipelined Multipliers for efficient DSP Up to eight on-chip Digital Clock Managers to support multiple system clocks Spartan-3 Configurable 18K Block RAMs + Distributed RAM

Bank 0 Ba Bank 2

4 I/O Banks, Support for all I/O Standards including PCI, DDR333, RSDS, mini-LVDS

Spartan-3 Family

Based upon Virtex-II Architecture – Optimized for Lower Cost • Smaller process = lower core voltage – .09 micron versus .15 micron – Vccint = 1.2V versus 1.5V

• Logic resources – Only one-half of the slices support RAM or SRL16s (SLICEM) – Fewer block RAMs and multiplier blocks • Clock Resources – Fewer global clock multiplexers and DCM blocks • I/O Resources – Fewer pins per package – No internal 3-state buffers – Support for different standards • New standards: 1.2V LVCMOS, 1.8V HSTL, and SSTL • Default is LVCMOS, versus LVTTL

SLICEM and SLICEL

• Each Spartan™-3 CLB contains four slices – Similar to the Virtex™-II • Slices are grouped in pairs – Left-hand SLICEM (Memory) • LUTs can be configured as memory or SRL16 – Right-hand SLICEL (Logic) • LUT can be used as logic only Switch Matrix

Left-Hand SLICEM Right-Hand SLICEL

COUT COUT SHIFTIN Slice X0Y1 Slice X0Y0 SHIFTOUT CIN Slice X1Y1 Slice X1Y0 CIN Fast Connects

Xilinx Tool Flow

Xilinx Design Flow

Plan & Budget Implement Translate Create Code/ Schematic Functional Simulation HDL RTL Simulation Synthesize to create netlist Map Place & Route Attain Timing Closure Timing Simulation Generate BIT File Configure FPGA

Synthesis

Generate a netlist file • After coding up your HDL code, you will need a tool to generate a netlist (NGC or EDIF) – Xilinx Synthesis Tool (XST) included – Support for Popular Third Party Synthesis tools: Synplify, Leonardo Spectrum

Implementation

Process a netlist file • Consists of three phases –

Translate:

Merge multiple design files into a single netlist –

Map:

Group logical symbols from the netlist (gates) into physical components (slices and IOBs) –

Place & Route:

Place components onto the chip, connect the components, and extract timing data into reports • Access Xilinx reports and tools at each phase – Timing Analyzer, Floorplanner, FPGA Editor, XPower Netlist Generated From Synthesis

. . .

Implement

Translate

. . .

Map Place & Route

. .

Configuration

• Once a design is implemented, you must create a file that the FPGA can understand – This file is called a bitstream: a BIT file (.bit extension) • The BIT file can be downloaded – Directly into the FPGA • Use a download cable such as Platform USB – To external memory device such as a Xilinx Platform Flash PROM • Must first be converted into a PROM file

ISE Project Navigator

Xilinx ISE Foundation is built around the Xilinx Design Flow • Enter Designs • Access to synthesis tools – Including third-party synthesis tools • Implement your design with a simple double-click – Fine-tune with easy-to-access software options • Download – Generate a bitstream – Configure FPGA using iMPACT

Synthesizing Designs

Generate a netlist file using XST (Xilinx Synthesis Technology) Synthesis Processes and Analysis • Access report • View Schematics (RTL or Technology) • Check syntax • Generate Post-Synthesis Simulation Model 1 Highlight HDL Sources 2 Double-click to Synthesize

The Design Summary Displays Design Data

• Quick View of Reports, Constraints • Project Status • Device Utilization • Design Summary Options • Performance and Constraints • Reports

Outline

• Overview • ISE • • Summary

Lab 1: Xilinx Tool Flow

USB

USB2

• Peer to Peer.

• Host computer is master.

• 480Mbits/s 53.24Mb/s theoretical • 30MB/s readily achievable in Bulk transfer mode.

• The speeds USB 1.0 Low & Full ,USB2 High • Hot Plug.

• Peripherals electronics can be relatively simple and inexpensive.

• Power 500mA from the bus.

USB Data Travels in Packets

•Identified by “Packet ID” (PID) •Token packet tells what’s coming •Data packets deliver bytes •Handshake packets report success or otherwise

USB Packets

S Y N C S E T U P A D D R E N D P C R C 5 S Y N C D A T A 0 Data Token Packet Data Packet Setup Stage C R C 1 6 S Y N C A C K H/S Pkt S Y N C O U T A D D R E N D P C R C 5 Token Packet Data Stage S Y N C D A T A 1 Data Data Packet C R C 1 6 S Y N C A C K S Y N C O U T A D D R E N D P C R C 5 S Y N C D A T A 0 Data H/S Pkt Token Packet Data Stage (cont'd) Data Packet C R C 1 6 S Y N C A C K H/S Pkt S Y N C O U T A D D R E N D P C R C 5 S Y N C D A T A 1 D a t a C R C 1 6 S Y N C A C K Token Packet Data Packet Data Stage (cont'd) H/S Pkt S Y N C I N A D D R E N D P C R C 5 S Y N C D A T A 1 C R C 1 6 S Y N C A C K Token Packet Data Packet H/S Pkt Status Stage

A Control Write Transfer

USB2 Controller

• EZ-USB FX2LP(TM) USB Microcontroller High-Speed USB Peripheral Controller • Integrated 8051 Microprocessor.

• Code/Data Downloaded via USB, or EEPROM.

• Many Integrated Peripherals.

Simple Algorithm

• Sample Data at full rate 2.77Ms/s (16 channels) • Down Convert Data to by 4 • Write data to USB interface 21.19MB/s

VHDL

VHDL Example

An example of a two-input XNOR gate is shown below.

entity XNOR2 is port (A, B: in std_logic; Z: out std_logic); end XNOR2; architecture behavioral_xnor of XNOR2 is -- signal declaration (of internal signals X, Y) signal X, Y: std_logic; begin X <= A and B; Y <= (not A) and (not B); Z <= X or Y; End behavioral_xnor;

ADC Board VHDL Firmware development for Mona Lisa

Transcript ADC Board VHDL Firmware development for Mona Lisa