Xilinx XC4000 FPGA devices - Mahanakorn University of

Download Report

Transcript Xilinx XC4000 FPGA devices - Mahanakorn University of

Introduction to CPLD/FPGA

Technology, Devices and Tools

Theerayod Wiangtong

Electronic Department Mahanakorn University of Technology

1

Outline

• Programmable Logic – CPLD – FPGA • Architecture: Basic & Advance • Examples • Features • Vendors and Devices • Design Tools 2

World of Integrated Circuits

Full-Custom ASICs Semi-Custom ASICs User Programmable PLD

FPGA

3

ASIC

ASIC: A

pplication

S

pecific

I

ntegrated

C

ircuit • Designs must be sent for expensive and time consuming fabrication in semiconductor foundry • Designed all the way from behavioral description to physical layout 4

CPLD/FPGA

• •

CPLD: C FPGA: F

omplex ield

P P

rogrammable rogrammable • No minimum quantity order • Reprogrammable

G L

ate ogic

A D

rray evice • Small development overhead • No NRE (non-recurring engineering) costs • Quick time to market 5

Which Way to Go?

High performance Low power Low cost in high volumes

ASIC

s Off-the-shelf Low development cost Short time to market Reconfigurability

CPLD/FPGA

s 6

Other Advantages

• Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower – Mistakes not detected at design time have large impact on development time and cost – FPGAs are perfect for rapid prototyping of digital circuits • Easy upgrades like in case of software • Unique applications – Reconfigurable computing 7

Programmable Logic CPLD/FPGA

8

Programmable Logic

• Programmable digital integrated circuit • Standard off-the-shelf parts • Desired functionality is implemented by configuring on chip logic blocks and interconnections • Types of programmable logic: – Complex PLDs (CPLD) – Field programmable Gate Arrays (FPGA) 9

PLD - Sum of Products

Programmable AND array followed by fixed fan-in OR gates

A B C

Programmable switch or fuse

f

1 

A

B

C

A

B

C f

2 

A

B

A

B

C

AND plane

10

PLD - Macrocell

Can implement combinational or sequential logic

Select Enable A B C

f

1

Flip-flop D Q MUX Clock AND plane

11

CPLD Structure

Integration of several PLD blocks with a programmable interconnect on a single chip • • •

PLD Block PLD Block

• • • • • •

PLD Block PLD Block

• • • 12

CPLD Example - Altera MAX7000

EPM7000 Series Block Diagram

13

CPLD Example - Altera MAX7000

EPM7000 Series Device Macrocell

14

FPGA Architecture

15

FPGA - Generic Structure

Logic block

• • •

FPGA building blocks: Programmable logic blocks Implement combinatorial and sequential logic Programmable interconnect Wires to connect inputs and outputs to logic blocks Programmable I/O blocks Special logic blocks at the periphery of device for external connections Interconnection switches I/O I/O

16

FPGA – Basic Logic Element

• LUT to implement combinatorial logic • Register for sequential circuits • Additional logic (not shown): – Carry logic for arithmetic functions – Expansion logic for functions requiring more than 4 inputs

Select Out A B C D LUT D Q Clock

17

Look-Up Tables (LUT)

• Look-up table with N-inputs can be used to implement any combinatorial function of N inputs • LUT is programmed with the truth-table

A

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1

B

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1

C

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1

D

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

Truth-table Z 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 A B C D LUT Z LUT implementation A B C D Gate implementation Z

18

LUT Implementation

• Example: 3-input LUT • Based on multiplexers (pass transistors) • LUT entries stored in configuration memory cells

X1 X2 Configuration memory cells X3 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 F

19

Programmable Interconnect

• Interconnect hierarchy (not shown) – Fast local interconnect – Horizontal and vertical lines of various lengths

LE LE LE Switch Matrix Switch Matrix LE LE LE

20

Switch Matrix Operation

Before Programming After Programming • 6 pass transistors per switch matrix interconnect point • Pass transistors act as programmable switches • Pass transistor gates are driven by configuration memory cells 21

Configuration Storage Elements

• Static Random Access Memory (SRAM) – each switch is a pass transistor controlled by the state of an SRAM bit – FPGA needs to be configured at power-on • Flash Erasable Programmable ROM (Flash) – each switch is a floating-gate transistor that can be turned off by injecting charge onto its gate. FPGA itself holds the program – reprogrammable, even in-circuit • Fusible Links (“Antifuse”) – Forms a forms a low resistance path when electrically programmed – one-time programmable in special programming machine – radiation tolerant 22

FPGA Technology Roadmap

Year 1995 1996 1997 2000 2003 2004 Technology 0.6µ 0.35 µ 0.25 µ 0.18 µ 0.13 µ 0.09µ Transistor count 3.5M

12M 23M 75M 430M 1B 23

Special Features

• Clock management – PLL,DLL – Eliminate clock skew between external clock input and on-chip clock – Low-skew global clock distribution network • Embedded memory blocks • Support for various interface standards • High-speed serial I/Os • Embedded processor cores • DSP blocks 24

FPGA Vendors & Device Families

• Xilinx – Virtex-II/Virtex-4: Feature packed high-performance SRAM-based FPGA – Spartan 3: low-cost feature reduced version – CoolRunner: CPLDs • Altera – Stratix/Stratix-II • High-performance SRAM-based FPGAs – Cyclone/Cyclone-II • Low-cost feature reduced version for cost-critical applications – MAX3000/7000 CPLDs – MAX-II: Flash-based FPGA • Actel – Anti-fuse based FPGAs • Radiation tolerant – Flash-based FPGAs • Lattice – Flash-based FPGAs – CPLDs (EEPROM) • QuickLogic – ViaLink-based FPGAs 25

State of the Art in FPGAs

• 90 nm process on 300 mm wafers – Lower cost per function (LUT + register) – Smaller and faster transistors: Higher speed • System speed up to 500 MHz – Mainly through smart interconnects, clock management, dedicated circuits, flexible I/O. – Integrated transceivers running at 10 Gigabits/sec • More Logic and Better Features: – >100,000 LUTs & flip-flops – >200 embedded RAMs, and same number 18 x 18 multipliers • 1156 pins (balls) with >800 GP I/O – 50 I/O standards, incl. LVDS with internal termination • 16 low-skew global clock lines – Multiple clock management circuits • On-chip microprocessor(s) and multi-Gbps transceivers 26

Latest Devices: Capacity & Features

• • • • • • • •

Xilinx Virtex-4 90nm process Up to 960 I/Os >200000 logic cells Up to 552 18kb block RAMs (~10Mb RAM) 192 DSP slices (18x18 multiplier accumulator) 20 digital clock managers (DCM) 24 high-speed serial transceivers (622Mb/s to 11.1Gb/s) Up to four PowerPC 405 cores

• • • • • • • •

Altera Stratix-II 90nm process Up to 1170 I/Os 179000 logic elements 9.6Mb embedded RAM 96 DSP blocks: 380 18x18 multipliers 12 PLLs Serial I/O up to 1Gb/s No hard processor cores

27

ALTERA

28

Device Families & Tools

29

Device Roadmap

30

Technology

31

Logic Density

32

Pricing Roadmap

33

FLEX10K Basic Architecture

34

Logic Array Block: FLEX10K

35

Logic Element of FLEX10K

36

Advance Altera Architecture

37

Stratix Device

38

Stratix Device Family

39

Altera: Embedded DSP Blocks

• Two DSP Block columns per device • Number varies by height of column • Can implement: – Eight 9x9 multipliers – Four 18x18 multipliers – One 36x36 multiplier • Contains adder/subtractor/accumulator • Registered inputs can become shift register 40

Altera: Embedded DSP Block

41

Embedded RAM

Dual-Port RAM – M512 – 512 x 1 – M4K – 4096 x 1 – M-RAM – 64K x 8 42

Embedded RAM Block

43

ALTERA High Speed I/O

44

Embedded Processor

• Soft Processor: NIOS 32bit @150MHz • Hard Processor: ARM922T 32bit RISC @200 MHz (Excalibur device) • Additional features – Communication Controller – Integrated MMU (Memory Management Unit) – High-Speed Memory Interface – C-Level Simulation – Multi-Processor Support 45

NIOS II Family

46

Max II Device

47

Xilinx

48

Product Overview

High Volume Low Cost High Performance High Density CPLD Rom-based Low Power Low Cost 49

Xilinx FPGA Families

• • •

Old families

– XC3000, XC4000, XC5200 – Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs.

High-performance families

– Virtex (0.22µm) – Virtex-E, Virtex-EM (0.18µm) – Virtex-II, Virtex-II PRO (0.13µm)

Low Cost Family

– Spartan/XL – derived from XC4000 – Spartan-II – derived from Virtex – Spartan-IIE – derived from Virtex-E – Spartan-3 50

Basic FPGA Architecture Spartan-II

51

CLB Structure

G4 G3 G2 G1

Look-Up Table

O COUT Carry & Control Logic YB Y F5IN BY SR F4 F3 F2 F1

Look-Up Table

O Carry & Control Logic XB X

D CK S EC R Q

G4 G3 G2 G1 F5IN BY SR

D CK S EC R Q

F4 F3 F2 F1 SLICE

Look-Up Table

O COUT Carry & Control Logic YB Y

Look-Up Table

O Carry & Control Logic XB X • Contains 2 slices • Each slice has 2 LUT-FF pairs with associated carry logic • Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs

D CK S EC R Q D CK S EC R Q

SLICE 52

CLB Slice Structure

• Each slice contains two sets of the following: – Four-input LUT • Any 4-input logic function, • or 16-bit x 1 sync RAM • or 16-bit shift register – Carry & Control • Fast arithmetic logic • Multiplier logic • Multiplexer logic – Storage element • Latch or flip-flop • Set and reset • True or inverted inputs • Sync. or async. control 53

Example: 5-Input Functions implemented using two LUTs

X 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 X 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 X 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 X 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 X 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 Y 0 1 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0

OUT

A4 A3 A2 A1

LUT ROM RAM

WS DI D 54

Dedicated Expansion Multiplexers

• MUXF5 combines 2 LUTs to create – Any 5-input function (LUT5) – Or selected functions up to 9 inputs – Or 4x1 multiplexer • MUXF6 combines 2 slices to form – Any 6-input function (LUT6) – Or selected functions up to 19 inputs – 8x1 multiplexer

CLB Slice LUT LUT MUXF5 Slice LUT LUT MUXF5 MUXF6

55

Distributed RAM

• CLB LUT configurable as Distributed RAM – A LUT equals 16x1 RAM – Implements Single and Dual-Ports – Cascade LUTs to increase RAM size • Synchronous write • Synchronous/Asynchronous read – Accompanying flip-flops used for synchronous read LUT LUT LUT =

RAM16X1S D WE WCLK A0 A1 A2 A3 O

=

RAM32X1S D WE WCLK A0 A1 A2 A3 A4 O or RAM16X2S D0

D1

WE WCLK A0 A1 A2 A3 O0 O1 or RAM16X1D D WE WCLK A0 A1 SPO A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3

56

Fast Carry Logic

  Each CLB contains separate logic and routing for the fast generation of sum & carry signals – Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources

MSB LSB

57

Basic I/O Block Structure

• • • • Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered Inputs can be delayed

Three-State FF Enable Clock Set/Reset Output FF Enable Direct Input FF Enable Registered Input

Q D EC SR D EC Q SR

Three-State Control

D EC Q SR

Output Path Input Path

58

Advance Xilinx Architecture

59

Virtex-II Pro

• 130nm CMOS Copper Low-K • 1200 I/Os, 1696 Pin Package • 125,000 Logic Cells • 10 Megabits of RAM • 556 XTREME DSP Multipliers • 16 3.125 Gbps transceivers • 4 PowerPC CPUs

Virtex-II Pro

60

Vertex-II Pro

PowerPC 405 Dedicated multipliers and memory Digital Clock Management (DCM) provides • 16 independent clock domains • Clock divide, multiply, phase shift • Enhanced Phase Locked Loops (PLLs) Routing Resources (90%) 61

Block RAM

• Most efficient memory implementation – Dedicated blocks of memory • Ideal for most memory requirements – 4 to 14 memory blocks • 4096 bits per blocks – Use multiple blocks for larger memories • Builds both single and true dual-port RAMs

Spartan-II True Dual-Port Block RAM Block RAM

62

Dual-Port Bus Flexibility

Port A In 1K-Bit Depth Port B In 256-Bit Depth RAMB4_S 4 _S 16 WEA ENA RSTA CLKA ADDRA [9:0] DIA [3:0] DOA [3:0] WEB ENB RSTB CLKB ADDRB [7:0] DIB [15:0] DOB [15:0] Port A Out 4-Bit Width Port B Out 16-Bit Width

• Each port can be configured with a different data bus width • Provides easy data width conversion without any additional logic 63

Two Independent Single-Port RAMs

Port A In 2K-Bit Depth VCC, ADDR[10:0] Port B In 2K-Bit Depth GND, ADDR[10:0]

RAMB4_S 1 _S 1 WEA ENA RSTA CLKA ADDRA [10:0] DIA [0] DOA [0] WEB ENB RSTB CLKB ADDRB [10:0] DIB [0] DOB [0]

Port A Out 1-Bit Width Port B Out 1-Bit Width • Can split a Dual-Port 4K RAM into two Single-Port 2K RAM – Simultaneous independent access to each RAM • To access the lower RAM – Tie the MSB address bit to Logic Low • To access the upper RAM – Tie the MSB address bit to Logic High 64

Rocket I/O

• From 4 to 24 RocketIO MGTs per Virtex-II Pro™ device • Continuous operating range 622 Mbps to 3.125 Gbps

Virtex 4: 11.1 Gbps !!!

65

Embedded Processor

• Soft Processor: MicroBlaze 32bit @150MHz • Hard Processor: IBM PowerPC405 32bit RISC @300MHz (in Vertex-II Pro) – Low Power Consumption: 0.9 mW/MHz – Five-Stage Data Path Pipeline – Hardware Multiply/Divide Unit – Thirty-Two 32-bit General Purpose Registers – Memory Management Unit (MMU) – Dedicated On-Chip Memory (OCM) Interface – Supports IBM CoreConnect™ Bus Architecture – Debug and Trace Support 66

FPGA Design Tools

67

Design process (1)

Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds…..

Specification (Lab Experiments) VHDL description (Your Source Files) Library IEEE; use ieee.std_logic_1164.

all ; use ieee.std_logic_unsigned.

all ; entity RC5_core is port ( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector( 31 downto 0 ); data_output: out std_logic_vector( 31 downto 0 ); out_full: in std_logic; key_input: in std_logic_vector( 31 downto 0 ); key_read: out std_logic; ); end AES_core; Synthesis Functional simulation Post-synthesis simulation 68

Design process (2)

Implementation Timing simulation Configuration On chip testing 69

Active-HDL

70

Simulation and Synthesis Tools

71

Logic Synthesis

VHDL description architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; Circuit netlist 72

Features of synthesis tools

• Interpret RTL code • Produce synthesized circuit netlist in a standard EDIF format • Give preliminary performance estimates • Some can display circuit schematics corresponding to EDIF netlist 73

Implementation

• After synthesis the entire implementation process is performed by FPGA vendor tools • Xilinx ISE foundation 11.1i

• Altera Quartus II 9.2

• 3 rd party tools for alliance version 74

Circuit Compilation

1. Technology Mapping 2. Placement 3. Routing

LUT

?

LUT Assign a logical LUT to a physical location.

Select wire segments And switches for Interconnection.

75

Routing Example

Programmable Connections FPGA

76

Configuration

• Once a design is implemented, you must create a file that the FPGA can understand – This file is called a bit stream or configuration file • The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information 77

QUESTIONS?

THANK YOU

78