Transcript Xilinx XC4000 FPGA devices - Mahanakorn University of
Introduction to CPLD/FPGA
Technology, Devices and Tools
Theerayod Wiangtong
Electronic Department Mahanakorn University of Technology
1
Outline
• Programmable Logic – CPLD – FPGA • Architecture: Basic & Advance • Examples • Features • Vendors and Devices • Design Tools 2
World of Integrated Circuits
Full-Custom ASICs Semi-Custom ASICs User Programmable PLD
FPGA
3
ASIC
•
ASIC: A
pplication
S
pecific
I
ntegrated
C
ircuit • Designs must be sent for expensive and time consuming fabrication in semiconductor foundry • Designed all the way from behavioral description to physical layout 4
CPLD/FPGA
• •
CPLD: C FPGA: F
omplex ield
P P
rogrammable rogrammable • No minimum quantity order • Reprogrammable
G L
ate ogic
A D
rray evice • Small development overhead • No NRE (non-recurring engineering) costs • Quick time to market 5
Which Way to Go?
High performance Low power Low cost in high volumes
ASIC
s Off-the-shelf Low development cost Short time to market Reconfigurability
CPLD/FPGA
s 6
Other Advantages
• Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower – Mistakes not detected at design time have large impact on development time and cost – FPGAs are perfect for rapid prototyping of digital circuits • Easy upgrades like in case of software • Unique applications – Reconfigurable computing 7
Programmable Logic CPLD/FPGA
8
Programmable Logic
• Programmable digital integrated circuit • Standard off-the-shelf parts • Desired functionality is implemented by configuring on chip logic blocks and interconnections • Types of programmable logic: – Complex PLDs (CPLD) – Field programmable Gate Arrays (FPGA) 9
PLD - Sum of Products
Programmable AND array followed by fixed fan-in OR gates
A B C
Programmable switch or fuse
f
1
A
B
C
A
B
C f
2
A
B
A
B
C
AND plane
10
PLD - Macrocell
Can implement combinational or sequential logic
Select Enable A B C
f
1
Flip-flop D Q MUX Clock AND plane
11
CPLD Structure
Integration of several PLD blocks with a programmable interconnect on a single chip • • •
PLD Block PLD Block
• • • • • •
PLD Block PLD Block
• • • 12
CPLD Example - Altera MAX7000
EPM7000 Series Block Diagram
13
CPLD Example - Altera MAX7000
EPM7000 Series Device Macrocell
14
FPGA Architecture
15
FPGA - Generic Structure
Logic block
• • •
FPGA building blocks: Programmable logic blocks Implement combinatorial and sequential logic Programmable interconnect Wires to connect inputs and outputs to logic blocks Programmable I/O blocks Special logic blocks at the periphery of device for external connections Interconnection switches I/O I/O
16
FPGA – Basic Logic Element
• LUT to implement combinatorial logic • Register for sequential circuits • Additional logic (not shown): – Carry logic for arithmetic functions – Expansion logic for functions requiring more than 4 inputs
Select Out A B C D LUT D Q Clock
17
Look-Up Tables (LUT)
• Look-up table with N-inputs can be used to implement any combinatorial function of N inputs • LUT is programmed with the truth-table
A
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
B
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1
C
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1
D
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
Truth-table Z 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 A B C D LUT Z LUT implementation A B C D Gate implementation Z
18
LUT Implementation
• Example: 3-input LUT • Based on multiplexers (pass transistors) • LUT entries stored in configuration memory cells
X1 X2 Configuration memory cells X3 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1 F
19
Programmable Interconnect
• Interconnect hierarchy (not shown) – Fast local interconnect – Horizontal and vertical lines of various lengths
LE LE LE Switch Matrix Switch Matrix LE LE LE
20
Switch Matrix Operation
Before Programming After Programming • 6 pass transistors per switch matrix interconnect point • Pass transistors act as programmable switches • Pass transistor gates are driven by configuration memory cells 21
Configuration Storage Elements
• Static Random Access Memory (SRAM) – each switch is a pass transistor controlled by the state of an SRAM bit – FPGA needs to be configured at power-on • Flash Erasable Programmable ROM (Flash) – each switch is a floating-gate transistor that can be turned off by injecting charge onto its gate. FPGA itself holds the program – reprogrammable, even in-circuit • Fusible Links (“Antifuse”) – Forms a forms a low resistance path when electrically programmed – one-time programmable in special programming machine – radiation tolerant 22
FPGA Technology Roadmap
Year 1995 1996 1997 2000 2003 2004 Technology 0.6µ 0.35 µ 0.25 µ 0.18 µ 0.13 µ 0.09µ Transistor count 3.5M
12M 23M 75M 430M 1B 23
Special Features
• Clock management – PLL,DLL – Eliminate clock skew between external clock input and on-chip clock – Low-skew global clock distribution network • Embedded memory blocks • Support for various interface standards • High-speed serial I/Os • Embedded processor cores • DSP blocks 24
FPGA Vendors & Device Families
• Xilinx – Virtex-II/Virtex-4: Feature packed high-performance SRAM-based FPGA – Spartan 3: low-cost feature reduced version – CoolRunner: CPLDs • Altera – Stratix/Stratix-II • High-performance SRAM-based FPGAs – Cyclone/Cyclone-II • Low-cost feature reduced version for cost-critical applications – MAX3000/7000 CPLDs – MAX-II: Flash-based FPGA • Actel – Anti-fuse based FPGAs • Radiation tolerant – Flash-based FPGAs • Lattice – Flash-based FPGAs – CPLDs (EEPROM) • QuickLogic – ViaLink-based FPGAs 25
State of the Art in FPGAs
• 90 nm process on 300 mm wafers – Lower cost per function (LUT + register) – Smaller and faster transistors: Higher speed • System speed up to 500 MHz – Mainly through smart interconnects, clock management, dedicated circuits, flexible I/O. – Integrated transceivers running at 10 Gigabits/sec • More Logic and Better Features: – >100,000 LUTs & flip-flops – >200 embedded RAMs, and same number 18 x 18 multipliers • 1156 pins (balls) with >800 GP I/O – 50 I/O standards, incl. LVDS with internal termination • 16 low-skew global clock lines – Multiple clock management circuits • On-chip microprocessor(s) and multi-Gbps transceivers 26
Latest Devices: Capacity & Features
• • • • • • • •
Xilinx Virtex-4 90nm process Up to 960 I/Os >200000 logic cells Up to 552 18kb block RAMs (~10Mb RAM) 192 DSP slices (18x18 multiplier accumulator) 20 digital clock managers (DCM) 24 high-speed serial transceivers (622Mb/s to 11.1Gb/s) Up to four PowerPC 405 cores
• • • • • • • •
Altera Stratix-II 90nm process Up to 1170 I/Os 179000 logic elements 9.6Mb embedded RAM 96 DSP blocks: 380 18x18 multipliers 12 PLLs Serial I/O up to 1Gb/s No hard processor cores
27
ALTERA
28
Device Families & Tools
29
Device Roadmap
30
Technology
31
Logic Density
32
Pricing Roadmap
33
FLEX10K Basic Architecture
34
Logic Array Block: FLEX10K
35
Logic Element of FLEX10K
36
Advance Altera Architecture
37
Stratix Device
38
Stratix Device Family
39
Altera: Embedded DSP Blocks
• Two DSP Block columns per device • Number varies by height of column • Can implement: – Eight 9x9 multipliers – Four 18x18 multipliers – One 36x36 multiplier • Contains adder/subtractor/accumulator • Registered inputs can become shift register 40
Altera: Embedded DSP Block
41
Embedded RAM
Dual-Port RAM – M512 – 512 x 1 – M4K – 4096 x 1 – M-RAM – 64K x 8 42
Embedded RAM Block
43
ALTERA High Speed I/O
44
Embedded Processor
• Soft Processor: NIOS 32bit @150MHz • Hard Processor: ARM922T 32bit RISC @200 MHz (Excalibur device) • Additional features – Communication Controller – Integrated MMU (Memory Management Unit) – High-Speed Memory Interface – C-Level Simulation – Multi-Processor Support 45
NIOS II Family
46
Max II Device
47
Xilinx
48
Product Overview
High Volume Low Cost High Performance High Density CPLD Rom-based Low Power Low Cost 49
Xilinx FPGA Families
• • •
Old families
– XC3000, XC4000, XC5200 – Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs.
High-performance families
– Virtex (0.22µm) – Virtex-E, Virtex-EM (0.18µm) – Virtex-II, Virtex-II PRO (0.13µm)
Low Cost Family
– Spartan/XL – derived from XC4000 – Spartan-II – derived from Virtex – Spartan-IIE – derived from Virtex-E – Spartan-3 50
Basic FPGA Architecture Spartan-II
51
CLB Structure
G4 G3 G2 G1
Look-Up Table
O COUT Carry & Control Logic YB Y F5IN BY SR F4 F3 F2 F1
Look-Up Table
O Carry & Control Logic XB X
D CK S EC R Q
G4 G3 G2 G1 F5IN BY SR
D CK S EC R Q
F4 F3 F2 F1 SLICE
Look-Up Table
O COUT Carry & Control Logic YB Y
Look-Up Table
O Carry & Control Logic XB X • Contains 2 slices • Each slice has 2 LUT-FF pairs with associated carry logic • Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs
D CK S EC R Q D CK S EC R Q
SLICE 52
CLB Slice Structure
• Each slice contains two sets of the following: – Four-input LUT • Any 4-input logic function, • or 16-bit x 1 sync RAM • or 16-bit shift register – Carry & Control • Fast arithmetic logic • Multiplier logic • Multiplexer logic – Storage element • Latch or flip-flop • Set and reset • True or inverted inputs • Sync. or async. control 53
Example: 5-Input Functions implemented using two LUTs
X 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 X 4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 X 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 X 3 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 X 2 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 Y 0 1 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0
OUT
A4 A3 A2 A1
LUT ROM RAM
WS DI D 54
Dedicated Expansion Multiplexers
• MUXF5 combines 2 LUTs to create – Any 5-input function (LUT5) – Or selected functions up to 9 inputs – Or 4x1 multiplexer • MUXF6 combines 2 slices to form – Any 6-input function (LUT6) – Or selected functions up to 19 inputs – 8x1 multiplexer
CLB Slice LUT LUT MUXF5 Slice LUT LUT MUXF5 MUXF6
55
Distributed RAM
• CLB LUT configurable as Distributed RAM – A LUT equals 16x1 RAM – Implements Single and Dual-Ports – Cascade LUTs to increase RAM size • Synchronous write • Synchronous/Asynchronous read – Accompanying flip-flops used for synchronous read LUT LUT LUT =
RAM16X1S D WE WCLK A0 A1 A2 A3 O
=
RAM32X1S D WE WCLK A0 A1 A2 A3 A4 O or RAM16X2S D0
D1
WE WCLK A0 A1 A2 A3 O0 O1 or RAM16X1D D WE WCLK A0 A1 SPO A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3
56
Fast Carry Logic
Each CLB contains separate logic and routing for the fast generation of sum & carry signals – Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources
MSB LSB
57
Basic I/O Block Structure
• • • • Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered Inputs can be delayed
Three-State FF Enable Clock Set/Reset Output FF Enable Direct Input FF Enable Registered Input
Q D EC SR D EC Q SR
Three-State Control
D EC Q SR
Output Path Input Path
58
Advance Xilinx Architecture
59
Virtex-II Pro
• 130nm CMOS Copper Low-K • 1200 I/Os, 1696 Pin Package • 125,000 Logic Cells • 10 Megabits of RAM • 556 XTREME DSP Multipliers • 16 3.125 Gbps transceivers • 4 PowerPC CPUs
Virtex-II Pro
60
Vertex-II Pro
PowerPC 405 Dedicated multipliers and memory Digital Clock Management (DCM) provides • 16 independent clock domains • Clock divide, multiply, phase shift • Enhanced Phase Locked Loops (PLLs) Routing Resources (90%) 61
Block RAM
• Most efficient memory implementation – Dedicated blocks of memory • Ideal for most memory requirements – 4 to 14 memory blocks • 4096 bits per blocks – Use multiple blocks for larger memories • Builds both single and true dual-port RAMs
Spartan-II True Dual-Port Block RAM Block RAM
62
Dual-Port Bus Flexibility
Port A In 1K-Bit Depth Port B In 256-Bit Depth RAMB4_S 4 _S 16 WEA ENA RSTA CLKA ADDRA [9:0] DIA [3:0] DOA [3:0] WEB ENB RSTB CLKB ADDRB [7:0] DIB [15:0] DOB [15:0] Port A Out 4-Bit Width Port B Out 16-Bit Width
• Each port can be configured with a different data bus width • Provides easy data width conversion without any additional logic 63
Two Independent Single-Port RAMs
Port A In 2K-Bit Depth VCC, ADDR[10:0] Port B In 2K-Bit Depth GND, ADDR[10:0]
RAMB4_S 1 _S 1 WEA ENA RSTA CLKA ADDRA [10:0] DIA [0] DOA [0] WEB ENB RSTB CLKB ADDRB [10:0] DIB [0] DOB [0]
Port A Out 1-Bit Width Port B Out 1-Bit Width • Can split a Dual-Port 4K RAM into two Single-Port 2K RAM – Simultaneous independent access to each RAM • To access the lower RAM – Tie the MSB address bit to Logic Low • To access the upper RAM – Tie the MSB address bit to Logic High 64
Rocket I/O
• From 4 to 24 RocketIO MGTs per Virtex-II Pro™ device • Continuous operating range 622 Mbps to 3.125 Gbps
Virtex 4: 11.1 Gbps !!!
65
Embedded Processor
• Soft Processor: MicroBlaze 32bit @150MHz • Hard Processor: IBM PowerPC405 32bit RISC @300MHz (in Vertex-II Pro) – Low Power Consumption: 0.9 mW/MHz – Five-Stage Data Path Pipeline – Hardware Multiply/Divide Unit – Thirty-Two 32-bit General Purpose Registers – Memory Management Unit (MMU) – Dedicated On-Chip Memory (OCM) Interface – Supports IBM CoreConnect™ Bus Architecture – Debug and Trace Support 66
FPGA Design Tools
67
Design process (1)
Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds…..
Specification (Lab Experiments) VHDL description (Your Source Files) Library IEEE; use ieee.std_logic_1164.
all ; use ieee.std_logic_unsigned.
all ; entity RC5_core is port ( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector( 31 downto 0 ); data_output: out std_logic_vector( 31 downto 0 ); out_full: in std_logic; key_input: in std_logic_vector( 31 downto 0 ); key_read: out std_logic; ); end AES_core; Synthesis Functional simulation Post-synthesis simulation 68
Design process (2)
Implementation Timing simulation Configuration On chip testing 69
Active-HDL
70
Simulation and Synthesis Tools
71
Logic Synthesis
VHDL description architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; Circuit netlist 72
Features of synthesis tools
• Interpret RTL code • Produce synthesized circuit netlist in a standard EDIF format • Give preliminary performance estimates • Some can display circuit schematics corresponding to EDIF netlist 73
Implementation
• After synthesis the entire implementation process is performed by FPGA vendor tools • Xilinx ISE foundation 11.1i
• Altera Quartus II 9.2
• 3 rd party tools for alliance version 74
Circuit Compilation
1. Technology Mapping 2. Placement 3. Routing
LUT
?
LUT Assign a logical LUT to a physical location.
Select wire segments And switches for Interconnection.
75
Routing Example
Programmable Connections FPGA
76
Configuration
• Once a design is implemented, you must create a file that the FPGA can understand – This file is called a bit stream or configuration file • The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information 77
QUESTIONS?
THANK YOU
78