Introduction To VIRTEX II Architecture
Download
Report
Transcript Introduction To VIRTEX II Architecture
Introduction To VIRTEX II
Architecture
Presented By:
Ankur Agarwal
Xilinx Design Flow
Plan & Budget
Create Code/
Schematic
HDL RTL
Simulation
Implement
Translate
Functional
Simulation
Synthesize
to create netlist
Map
Place & Route
Attain Timing
Closure
Timing
Simulation
Create
Bit File
Xilinx Architecture features
High performance at 2.5, 3.3V and 5V
Technology Independence
EDIF, VHDL, Verilog, SDF Interface
Footprint compatibility
Devices with each family are compatible
with each other
Pin locking
VIRTEX
Up to 2 Million System Gates at 100+
MHz
Features:
Distributed and Block RAM available
Low Power
Delay Logic Loops
2.5V Internal Operation with support of
common power
Naming Conventions
XC4028XL-3-BG256
Package
Speed Grade
Sub-Family (3V = XL, 5V = no XL)
No. of Gates
Family (4000, 9500)
Spartan starts with XCS
CPLD and FPGA
Complex Programmable Logic
Device (CPLD)
Field-Programmable Gate Array
(FPGA)
Architecture
PAL/22V10-like
More Combinational
Gate array-like
More Registers + RAM
Density
Low-to-medium
0.5-10K logic gates
Medium-to-high
1K to 3.2M system gates
Performance
Predictable timing
Up to 250 MHz today
Application dependent
Up to 200 MHz
Interconnect
“Crossbar Switch”
Incremental
Overview of Xilinx FPGA
Architecture
I/O Blocks (IOBs)
Programmable
Interconnect
Configurable
Logic Blocks (CLBs)
Tristate
Buffers
Global
Resources
Block Diagram of VIRTEX-II
Architecture
SONET / SDH
DCM
Distri
RAM
PCI-X
PCI
18Kb
BRAM
LVDS
CAM
FIFO
Shift
Registers
DDR
DDR
SDRAM
DDR
QDR
SRAM
DDR
CAM
Multiplier
BLVDS
Backplane
CLB Resources
Basic resource unit is the Logic Cell
1 CLB contains 2 - 4 Logic Cells, depending on device family
Logic Cell = 4-input Look-Up Table (LUT) + D Flip-flop
LUT capacity limited by number of inputs, not complexity of
function
LUTs can be used as ROM or synchronous RAM
Flip-flop can be configured as a transparent latch in Virtex and
Spartan-II
LUT
FF
Closer Look at a CLB Structure
COUT
G4
G3
G2
G1
Look-Up
Table O
Carry
&
Control
Logic
COUT
YB
Y
D
S
Q
CK
EC
CIN
CLK
CE
Look-Up
Table O
R
F5IN
BY
SR
F4
F3
F2
F1
G4
G3
G2
G1
Carry
&
Control
Logic
YB
Y
D
S
Q
CK
EC
R
F5IN
BY
SR
Look-Up
Table O
Carry
&
Control
Logic
XB
X
D
S
CK
EC
Q
F4
F3
F2
F1
R
SLICE
CIN
CLK
CE
Look-Up
Table O
Carry
&
Control
Logic
XB
X
D
S
Q
CK
EC
R
SLICE
Each slice has 2 LUT-FF pairs with associated carry logic
Two 3-state buffers (BUFT) associated with each CLB, accessible
by all CLB outputs
Interconnect Technology
Offered by VIRTEX-II
Interconnect an array of switch matrices
All Virtex II features can access routing resources
through the switch matrix
Simplify design and place & route
Switch
Matrix
CLB
Switch
Matrix
IOB
Switch
Matrix
DCM
Switch
Matrix
Switch
Matrix
Switch
Matrix
Switch
Matrix
18Kb
BRAM
MULT
18x18
Simplified SLICE Structure
Each Slice has four outputs:
Two registered outputs
Two non-registered outputs
Two BUFTs associated, accessible by all 16
CLB outputs
Carry Logic for fast addition
Two independent carry chain per CLB
Fast Carry Logic
Each CLB contains separate
logic and routing for the fast
generation of carry signals
Increases efficiency and
performance of adders,
subtractors, accumulators,
comparators, and counters
Carry logic is independent of
normal logic and routing
resources
MSB
Carry Logic
Routing
LSB
CLB (Configurable Logic Blocks)
Each CLB is connected to one switch matrix
Providing access to general routing resources
COUT
COUT
TBUF
TBUF
Slice S3
X1Y1
Switch
Matrix
Slice S2
X1Y0
SHIFT
Slice S1
X0Y1
Slice S0
X0Y0
CIN
Fast Connects
CIN
High level of logic integration
Wide-input functions:
—16:1 multiplexer in 1 CLB or any
function
—32:1 multiplixer in 2 CLBs
(1 level of LUT)
Fast arithmetic functions
—2 look-ahead carry chains
per CLB column
Addressable shift registers in LUT
—16-b shift register in 1 LUT
—128-b shift register in 1 CLB
(dedicated shift chain)
Four-Input LUT
Implements combinatorial logic
Any 4-input logic function
Cascaded for wide-input functions
4-input logic function
Truth Table
Inputs(ABCD) Output(Z)
0000
0
0001
0
0010
1
0011
0
……
..
1110
1
1111
1
A
LUT
=
B
Z
C
D
Multiplexers
MUXF5 combines 2 LUTs to create
4x1 multiplexer
Or any 5-input function (LUT5)
Or selected functions up to 9 inputs
MUXF6 combines 2 slices to form
8x1 multiplexer
Or any 6-input function (LUT6)
Or selected functions up to 19 inputs
Dedicated muxes are faster and more
space efficient
CLB
Slice
LUT
MUXF6
LUT
MUXF5
Slice
LUT
LUT
MUXF5
CLB Multiplexers
CLB Multiplexer Location
F5
F8
MUXF8 combines the 2 MUXF7 outputs
(Two CLB)
Slice S3
F5
F6
MUXF6 combines Slices X1Y0 & X1Y1
Slice S0
MUXF6 combines Slices X0Y0 & X0Y1
F5
F6
Slice S1
MUXF7 combines the 2 MUXF6 outputs
F5
F7
Slice S2
CLB
Horizontal Cascade Chain
Wide AND-OR functions (Sum Of Products)
SOP
Slice S3
SOP
Slice S3
Slice S2
SOP
Slice S3
Slice S2
Slice S2
Slice S1
Slice S1
Slice S1
Slice S0
Slice S0
Slice S0
CLB
CLB
CLB
Shift Register
LUT
Each LUT can be configured
as shift register
Serial in, serial out
Dynamically addressable
delay up to 16 cycles
For programmable pipeline
Cascade for greater cycle
delays
Use CLB flip-flops to add
depth
IN
CE
CLK
LUT
=
DEPTH[3:0]
D
CE
Q
D
CE
Q
D
CE
Q
D
CE
Q
OUT
Shift Register
12 Cycles
64
Operation A
Operation B
4 Cycles
8 Cycles
64
Operation C
3 Cycles
3 Cycles
9-Cycle imbalance
Register FPGA
Allows for addition of pipeline stages to increase throughput
Data paths must be balanced to keep desired functionality
Shift Register Look-Up Table
High density integration of shift registers
DSP applications use SRL16 for delay matching
CDMA wireless and video applications require shift
registers
Up to 128-b per CLB
Cascadable output
Dynamic addressable output
16-b per LUT
Multiple SRLC16 cascadable to any length
Digital Clock Manager
High-Speed 420 MHz clock generation:
Clock de-skew on-chip and off-chip
Up to 12 DCM per device
Fully digital circuitry
Flexible Frequency Synthesis
Synthesis outputs: clock 0° & 180° (def.: 4X)
High-Resolution Phase Shifting
DPS fixed and variable modes
Delay-Locked Loop (DLL)
Precise Clock De-Skew
DLL outputs: clock 0°, 90°, 180°, 270°
DLL outputs: clock 2X and clock division
50/50 duty cycle correction
Digital Clock Manager: DCM
Delay-Locked Loop
Clock phase de-skew
Duty cycle correction
Temperature compensation
RST input
LOCKED output
Attributes:
DCM
CLKIN
CLKFB
RST
DSSEN
PSINCDEC
PSEN
PSCLK
CLK0
CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV
CLKFX
CLKFX180
LOCKED
STATUS[7:0]
PSDONE
Clock signal
Control signal
DUTY_CYCLE_CORRECTION
DLL_FREQUENCY_MODE
CLKDV_DIVIDE = 1.5 to 16.0
STARTUP_WAIT
CLK_FEEDBACK = CLK0 or
CLK2X
Up to 4 clock outputs per DCM
Advanced Frequency
Synthesis
DCM
CLKIN
CLKFB
RST
DSSEN
PSINCDEC
PSEN
PSCLK
CLK0
CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV
CLKFX
CLKFX180
LOCKED
STATUS[7:0]
PSDONE
Clock signal
Control signal
Frequency Synthesis
CLKFX is any M / D product of
CLKIN frequency
M = 2 to 32, D = 1 to 32
Default: M=4, D=1 (4X CLKIN)
Always nominal 50/50 duty-cycle
Attributes:
CLKFX_MULTIPLY (integer)
CLKFX_DIVIDE (integer)
DFS_FREQUENCY_MODE
After LOCKED:
FreqCLKFX = (M/D) x FreqCLK IN
High Resolution Phase
Shifting
DCM
CLKIN
CLKFB
RST
DSSEN
PSINCDEC
PSEN
PSCLK
CLK0
CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV
CLKFX
CLKFX180
LOCKED
STATUS[7:0]
PSDONE
Clock signal
Control signal
Fine Phase Shifting
Applies to all CLK outputs
Phase shift = fraction CLKIN period
Fixed or variable modes
Inputs in variable mode:
PSINCDEC input =Increase
/Decrease
PSEN = Enable Phase Shift
PSCLK synchronizes Phase Shift
PSDONE output
Attributes:
CLOCKOUT_PHASE_SHIFT =
NONE, FIXED, VARIABLE
PHASE_SHIFT (signed integer)
-255 to +255
Global Clocks
Up to 16 Dedicated Low Skew Clocks
16 global clock multiplexers & buffers
8 clock nets in each quadrant
Global clock ENABLE
Switch glitch-free from one clock to another
16 clock pads (can be used as user I/O)
Clock Distribution
16 Global Clock Multiplexers
Eight on the top
Eight on the bottom
Switch “glitch free” from 1 clock to the
other
NW
8 Clocks selectable per
8
quadrant
NW
8 BUFGMUX
8 BUFGMUX
8 BUFGMUX
NE
8
8 max
16 Clocks
NE
8
16 Clocks
SW
Unused Branches are Disable
(Power Saving)
8
SW
SW
8 BUFGMUX
SE
Use Global Buffers to
Reduce Clock Skew
•Global buffers are connected to dedicated routing.
•This routing network is balanced to minimize skew
•All Xilinx FPGAs have global buffers
D
D
Q
CLK2
Q
BUFG
CLK1
BUFG
Introduces clock skew between CLK1 and
CLK2
Uses an extra BUFG to reduce skew on CLK2
Design contains 2 clock signals
Global Clocks: BUFGMUX
Three modes:
Clock buffer
Stop the clock High or Low
BUFGCE (stop Low)
Clock multiplexer “glitch-free”
O
I
O
Low skew clock distribution
BUFG primitive
Clock enable
I
Switch from one clock to another
BUFGMUX
unrelated clocks
CE
I0
I1
BUFGMUX
O
S
No pulse width shorter than
1/2 of the period
Memory
On-Chip SelectRAMTM Memory
DSP Coefficients
Small FIFOs
CAM
Shallow/Wide
128x1
Distributed RAM
bytes
Large FIFOs
Packet Buffers
Video Line Buffers
Cache Tag Memory
CAM
Deep/Wide
Up to
400 Mbps/pin
DDR & QDR
18 kb
Blocks
Block RAM
kilobytes
Terabit Memory Continuum
External RAM/CAM
megabytes
Embedded 18 kb Block RAM
Up to 3 Mb on-chip block RAM
High internal buffering bandwidth
Reduced I/O count and more embedded memory
18Kbit block RAM
Parity bit locations (parity in/out busses)
Data width up to 36 bits
3 WRITE modes
Output latches Set/Reset
True Dual-Port RAM
Independent clock (async.) & control
Distributed RAM
RAM16X1S
CLB LUT configurable as
Distributed RAM
A LUT equals 16x1 RAM
Implements Single and
DualPorts
Cascade LUTs to increase RAM
size
Synchronous write
Synchronous/Asynchronous read
Accompanying flip-flops used
for synchronous read
D
WE
WCLK
A0
A1
A2
A3
=
LUT
O
RAM32X1S
D
WE
WCLK
A0
A1
A2
A3
A4
LUT
=
LUT
or
O
RAM16X2S
D0
D1
WE
WCLK
A0
A1
A2
A3
O0
O1
or
RAM16X1D
D
WE
WCLK
A0
SPO
A1
A2
A3
DPRA0 DPO
DPRA1
DPRA2
DPRA3
18 x 18 Embedded Multiplier
Fast arithmetic functions
Optimized to implement multiply /
accumulate modules
18 x 18 signed multiplier
Fully combinatorial
Optional registers with CE & RST (pipeline)
Independent from adjacent block RAM
18 x 18 Multiplier
Embedded 18-bit x 18-bit multiplier
2’s complement signed operation
Multipliers are organized in columns
Data_A
(18 bits)
18 x 18
Multiplier
Data_B
(18 bits)
Output
(36 bits)
Basic I/O Block Structure
D Q
EC
Three-State
FF Enable
Clock
SR
Three-State
Control
Set/Reset
D Q
EC
Output
FF Enable
SR
Output Path
Direct Input
FF Enable
Registered
Input
Q
D
EC
SR
Input Path
I/O Signal Types
I/O Signal Type
Single-Ended
LVCMOS
HSTL
SSTL
Differential
LVTTL
NOTE: Only the popular IO types shown here
LVDS
Bus LVDS
LVPECL
IOB: Double Data Rate
Registers
DDR registers can be clocked by
Clock and not (clock) if the duty cycle is 50/50
CLK0 and CLK180 DLL outputs
CLK
DATA_1
DATA_2
Dual Data Rate
D1A
D1B
D2A
D1A
D1C
D2B
D2A
D1B
D2C
D2B
D1C
Built-In HSTL II Support
What is the advantage of using HSTL Class II?
High-speed IO interface
Bi-directional
Double parallel termination
Vtt = 0.75V
Vtt = 0.75V
R=50
R=50
Zo = 50
Vref = 0.75V
Digitally Controlled
Impedance
Dynamically adjusted termination resistors
Provides drivers that matched to the impedance of the traces
Provides on-chip termination
Transmitter or receiver
On-Chip termination advantages:
No termination resistors on board
Improve signal integrity by eliminating stub reflection
Eliminates the need for source termination (single-ended I/O)
Reduces board routing headaches and component count
Virtex-II Family: Four and Six
Columns Block RAM & Multiplier
Device
XC2V250
Virtex-II Family Members
Device
XC2V
40
CLB Array
18Kb
BRAM
8x
8
80
250
500
16 x
8
24 x
16
32 x
24
1000 1500 2000 3000 4000 6000 8000
40 x
32
48 x
40
56 x
48
64 x
56
80 x
72
96 x
88
112 x
104
4
8
24
32
40
48
56
96
120
144
168
Multiplier
4
8
24
32
40
48
56
96
120
144
168
DCM
4
4
8
8
8
8
8
12
12
12
12
88
120
200
264
432
528
624
720
Max IOB
912 1,104 1,296
2 Columns
4 Columns
6 Columns
BRAM & Multipliers
BRAM & Multipliers
BRAM & Multipliers
VIRTEX-II Packaging
Device XC2V
Max user I/Os
CS144
FG256
FG456
FG676
FF896
FF1152
FF1517
BG575
BG728
BF957
40
88
88
88
80
120
92
120
250
200
92
172
200
500
1000
264
432
172
264
172
324
432
1500
2000
3000
528
624
720
392
528
456
624
484
720
328
392
408
456
624
516
684
4000
6000
8000
912
(1296)
1,104 1108
824
912
824
1,104
824
1,108
684
684
684
FF and BF are flip-chip ball grid arrays packages
Pinout compatibility inside same color rectangle