Design and Performance of a PCI Interface with four 2 Gbit/s

Download Report

Transcript Design and Performance of a PCI Interface with four 2 Gbit/s

Design and Performance of a
PCI Interface with four 2 Gbit/s
Serial Optical Links
Stefan Haas, Markus Joos
CERN
Wieslaw Iwanski
Henryk Niewodnicznski Institute of Nuclear Physics
LECC, 13.-17. Sept. 2004, Boston
Outline
● Introduction
● Interface Card Hardware
● Firmware Description
● Software
● Performance Measurements
● Summary
LECC 2004
-2-
S. Haas, 14. Sept. '04
Introduction
● DAQ systems for current and future experiments
depend on reliable high-speed data transmission
● S-LINK specification addresses this type of application:
►
►
►
►
►
Point-to-point data link, bandwidth 160 MB/s (32-bit @ 40 MHz)
Flow control (XON/XOFF)
Error detection (e.g. CRC),
Self-test mode & return line signals
CMC mezzanine card format
● ATLAS Read-Out Link (ROL)
►
►
►
►
LECC 2004
ROL implementation is based on S-LINK
Connects front-end electronics interface modules (Read-Out
Drivers) to the Read-Out system (ROS)
ROS is based on commodity PCs and custom PCI interface
cards (ROBin)
~1650 ROLs will be used in ATLAS
-3-
S. Haas, 14. Sept. '04
ROL Source Card
●
●
●
●
●
High-speed Optical Link for ATLAS (HOLA)
Standard S-LINK mezzanine card
Industry standard pluggable (SFP) 850nm F/O transceiver
Serial link speed 2 Gb/s with 8B10B line encoding
Low-power: ~2W typical
160MB/s
32bit @ 40MHz
S-LINK
Protocol
FPGA
SER
DES
Cage for SFP F/O
Transceiver
CMC mezzanine
Connector
LECC 2004
-4-
S. Haas, 14. Sept. '04
Quad S-LINK PCI Interface (FILAR)
● FILAR Features:
►
►
►
►
Four 2 Gb/s HOLA link channels integrated on-board
64-bit/66MHz PCI interface (3.3V slots only)
Move data between 4 link interfaces and the host PC memory
Based on S32PCI64 interface design: one slot for S-LINK
mezzanine card
● Applications: small readout systems for lab & test
beam
● FPGA-based (in-system reconfigurable)
►
PCI I/F implemented using a commercial PCI IP core
● Firmware versions:
►
►
►
LECC 2004
Quad S-LINK receiver (S-LINK to PCI)
Quad S-LINK transmitter (PCI to S-LINK)
Quad S-LINK data source (for performance measurements)
-5-
S. Haas, 14. Sept. '04
FILAR Hardware
SFP Fiber Optic Transceiver
SERDES
HOLA Interface FPGA
3.3V only(!)
PCI Interface FPGA
LECC 2004
-6-
64-bit/66MHz PCI interface
S. Haas, 14. Sept. '04
Receiver Firmware Operation
● Host processor:
►
● Interface card:
1) Fills a request FIFO on the
interface card with addresses
of free memory buffer pages
►
►
►
5) Reads the results from the
acknowledge FIFO and
processes the data
►
2) Transfers data fragments
from S-LINK to host memory
as bus master using PCI
bursts of up to 1kB for
maximum performance
3) Stores length, status and
control words for received
fragments in an acknowledge
FIFO
4) Asserts an interrupt
(optional)
● Protocol overhead of ~2 PCI single-cycles (SC) per data fragment
and channel:
►
►
LECC 2004
Write address of buffer memory page
Read length and status of received fragment
-7-
S. Haas, 14. Sept. '04
Receiver Firmware Block Diagram
S-LINK
S-LINK
S-LINK
S-LINK
INPUT
INPUT
INPUT
BUFFER
INPUT
BUFFER
BUFFER
FIFO
BUFFER
FIFO
FIFO
FIFO
PCI
PCI
PCI
BURST
PCI
BURST
BURST
FIFO
BURST
FIFO
FIFO
FIFO
DMA
ENGINE
REQUEST
REQUEST
REQUEST
FIFO
REQUEST
FIFO
FIFO
FIFO
66 MHz
64-bit
PCI
64-BIT
BACKEND
CONTROL
LOGIC
ACK.
ACK.
ACK.
FIFO
ACK.
FIFO
FIFO
FIFO
PCI
528MB/s
IP
CORE
CONTROL
&
STATUS
REGISTERS
LECC 2004
-8-
S. Haas, 14. Sept. '04
Firmware Optimization
● Single-cycles do not use the PCI bus efficiently
● Performance optimized version receiver firmware was
developed (DMA protocol firmware):
►
►
►
►
►
Interface card transfers request and acknowledge data using
DMA
CPU prepares a descriptor block with buffer addresses for one
or more channels in system memory
Firmware fetches the block using DMA and fills the on-board
request FIFOs
Firmware transfers a block with the length and status
information from the acknowledge FIFOs to the system memory
using DMA when a threshold is reached
Requires additional memory resources in the FPGA, only 3
receive channels can be implemented on the current hardware
● Reduces PCI bus overhead and CPU load
LECC 2004
-9-
S. Haas, 14. Sept. '04
Software
● FILAR software package:
►
►
►
Linux device driver (loadable module)
Library provides easy to use programming API for applications
Test and benchmarking programs
●
●
●
●
Software written in C
Separate drivers for the different receiver firmware versions
Supports multiple channels & PCI cards
Interrupt driven: device driver is called when a predefined number
of fragments are available in any channel
● Code optimised for maximising throughput
►
►
Manage the card with minimal attention from the application layer
Reduce the number of context switches
● Fully integrated into the ATLAS DataFlow software
● Requires cmem driver/library for allocation of contiguous memory
● Similar package available for the transmitter firmware
LECC 2004
- 10 -
S. Haas, 14. Sept. '04
Measurement Setup
● PC with Supermicro server
motherboard (ServerWorks
GC-LE chipset)
● 4 independent 64-bit PCI bus
segments
● Intel Xeon CPU (3 GHz)
● S-LINK input channels driven
by HOLA data sources
HOLA
S-LINK
FILAR
FILAR
FILAR
FILAR
FILAR
FILAR
64-bit PCI
66MHz
I/O BRIDGE
I/O BRIDGE
CPU
CPU
(XEON
(XEON
3GHz)
3GHz)
I/O BRIDGE
I/O BRIDGE
NORTH
NORTH
BRIDGE
BRIDGE
Memory
Memory
(DDR
(DDR266)
266)
● Chipset architecture is
important to obtain the
maximum performance
LECC 2004
- 11 -
S. Haas, 14. Sept. '04
Performance: Single-Cycle Firmware
500.0
145MB/s per channel
Aggregate Throughput [Mbyte/s]
450.0
360MB/s @ 1kB
400.0
350.0
187MB/s per channel
300.0
● FILAR receiver with SC
firmware
● Sawtooth structure
due to overhead for
setting up a PCI burst
(1kB)
● Performance for one
channel is limited by
link bandwidth
250.0
200.0
● Throughput with 3
channels is limited by
PCI interface
150.0
100.0
50.0
1 Chan SC
2 Chan SC
3 Chan SC
● Maximum throughput
is ~450MB/s
0.0
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Length [byte]
LECC 2004
- 12 -
S. Haas, 14. Sept. '04
Performance: DMA Protocol Firmware
500.0
● FILAR receiver with
DMA firmware
440MB/s @ 1kB
Aggregate Throughput [Mbyte/s]
450.0
● Better performance
than SC firmware, in
particular for short
fragments
400.0
350.0
300.0
● 25% improvement for
3 channels at 1kB
fragment length
250.0
200.0
● Performance for long
fragments is similar
for both firmware
versions
150.0
100.0
50.0
1 Chan DMA
0.0
0
500
1000
1500
2000
2 Chan DMA
2500
3000
3 Chan DMA
3500
4000
4500
Length [byte]
LECC 2004
- 13 -
S. Haas, 14. Sept. '04
Throughput: Multiple FILAR cards
● DMA protocol F/W
1200
● Maximum throughput
140MB/s per channel
Bandwidth [Mbyte/s]
1000
800
145MB/s per channel
600
400
147MB/s per channel
of 1.1GB/s with 3
receiver cards
● Throughput scales
with the number of
channels for fragments
of 2kB and more
● For fragments of 500B
and less the system is
rate limited
200
3 Chan.
0
0
500
6 Chan.
8 Chan.
1000 1500 2000 2500 3000 3500 4000 4500
Length [byte]
LECC 2004
- 14 -
S. Haas, 14. Sept. '04
Fragment Rate: Multiple FILAR cards
250
● Received data
fragment frequency
per channel vs.
fragment length
100kHz can be
sustained with 3
cards for fragments
of less than 1kB
6 Chan.
8 Chan..
200
Fragment Frequency [kHz]
● Fragment rates of
3 Chan.
150
100
100kHz @ 1kB
50
0
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Fragment Length [byte]
LECC 2004
- 15 -
S. Haas, 14. Sept. '04
S-LINK Transmitter Performance
400
● Transmitter
connected to a
FILAR receiver in
another PC
● PCI interface is
saturated with 2
active channels
● Maximum
throughput
obtained is
360MB/s
● PCI memory read
performance is not
as good as write
Aggregate Throughput [Mbyte/s]
350
300
250
200
150
100
50
1 Channel
2 Channels
3 Channels
0
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Fragment size [byte]
LECC 2004
- 16 -
S. Haas, 14. Sept. '04
Summary
● FILAR high-performance PCI interface card with 4 on-board 2 Gb/s
●
●
●
●
●
●
S-LINK channels (HOLA) has been designed
Quad S-LINK receiver, transmitter and data source firmware
versions have been developed and optimized
Software package with Linux device driver and API library are
integrated in the ATLAS DataFlow software
Maximum throughput for one receiver card ~450MB/s
Aggregate data rate of > 1GB/s to system memory has been
measured with 3 receiver cards
Event rates of over 100kHz can be achieved for 1kB fragments
FILAR applications and users:
►
►
►
►
LECC 2004
Test readout of front-end electronics interface modules
ATLAS subdetector groups (LAr, SCT, TileCal, TRT, Pixel, LVL1 Calo),
DAQ & ROBin, MDT chamber tests
Readout system for the ATLAS combined test beam
Stable design, ~50 cards produced so far
- 17 -
S. Haas, 14. Sept. '04