SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

Download Report

Transcript SCARIe FABRIC A pilot study of distributed correlation Huib Jan van Langevelde Ruud Oerlemans Nico Kruithof Sergei Pogrebenko and many others…

SCARIe FABRIC
A pilot study of distributed correlation
Huib Jan van Langevelde
Ruud Oerlemans
Nico Kruithof
Sergei Pogrebenko
and many others…
What correlators do…
• Synthesis imaging simulates a very large telescope
• by measuring Fourier components of sky brightness
• on each baseline pair
• Sensitivity is proportional to √bandwidth
• optimal use of available recording bandwidth
• by sampling 2 bits (4 level) at Nyquist rate
• Correlator calculates ½N(N-1) baseline outputs
• after compensating for the geometry of array
• Integrates output signal to something relatively slow
• and samples with delay/frequency resolution
huib 02/11/06
GiGaPort meeting SURF Utrecht 2 Nov 2006
2/17
EVN MkIV data processor at JIVE
• Implements this in custom silicon
• 16 stations input from tapes
• now hard-disks and fibres
• Input data is 1 Gb/s max
• 1 or 2 bit sampled
• up to 16 sub-bands
• format includes time codes
• “Super computer” 1024 chips
• 256 complex correlations each
• at 32 MHz clock
• Around 100 T-operations/sec
huib 02/11/06
• 2 bit only!
• Depends a bit how you do it
Should next correlator also use special hardware?
GiGaPort meeting SURF Utrecht 2 Nov 2006
3/17
• Time critical, keep up with
input
• example: LOFAR on BlueGene
GbE
switch
BG/L
Rack
BG/L
Rack
Cluster of servers
10 TB RAID per node
Infiniband interconnect
GbE
switch
BG/L
Rack
GbE switch
GbE
switch
BG/L
Rack
GbE
switch
Cluster of servers
4BG RAM/node
Infiniband interconnect
10 GbE
switch
10 GbE
switch
10 GbE
switch
•Can be implemented on
standard computing?
10 GbE
switch
10 GbE
switch
Next generation…
Cluster of servers
general purpose nodes
Infiniband interconnect
Cluster of servers
general purpose nodes
Infiniband interconnect
LOFAR central processor
• Higher precision and new
applications
• Better sensitivity, interference
mitigation, spacecraft
navigation
•Can CPU cycles be found on
the Grid?
• From 16 antenna @ 1Gb/s
(eVLBI)
FABRIC eVLBI
SKA inner core (5km)
• And growing…
• To 1000s at 100 Gb/s (SKA)
• Pilot projects FABRIC & SCARIe
huib 02/11/06
• Connectivity, workflow
• Real-time resource allocation
GiGaPort meeting SURF Utrecht 2 Nov 2006
4/17
Tflops, Pflops…
•2 bit operations ⇒
floating point
• Results in enormous
computing tasks
• Very few operations /
bit
• Some could be
associated with
telescope
typical VLBI problems
description
1 Gb/s full array
typical eVLBI continuum
typical spectral line
FABRIC demo
future VLBI
N
N
data-rate
N
telescopes subbands [Mb/s] spect/prod
16
16
1024
16
8
8
128
16
10
2
16
512
4
2
16
32
32
32
4096
256
Tflops
83.89
2.62
16.38
0.16
21474.84
huib 02/11/06
SKA not even in here…
Rough estimate based on XF correlation
GiGaPort meeting SURF Utrecht 2 Nov 2006
5/17
SCARIe FABRIC
•EC funded project EXPReS (03/2006)
• To turn eVLBI into an operational system
• Plus: Joint Research Activity: FABRIC
• Future Arrays of Broadband Radio-telescopes on Internet Computing
•One work-package on 4Gb/s data acquisition and transport
(Jodrell Bank, Metsahovi, Onsala, Bonn, ASTRON)
•One work-package on distributed correlation (JIVE, PSNC Poznan)
•Dutch NWO funded project SCARIe (10/2006)
• Software Correlator Architecture Research and Implementation for eVLBI
• Collaboration with SARA and UvA
• Use Dutch Grid with configurable high connectivity: StarPlane
• Software correlation with data originating from JIVE
•Complementary projects with matching funding
• International and national expertise from other partners
• Total of 9 man year at JIVE, plus some matching from staff
• plus similar amount at partners
huib 02/11/06
GiGaPort meeting SURF Utrecht 2 Nov 2006
6/17
Aim of the project
• Research the possibility of distributed correlation
• Using the Grid for getting the CPU cycles
• Can it be employed for the next generation VLBI correlation?
• Exercise the advantages of software correlation
• Using floating point accuracy and special filtering
• Explore (push) the boundaries of the Grid paradigm
• “Real time” applications, data transfer limitations
• To lead to a modest size demo
• With some possible real applications:
• Monitoring EVN network performance
• Continuous available eVLBI network with few telescopes
•Monitoring transient sources
•Astrometry, possibly of spectral line sources
• Special correlator modes: spacecraft navigation, pulsar gating
• Test bed for broadband eVLBI research
huib 02/11/06
Something to try on the roadmap for the next generation correlator,
even if you do not believe it is the solution…
GiGaPort meeting SURF Utrecht 2 Nov 2006
7/17
Previous experience on Software correlation
• Builds on previous
experience at JIVE
• regular and automated network
performance tests
• Using Japanese software
correlator from NICT
• Huygens extreme narrow band
correlation
• Home grown superFX with subHz resolution
huib 02/11/06
GiGaPort meeting SURF Utrecht 2 Nov 2006
8/17
Work packages
• Grid resource allocation
• Grid workflow management
• Tool to allocate correlator resources and schedule correlation
• Data flow from telescopes to appropriate correlator resources
• Expertise from the Poznan group in Virtual Laboratories
• Will this application fit on Grid?
• As it is very data intensive
• And time-critical if not real-time
• Software correlation
• correlator algorithm design
• High precision correlation on standard computing
• Scalable to cluster computers
• Portable for grid computers and interfaced to standard
middleware
• Interactive visualization and output definition
huib 02/11/06
• Collect & merge data in EVN archive
• Standard format and proprietary rights
GiGaPort meeting SURF Utrecht 2 Nov 2006
9/17
Basic idea
•Use the Grid for correlation
•CPU cycles on compute nodes
•The Net could be crossbar switch?
•Correlation will be asynchronous
huib 02/11/06
•Based on floating point arithmetic
•Portable code, standard environment
GiGaPort meeting SURF Utrecht 2 Nov 2006
10/17
Workflow Management
• Must interact with normal VLBI schedules
• Divide data, route to compute nodes, setup correlation
• Dynamic resource allocation, keep up with incoming data!
Effort from Poznan, based on their Virtual Lab.
huib 02/11/06
GiGaPort meeting SURF Utrecht 2 Nov 2006
11/17
Topology
•Slice in time
• Every node gets an interval
• A “new correlator” for every time
slice
• Employ clusters computers at
nodes
• Minimizes total data transport
• Bottleneck at compute node
• Probably good connectivity at
Grid nodes anyway
• Scales perfectly
• Easily estimated how many
nodes are needed
• Works with heterogeneous
nodes
• But leaves sorting to compute
nodes
• Memory access may limit
effectiveness
huib 02/11/06
GiGaPort meeting SURF Utrecht 2 Nov 2006
•Slice in baseline
• Assign a (or a range of)
products to a certain node
• E.g. two data streams meet in
some place
• Transport Bottleneck at
sources (telescopes)
• Maybe curable with multicast
transport mechanism which
forks at network nodes
• Some advantage when local
nodes at telescopes
• Does not scale very simply
• Simple schemes for ½N2 nodes
• Need to re-sort output
• But reduces the compute
problem
• Using the network as the
cross-bar switch
12/17
Work packages
• Grid resource allocation
• Grid workflow management
• Tool to allocate correlator resources and schedule correlation
• Data flow from telescopes to appropriate correlator resources
• Expertise from the Poznan group in Virtual Laboratories
• Will this application fit on Grid?
• As it is very data intensive
• And time-critical if not real-time
• Software correlation
• correlator algorithm design
• High precision correlation on standard computing
• Scalable to cluster computers
• Portable for grid computers and interfaced to standard
middleware
• Interactive visualization and output definition
huib 02/11/06
• Collect & merge data in EVN archive
• Standard format and proprietary rights
GiGaPort meeting SURF Utrecht 2 Nov 2006
13/17
Broadband software correlation
Station 1
Station 2
Station N
EVN Mk4 equivalents
Raw data BW=16 MHz,
Mk4 format on Mk5 disk
From Mk5 to linux disk
Raw data 16 MHz,
Mk4 format on linux disk
DIM,TRM,
CRM
Channel extraction
Extracted data
SU
Pre-calculated,Delay tables
DCM,DMM,
FR
Delay corrections
Delay corrected data
Correlator
Chip
Correlation. SFXC
huib 02/11/06
Data Product
GiGaPort meeting SURF Utrecht 2 Nov 2006
14/17
Better SNR than Mk4 hardware
huib 02/11/06
GiGaPort meeting SURF Utrecht 2 Nov 2006
15/17
Software correlation
•Working on benchmarking
• Single core processors so far
• Different CPU’s available
SFX correlator: measuring CPU on single core
Auto and Cross correlations
4000
3500
• Already quite efficient
2500
CPU time (s)
• More work on memory
performance
3000
jop32
2000
pcint
cedar
1500
1000
•Must deploy on cluster
computers
500
0
0
4
8
12
16
20
24
28
32
36
40
44
number of stations
•And then on Grid
SFX correlator:CPU contributions
4000
3500
2500
CPU time (s)
•Organize the output to be
used for astronomy
3000
cedar
FFT only
2000
I/O only
FFT Auto
1500
1000
huib 02/11/06
500
0
0
4
8
12
16
20
24
28
32
36
40
44
number of stations
GiGaPort meeting SURF Utrecht 2 Nov 2006
16/17
Side step: Data intensive processing
•Radio-astronomy can be extreme
•User data sets can be large
• Few – 100 GB now
• Larger: LOFAR, eVLBI, APERTIF, SKA
• All data enter imaging
• Iterative calibration schemes
• Few operations per Byte
•Parallel computing: not obviously
suited for messaging systems
• Task (data oriented) parallelization
• Processing traditionally done
interactively on user platform
• More and more pipeline approaches
•Addressed in RadioNet
• Project ALBUS
• resulted in Python for AIPS
• Looking for extension in FP7
huib 02/11/06
• Interoperability with ALMA, LOFAR
• But for user domain
NRI eSciences
2 Nov 2006
17/8
Goal of the project
• Develop: methods for high data rate e-VLBI using
distributed correlation
• High data rate eVLBI data acquisition and transport
• Develop a scalable prototype for broadband data acquisition
•Prototype acquisition system
• Establish a transportation protocol for broadband e-VLBI
•Build into prototype, establish interface normal system
• Interface e-VLBI public networks with LOFAR and e-MERLIN
dedicated networks
•Correlate wide band Onsala data on eMERLIN
•Demonstrate LOFAR connectivity
• Distributed correlation
• Setup data distribution over Grid
•Workflow management tool
• Develop a software correlator
huib 02/11/06
•Run a modest distributed eVLBI experiment
GiGaPort meeting SURF Utrecht 2 Nov 2006
19/17
Current eVLBI practice
observing schedule
in VEX format
BBC &
samplers
user correlator
parameters
field system
controls antenna
and acquisition
correlator control
including model
calculation
Mk4
formatter
Mk5
recorder
earth orientation
parameters
output
data
Mk4 data
in Mk5prop form
over TCPIP
huib 02/11/06
GiGaPort meeting SURF Utrecht 2 Nov 2006
Mk5
playback
20/17
FABRIC components
observing schedule
in VEX format
DBBC
VSI
PC-EVN
#2
field system
controls antenna
and acquisition
huib 02/11/06
VSIe??
on??
GiGaPort meeting SURF Utrecht 2 Nov 2006
GRID
resources data
user correlator
parameters
earth orientation
parameters
resource allocation
and routing
correlator control
including model
calculation
FABRIC
=
The GRID
output
data
21/17