Transcript Slide 1

Software correlators as testbeds
Rapid evaluation and prototyping of RFI algorithms
Adam Deller
NRAO Socorro
March 31, 2010
Atacama Large Millimeter/submillimeter Array
Expanded Very Large Array
Robert C. Byrd Green Bank Telescope
Very Long Baseline Array
Outline
• Why use in-correlator techniques for RFI
detection and mitigation?
• Software correlators:
– Key attributes and advantages
– Applications in RFI algorithm development
• Test case: Kurtosis-based detection of RFI
• Conclusions
2
Why in-correlator techniques?
• Higher time resolution data
• Easier to identify impulsive/short time duration
RFI
• Removal of affected areas can potentially lead
to less data loss
• Other techniques such as Field of View (FOV)
shaping require modifying data on timescales
shorter than 1 integration
3
EVLA example
Time
Frequency
Aircraft radar
(12s period)
4
Software correlators
• Correlation algorithm is coded in a high-level
language such as C++, runs on commodity
machines (nowadays, multicore rackmount
servers)
• Rapid and inexpensive to develop
• Widely used in VLBI (DiFX correlator used by
LBA,VLBA, MPIfR Bonn, …)
• Key point here: quick/easy to modify
5
Software correlators
The hardware used
for the VLBA DiFX
software correlator in
Socorro; 5 x dual
motherboard, dual
CPU quad cores
(process 10 stations x
128 MHz b/w in real
time)
6
The DiFX architecture
DataStream 1
(~100ms)
Baseband data
processing buffer
Core 1
processing buffer
processing buffer
DataStream 2
Source data …
Core 2
…
DataStream N
Core M
All processing
done in
floats
Large, segmented ring buffer
Up to 100s MB/
a few or more seconds
Timerange, destination
Visibilities
Master Node
Visibility buffer
Visibility buffer
Visibility buffer
Interconnect is commodity ethernet (Message Passing Interface).
Optimised C vector libraries used for speed-up.
7
The DiFX architecture
• FX style correlator
• Requires only a couple of libraries and a C++
compiler - easy to get going
• DiFX reads baseband data from a file or
network stream (Mk4, VLBA, LBA,VDIF*)
• Output: Produces FITS-IDI files (easy path to
AIPS/CASA)
* Aside:VDIF is a simple but general packet-based format,
hopefully convergence here - for specs see
www.vlbi.org/vsi/docs/VDIF specification Release 1.0 ratified.pdf
8
DiFX info
• Google group: http://groups.google.com/group/difxusers?hl=en
• Wiki: http://cira.ivec.org/dokuwiki/doku.php/difx/start
• SVN codebase: https://svn.atnf.csiro.au/trac/difx/
9
Kurtosis analysis
• Basically measures the peakedness of pdf of a
time varying quantity - equal to 3 for normally
distributed quantities
• Impulsive RFI leads to a pdf with many outliers
and a kurtosis value >> 3
• Subtle differences between real time domain
data and complex frequency domain data (Nita,
earlier)
10
Kurtosis analysis
• I applied kurtosis analysis to channelized (postFFT) data from each antenna (not crosscorrelations)
• Easy to calculate - just need 2nd
(autocorrelation) and 4th (autocorrelation^2)
central moments of the quantity of interest
• Easiest to maintain moments about the origin 1
- 4, convert at desired duty cycle
De Roo (2009), IEEE Trans. Geosc. Rem. Sens.
11
Implementing kurtosis in DiFX
• Allocate a few extra arrays and make a few
extra function calls to calculate moments about
the origin
• Convert to central moments and calculate
kurtosis at the end of every subintegration
• Getting results out is no hassle (maybe unlike
clocked h/w system with less I/O)
• Total development time: 1.5 hours
12
Kurtosis results
• LL polarisation shown at 1/3 real time (2
second integrations, 100ms kurtosis calc)
A “normal”
RFI-free
band from
one station
16
Kurtosis results
• LL polarisation shown at 1/3 real time (2
second integrations, 100ms kurtosis calc)
Same band
at Hancock,
where the
RFI is
clearly
much worse
17
Implementing kurtosis flagging
• Basing flagging on a kurtosis threshold is trivial!
One line of code to zero any affected channels
in all baselines to given antenna
• Currently dumping at the “subintegration”
timescale - usually of order 20ms
• Easy to integrate further downstream; DiFX
also has a feature to manipulate data on
timescales shorter than 1 subintegration
18
Work to do
• Correctly calculate expected value of kurtosis
for 2 bit quantized input data after
channelization
• Test the effect of kurtosis-based clipping on the
interferometer output (imaging statistics)
• Test the implementation on a connectedelement system like the EVLA (which can
produce VDIF output suitable for DiFX)
19
Other RFI algorithm possibilities
• FOV shaping; weighting subintegration (or
smaller) chunks of visibility data to improve
correlator FOV “filter”
• Other thresholding or kurtosis on
autocorrelations or crosscorrelations
• With VLBI (or potentially EVLA) data we can
record the baseband and test RFI algorithms
many times in a controlled way
20
Conclusions
• High time resolution RFI detection and/or
rejection in interferometers is an interesting
and worthwhile pursuit
• Software correlators make testing “incorrelator” algorithms much easier!
• A simple kurtosis-based RFI auto-flagger will be
made available in DiFX
• Plenty of scope for further development
21
Questions?
22
The use of multiple FOVs
• By repeating this operation multiple times one
can generate an arbitrary number of “pencil
beams” (as CPU memory permits)
primary beam
• The overhead is small
compared to the cost
Not
of correlating the data:
to
generating 100s of pencil
scale!!
beams only requires ~3x
the compute power
uv-shifted
“pencil” fields
23
Directing the survey
• Low-resolution radio data can provide a fluxcomplete sample to be surveyed:
• The known
primary
empty
beam
space is
ignored!
Random
QuickTime™ and a
decompressor
are needed to see this picture.
cutout
from the
NRAO
FIRST
survey
24
Datastream correlation flow
•Start time
•Valid samples
•Num sent
•MPI_Send * handle
•Lock
Read thread
Data buffer
Requested
time sent
to Core
“Segment”
“Send”
Send thread
FFT = 2x num channels
…..
Core in pictures
Baseband data from
each telescope
Read/send thread
Subint visibilities
Subint slot
Mode objects for
each datastream
Core
object
Baseband data
pointer
unpacked data
Intermediatiate data
Proc. thread
XMAC
Thread visibilities
Final data for XMAC
Repeated for
each subband