Transcript Slide 1
Software correlators as testbeds Rapid evaluation and prototyping of RFI algorithms Adam Deller NRAO Socorro March 31, 2010 Atacama Large Millimeter/submillimeter Array Expanded Very Large Array Robert C. Byrd Green Bank Telescope Very Long Baseline Array Outline • Why use in-correlator techniques for RFI detection and mitigation? • Software correlators: – Key attributes and advantages – Applications in RFI algorithm development • Test case: Kurtosis-based detection of RFI • Conclusions 2 Why in-correlator techniques? • Higher time resolution data • Easier to identify impulsive/short time duration RFI • Removal of affected areas can potentially lead to less data loss • Other techniques such as Field of View (FOV) shaping require modifying data on timescales shorter than 1 integration 3 EVLA example Time Frequency Aircraft radar (12s period) 4 Software correlators • Correlation algorithm is coded in a high-level language such as C++, runs on commodity machines (nowadays, multicore rackmount servers) • Rapid and inexpensive to develop • Widely used in VLBI (DiFX correlator used by LBA,VLBA, MPIfR Bonn, …) • Key point here: quick/easy to modify 5 Software correlators The hardware used for the VLBA DiFX software correlator in Socorro; 5 x dual motherboard, dual CPU quad cores (process 10 stations x 128 MHz b/w in real time) 6 The DiFX architecture DataStream 1 (~100ms) Baseband data processing buffer Core 1 processing buffer processing buffer DataStream 2 Source data … Core 2 … DataStream N Core M All processing done in floats Large, segmented ring buffer Up to 100s MB/ a few or more seconds Timerange, destination Visibilities Master Node Visibility buffer Visibility buffer Visibility buffer Interconnect is commodity ethernet (Message Passing Interface). Optimised C vector libraries used for speed-up. 7 The DiFX architecture • FX style correlator • Requires only a couple of libraries and a C++ compiler - easy to get going • DiFX reads baseband data from a file or network stream (Mk4, VLBA, LBA,VDIF*) • Output: Produces FITS-IDI files (easy path to AIPS/CASA) * Aside:VDIF is a simple but general packet-based format, hopefully convergence here - for specs see www.vlbi.org/vsi/docs/VDIF specification Release 1.0 ratified.pdf 8 DiFX info • Google group: http://groups.google.com/group/difxusers?hl=en • Wiki: http://cira.ivec.org/dokuwiki/doku.php/difx/start • SVN codebase: https://svn.atnf.csiro.au/trac/difx/ 9 Kurtosis analysis • Basically measures the peakedness of pdf of a time varying quantity - equal to 3 for normally distributed quantities • Impulsive RFI leads to a pdf with many outliers and a kurtosis value >> 3 • Subtle differences between real time domain data and complex frequency domain data (Nita, earlier) 10 Kurtosis analysis • I applied kurtosis analysis to channelized (postFFT) data from each antenna (not crosscorrelations) • Easy to calculate - just need 2nd (autocorrelation) and 4th (autocorrelation^2) central moments of the quantity of interest • Easiest to maintain moments about the origin 1 - 4, convert at desired duty cycle De Roo (2009), IEEE Trans. Geosc. Rem. Sens. 11 Implementing kurtosis in DiFX • Allocate a few extra arrays and make a few extra function calls to calculate moments about the origin • Convert to central moments and calculate kurtosis at the end of every subintegration • Getting results out is no hassle (maybe unlike clocked h/w system with less I/O) • Total development time: 1.5 hours 12 Kurtosis results • LL polarisation shown at 1/3 real time (2 second integrations, 100ms kurtosis calc) A “normal” RFI-free band from one station 16 Kurtosis results • LL polarisation shown at 1/3 real time (2 second integrations, 100ms kurtosis calc) Same band at Hancock, where the RFI is clearly much worse 17 Implementing kurtosis flagging • Basing flagging on a kurtosis threshold is trivial! One line of code to zero any affected channels in all baselines to given antenna • Currently dumping at the “subintegration” timescale - usually of order 20ms • Easy to integrate further downstream; DiFX also has a feature to manipulate data on timescales shorter than 1 subintegration 18 Work to do • Correctly calculate expected value of kurtosis for 2 bit quantized input data after channelization • Test the effect of kurtosis-based clipping on the interferometer output (imaging statistics) • Test the implementation on a connectedelement system like the EVLA (which can produce VDIF output suitable for DiFX) 19 Other RFI algorithm possibilities • FOV shaping; weighting subintegration (or smaller) chunks of visibility data to improve correlator FOV “filter” • Other thresholding or kurtosis on autocorrelations or crosscorrelations • With VLBI (or potentially EVLA) data we can record the baseband and test RFI algorithms many times in a controlled way 20 Conclusions • High time resolution RFI detection and/or rejection in interferometers is an interesting and worthwhile pursuit • Software correlators make testing “incorrelator” algorithms much easier! • A simple kurtosis-based RFI auto-flagger will be made available in DiFX • Plenty of scope for further development 21 Questions? 22 The use of multiple FOVs • By repeating this operation multiple times one can generate an arbitrary number of “pencil beams” (as CPU memory permits) primary beam • The overhead is small compared to the cost Not of correlating the data: to generating 100s of pencil scale!! beams only requires ~3x the compute power uv-shifted “pencil” fields 23 Directing the survey • Low-resolution radio data can provide a fluxcomplete sample to be surveyed: • The known primary empty beam space is ignored! Random QuickTime™ and a decompressor are needed to see this picture. cutout from the NRAO FIRST survey 24 Datastream correlation flow •Start time •Valid samples •Num sent •MPI_Send * handle •Lock Read thread Data buffer Requested time sent to Core “Segment” “Send” Send thread FFT = 2x num channels ….. Core in pictures Baseband data from each telescope Read/send thread Subint visibilities Subint slot Mode objects for each datastream Core object Baseband data pointer unpacked data Intermediatiate data Proc. thread XMAC Thread visibilities Final data for XMAC Repeated for each subband