Transcript Document
Algorithms for Active Region Identification and Tracking Michael Turmon JPL/Caltech Work partly supported by NASA Applied Information Systems Research (AISR), SOHO GI, Heliophysics SR&T Core Functions Two functions are relevant for HMI pipeline and HP KB: Identification – – – – Find objects in (magnetogram,intensitygram) pairs Bayesian approach maximizing posterior probability of labeling Family of methods by many researchers Largely a “solved problem” for active regions in photosphere Tracking – – – – Link identified ARs in a series of images Grouping of regions in labeling into ARs Single-link most likely tracker (optimization-based association) Data flow and object representation issues TBD for HMI Original motivation: Link visible activity to irradiance Other uses: Image masks, computational focus-of-attention, activity catalogs, subsetting to cope with large data volumes 2 Identification: Finding the Best Labeling • Bayesian approach: maximize a posterior probability having two terms – Trade off fidelity to data vs. spatial coherence log Pr(class mask | obs. images) = constant + log Pr(obs. images | class mask) + log Pr(class mask) • Likelihood: Probability of a certain observed (field,intensity) given activity type: quiet Sun, facula, sunspot – Use a Gaussian mixture model to parameterize likelihood • Prior: Enforces spatial smoothness of labeling to disambiguate cases near the class boundary – Spatial coherence by Markov random field (“MRF”) model • Find mask via discrete optimization of class mask given observation 3 Identification: Integrating Multimode Imagery Q Light Intensity F Flexible, general methods using statistical models to identify objects in images S 1: Experts identify classes in sample images Magnetic Field Labeling by inferred statistical model Magnetogram Photogram 2: Learned mixture model performs classification automatically Key: S(pot) F(acula) Q(uiet sun) Q F Q N S S Labeling • Can not distinguish classes from just one observable • Select mixture model using sample images labeled by scientists – One mixture model per class – To classify, compute each class’s probability under its mixture – Move beyond ad hoc threshold rules to allow arbitrary class separators 4 Identification: Results Turmon, Pap, Mukhtar, “Statistical Pattern Recognition for Labeling Solar Active Regions: Application to SoHO/MDI Imagery,” ApJ, 2002, 396-407. 5 Identification: Software Capability • Core numerical code is in C with thin data access API – – – – Command line interface uses cfitsio for data access Matlab interface uses the same code with different Makefile Bourne shell drivers for sequencing, error-handling, metadata Metadata creation/propagation: time, geometry, quality, preproc. info • MDI: automatic classification of all available full-disk synoptic (magnetogram,intensitygram) pairs from 1996-2007 – Models already exist for MDI data – Ported into a runnable module on the MDI data pipeline (“PUI”) • Kitt Peak: calibration and test of classifier on SPM/VSM data – NASA SR&T, with Harry Jones, Kitt Peak – SPM data taken 1992-2003 – Three useful observables (field, intensity + equivalent width) 6 Identification: Issues Being Worked • Outputs – Square, full-disk mask images in coordinates of observations – Bounding boxes for ARs • Carrington longitude + colatitude • Obs. time disambiguates short-lived regions sharing CMP time • Cadence and volume – Approximate cadence of 1 mask/12 mins. – Try for 4Kx4K masks, option to reduce to 1Kx1K – Will not re-map observed images to find mask • Tuning – HMI LOS field and intensity are simultaneous – The intensity proxy is being defined. Need to remove LD. – Region models will need to be re-fit: HMI noise ≠ MDI noise • Region models do capture “error bars” on classes 7 Components of “Tracking” • Identification (just discussed) • Grouping – Group separated features into AR – Formal literature on this is not well-developed – Use a simple template-based method • Association – Construct 1:1 map from previous AR set to next AR set. • Chained together, you have a track. – Criterion: maximize cumulative area of overlap – Heuristics to “look harder” for new or dying ARs • Naming – Link a track to a name like NOAA AR#9077 8 Active Region Tracking: Grouping into ARs • Activity mask = a set of pixels – Grouping into NOAA-like AR’s is not trivial – Connected components insufficient • Take a matched-filter type approach – Convolve AR mask with a Gaussian kernel – Threshold – AR groups are within basins 50 Mm • Devilish Details – Gaussian in 3D pixel-pixel distance; stretched longitudinally; FWHM ~50x25Mm (~40x20 MDI pixel) at disk center – Convolution on sphere to treat the limb fairly – AR masks sparse: fast convolution (MDI: 1.7s) – More cleverness is possible, e.g. polarity 9 Active Region Tracking: Grouping Example MDI Labeling Identified Groups 2002 Sep. 02, 11:11 UTC Convolved with Template 10 Active Region Tracking: Association • Associate ARs in before and after images Before B After A • Correlation-based tracker – Standard latitude-dependent motion model – Use area of overlap of AR bitmaps on the sphere – Overlap between a in A and b in B is D(a,b) • Solve assignment problem to match A up to B: with P a permutation matrix giving the B-to-A mapping – Fast, exact solution by linear programming – Slack variables account for new or dead ARs 11 Magnetogram Labeling Active Region Tracking: Results 12 Active Region Tracking: Software Capability • Adapted tracker parameters to reproduce NOAA sunspot groups – Core of algorithm is in C + cfitsio, but sequencing is in Matlab – Sequencing for HMI in C, sh, Python? • Automatically tracked all ARs over 1996-2002 – 2200 days, 10000 frames, 2100 ARs – Better than 1% error rate • Regions erroneously combined or regions missed – Code allows merging of small groups and follows remnants • Strategies for reducing error rate – Increased cadence – Better region merge and split – Polarity in AR grouping to deal better with tiny bipolar ARs • There are four, three-month periods of high-cadence data from MDI – Full-disk images every minute – Near-HMI temporal resolution: 1 image/min 13 Active Region Tracking: Issues Being Worked • Outputs – Data series: AR cutouts in original coordinates – Data cubes: Remapped ARs in space and time – Region summaries for pipeline modules or HP KB • Region ID (HMI and NOAA) • Bounding box (CaRot + CoLat + TObs, as for instantaneous AR) • Bitmap (not simply connected, no “cookie cutter” shape) • Tuning and Details – Cadence 1 frame/12 min. (MDI: 1 frame/360 min.) • MDI is a very approximate analog – Grouping algorithm needs tuning for HMI (smaller ARs) – Improvement via obvious accuracy metrics • Unexplained AR overlaps: unmatched D(a,b) in earlier slide • Excessive AR area changes over time 14 15 References M. Turmon, H. Jones, J. Pap, O. Malanushenko, “Statistical feature recognition for multidimensional solar imagery”, in review at Solar Physics, 2009. H. Jones, G. Chapman, K. Harvey, J. Pap, D. Preminger, M. Turmon, S. Walton, “A comparison of feature classification methods for modeling solar irradiance variation”, Solar Physics, 2007. The mixture modeling work appeared in: Mixtures-2001, “Recent Developments in Mixture Modelling,” Hamburg Compstat-2004, Prague, as “Symmetric Normal Mixtures” Earlier work: J. Pap, H. Jones, M. Turmon & L. Floyd, “Study of the SOHO/VIRGO Irradiance Variations using MDI and Kitt Peak images,” Proc. SOHO-11 Workshop, Davos, 2002. H.P. Jones, M. Turmon, et al. “A comparison of feature classification methods for modeling solar irradiance variation,” 34th COSPAR Scientific Assembly, 2002. L. Gyori, T. Baranyi, M. Turmon & J.M. Pap, "Comparison of image-processing methods to extract sunspots,” Proc. SOHO-11 Workshop, Davos, 2002. 16