Transcript Document

Algorithms for
Active Region Identification and Tracking
Michael Turmon
JPL/Caltech
Work partly supported by
NASA Applied Information Systems Research (AISR),
SOHO GI, Heliophysics SR&T
Core Functions
Two functions are relevant for HMI pipeline and HP KB:
Identification
–
–
–
–
Find objects in (magnetogram,intensitygram) pairs
Bayesian approach maximizing posterior probability of labeling
Family of methods by many researchers
Largely a “solved problem” for active regions in photosphere
Tracking
–
–
–
–
Link identified ARs in a series of images
Grouping of regions in labeling into ARs
Single-link most likely tracker (optimization-based association)
Data flow and object representation issues TBD for HMI
Original motivation: Link visible activity to irradiance
Other uses: Image masks, computational focus-of-attention, activity
catalogs, subsetting to cope with large data volumes
2
Identification: Finding the Best Labeling
• Bayesian approach: maximize a posterior probability having two terms
– Trade off fidelity to data vs. spatial coherence
log Pr(class mask | obs. images) = constant +
log Pr(obs. images | class mask) + log Pr(class mask)
• Likelihood: Probability of a certain observed (field,intensity) given
activity type: quiet Sun, facula, sunspot
– Use a Gaussian mixture model to parameterize likelihood
• Prior: Enforces spatial smoothness of labeling to disambiguate cases
near the class boundary
– Spatial coherence by Markov random field (“MRF”) model
• Find mask via discrete optimization of class mask given observation
3
Identification: Integrating Multimode Imagery
Q
Light
Intensity
F
Flexible, general methods using statistical
models to identify objects in images
S
1: Experts identify classes
in sample images
Magnetic
Field
Labeling by inferred
statistical model
Magnetogram
Photogram
2: Learned mixture model
performs classification
automatically
Key:
S(pot)
F(acula)
Q(uiet sun)
Q
F
Q
N
S
S
Labeling
• Can not distinguish classes from just one observable
• Select mixture model using sample images labeled by scientists
– One mixture model per class
– To classify, compute each class’s probability under its mixture
– Move beyond ad hoc threshold rules to allow arbitrary class separators
4
Identification: Results
Turmon, Pap, Mukhtar, “Statistical Pattern Recognition for Labeling Solar Active Regions:
Application to SoHO/MDI Imagery,” ApJ, 2002, 396-407.
5
Identification: Software Capability
• Core numerical code is in C with thin data access API
–
–
–
–
Command line interface uses cfitsio for data access
Matlab interface uses the same code with different Makefile
Bourne shell drivers for sequencing, error-handling, metadata
Metadata creation/propagation: time, geometry, quality, preproc. info
• MDI: automatic classification of all available full-disk synoptic
(magnetogram,intensitygram) pairs from 1996-2007
– Models already exist for MDI data
– Ported into a runnable module on the MDI data pipeline (“PUI”)
• Kitt Peak: calibration and test of classifier on SPM/VSM data
– NASA SR&T, with Harry Jones, Kitt Peak
– SPM data taken 1992-2003
– Three useful observables (field, intensity + equivalent width)
6
Identification: Issues Being Worked
• Outputs
– Square, full-disk mask images in coordinates of observations
– Bounding boxes for ARs
• Carrington longitude + colatitude
• Obs. time disambiguates short-lived regions sharing CMP time
• Cadence and volume
– Approximate cadence of 1 mask/12 mins.
– Try for 4Kx4K masks, option to reduce to 1Kx1K
– Will not re-map observed images to find mask
• Tuning
– HMI LOS field and intensity are simultaneous
– The intensity proxy is being defined. Need to remove LD.
– Region models will need to be re-fit: HMI noise ≠ MDI noise
• Region models do capture “error bars” on classes
7
Components of “Tracking”
• Identification
(just discussed)
• Grouping
– Group separated features into AR
– Formal literature on this is not well-developed
– Use a simple template-based method
• Association
– Construct 1:1 map from previous AR set to next AR set.
• Chained together, you have a track.
– Criterion: maximize cumulative area of overlap
– Heuristics to “look harder” for new or dying ARs
• Naming
– Link a track to a name like NOAA AR#9077
8
Active Region Tracking: Grouping into ARs
• Activity mask = a set of pixels
– Grouping into NOAA-like AR’s is not trivial
– Connected components insufficient
• Take a matched-filter type approach
– Convolve AR mask with a Gaussian kernel
– Threshold
– AR groups are within basins
50 Mm
• Devilish Details
– Gaussian in 3D pixel-pixel distance; stretched longitudinally;
FWHM ~50x25Mm (~40x20 MDI pixel) at disk center
– Convolution on sphere to treat the limb fairly
– AR masks sparse: fast convolution (MDI: 1.7s)
– More cleverness is possible, e.g. polarity
9
Active Region Tracking: Grouping Example
MDI Labeling
Identified Groups
2002 Sep. 02, 11:11 UTC
Convolved with Template
10
Active Region Tracking: Association
• Associate ARs in before
and after images
Before
B
After
A
• Correlation-based tracker
– Standard latitude-dependent
motion model
– Use area of overlap of AR bitmaps on the sphere
– Overlap between a in A and b in B is D(a,b)
• Solve assignment problem to match A up to B:
with P a permutation matrix giving the B-to-A mapping
– Fast, exact solution by linear programming
– Slack variables account for new or dead ARs
11
Magnetogram
Labeling
Active Region Tracking: Results
12
Active Region Tracking: Software Capability
• Adapted tracker parameters to reproduce NOAA sunspot groups
– Core of algorithm is in C + cfitsio, but sequencing is in Matlab
– Sequencing for HMI in C, sh, Python?
• Automatically tracked all ARs over 1996-2002
– 2200 days, 10000 frames, 2100 ARs
– Better than 1% error rate
• Regions erroneously combined or regions missed
– Code allows merging of small groups and follows remnants
• Strategies for reducing error rate
– Increased cadence
– Better region merge and split
– Polarity in AR grouping to deal better with tiny bipolar ARs
• There are four, three-month periods of high-cadence data from MDI
– Full-disk images every minute
– Near-HMI temporal resolution: 1 image/min
13
Active Region Tracking: Issues Being Worked
• Outputs
– Data series: AR cutouts in original coordinates
– Data cubes: Remapped ARs in space and time
– Region summaries for pipeline modules or HP KB
• Region ID (HMI and NOAA)
• Bounding box (CaRot + CoLat + TObs, as for instantaneous AR)
• Bitmap (not simply connected, no “cookie cutter” shape)
• Tuning and Details
– Cadence 1 frame/12 min. (MDI: 1 frame/360 min.)
• MDI is a very approximate analog
– Grouping algorithm needs tuning for HMI (smaller ARs)
– Improvement via obvious accuracy metrics
• Unexplained AR overlaps: unmatched D(a,b) in earlier slide
• Excessive AR area changes over time
14
15
References
M. Turmon, H. Jones, J. Pap, O. Malanushenko, “Statistical feature recognition for
multidimensional solar imagery”, in review at Solar Physics, 2009.
H. Jones, G. Chapman, K. Harvey, J. Pap, D. Preminger, M. Turmon, S. Walton, “A
comparison of feature classification methods for modeling solar irradiance
variation”, Solar Physics, 2007.
The mixture modeling work appeared in:
Mixtures-2001, “Recent Developments in Mixture Modelling,” Hamburg
Compstat-2004, Prague, as “Symmetric Normal Mixtures”
Earlier work:
J. Pap, H. Jones, M. Turmon & L. Floyd, “Study of the SOHO/VIRGO
Irradiance Variations using MDI and Kitt Peak images,” Proc. SOHO-11
Workshop, Davos, 2002.
H.P. Jones, M. Turmon, et al. “A comparison of feature classification methods
for modeling solar irradiance variation,” 34th COSPAR Scientific Assembly,
2002.
L. Gyori, T. Baranyi, M. Turmon & J.M. Pap, "Comparison of image-processing
methods to extract sunspots,” Proc. SOHO-11 Workshop, Davos, 2002.
16