Object Orie’d Data Analysis, Last Time • SiZer Analysis – Zooming version, -- Dependent version – Mass flux data, -- Cell cycle data • Image Analysis –

Download Report

Transcript Object Orie’d Data Analysis, Last Time • SiZer Analysis – Zooming version, -- Dependent version – Mass flux data, -- Cell cycle data • Image Analysis –

Object Orie’d Data Analysis, Last Time
• SiZer Analysis
– Zooming version,
-- Dependent version
– Mass flux data,
-- Cell cycle data
• Image Analysis
– 1st Generation
-- 2nd Generation
• Object Representation
– Landmarks
– Boundaries
– Medial
OODA in Image Analysis
First Generation Problems:
•
Denoising
•
Segmentation
•
Registration
(find object boundaries)
(align objects)
(all about single images)
OODA in Image Analysis
Second Generation Problems:
•
Populations of Images
– Understanding Population Variation
– Discrimination (a.k.a. Classification)
•
Complex Data Structures (& Spaces)
•
HDLSS Statistics
Image Object Representation
Major Approaches for Images:
•
Landmark Representations
•
Boundary Representations
•
Medial Representations
Landmark Representations
Landmarks for fly wing data:
Landmark Representations
Major Drawback of Landmarks:
•
Need to always find each landmark
•
Need same relationship
•
I.e. Landmarks need to correspond
•
Often fails for medical images
•
E.g. How many corresponding landmarks
on a set of kidneys, livers or brains???
Boundary Representations
Major sets of ideas:
•
Triangular Meshes
–
•
Active Shape Models
–
•
Survey: Owen (1998)
Cootes, et al (1993)
Fourier Boundary Representations
–
Keleman, et al (1997 & 1999)
Boundary Representations
Example of triangular mesh rep’n:
From:www.geometry.caltech.edu/pubs.html
Boundary Representations
Main Drawback:
Correspondence
•
For OODA (on vectors of parameters):
Need to “match up points”
•
Easy to find triangular mesh
–
•
Lots of research on this driven by gamers
Challenge to match mesh across objects
–
There are some interesting ideas…
Medial Representations
Main Idea:
Represent Objects as:
• Discretized skeletons (medial atoms)
• Plus spokes from center to edge
• Which imply a boundary
Very accessible early reference:
• Yushkevich, et al (2001)
Medial Representations
2-d M-Rep Example:
Corpus Callosum
(Yushkevich)
Medial Representations
2-d M-Rep Example:
Corpus Callosum
(Yushkevich)
Atoms
Spokes
Implied
Boundary
Medial Representations
3-d M-Rep Example: From Ja-Yeon Jeong
Bladder – Prostate - Rectum
Atoms - Spokes - Implied Boundary
Medial Representations
3-d M-reps: there are several variations
Two choices:
From
Fletcher
(2004)
Medial Representations
Statistical Challenge
• M-rep parameters are:
– Locations  2 , 3
0
– Radii
– Angles (not comparable)
• Stuffed into a long vector
• I.e. many direct products of these
Medial Representations
Statistical Challenge:
• How to analyze angles as data?
• E.g. what is the average of:




3 , 4 , 358 , 359

– 181 ??? (average of the numbers)

–
1 (of course!)
• Correct View of angular data:
Consider as points on the unit circle
Medial Representations
What is the average (181o?) or (1o?) of:

3,

4,

358 ,

359
Medial Representations
Statistical Challenge
• Many direct products of:
– Locations  2 , 3
– Radii
0
– Angles (not comparable)
• Appropriate View:
Data Lie on Curved Manifold
Embedded in higher dim’al Eucl’n Space
Medial Representations
Data on Curved Manifold Toy Example:
Medial Representations
Data on Curved Manifold Viewpoint:
• Very Simple Toy Example (last movie)
• Data on a Cylinder = 1  S 1
• Notes:
–
–
–
–
–
•
Simplest non-Euclidean Example
3
2-d data, embedded on manifold in R
Can flatten the cylinder, to a plane
Have periodic representation
Movie by: Suman Sen
Same idea for more complex direct prod’s
A Challenging Example
•
Male Pelvis
–
–
–
•
Bladder – Prostate – Rectum
How do they move over time (days)?
Critical to Radiation Treatment (cancer)
Work with 3-d CT
–
–
–
Very Challenging to Segment
Find boundary of each object?
Represent each Object?
Male Pelvis – Raw Data
One CT Slice
(in 3d image)
Tail Bone
Rectum
Prostate
Male Pelvis – Raw Data
Prostate:
manual
segmentation
Slice by slice
Reassembled
Male Pelvis – Raw Data
Prostate:
Slices:
Reassembled in 3d
How to represent?
Thanks: Ja-Yeon Jeong
Object Representation
•
Landmarks (hard to find)
•
Boundary Rep’ns (no correspondence)
•
Medial representations
–
Find “skeleton”
–
Discretize as “atoms” called M-reps
3-d m-reps
Bladder – Prostate – Rectum
(multiple objects, J. Y. Jeong)
•
Medial Atoms provide “skeleton”
•
Implied Boundary from “spokes”  “surface”
3-d m-reps
M-rep model fitting
•
Easy, when starting from binary (blue)
•
But very expensive (30 – 40 minutes technician’s time)
•
Want automatic approach
•
Challenging, because of poor contrast, noise, …
•
Need to borrow information across training sample
•
Use Bayes approach: prior & likelihood  posterior
•
~Conjugate Gaussians, but there are issues:
•
Major HLDSS challenges
•
Manifold aspect of data
Mildly Non-Euclidean Spaces
Statistical Analysis of M-rep Data
Recall: Many direct products of:
• Locations
• Radii
• Angles
I.e. points on smooth manifold
Data in non-Euclidean Space
But only mildly non-Euclidean
Mildly Non-Euclidean Spaces
Good source for statistical analysis of
Mildly non-Euclidean Data
Fletcher (2004), Fletcher, et al (2004)
Main ideas:
• Work with geodesic distances
• I.e. distances along surface of manifold
Mildly Non-Euclidean Spaces
What is the mean of data on a manifold?
• Bad choice:
–
–
–
•
•
Mean in embedded space
Since will probably leave manifold
Think about unit circle
How to improve?
Approach study characterizations of mean
–
–
There are many
Most fruitful: Frechét mean
Mildly Non-Euclidean Spaces
Fréchet mean of numbers:
n
X  arg min   X i  x 
x
2
i 1
Fréchet mean in Euclidean Space:
X  arg min  X i  x  arg min  d X i , x 
n
x
i 1
n
2
x
2
i 1
Fréchet mean on a manifold:
Replace Euclidean d by Geodesic d
Mildly Non-Euclidean Spaces
Fréchet Mean:
• Only requires a metric (distance) space
• Geodesic distance gives geodesic mean
Well known in robust statistics:
• Replace Euclidean distance
2
1
• With Robust distance, e.g. L with L
• Reduces influence of outliers
• Gives another notion of robust median
Mildly Non-Euclidean Spaces
E.g. Fréchet Mean for data on a circle
Mildly Non-Euclidean Spaces
E.g. Fréchet Mean for data on a circle:
• Not always easily interpretable
–
–
–
•
Think about “distances along arc”
2

Not about “points in
”
Sum of squared distances “strongly feels the largest”
Not always unique
–
–
–
But unique “with probability one”
Non-unique requires strong symmetry
But possible to have many means
Mildly Non-Euclidean Spaces
E.g. Fréchet Mean for data on a circle:
• Not always sensible notion of center
–
–
•
Not continuous Function of Data
–
–
•
•
Sometimes prefer “top & bottom”?
At end: farthest points from data
Jump from 1 – 2
Jump from 2 – 8
All false for Euclidean Mean
But all happen generally for manifold data
Mildly Non-Euclidean Spaces
E.g. Fréchet Mean for data on a circle:
• Also of interest is Fréchet Variance:
n
1
2
2
ˆ  min  d X i , x 
x n
i 1
•
•
•
Works like sample variance
Note values in movie, reflecting spread in data
Note theoretical version:
  min E X d  X , x 
2
2
x
•
Useful for Laws of Large Numbers, etc.
Mildly Non-Euclidean Spaces
Useful Viewpoint for data on manifolds:
• Tangent Space
• Plane touching at one point
• At which point?
Geodesic (Fréchet) Mean
Hence terminology “mildly non-Euclidean”
(pic next page)
Mildly Non-Euclidean Spaces
Pics from:
Fletcher (2004)
Mildly Non-Euclidean Spaces
“Exponential Map” Terminology:
From Complex Exponential Function


Exponential Map:
i
  e
 cos   i sin 
In Tangent Space
On Manifold
Mildly Non-Euclidean Spaces
Exponential Map Terminology
Memory Trick:
• Exponential Map
Tangent Plane  Curved Manifold
•
Log Map (Inverse)
Curved Manifold  Tangent Plane
Mildly Non-Euclidean Spaces
Analog of PCA?
Principal geodesics (PGA):
• Replace line that best fits data
• By geodesic that best fits the data
(geodesic through Fréchet mean)
• Implemented as PCA in tangent space
• But mapped back to surface
• Fletcher (2004)
PGA for m-reps, BladderProstate-Rectum
Bladder – Prostate – Rectum, 1 person, 17 days
PG 1
PG 2
(analysis by Ja Yeon Jeong)
PG 3
PGA for m-reps, BladderProstate-Rectum
Bladder – Prostate – Rectum, 1 person, 17 days
PG 1
PG 2
(analysis by Ja Yeon Jeong)
PG 3
PGA for m-reps, BladderProstate-Rectum
Bladder – Prostate – Rectum, 1 person, 17 days
PG 1
PG 2
(analysis by Ja Yeon Jeong)
PG 3
Mildly Non-Euclidean Spaces
Other Analogs of PCA???
• Why pass through geodesic mean?
• Sensible for Euclidean space
• But obvious for non-Euclidean?
Perhaps “geodesic that explains data as well
as possible” (no mean constraint)?
• Does this add anything?
• All same for Euclidean case
(since least squares fit contains mean)
Mildly Non-Euclidean Spaces
E.g. PGA on the unit sphere:
Unit Sphere
Data
Mildly Non-Euclidean Spaces
E.g. PGA on the unit sphere:
Unit Sphere
Data
Geodesic Mean
Mildly Non-Euclidean Spaces
E.g. PGA on the unit sphere:
Unit Sphere
Data
Geodesic Mean
PG 1
Mildly Non-Euclidean Spaces
E.g. PGA on the unit sphere:
Unit Sphere
Data
Geodesic Mean
PG 1
Best Fit Geodesic
Mildly Non-Euclidean Spaces
E.g. PGA on the unit sphere:
Which is “best”?
• Perhaps best fit?
• What about PG2?
–
•
What about PG3?
–
–
•
Should go through geo mean?
Should cross PG1 & PG2 at same point?
Need constrained optimization
Gaussian Distribution on Manifold???