Unsupervised Learning with Permuted Data

Transcript Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data

Sergey Kirshner Sridevi Parise Padhraic Smyth School of Information and Computer Science University of California, Irvine www.datalab.uci.edu

ICML 2003 © Sergey Kirshner, UC Irvine

x 1 -0.64

Permutation Problem

x 2 x 3 2.76

3.01

vector 1 vector 2 -0.52

1.05

2.60

vector 3 vector 4 -0.17

-0.46

0.79

-2.86

1.05

0.88

vector 5 vector 6 -0.43

1.60

2.23

-0.14

-0.02

1.28

vector 7 -0.54

-1.93

-0.15

ICML 2003 © Sergey Kirshner, UC Irvine

vector 1 vector 2 vector 3 vector 4 vector 5 vector 6 vector 7 x 1 -0.64

Permutation Problem

x 2 x 3 x 1 2.76

3.01

vector 1 3.01

x 2 x 3 2.76

-0.64

-0.52

1.05

2.60

vector 2 -0.52

2.60

1.05

-0.17

-0.46

0.79

vector 3 0.79

-0.17

-0.46

-2.86

1.05

0.88

vector 4 1.05

0.88

-2.86

-0.43

1.60

2.23

vector 5 -0.43

1.60

2.23

-0.14

-0.02

1.28

vector 6 -0.14

1.28

-0.02

-0.54

-1.93

-0.15

vector 7 -1.93

-0.15

-0.54

ICML 2003 © Sergey Kirshner, UC Irvine

Permutation Problem

x 1 ?

vector 1 3.01

vector 2 -0.52

vector 3 x 2 x 3 2.76

-0.64

2.60

1.05

0.79

-0.17

-0.46

vector 4 1.05

0.88

-2.86

vector 5 -0.43

1.60

2.23

vector 6 -0.14

1.28

-0.02

vector 7 -1.93

-0.15

-0.54

ICML 2003 © Sergey Kirshner, UC Irvine

Motivational Example

VLA FIRST Survey http://sundog.stsci.edu

ICML 2003 © Sergey Kirshner, UC Irvine

Which Mapping Is the Right One?

core core lobe 1 lobe 2 lobe 2 lobe 1 lobe 1 core lobe 2 lobe 2 lobe 1 core lobe 2 core lobe 1 lobe 1 lobe 2 core ICML 2003 © Sergey Kirshner, UC Irvine

Permutation Problem

x 1 • Can we learn what permutations were applied?

• Can we learn the probability distribution which generated the data?

• If the distribution for the permuted data is known, how difficult is it to find the correct permutation?

vector 1 vector 2 vector 3 vector 4 vector 5 vector 6 vector 7 3.01

-0.52

0.79

-0.17

-0.46

1.05

-0.43

-0.14

x 2 x 3 2.76

-0.64

2.60

0.88

-2.86

1.60

1.05

2.23

1.28

-0.02

-1.93

-0.15

-0.54

ICML 2003 © Sergey Kirshner, UC Irvine

Related Work

• Image analysis – point correspondence problem (Gold

et al

, 1995) – image transformation learning (Frey & Jojic, 2003) • Information extraction – text field positions (McCallum

et al

, 2000) ICML 2003 © Sergey Kirshner, UC Irvine

What’s New?

• Previous work – problem-specific algorithms • Our contributions – analysis of the difficulty of the general problem   Bayes error rate for permutations bounded above by classification BER  specific results for Gaussian data – comments on learning ICML 2003 © Sergey Kirshner, UC Irvine

Generative Model

) ICML 2003 © Sergey Kirshner, UC Irvine

Generative Model

)

Generative Model

Mixture Model

How Hard is the Problem?

• Need measure of difficulty for a problem – How often does the optimal decision rule make a mistake?

Marginal Probability Distributions

2 ) p(

Corresponding Marginal Problem

• Marginal (projection) probability distribution – component overlap problem – Bayes-optimal error rate ICML 2003 © Sergey Kirshner, UC Irvine

Key

Main Result

•

Theorem:

If the set of allowed permutations contains a key, and all allowed permutations are equally likely to be selected, • Why is this important?

– little overlap implies easy permutation problem – high overlap?

Special Cases

• Consider low-dimensional special cases – 2-dimensional Gaussians • Find out what factors into the difficulty – m 1 , m 2 , s 1 2 , s 2 2 determine overlap Bayes error rate – permutation Bayes error rate also depends on correlation n /( s 1 s 2 ) ICML 2003 © Sergey Kirshner, UC Irvine

High Overlap Bayes Error Rate

High Permutation Bayes Error Rate

Low Permutation Bayes Error Rate

Learning

• Solve as other mixture model problems: Expectation Maximization (EM) – treat permutation as a hidden variable • Identifiability – distribution

(

) resulting in a given distribution

(

) may not be unique!

Learning to Rotate Galaxies

Kirshner

et al

Summary

• Framework for the permuted data problem • Analysis of the optimal error rate – upper bound by optimal error rate of related problem • Special case analysis – closed-form expressions – importance of correlation • Parameter estimation when parameters are unknown ICML 2003 © Sergey Kirshner, UC Irvine

Future Work

• Identifiability – What distributions are identifiable?

– Do permutation make identifiability problem different from ordinary mixtures?

• What to do with large number of permutations?

Acknowledgements

• Funding – NSF – DOE • Datalab @ UCI

http://www.datalab.uci.edu