Unsupervised Learning with Permuted Data

Download Report

Transcript Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data

Sergey Kirshner Sridevi Parise Padhraic Smyth School of Information and Computer Science University of California, Irvine www.datalab.uci.edu

ICML 2003 © Sergey Kirshner, UC Irvine

x 1 -0.64

Permutation Problem

x 2 x 3 2.76

3.01

vector 1 vector 2 -0.52

1.05

2.60

vector 3 vector 4 -0.17

-0.46

0.79

-2.86

1.05

0.88

vector 5 vector 6 -0.43

1.60

2.23

-0.14

-0.02

1.28

vector 7 -0.54

-1.93

-0.15

ICML 2003 © Sergey Kirshner, UC Irvine

vector 1 vector 2 vector 3 vector 4 vector 5 vector 6 vector 7 x 1 -0.64

Permutation Problem

x 2 x 3 x 1 2.76

3.01

vector 1 3.01

x 2 x 3 2.76

-0.64

-0.52

1.05

2.60

vector 2 -0.52

2.60

1.05

-0.17

-0.46

0.79

vector 3 0.79

-0.17

-0.46

-2.86

1.05

0.88

vector 4 1.05

0.88

-2.86

-0.43

1.60

2.23

vector 5 -0.43

1.60

2.23

-0.14

-0.02

1.28

vector 6 -0.14

1.28

-0.02

-0.54

-1.93

-0.15

vector 7 -1.93

-0.15

-0.54

ICML 2003 © Sergey Kirshner, UC Irvine

Permutation Problem

x 1 ?

vector 1 3.01

?

vector 2 -0.52

?

vector 3 x 2 x 3 2.76

-0.64

2.60

1.05

0.79

-0.17

-0.46

?

vector 4 1.05

0.88

-2.86

?

vector 5 -0.43

1.60

2.23

?

vector 6 -0.14

1.28

-0.02

?

vector 7 -1.93

-0.15

-0.54

ICML 2003 © Sergey Kirshner, UC Irvine

Motivational Example

VLA FIRST Survey http://sundog.stsci.edu

ICML 2003 © Sergey Kirshner, UC Irvine

Which Mapping Is the Right One?

core core lobe 1 lobe 2 lobe 2 lobe 1 lobe 1 core lobe 2 lobe 2 lobe 1 core lobe 2 core lobe 1 lobe 1 lobe 2 core ICML 2003 © Sergey Kirshner, UC Irvine

Permutation Problem

x 1 • Can we learn what permutations were applied?

• Can we learn the probability distribution which generated the data?

• If the distribution for the permuted data is known, how difficult is it to find the correct permutation?

?

?

?

?

?

?

?

vector 1 vector 2 vector 3 vector 4 vector 5 vector 6 vector 7 3.01

-0.52

0.79

-0.17

-0.46

1.05

-0.43

-0.14

x 2 x 3 2.76

-0.64

2.60

0.88

-2.86

1.60

1.05

2.23

1.28

-0.02

-1.93

-0.15

-0.54

ICML 2003 © Sergey Kirshner, UC Irvine

Related Work

• Image analysis – point correspondence problem (Gold

et al

, 1995) – image transformation learning (Frey & Jojic, 2003) • Information extraction – text field positions (McCallum

et al

, 2000) ICML 2003 © Sergey Kirshner, UC Irvine

What’s New?

• Previous work – problem-specific algorithms • Our contributions – analysis of the difficulty of the general problem   Bayes error rate for permutations bounded above by classification BER  specific results for Gaussian data – comments on learning ICML 2003 © Sergey Kirshner, UC Irvine

Generative Model

p(

x

) ICML 2003 © Sergey Kirshner, UC Irvine

Generative Model

p( s ) ICML 2003 © Sergey Kirshner, UC Irvine

Generative Model

ICML 2003 © Sergey Kirshner, UC Irvine

Generative Model

q(

x

) ICML 2003 © Sergey Kirshner, UC Irvine

p(

x

)

Generative Model

q(

x

) ICML 2003 © Sergey Kirshner, UC Irvine

Generative Model

q(

x

) ICML 2003 © Sergey Kirshner, UC Irvine

Generative Model

q(

x

) ICML 2003 © Sergey Kirshner, UC Irvine

Mixture Model

ICML 2003 © Sergey Kirshner, UC Irvine

How Hard is the Problem?

• Need measure of difficulty for a problem – How often does the optimal decision rule make a mistake?

• Bayes-optimal error rate • Bayes-optimal permutation error rate ICML 2003 © Sergey Kirshner, UC Irvine

Marginal Probability Distributions

p(

x

2 ) p(

x

1 ) ICML 2003 © Sergey Kirshner, UC Irvine

Corresponding Marginal Problem

• Marginal (projection) probability distribution – component overlap problem – Bayes-optimal error rate ICML 2003 © Sergey Kirshner, UC Irvine

Key

ICML 2003 © Sergey Kirshner, UC Irvine

Main Result

Theorem:

If the set of allowed permutations contains a key, and all allowed permutations are equally likely to be selected, • Why is this important?

– little overlap implies easy permutation problem – high overlap?

 could still have easy permutation problem ICML 2003 © Sergey Kirshner, UC Irvine

Special Cases

• Consider low-dimensional special cases – 2-dimensional Gaussians • Find out what factors into the difficulty – m 1 , m 2 , s 1 2 , s 2 2 determine overlap Bayes error rate – permutation Bayes error rate also depends on correlation n /( s 1 s 2 ) ICML 2003 © Sergey Kirshner, UC Irvine

High Overlap Bayes Error Rate

ICML 2003 © Sergey Kirshner, UC Irvine

High Permutation Bayes Error Rate

Correlation approaches -1 ICML 2003 © Sergey Kirshner, UC Irvine

Low Permutation Bayes Error Rate

Correlation approaches 1 ICML 2003 © Sergey Kirshner, UC Irvine

Learning

• Solve as other mixture model problems: Expectation Maximization (EM) – treat permutation as a hidden variable • Identifiability – distribution

p

(

x

) resulting in a given distribution

q

(

x

) may not be unique!

 each dimension is {0,1} ICML 2003 © Sergey Kirshner, UC Irvine

Learning to Rotate Galaxies

Kirshner

et al

, NIPS 2002 ICML 2003 © Sergey Kirshner, UC Irvine

Summary

• Framework for the permuted data problem • Analysis of the optimal error rate – upper bound by optimal error rate of related problem • Special case analysis – closed-form expressions – importance of correlation • Parameter estimation when parameters are unknown ICML 2003 © Sergey Kirshner, UC Irvine

Future Work

• Identifiability – What distributions are identifiable?

– Do permutation make identifiability problem different from ordinary mixtures?

• What to do with large number of permutations?

• Other applications ICML 2003 © Sergey Kirshner, UC Irvine

Acknowledgements

• Funding – NSF – DOE • Datalab @ UCI

http://www.datalab.uci.edu

• Sapphire Group at LLNL – Chandrika Kamath – Erick Cantú-Paz ICML 2003 © Sergey Kirshner, UC Irvine