Conditional Chow-Liu Tree Structures for Modeling Discrete

Download Report

Transcript Conditional Chow-Liu Tree Structures for Modeling Discrete

Learning with Tree-averaged
Densities and Distributions
Sergey Kirshner
Alberta Ingenuity Centre for Machine Learning,
Department of Computing Science,
University of Alberta, Canada
NIPS 2007
Poster W12
December 5, 2007
Overview
• Want to fit density to complete multivariate
data
• New density estimation model based on
averaging over tree-dependence structures
– Distribution = Univariate Marginals + Copula
– Bayesian averaging over tree-structured copulas
– Efficient parameter estimation for tree-averaged
copulas
• Can solve problems with 10-30 dimensions
NIPS 2007
Learning with Tree-averaged Densities and Distributions
2
Most Popular Distribution…
• Interpretable
• Closed under taking
marginals
• Generalizes to
multiple dimensions
• Models pairwise
dependence
• Tractable
• 245 pages out of 691
from Continuous
0.2
0.15
0.1
0.05
0
3
2
3
1
2
0
1
-1
-1
-2
-3
0
-2
-3
Multivariate
Distributions by Kotz,
Balakrishnan, and
Johnson
NIPS 2007
Learning with Tree-averaged Densities and Distributions
3
What If the Data Is NOT Gaussian?
NIPS 2007
Learning with Tree-averaged Densities and Distributions
4
Curse of Dimensionality
[Bellman 57]
0.2
0.15
0.1
0.05
0
3
1/n
2
3
1
2
0
1
-1
-1
-2
-3
0
-2
-3
1/n
nd cells
NIPS 2007
V[-2,2]d ≈ 0.9545d
Learning with Tree-averaged Densities and Distributions
5
Avoiding the Curse: Step 1
Separating Univariate Marginals
univariate marginals,
independent variables,
NIPS 2007
multivariate dependence term,
copula
Learning with Tree-averaged Densities and Distributions
6
Monotonic Transformation of the
Variables
NIPS 2007
Learning with Tree-averaged Densities and Distributions
7
Copula
Copula C is a multivariate distribution (cdf) defined on a unit
hypercube with uniform univariate marginals:
NIPS 2007
Learning with Tree-averaged Densities and Distributions
8
Sklar’s Theorem
[Sklar 59]
=
NIPS 2007
+
Learning with Tree-averaged Densities and Distributions
9
Example: Bivariate Gaussian Copula
NIPS 2007
Learning with Tree-averaged Densities and Distributions
10
Useful Properties of Copulas
• Preserves concordance between the variables
– Rank-based measure of dependence
• Preserves mutual information
• Can be viewed as a canonical form of a
multivariate distribution for the purpose of
the estimation of multivariate dependence
NIPS 2007
Learning with Tree-averaged Densities and Distributions
11
Copula Density
NIPS 2007
Learning with Tree-averaged Densities and Distributions
12
Separating Univariate Marginals
1. Fit univariate marginals (parametric or nonparametric)
2. Replace data points with cdf’s of the
marginals
3. Estimate copula density
Inference for the margins [Joe and Xu 96]; canonical maximum likelihood [Genest et al 95]
NIPS 2007
Learning with Tree-averaged Densities and Distributions
13
What Next?
• Aren’t we back to square one?
– Still estimating multivariate density from data
• Not quite
– All marginals are fixed
– Lots of approaches for copulas
• Vast majority focus on bivariate case
– Design models that use only pairs of variables
NIPS 2007
Learning with Tree-averaged Densities and Distributions
14
Tree-Structured Densities
x1
x2
x6
x3
x5
x4
NIPS 2007
Learning with Tree-averaged Densities and Distributions
15
Tree-Structured Copulas
NIPS 2007
Learning with Tree-averaged Densities and Distributions
16
Chow-Liu Algorithm (for Copulas)
a1
a2
a1
a2
a4
a3
a4
a3
A1A2
A1A3
A1A4
A2A3
A2A4
A3A4
NIPS 2007
c(a1,a2)
c(a1,a3)
c(a1,a4)
c(a2,a3)
c(a2,a4)
c(a3,a4)
Learning with Tree-averaged Densities and Distributions
0.3126
0.0229
0.0172
0.0230
0.0183
0.2603
17
Distribution over Spanning Trees
[Meilă and Jaakkola 00, 06]
a1
b12
a2
b13
b14
b23
b24
a4
b34
a3
O(d3) !!!
NIPS 2007
Learning with Tree-averaged Densities and Distributions
18
Tree-Averaged Copula
• Can compute sum over all dd-2 spanning trees
• Can be viewed as a mixture over many, many
spanning trees
• Can use EM to estimate the parameters
– Even though there are dd-2 mixture components!
NIPS 2007
Learning with Tree-averaged Densities and Distributions
19
EM for Tree-Averaged Copulas
• E-step: compute
Intractable!!!
– Can be done in O(d3) per data point
• M-step: update b and Q
– Update of Q is often linear in the number of points
• Gaussian copula: solving cubic equation
– Update of b is essentially iterative scaling
• Can be done in O(d3) per iteration
NIPS 2007
Learning with Tree-averaged Densities and Distributions
20
Experiments: Log-Likelihood on Test
Data
UCI ML Repository
MAGIC data set
12000 10-dimensional
vectors
2000 examples in test
sets
Average over 10
partitions
NIPS 2007
Learning with Tree-averaged Densities and Distributions
21
Binary-Continuous Data
NIPS 2007
Learning with Tree-averaged Densities and Distributions
22
Summary
• Multivariate distribution = univariate marginals +
copula
• Copula density estimation via tree-averaging
– Closed form
• Tractable parameter estimation algorithm in ML
framework (EM)
– O(Nd3) per iteration
• Only bivariate distributions at each estimation
– Potentially avoiding the curse of dimensionality
• New model for multi-site rainfall amounts (POSTER
W12)
NIPS 2007
Learning with Tree-averaged Densities and Distributions
23