Transcript Presented

Nonparametric Latent Feature Models
for Link Prediction
Kurt T. Miller, Thomas L. Griffiths, Michael I. Jordan
NIPS 2009
Presented by Minhua Chen, 06.04.2010.
Problem Formulation
• Link prediction in Social Network (Binary Matrix completion)
Y =
1
0
?
?
0
1
?
0
1
?
1
?
0
1
0
1
Yij = 1: person i is linked to person j.
Yij = 0: person i is not linked to person j.
Yij = ?: unobserved entry to be filled in.
• Linkage can stand for different relations, e.g., friends or not, colleagues or not.
• If the network is a directed graph, then Y can be asymmetric.
• Observed entries + auxiliary information (optional)  unobserved entries
Methods
• Class-based model
Entities are clustered into classes.
Linkage is determined by which classes they belong to.
Models: Infinite Relational Model (IRM)
Mixed Membership Stochastic Blockmodel (MMSB)
Disadvantage: clustering description is too coarse, not expressive.
• Latent-feature model
Interactions between latent-features determine the linkage.
This paper extends it to a nonparametric model using IBP.
Number of latent features can be inferred as well as their interactions.
Model
•
•
•
•
Define Z to be a binary N*K matrix with N people and K latent features.
Define W to be a K*K weighting matrix for the K latent features.
The model is
Or expressed in more details:
Results on Synthetic Data
(c) Ground truth of Z
(d) Generated Y
(e) Inferred Z
Although the missing values are imputed correctly, the inferred Z is different
from ground truth. This indicates that the model is unidentifiable.
Results on Multi-Task Data
•
•
•
The Countries data contains 54 relation matrices among 14 countries,
along with 90 given covariates.
The Alyawarra data contains 26 kinship relationship matrices of 104 people
in the Alyawarra tribe in Central Australia.
For each dataset, 80% of the data is used for training and the rest 20% is
used for testing.
• LFRM outperforms IRM and MMSB with proper initialization.
Results on Single-Task Data
AUC performance
• 234 authors who published with the most other
people in NIPS 1-17 are used, and their
coauthorship matrix is constructed.
LFRM w/IRM
0.9509
LFRM rand
0.9466
IRM
0.8906
MMSB
0.8705