Presentation (PPTX)

Download Report

Transcript Presentation (PPTX)

Web Image Prediction Using
Multivariate Point Processes
Gunhee Kim1 Li Fei-Fei2
1:
Eric P. Xing1
School of Computer Science, Carnegie Mellon University
2: Computer Science Department, Stanford University
August 14, 2012
1
Outline
• Problem Statement
• Method
 Multivariate Point Process + Poisson Regression
 Full model of Intensity Function
 Learning and Prediction
 Personalization
• Experiments
• Conclusion
2
Outline
• Problem Statement
• Method
 Multivariate Point Process + Poisson Regression
 Full model of Intensity Function
 Learning and Prediction
 Personalization
• Experiments
• Conclusion
3
Problem Statement - Web Image Prediction
A photo stream of world+cup from Flickr up to 12/31/2008.
Each image is associated with meta-data (timestamp, owner ID).
Can we guess what photos will appear on the Flickr at tq = 6/6/2009?
Actual images at tq
Collective
Image prediction
Actual images by uq at tq
Personalized
Image prediction
4
Why is Image Prediction Interesting?
Predicting User Behaviors on the Web
User behavior on the Web changes over time.
(1) Keyword search
• What query terms are popular?
• What documents are most relevant?
• What documents are likely to be clicked?
(2) News recommendation
(3) Product search
Few previous work on what images people are interested in.
• [D08] Dakka et al. CIKM 2008
• [V11] Amodeo et al, CIKM2011
• [M09] Metzler et al. SIGIR 2009 • [R12] Radinsky et al, WWW 2012
• [K10] Kulkani et al, WSDM 2011
5
Why is Image Prediction Interesting?
Time-sensitive Image Reranking
Submit the term world+cup into Google/Bing/Flickr engines
Google
Bing
• Severely redundant. Almost identical all year long.
Increase diversity by temporal trends
Flickr
• Any meaningful order?
Ranking by temporal suitability
Why is Image Prediction Interesting?
Time-sensitive Image Reranking
Time-sensitive image reranking
For tq = Jun. 23 (summer)
For tq = Feb. 5 (winter)
Personalized Time-sensitive image reranking
For tq = Aug. 23 and uq = 15655191
Relation to Previous Work
Web Content Dynamics
Similar Image Retrieval
• Text based method [A11,W06]
• Semantic meaning of keyword +
feature-wise similarity
• Image-based method [K10]
 No image prediction
 No personalization
• [D11, P08, T08]
 Temporal trends + user histories
Image based
Collaborative Filtering
Leveraging Web Photos
to Infer Missing Information
• Social trends in politics and
market [J10]
• Spatio-temporal events [S10]
• Scene completion [H07]
• 3D models of landmarks [SN10]
• Semantic image hierarchy [L10]
 Images: source of prediction
not subject of prediction
 Future images: not studied as
missing info to be inferred.
• [A11] Ahmed et al. AISTAT11 • [D11] Deng et al. CVPR 11 • [J10] Jin et al. MM10
• [SN10] Snavely et al. IEEE10
• [W06] Wang et al. KDD06
• [P08] Dhilbin et al. CVPR08 • [S10] Singh et al. MM10
• [L10] Li et al. CVPR10
8
• [K10] Kim et al, ECCV10
• [T08] Torralba et al. PAMI08 • [H07] Hayes et al. SIGGRAPH07
Summary of Contribution
Collective and Personalized Web Image Prediction
Few previous work for large-scale Web images.
(1) Predicting user behaviors on the Web
(2) Time-sensitive image reranking
(2) News recommendation
Algorithm based on multivariate point process
Novel in image retrieval literature
Flexibility, optimality, scalability, and prediction accuracies
More than 10 million images of 40 topics
Outperform baselines (PageRank based IR, Topic modeling)
9
Outline
• Problem Statement
• Method
 Multivariate Point Process + Poisson Regression
 Full model of Intensity Function
 Learning and Prediction
 Personalization
• Experiments
• Conclusion
10
Multivariate Point Process (MPP)
A stochastic process that consists of a series of random events
in time and spaces.
Neural spiking modeling
Ecology
Locations of Lauraceae trees
[Moller et al. 2008]
[Brown et al. Nat.Neuro.04]
Geology
Micro-earthquake data
[Schoenberg]
Computer Vision
Statistical Model
for spatiotemporal events
Events in video
Crowd counting [Ge et al.CVPR08] [Prabhakar et al. CVPR10]
11
MPP for Image Streams
An occurrence of a particular
image at a particular time
=
A point in time
and image space
A short stream of penguin images
v1 : ice hockey
v2 : animal penguin
v3 : snowy mountain
Each image is associated with (visual cluster, timestamp)
Discrete-time trivariate PP
12
Mathematical Formulation for MPP
A short stream of penguin images
Intensity function for VC i at t
P[N i (t + D) - N i (t) =1]
l = lim
D®0
D
i
Infinitesimal expected occurrence
rate of visual cluster i at time t
Covariates: any likely factors to be associated with image occurrences
(ex. Time, season, and other external events)
The intensity function is represented by exponential of linear
covariate functions.
J
log l (tk | q ) = åq f (x1, , xL )
i
i
i i
j j
j=1
q i = {q1i , , q Ji } : Parameter set
f ji (x1, , xL ) : covariate function
13
MLE solution for MPP
A short stream of penguin images
Log-likelihood of an observed stream
K
i
1:K
l(N
K
| q ) = å log(l (tk | q )D)DN - å l i (tk | q i )D
i
i
i
i
k
k=1
k=1
Parametric form of intensity functions with covariates
J
log l i (tk | q i ) = åq ij f ji (x1, , xL )
Poisson
regression
Globally-optimal
solution
j=1
MLE solution q i* can tell which covariates are contributing for the
occurrence of visual cluster i
14
Sparse MLE solution for MPP
A short stream of penguin images
Log-likelihood of an observed stream
J
i
m
q
l(N | q ) = å log(l (tk | q )D)DN - å l (tk | q )D
å j
K
i
1:K
i
K
i
i
k=1
A sparse solution is encouraged
i
k
i
k=1
i
j=1
L1 (Lasso) penalty
For each visual cluster, only a small number of strong factors
affect image occurrence.
MLE solution: Cyclic coordinate descent [Friedman et al. 2010].
15
A Toy Example of Image Prediction
Covariates: only year and months
log l i (tk | q i ) = q 0i +
2009
12
å q I (t ) + åq
i
y y
y=2003
(1 +
7
k
i
m m
I (tk )
m=1
+ 12 = 20 parameters)
Shark example
N
1
Peaked in summer
(Sea tour)
Peaked in January
N 2 (Ice hockey)
Observed occurrence data
Every year
Every month
16
Outline
• Problem Statement
• Method
 Multivariate Point Process + Poisson Regression
 Full model of Intensity Function
 Learning and Prediction
 Personalization
• Experiments
• Conclusion
17
Full Model of Intensity Functions
P
log l (tk | q ) = a 0 + åa p DN(tk - pd, tk - (p -1)d)
+
p=1
M
R
å åb DN
c
q
c=1,c¹i q=1
12
c
(tk - qd, tk - (q -1)d) Correlation
component
Z
+åg m g(tk - m) + åg zutk -d:tk (z)
m=1
History
component
z=1
External
component
Any probable factors can be included without performance
loss because we encourage a sparse solution.
18
Full Model of Intensity Functions
P
log l (tk | q ) = a 0 + åa p DN(tk - pd, tk - (p -1)d)
+
p=1
M
R
å åb DN
c
q
c=1,c¹i q=1
12
c
(tk - qd, tk - (q -1)d) Correlation
component
Z
+åg m g(tk - m) + åg zutk -d:tk (z)
m=1
History
component
z=1
External
component
Linear autoregressive (AR) process of order P
Typical pattern of
annual periodicity
Biphasic =
bursty occurrence
19
Full Model of Intensity Functions
P
log l (tk | q ) = a 0 + åa p DN(tk - pd, tk - (p -1)d)
+
p=1
M
R
å åb DN
c
q
c=1,c¹i q=1
12
c
(tk - qd, tk - (q -1)d) Correlation
component
Z
+åg m g(tk - m) + åg zutk -d:tk (z)
m=1
History
component
z=1
External
component
Existence or absence of a VC can be a strong clue.
Synchronized
4 months lag
20
Full Model of Intensity Functions
P
log l (tk | q ) = a 0 + åa p DN(tk - pd, tk - (p -1)d)
+
p=1
M
R
å åb DN
c
q
c
c=1,c¹i q=1
12
(tk - qd, tk - (q -1)d) Correlation
component
Z
+åg m g(tk - m) + åg zutk -d:tk (z)
m=1
Month covariate
History
component
z=1
External
component
User covariate
Note
1. Flexibly add or remove covariate functions according to
the characteristics of image topics.
2. AR can be replaced by a more general temporal model
such as ARMA.
21
Outline
• Problem Statement
• Method
 Multivariate Point Process + Poisson Regression
 Full model of Intensity Function
 Learning and Prediction
 Personalization
• Experiments
• Conclusion
22
Learning and Prediction
Learning
Prediction
1. Figure out covariates
for intensity function
J
log l (tk | q ) = åq ij f ji (x1, , xL )
i
i
j=1
For each visual cluster (VC) i,
2. Observe the actual
occurrence of VC i
Given a topic keyword and tq,
1. Gather covariates info for tq.
2. Compute intensity function
for each VC i,
J
log l (tq | q ) = åq i*j f ji (x1, , xL )
i
i*
j=1
3. Sample L images according to
l i (tq | q i* ),
i = {1, , M}
3. Compute MLE solution q i* by
using cyclic coordinate descent.
O(MJT), only once offline
30 min (with soccer topic of 810K images)
O(MJ), for each tq
<< 1 sec
M: No. of VCs
J: No. of covariates
T: No. of time steps
M: = 200, J = 118, T = 1,500
23
Outline
• Problem Statement
• Method
 Multivariate Point Process + Poisson Regression
 Full model of Intensity Function
 Learning and Prediction
 Personalization
• Experiments
• Conclusion
24
Personalization
Idea of locally-weighted Learning [Atkeson et al.97]
Collective Image prediction
Each image is equally
weighted
Personalized Image prediction
For a user u6
Each image is weighted according to the user similarity with u6
Learning is more
biased.
25
Outline
• Problem Statement
• Method
 Multivariate Point Process + Poisson Regression
 Full model of Intensity Function
 Learning and Prediction
 Personalization
• Experiments
• Conclusion
26
Flickr Dataset
10,284,945 images of 40 topic keywords
7 groups
Nations
Places
Animals
Objects
Activities
Abstract
Hot topics
Ex. Soccer dataset
Seasonal variation
Zipf’s law
27
Experimental Tasks
Split the dataset into training/test sets
Randomly pick tq
Training data + image DB
2010
12/31/2008
Collective Image prediction
L Predicted images
Personalized Image prediction
Timeline
±1 days
Randomly chosen 20 tq per topic
Positive test images
Randomly chosen 20 (tq,uq) pairs
28
Evaluation Measures
Actual images and predicted images are more then hundreds.
How can we compare them?
(1) Two distance metrics : Lower is better
L2
Tiny [Torralba et al. 2008]
2
*
*
*
2
*
*
*
SIFT/HOG
Resize 32x32 images
(2) Average precision: higher is better.
Using predicted images
Rank positive/negative test images
29
Quantitative Results
Baselines
Sampling from ImageNet
Semantic meaning only
Collective Image prediction
PageRank based IR
State-of-the-art retrieval
Author-Time topic model
Generative topic model
Personalized Image prediction
7~8% higher than
the best baseline.
30
Examples of Collective Image Prediction
World+cup
(a) Jan.
(b) May
(c) Sep.
Cardinals
Ski+skating
(a) Jan.
Football /
Snow
Bicycle+kayak+soccer
(b) May
Baseball / Leafy, Eggs
Soccer world cup
(c) Sep.
Baseball / Leafy
31
Examples of Personalized Image Prediction
Fine+art
(a) User1
(b) User2
(c) User3
Brazilian
Painting
(a) User1
Flower
Class
(b) User2
Dance
Photography
(c) User3
Auto-racing
32
Outline
• Problem Statement
• Method
 Multivariate Point Process + Poisson Regression
 Full model of Intensity Function
 Learning and Prediction
 Personalization
• Experiments
• Conclusion
33
Conclusion
What’s done
Web image prediction
(1) User behavior prediction
(2) Time-sensitive image reranking
Example code will be available !
Poisson regression on multivariate point process
Observations
Many topics are associated with predictable periodic events.
Image-based Personalization is important.
More delicate information about user preference over texts
Ex. What styles of painting does user A like?
34