Transcript slides
David Andrzejewski, Univ. of Wisconsin-Madison (USA)
David G. Stork, Ricoh Innovations, Inc. and Stanford Univ. (USA)
Xiaojin Zhu, Univ. of Wisconsin-Madison (USA)
Ron Spronk, Queen's Univ. (Canada)
1
Visual arts
Digital authentication of Bruegel, Perugino
(Lyu et al, 2004)
Jackson Pollock
(Taylor, 1999)
(Irfan and Stork, 2009)
Writings
Authorship of the Federalist Papers
(Mosteller and Wallace, 1964)
Ronald Reagan’s radio addresses
(Airoldi et al, 2007)
2
http://www.artchive.com
Haags Gemeentemuseum, The Hague
3
4
Better understand compositional style
1. Develop a formal representation of the paintings
2. Extract these representations from paintings
3. Train a generative model
4. Learn relative visual weights of colors
5. Classify true Mondrians versus
1. “fakes” created by the generative model in step 3
2. “earlier states” of the Transatlantic paintings
5
•Vertical/horizontal lines
• locations
• extents
• Rectangles
• locations
• sizes
• colors
•can span multiple lines
6
7
8
Hypothesize an underlying probabilistic model
that generates observed data
Many uses in machine learning
Make predictions (Naïve Bayes)
Generate new examples (Markov model)
Interpret parameter values (Linear regression)
Given data, learn/train model parameters
Our approach: Maximum likelihood estimation (MLE)
9
Canvas aspect ratios (kernel density estimator)
10
Number of horiz/vert lines (Poisson)
Horiz/vert line spacing (Dirichlet)
11
Segments are deleted / invisible / left alone (Polya)
12
Rectangle colors (Multinomial)
13
Don’t allow unrealistic “hanging” lines
Require ≥ 1 vertical line
14
Rectangle color
Multinomial probability
White
0.754
Red
0.085
Yellow
0.062
Blue
0.065
Black
0.034
Line type
Spacing Dirichlet
Vertical
1.80
Horizontal
1.61
15
Calculate visual “center of mass”
Assume true Mondrians centered at [0.5,0.5]
Learn color weights via linear programming
Red
Yellow
Blue
Black
0.237
0.143
0.227
0.392
16
Completed in Europe, but then altered after
Mondrian’s arrival in the United States
A variety of techniques (x-ray, UV, etc) were
used to recover the earlier states
(Cooper & Spronk, 2001 )
17
Composition with Red, Blue, and Yellow (1937-1942)
18
Composition with Red, Yellow, and Blue (1935-1942)
19
No. 9 (1939-1942)
20
Very popular technique in machine learning
At each iteration, choose a rule to “split” on
Resulting partitions should be more “pure” with
respect to target classification
(true Mondrian or computer-generated fake?)
Key feature: resulting trees easy to interpret
Estimate accuracy with leave-one-out crossvalidation
Control over-fitting with pruning
21
45 true Mondrians versus 45 generated “fakes”
Classifier
Accuracy
Majority baseline
50%
Decision tree (no pruning)
70%
Decision tree (with pruning)
68%
45 true Mondrians versus 11 “earlier states”
Classifier
Accuracy
Majority baseline
81%
Decision tree (no pruning)
72%
Decision tree (with pruning)
75%
22
Analysis of results
Transatlantic dataset
< 1% pixels blue
# horiz / # vert < 0.9
Low visual “density”
THEN Transatlantic
23
Formal representation and feature extraction
Generative model
Fitting simple statistics of Mondrians cannot create
realistic synthetic paintings
Color weights align well with our intuitions
Classification
Can reliably discriminate true Mondrians vs. computer-
generated
Cannot do so for true Mondrians vs Transatlantic “earlier
states”
▪ Underlying images were “nearly complete” (!)
24