Grammar of Image Zhaoyin Jia, 03-30-2009 Problems Enormous amount of vision knowledge: …… Computational complexity Classification, Recognition 20 Semantic gap.
Download
Report
Transcript Grammar of Image Zhaoyin Jia, 03-30-2009 Problems Enormous amount of vision knowledge: …… Computational complexity Classification, Recognition 20 Semantic gap.
Grammar of Image
Zhaoyin Jia, 03-30-2009
Problems
Enormous amount of vision knowledge:
……
Computational complexity
20
Classification,
Recognition
40
60
80
100
120
20
Semantic gap
40
60
80
100
120
140
160
Task of image parsing
Objectives in this paper
Framework for vision
Algorithm for this framework
Top-down/bottom-up computation
Generalization of small sample
And-Or Graph
Use Monte Carlos simulation to
synthesis more configurations
Fill the semantic gap
Grammar
Language: co-occurance of s is more than chance
p( s | A B)
1
p( s | A) p( s | B)
Image: Parallel; T-junction
CONSTANTINOPLE
Formulation of grammar
Start symbol: S
Non-terminal nodes:VN
Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar
Start symbol: S
Non-terminal nodes:VN
Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar
Start symbol: S
Non-terminal nodes:VN
Reproduction Rule: R
Terminal nodes: VT
S
NP VP
VP
VP
VP
V
……
PP
NP
Formulation of grammar
Start symbol: S
Non-terminal nodes:VN
Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar
Start symbol: S
Non-terminal nodes:VN
Reproduction Rule: R
Terminal nodes: VT
Image grammar
Start symbol: S
Reproduction
Rules
Non-terminal
nodes:VN
Terminal nodes: VT
Overlapping parts/Ambiguity
Overlapping parts/Ambiguity
Similar color, occlusion, etc.
Stochastic Context Free Grammar
For each VN , we have reproduction rules:
with a probability associated with each one:
Probability of parsing tree:
Probability of sentence:
Stochastic Grammar with Context
From left to right: bi-gram model (Markov chain)
a sentence with n words:
Non-local relations: tree model
New issues in Image Grammar
Loss of “left to right” order: region adjacency graph
New issues in Image Grammar
Scaling makes different terminal in parsing tree
New issues in Image Grammar
Switch between texture and structure
Building the image grammar
Visual Vocabulary:
primitives, sketch graph, textons…
Relations and configurations:
co-occurance, attached, hinged, supported,
occluded…
And-or Graph representation
embedding image grammar
Learning /testing the parse graph
find the possible inference
Database
Lotus Hill Institute Dataset
Benjamin Yao, Xiong Yang, and Song-Chun Zhu, “Introduction to a large scale general purpose ground truth
dataset: methodology, annotation tool, and benchmarks.” EMMCVPR, 2007
http://www.imageparsing.com/
636,748 images, 3,927,130 Physical Objects
A few hundred are free
Free Data
http://yoshi.cs.ucla.edu/yao/data/
6 categories, 145 subsets
Manmade Object 75
Transportation 9
Nature Object 40
Objects in Scene 6
UCLA Aerial Image 5 UIUC Sport Activity 10
Outline & segmentation of the object
Free Data
http://yoshi.cs.ucla.edu/yao/data/
6 categories, 145 subsets
Manmade Object 75
Transportation 9
Nature Object 40
Objects in Scene 6
UCLA Aerial Image 5 UIUC Sport Activity 10
Segmentation of a scene (street)
Free Data
http://yoshi.cs.ucla.edu/yao/data/
6 categories, 145 subsets
Manmade Object 75
Transportation 9
Nature Object 40
Objects in Scene 6
UCLA Aerial Image 5 UIUC Sport Activity 10
Physical parts of the object
OBJECT1:truck
OBJECT1:truck
PART1:truck:body
PART2:truck:windshield
PART3:truck:headlight
PART4:truck:headlight
PART5:truck:headlight
PART6:truck:headlight
PART7:truck:rearview mirror
PART8:truck:rearview mirror
PART9:truck:rear light
PART10:truck:window
PART11:truck:frontal left wheel
PART12:truck:frontal right wheel
PART13:truck:back wheel
PART14:truck:back wheel
PART15:truck:carriage
Visual Vocabulary
The “Lego Land”
Language
Visual Vocabulary
: function of image primitives
: a) geometry transformation
b) appearance
: bond between each primitives
Visual Vocabulary
Sketch and Texture
SK NSK
I I SK I NSK
S. C. Zhu,Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,”
Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997
Primal sketch model
Sketch graph
Input image
Texture pixels
C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of
International Conference on Computer Vision,2003.
Primal sketch model
C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of
International Conference on Computer Vision,2003.
High level visual vocabulary
Cloth: collar, left/right sleeves, hands
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for
cloth modeling and sketching,” in Proceedings of IEEE Conference on
Pattern Recognition and Computer Vision, New York, June 2006
Relations and configurations
Definition of relation:
bonds: {(s, t )} S S
relations: E {(s, t; , ) : s, t S} ,
: structure, : compatibility
Three types of relations
Bonds and connections
Joints and junctions
Object interactions/semantics
Definition of configurations:
C V , E
V {Ai : Ai (( x, y;i ), i ) };
Relations
Bonds and connections
connects primitives into bigger graphs
Ebond (S ) {(ij , ij ; , )}
S {ij , i 1, 2,..., n, j 1, 2,..., n(i)}
( x, y, )
intensity/color compatibility
Relations
Joint and junctions
Relations
Object interactions
Configuration
Spatial layout of entities at a certain level
C V , E
V {Ai : Ai (( x, y;i ), i ) };
Primal sketch – parts – object – scene
Reconfigurable graphs
Treat bonds as random variables: address nodes
Inference of the configuration
Have the primal sketch of the image
Detect the ‘T-junction’
Simulated annealing to infer the Gestalt Law
Red dot: connect region
Black line: known edge
Green line: inferred connection
R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006
Reconfigurable graphs
Source image
T-junction
Inferred
connection
Layer
extraction
Ru-Xin Gao1, Tian-Fu Wu, Song-Chun Zhu, and Nong Sang, “Bayesian Inference for Layer Representation with
Mixed Markov Random Field ”
Reconfigurable graphs
R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006
And-Or Graph
Parse graph of the image pg ( pt , E )
pt: parse tree of vocabulary
E: relations
Inference the parse graph: pg* arg max p( pg | I )
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical
Report, Lotus Hill Research Institute, 2007.
And-Or Graph
Contain all the valid parse
graphs
And node, Or node, leafnode
Relation between children
of And node
Parse tree: assigning label
on Or node
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical
Report, Lotus Hill Research Institute, 2007.
And-Or Graph
Definition:
Gand or S ,VN ,VT , R, , P
and
or
V
V
V
N
VT {( x, y; ), )
V or V1And ,V2And ,...
V And VT | V1Or ,V2Or ,...
image primitives
R Em {(vs , vt ; st , st )} relations at all level
P : probability model defined on the And-Or graph
: valid configuration of terminal nodes
Stochastic Model on And-Or graph
Terminal (leaf) node: T ( pg )
And-Or node: V or ( pg ),V and ( pg)
Set of links: E ( pg )
Switch variable at Or-node: w(t )
Attributes of primitives: (t )
p( pg ; , R, )
( pg )
1
exp( ( pg ))
Z ()
v ( w(v))
vV Or ( pg )
( i , j )E ( pg )
ij (vi , v j , ij , ij )
vV and ( pg ) T ( pg )
t ( (t ))
Stochastic Model on And-Or graph
Terminal (leaf) node: T ( pg )
And-Or node: V or ( pg ),V and ( pg)
Set of links: E ( pg )
Switch variable at Or-node: w(t )
Attributes of primitives: (t )
p( pg ; , R, )
( pg )
1
exp( ( pg ))
Z ()
v ( w(v))
vV Or ( pg )
( i , j )E ( pg )
t ( (t ))
vV and ( pg ) T ( pg )
ij (vi , v j , ij , ij )
SCFG: weigh the frequency at the children of or-nodes
Stochastic Model on And-Or graph
Terminal (leaf) node: T ( pg )
And-Or node: V or ( pg ),V and ( pg)
Set of links: E ( pg )
Switch variable at Or-node: w(t )
Attributes of primitives: (t )
p( pg ; , R, )
( pg )
1
exp( ( pg ))
Z ()
v ( w(v))
vV Or ( pg )
( i , j )E ( pg )
t ( (t ))
vV and ( pg ) T ( pg )
ij (vi , v j , ij , ij )
Weigh the local compatibility of primitives (geometric and appearance)
Stochastic Model on And-Or graph
Terminal (leaf) node: T ( pg )
And-Or node: V or ( pg ),V and ( pg)
Set of links: E ( pg )
Switch variable at Or-node: w(t )
Attributes of primitives: (t )
p( pg ; , R, )
( pg )
1
exp( ( pg ))
Z ()
v ( w(v))
vV Or ( pg )
( i , j )E ( pg )
t ( (t ))
vV and ( pg ) T ( pg )
ij (vi , v j , ij , ij )
Spatial and appearance between primitives (parts or objects)
Learning And-Or Graph
p( pg ; , R, )
( pg )
vV Or ( pg )
1
exp( ( pg ))
Z ()
v (w(v))
vV and ( pg )T ( pg )
t ( (t ))
(i , j )E ( pg )
Learning the vocabulary
Learning the relation set R, given
Learning the parameters , given R and
ij (vi , v j , ij , ij )
Learning And-Or Graph
p( pg ; , R, )
( pg )
vV Or ( pg )
1
exp( ( pg ))
Z ()
v (w(v))
vV and ( pg )T ( pg )
t ( (t ))
(i , j )E ( pg )
ij (vi , v j , ij , ij )
Learning the vocabulary , and hierarchic And-Or Graph
Learning the relation set R, given
Discussed in the
paper
Learning the parameters , given R and
Learning And-Or Graph
Observation:
f ( I , pg )
Learning model:
p( pg; , R, )
Learning and Pursuing
Relation Set R:
Start from Stochastic
Context Free Graph (a)
Learn the relations that
maximally reduce the KL
divergence to the
observation (b-e)
J. Porway, Z.Y. Yao, and S. C. Zhu, “Learning an And–Or graph for modeling and recognizing object categories,” Technical
Report, Department of Statistics,2007
Learning And-Or Graph
Learning graph parameter
Approximating p( pg; , R, ) to
Similar to texture synthesis
S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy
principle and its applications to texture modeling,”
Neural Computation, vol. 9, no. 8, pp. 1627–1660,
November 1997
f ( I , pg )
Case I: Rectangle
Nodes: Rectangle
Two vanishing points, four edge direction
Rules:
F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of
International Conference on Computer Vision, Beijing,China, 2005.
Case I: Rectangle
Get the primal sketch of the scene
Find the ‘strong’ rectangular (bottom-up,
red)
Weigh (score) different hypothesis (topdown, blue)
Weight is the compatibility of the image with
the proposed rectangular (primal-sketch)
(( I ( x, y) B ( x, y))
2
k
~ exp( ( x , y )
2 2
)
Accept the best one
Do the previous 3 steps until all the
weigh is small. (negative)
F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of
International Conference on Computer Vision, Beijing,China, 2005.
Case I: Rectangle
Inference process
Case I: Rectangle
F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of
International Conference on Computer Vision, Beijing,China, 2005.
Case II: Human Cloth
Use And-Or graph to generate a matching model
Matching using the
And-or Graph
Vocabulary (training dataset)
Case II: Human Cloth
The And-Or Graph
Novel Configuration
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings
of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case II: Human Cloth
Inference process
Top-down:
refine the
matching
using the
relation
Localize
face, then
estimate
the parts of
the body
Bottom-up:
a coarse
matching of
the parts
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings
of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case II: Human Cloth
Inference result
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings
of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case II: Human Cloth
Inference result
Hands are not exactly the same: find the best matching in the dataset
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings
of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case III: Recognition
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu,
“Recursive top-down/bottomup
algorithm for object recognition,”
Technical Report, Lotus Hill Research
Institute, 2007.
Conclusion
Enormous amount of vision knowledge: (Add-Or graph)
……
Conclusion
Computational complexity :
Remain open for scheduling bottom-up/top-down procedure
Semantic Gap
Learning the And-Or Graph
Learning the vocabulary , and its attributes
After all, we are not supposed to define so many things:
ideal vision words:
20
what we have now:
40
60
80
100
120
20
40
60
80
100
120
140
160
Thank you
Zhaoyin Jia