Grammar of Image Zhaoyin Jia, 03-30-2009 Problems  Enormous amount of vision knowledge: ……  Computational complexity Classification, Recognition 20  Semantic gap.

Download Report

Transcript Grammar of Image Zhaoyin Jia, 03-30-2009 Problems  Enormous amount of vision knowledge: ……  Computational complexity Classification, Recognition 20  Semantic gap.

Grammar of Image
Zhaoyin Jia, 03-30-2009
Problems

Enormous amount of vision knowledge:
……

Computational complexity
20
Classification,
Recognition
40
60
80
100
120
20

Semantic gap
40
60
80
100
120
140
160
Task of image parsing
Objectives in this paper

Framework for vision


Algorithm for this framework


Top-down/bottom-up computation
Generalization of small sample


And-Or Graph
Use Monte Carlos simulation to
synthesis more configurations
Fill the semantic gap
Grammar

Language: co-occurance of s is more than chance
p( s | A  B)
1
p( s | A) p( s | B)

Image: Parallel; T-junction
CONSTANTINOPLE
Formulation of grammar


Start symbol: S
Non-terminal nodes:VN


Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar


Start symbol: S
Non-terminal nodes:VN


Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar


Start symbol: S
Non-terminal nodes:VN


Reproduction Rule: R
Terminal nodes: VT
S
NP VP
VP
VP
VP
V
……
PP
NP
Formulation of grammar


Start symbol: S
Non-terminal nodes:VN


Reproduction Rule: R
Terminal nodes: VT
Formulation of grammar


Start symbol: S
Non-terminal nodes:VN


Reproduction Rule: R
Terminal nodes: VT
Image grammar

Start symbol: S

Reproduction
Rules

Non-terminal
nodes:VN

Terminal nodes: VT
Overlapping parts/Ambiguity
Overlapping parts/Ambiguity

Similar color, occlusion, etc.
Stochastic Context Free Grammar

For each VN , we have reproduction rules:
with a probability associated with each one:

Probability of parsing tree:

Probability of sentence:
Stochastic Grammar with Context

From left to right: bi-gram model (Markov chain)
a sentence with n words:

Non-local relations: tree model
New issues in Image Grammar

Loss of “left to right” order: region adjacency graph
New issues in Image Grammar

Scaling makes different terminal in parsing tree
New issues in Image Grammar

Switch between texture and structure
Building the image grammar




Visual Vocabulary:
primitives, sketch graph, textons…
Relations and configurations:
co-occurance, attached, hinged, supported,
occluded…
And-or Graph representation
embedding image grammar
Learning /testing the parse graph
find the possible inference
Database

Lotus Hill Institute Dataset
Benjamin Yao, Xiong Yang, and Song-Chun Zhu, “Introduction to a large scale general purpose ground truth
dataset: methodology, annotation tool, and benchmarks.” EMMCVPR, 2007
http://www.imageparsing.com/

636,748 images, 3,927,130 Physical Objects

A few hundred are free
Free Data
http://yoshi.cs.ucla.edu/yao/data/

6 categories, 145 subsets
Manmade Object 75
Transportation 9

Nature Object 40
Objects in Scene 6
UCLA Aerial Image 5 UIUC Sport Activity 10
Outline & segmentation of the object
Free Data
http://yoshi.cs.ucla.edu/yao/data/

6 categories, 145 subsets
Manmade Object 75
Transportation 9

Nature Object 40
Objects in Scene 6
UCLA Aerial Image 5 UIUC Sport Activity 10
Segmentation of a scene (street)
Free Data
http://yoshi.cs.ucla.edu/yao/data/

6 categories, 145 subsets
Manmade Object 75
Transportation 9

Nature Object 40
Objects in Scene 6
UCLA Aerial Image 5 UIUC Sport Activity 10
Physical parts of the object
OBJECT1:truck
OBJECT1:truck
PART1:truck:body
PART2:truck:windshield
PART3:truck:headlight
PART4:truck:headlight
PART5:truck:headlight
PART6:truck:headlight
PART7:truck:rearview mirror
PART8:truck:rearview mirror
PART9:truck:rear light
PART10:truck:window
PART11:truck:frontal left wheel
PART12:truck:frontal right wheel
PART13:truck:back wheel
PART14:truck:back wheel
PART15:truck:carriage
Visual Vocabulary

The “Lego Land”

Language
Visual Vocabulary



: function of image primitives
: a) geometry transformation
b) appearance
: bond between each primitives
Visual Vocabulary

Sketch and Texture
  SK   NSK
I  I SK  I NSK


S. C. Zhu,Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,”
Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997
Primal sketch model
Sketch graph
Input image
Texture pixels
C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of
International Conference on Computer Vision,2003.
Primal sketch model
C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of
International Conference on Computer Vision,2003.
High level visual vocabulary

Cloth: collar, left/right sleeves, hands
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for
cloth modeling and sketching,” in Proceedings of IEEE Conference on
Pattern Recognition and Computer Vision, New York, June 2006
Relations and configurations


Definition of relation:
bonds: {(s, t )}  S  S
relations: E  {(s, t;  ,  ) : s, t  S} ,
: structure,  : compatibility
Three types of relations





Bonds and connections
Joints and junctions
Object interactions/semantics
Definition of configurations:
C  V , E 
V  {Ai : Ai  (( x, y;i ), i ) };
Relations

Bonds and connections
connects primitives into bigger graphs
Ebond (S )  {(ij , ij ;  ,  )}
S  {ij , i  1, 2,..., n, j  1, 2,..., n(i)}
  ( x, y, )
 intensity/color compatibility
Relations

Joint and junctions
Relations

Object interactions
Configuration

Spatial layout of entities at a certain level
C  V , E 
V  {Ai : Ai  (( x, y;i ), i ) };
Primal sketch – parts – object – scene
Reconfigurable graphs

Treat bonds as random variables: address nodes
Inference of the configuration



Have the primal sketch of the image
Detect the ‘T-junction’
Simulated annealing to infer the Gestalt Law
Red dot: connect region
Black line: known edge
Green line: inferred connection
R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006
Reconfigurable graphs
Source image
T-junction
Inferred
connection
Layer
extraction
Ru-Xin Gao1, Tian-Fu Wu, Song-Chun Zhu, and Nong Sang, “Bayesian Inference for Layer Representation with
Mixed Markov Random Field ”
Reconfigurable graphs
R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006
And-Or Graph


Parse graph of the image pg  ( pt , E )
pt: parse tree of vocabulary
E: relations
Inference the parse graph: pg*  arg max p( pg | I )
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical
Report, Lotus Hill Research Institute, 2007.
And-Or Graph




Contain all the valid parse
graphs
And node, Or node, leafnode
Relation between children
of And node
Parse tree: assigning label
on Or node
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical
Report, Lotus Hill Research Institute, 2007.
And-Or Graph

Definition:
Gand or  S ,VN ,VT , R, , P 
and
or
V

V

V
 N
 VT  {( x, y; ),  )



V or   V1And ,V2And ,...
V And   VT | V1Or ,V2Or ,...
image primitives
R  Em  {(vs , vt ;  st , st )} relations at all level
P : probability model defined on the And-Or graph
 : valid configuration of terminal nodes
Stochastic Model on And-Or graph





Terminal (leaf) node: T ( pg )
And-Or node: V or ( pg ),V and ( pg)
Set of links: E ( pg )
Switch variable at Or-node: w(t )
Attributes of primitives:  (t )
p( pg ; , R, ) 
 ( pg ) 

1
exp( ( pg ))
Z ()
v ( w(v)) 
vV Or ( pg )


( i , j )E ( pg )
ij (vi , v j ,  ij , ij )

vV and ( pg ) T ( pg )
t ( (t ))
Stochastic Model on And-Or graph





Terminal (leaf) node: T ( pg )
And-Or node: V or ( pg ),V and ( pg)
Set of links: E ( pg )
Switch variable at Or-node: w(t )
Attributes of primitives:  (t )
p( pg ; , R, ) 
 ( pg ) 

1
exp( ( pg ))
Z ()
v ( w(v)) 
vV Or ( pg )


( i , j )E ( pg )

t ( (t ))
vV and ( pg ) T ( pg )
ij (vi , v j ,  ij , ij )
SCFG: weigh the frequency at the children of or-nodes
Stochastic Model on And-Or graph





Terminal (leaf) node: T ( pg )
And-Or node: V or ( pg ),V and ( pg)
Set of links: E ( pg )
Switch variable at Or-node: w(t )
Attributes of primitives:  (t )
p( pg ; , R, ) 
 ( pg ) 

1
exp( ( pg ))
Z ()
v ( w(v)) 
vV Or ( pg )


( i , j )E ( pg )

t ( (t ))
vV and ( pg ) T ( pg )
ij (vi , v j ,  ij , ij )
Weigh the local compatibility of primitives (geometric and appearance)
Stochastic Model on And-Or graph





Terminal (leaf) node: T ( pg )
And-Or node: V or ( pg ),V and ( pg)
Set of links: E ( pg )
Switch variable at Or-node: w(t )
Attributes of primitives:  (t )
p( pg ; , R, ) 
 ( pg ) 

1
exp( ( pg ))
Z ()
v ( w(v)) 
vV Or ( pg )


( i , j )E ( pg )

t ( (t ))
vV and ( pg ) T ( pg )
ij (vi , v j ,  ij , ij )
Spatial and appearance between primitives (parts or objects)
Learning And-Or Graph
p( pg ; , R, ) 
 ( pg ) 

vV Or ( pg )



1
exp( ( pg ))
Z ()
v (w(v)) 

vV and ( pg )T ( pg )
t ( (t )) 

(i , j )E ( pg )
Learning the vocabulary 
Learning the relation set R, given 
Learning the parameters  , given R and 
ij (vi , v j ,  ij , ij )
Learning And-Or Graph
p( pg ; , R, ) 
 ( pg ) 

vV Or ( pg )



1
exp( ( pg ))
Z ()
v (w(v)) 

vV and ( pg )T ( pg )
t ( (t )) 

(i , j )E ( pg )
ij (vi , v j ,  ij , ij )
Learning the vocabulary  , and hierarchic And-Or Graph
Learning the relation set R, given 
Discussed in the
paper
Learning the parameters  , given R and 
Learning And-Or Graph
Observation:
f ( I , pg )
Learning model:
p( pg; , R, )

Learning and Pursuing
Relation Set R:

Start from Stochastic
Context Free Graph (a)

Learn the relations that
maximally reduce the KL
divergence to the
observation (b-e)
J. Porway, Z.Y. Yao, and S. C. Zhu, “Learning an And–Or graph for modeling and recognizing object categories,” Technical
Report, Department of Statistics,2007
Learning And-Or Graph



Learning graph parameter 
Approximating p( pg; , R, ) to
Similar to texture synthesis
S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy
principle and its applications to texture modeling,”
Neural Computation, vol. 9, no. 8, pp. 1627–1660,
November 1997
f ( I , pg )
Case I: Rectangle

Nodes: Rectangle


Two vanishing points, four edge direction
Rules:
F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of
International Conference on Computer Vision, Beijing,China, 2005.
Case I: Rectangle

Get the primal sketch of the scene

Find the ‘strong’ rectangular (bottom-up,
red)

Weigh (score) different hypothesis (topdown, blue)

Weight is the compatibility of the image with
the proposed rectangular (primal-sketch)
 (( I ( x, y)  B ( x, y))
2
k
~ exp( ( x , y )
2 2
)

Accept the best one

Do the previous 3 steps until all the
weigh is small. (negative)
F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of
International Conference on Computer Vision, Beijing,China, 2005.
Case I: Rectangle

Inference process
Case I: Rectangle
F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of
International Conference on Computer Vision, Beijing,China, 2005.
Case II: Human Cloth

Use And-Or graph to generate a matching model
Matching using the
And-or Graph

Vocabulary (training dataset)
Case II: Human Cloth

The And-Or Graph

Novel Configuration
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings
of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case II: Human Cloth

Inference process
Top-down:
refine the
matching
using the
relation
Localize
face, then
estimate
the parts of
the body
Bottom-up:
a coarse
matching of
the parts
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings
of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case II: Human Cloth

Inference result
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings
of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case II: Human Cloth

Inference result
Hands are not exactly the same: find the best matching in the dataset
H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings
of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006.
Case III: Recognition
Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu,
“Recursive top-down/bottomup
algorithm for object recognition,”
Technical Report, Lotus Hill Research
Institute, 2007.
Conclusion

Enormous amount of vision knowledge: (Add-Or graph)
……
Conclusion

Computational complexity :


Remain open for scheduling bottom-up/top-down procedure
Semantic Gap


Learning the And-Or Graph
Learning the vocabulary  , and its attributes 
After all, we are not supposed to define so many things:
ideal vision words:
20
what we have now:
40
60
80
100
120
20
40
60
80
100
120
140
160
Thank you
Zhaoyin Jia