Learning to Predict Where Humans Look

Download Report

Transcript Learning to Predict Where Humans Look

ICCV 2009
Tilke Judd , Krista Ehinger , Fr´edo Durand, Antonio Torralba





Introduction
Database of eye tracking data
Learning a model of saliency
Applications
Conclusion
• Bottom-up control of selective attention
− stimulus salience (defined by color, contrast and
orientation)
− Saliency map

Current saliency models do not accurately
predict human fixations.
• Top-down control of selective attention
− Scene schema guides fixations (more likely to land on
meaningful areas)
− Task goals guides fixations to land on objects relevant
to the task
• The first is a large database of eye tracking
experiments with labels and analysis
• Second is a supervised learning model of saliency
which combines both bottom-up image based
saliency cues and top-down image semantic
dependent cues
• Goal : Predict where users look without the eye
tracking hardware.

Data gathering protocol
◦ 1003 random images from Flickr and LabelMe and
recorded eye tracking data from 15 users who free
viewed these images.
779 landscape images and 228 portrait
images.

Data gathering protocol
Gaze tracking paths and fixation locations are recorded for
each viewer

Data gathering protocol
Gaussia
n filter
left. Saliency map
right. most salient 20 percent of the
image

Analysis of dataset
◦ a strong bias for human fixations to be near the
center of the image [19][23]

Analysis of dataset
◦ the performance of human saliency maps to predict
eye fixations
Ground truth fixations
Saliency map as classifier

Analysis of dataset
◦ Object of interest and Size of regions of interest

Features used for machine learning
◦
◦
◦
◦
Low-level features ex: color ,orientation ,intensity
Mid-level features ex: horizon
High-level features ex: face detector
Center prior : distance to the center

Training sample selection
◦ 903 training images and 100 testing images
◦ 10 positively labeled pixels randomly from the top
20% salient locations 10 negatively labeled pixels
from the bottom 70% salient locations

Training
◦ used the liblinear support vector machine to train a
model

Comparison of saliency maps

Performance on testing images
1. Outperforms than other model
2. Reaches 88% of the way to
human performance
3. not benefit from the huge bias
of fixations toward the center
4. the overall performance for the
object detector model is low

Performance on testing samples (the average of
the true positive and true negative rates)
1. performs only as well as
chance for the other subsets of
samples
2. the later model performs
more robustly over all subsets
of samples
3. people and cars performs
better on the subsets with
faces

Using eye tracking data to decide how to
render a photograph with differing levels of
detail.
[4] D. DeCarlo and A. Santella. Stylization and abstraction of photographs. ACM Transactions
on Graphics

Contributions
◦ Developed a largest eye tracking database of
natural images and permits large-scale
quantitative analysis of fixations points and gaze
paths.
◦ Using machine learning to train a bottom-up,
top-down model of saliency and outperforms
several existing Models.

future work
◦ understanding the impact of framing, cropping
and scaling images on fixations.