Institute Jean Nicod, Oct 28, 2005 What is focal attention for? The What and Why of perceptual selection The central function of.
Download ReportTranscript Institute Jean Nicod, Oct 28, 2005 What is focal attention for? The What and Why of perceptual selection The central function of.
Institute Jean Nicod, Oct 28, 2005
What is focal attention for?
The What and Why of perceptual selection
The central function of focal attention is to
select
We must select because our capacity to process information is limited We must select because we need to be able to
mark
certain aspects of a scene and to
refer
to the marked tokens individually That’s what this talk is principally about: but first some background
The functions of focal attention
A central notion in vision science is that of “picking out” or
selecting
(also
referring, tracking
)
. T
he usual mechanism for perceptual selection is called
selective attention
or
focal attention
.
Why must we select at all? Overview
We must select because we can’t process all the information available. This is the
resource-limitation
reason. ○ But in what way (along what dimensions) is it limited? What happens to what is not selected? The “filter theory” has many problems.
We need to select because certain patterns cannot be computed without first
marking
certain special elements (e.g. in counting) We need to select in order to track the identity of individual things e.g., to solve the correspondence problem by identifying tokens in order to establish the equivalence of
this
(t=i) and
this
(t=i+ε) We need to select because of the way relevant information in the world is packaged. This leads to the
Binding Problem
. That’s an important part of what I will discuss in this talk.
Broadbent’s Filter Theory
(illustrating the resource-limited account of selection) Rehearsal loop Effectors Motor planner Limited Capacity Channel Store of conditional probabilities of past events (in LTM) Broadbent, D. E. (1958).
Perception and Communication
. London: Pergamon Press.
Attention and Selection
The question of what is the basis for selection has been at the bottom of a lot of controversy in vision science. Some options that have been proposed
include:
We select what can be described physically (i.e., by “channels”) – we select transducer outputs e.g., we select by frequency, color, shape, or
location
We select according to what is important to us (e.g., affordances – Gibson), or according to phenomenal salience (William James) We select what we need to treat as special or what we need to refer to selecting as “marking”
Consider the options for what is the basis of visual selection
The most obvious answer to what we select is
places
or
locations .
We can select most other properties by their location – e.g., we can move our eyes so our gaze lands on different places Must we always move our eyes to change what we attend to? ○ ○ Studies of
Covert Attention-Movement
: Posner (1980) Other empirical questions about place selection… • When places are selected, are they selected automatically or can they be selected voluntarily? • How does the visual system specify
where
to move attention to?
• Are there restrictions on what places we can select? • Are selected places punctate or can they be regions?
• Must selected places be filled or can they be empty places? • Can places be specifiable in relation to landmark objects (e.g., select the place half way between
X
and
Y
)?
Fixation frame
Covert movement of attention
Cue Target-cue interval Detection target * Cued Uncued * Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed.
Posner, M. I. (1980). Orienting of Attention.
Quarterly Journal of Experimental Psychology, 32
, 3-25.
Extension of Posner’s demonstration of attention switch Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?
Sperling & Weichselgartner argued that this analog movement is best explained by a quantal mechanism The theory assumes a quantal jump in attention in which the spotlight pointed at location -2 is extinguished and, simultaneously, the spotlight at location +2 is turned on. Because extinction and onset take a measurable amount of time, there is a brief period when the spotlights partially illuminate both locations simultaneously.
Could
Objects,
rather than places
,
be the basis for selection?
An independently motivated alternative is that selection occurs when token perceptual objects are
individuated
Individuation
involves distinguishing something from all things it is not. In general individuation involves appealing to properties of the thing in question (cf Strawson). ○ But a more primitive type of individuation or
perceptual parsing
may be computed in early vision
Primitive Individuation
(
PI
) may be automatic ○ PI is associated with transients or the appearance of a new object ○ PI is sometimes accompanied by assignment of a deictic reference or FINST that keeps individuals distinct without encoding their properties (nonconceptual individuation). This indexing process is, however, numerically limited (to about 4 objects) [* More later] ○ Individuation is often accompanied by the creation of an
Object File
(OF) for that individual, though the OF may remain empty
Some empirical evidence for object based selection and indexing
General empirical considerations
Individuals and patterns – the need for argument-binding Examples: subitizing, collinarity and other relational judgments
Experimental demonstrations
Single-object advantage in joint judgments Evidence that whole enduring objects are selected Multiple-Object tracking
Clinical/neuroscience findings
Some empirical evidence for object-based selection
General empirical considerations
Individuals and patterns – the need for argument-binding Examples: subitizing, collinarity and other relational judgments
Experimental demonstrations
Single-object advantage in joint judgments Evidence that whole enduring objects are selected Multiple-Object tracking
Clinical/neuroscience findings
Individuals and patterns
Vision does not recognize patterns by applying templates but by parsing the pattern into parts –
recognition-by-parts
(Biederman) A pattern is encoded over time (and over eye movements), so the visual system must
keep track of the individual parts
them as the
same objects
and recognize at different times and stages of encoding Individuating is a prerequisite for recognition of configurational properties (patterns) defined among several individual parts An example of how we can easily detect patterns if they are defined over a small enough number of parts is in
subitizing
In order to recognize a pattern, the visual system must pick out individual parts and bind them to the representation being constructed Examples include what Ullman called “visual routines” Another area where the concept of an individual has become important is in cognitive development, where it is clear that babies are sensitive to the numerosity of individual things in a way that is independent of their perceptual properties
Are there collinear items (n>3)?
Several objects must be picked out at once in making relational judgments
The same is true for other relational judgments like
inside
or
on-the same-contour
… etc. We must pick out the relevant individual objects first. Respond: Inside-same contour? On-same contour?
Another example: Subitizing
vs
Counting. How many squares are there? Subitizing
is fast, accurate and only slightly dependent on how many items there are
. Only the squares on the right can be subitized.
Concentric squares cannot be subitized because individuating them requires a curve tracing operation that is not automatic.
Signature subitizing phenomena only appear when objects are automatically individuated and indexed Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision.
Psychological Review, 101
(1), 80-102.
Some empirical evidence for object-based selection
General empirical considerations
Individuals and patterns – the need for argument-binding Examples: subitizing, collinarity and other relational judgments
Some experimental demonstrations
Single-object advantage in joint judgments Evidence that whole enduring objects are selected Multiple-Object tracking
Clinical/neuroscience findings
Single-object superiority occurs even when the shapes are controlled
Instruction:
Attend to the Red objects
Which vertex is higher, left or right (Note: There are now many control studies that eliminate most obvious confounds)
Attention spreads over
perceived
objects
Spreads to B and not C Spreads to C and not B Spreads to B and not C Spreads to C and not B Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads to other parts of the same visual object compared to equally distant parts of different objects.
We can select a shape even when it is intertwined among other similar shapes Are the green items the same? On a surprise test at the end, subjects were not able to recall shapes that had been present but had not been attended in the task (Rock & Gutman, 1981; DeSchepper & Treisman, 1996)
Further evidence that attention is object-based comes from the finding that various attention phenomena move with moving objects Once an object is selected, the selection appears to remain with the object as it moves
Inhibition of return appears to be object-based Inhibition-of-return (IOR) is the phenomenon whereby attention is slow to go back to an object that had been attended about 0.7 – 1.0 secs before It is thought to help in visual search since it prevents previously visited objects from being revisited Tipper, Driver & Weaver (1991) showed that IOR moves with the inhibited object
IOR appears to be object-based (it travels with the object that was attended)
Objects endure despite changes in location; and they carry their history with them!
Object File Theory
of Kahneman & Treisman A B A 1 2 3 Letters are faster to read if they appear in the
same box
where they appeared initially. Priming travels with the object. According to the theory, when an object first appears, a file is created for it and the properties of the object are encoded and subsequently accessed through this object-file.
Some empirical evidence for object-based selection
General empirical considerations
Individuals and patterns – the need for argument-binding Examples: subitizing, collinarity and other relational judgments
Experimental demonstrations
Single-object advantage in joint judgments Evidence that whole enduring objects are selected Multiple-Object tracking studies (later)
Clinical/neuroscience findings
Visual neglect Balint syndrome & simultanagnosia
Visual neglect syndrome is object-based
When a right neglect patient is shown a dumbbell that rotates, the patient continues to neglect the object that had been on the right, even though It is now on the left (Behrmann & Tipper, 1999) .
Simultanagnosic (Balint Syndrome) patients attend to only one object at a time Simultanagnosic patients cannot judge the relative length of two lines, but they can tell that a figure made by connecting the ends of the lines is not a rectangle but a trapezoid (Holmes & Horax, 1919) .
Balint patients attend to only one object at a time
even if they are overlapping!
Luria, 1959
Some empirical evidence for object-based selection
Some general empirical considerations
Individuals and patterns – the need for argument-binding Examples: subitizing, collinarity and other relational judgments
Some direct experimental demonstrations
Single-object advantage in joint judgments Evidence that whole enduring objects are selected
Multiple-Object tracking studies
Clinical/neuroscience findings
Multiple Object Selection
One of the clearest cases illustrating object-based selection is
Multiple Object Tracking
Keeping track of individual objects in a scene requires a mechanism for
individuating, selecting, accessing
and
tracking the identity
of individuals over time These are the functions we have proposed are carried out by the mechanism of
visual indexes
(FINSTs) We have been using a variety of methods for studying
visual indexing
, including subitizing, subset selection for search, and Multiple Object Tracking (MOT).
Multiple Object Tracking
In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off.
After these 4 “targets” have been briefly identified, all objects resume their identical appearance and move randomly. The subjects’ task is to keep track of which ones had earlier been designated as targets.
After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets.
People are very good at this task (80%-98% correct). The question is:
How do they do it?
Keep track of the objects that flash
How do we do it? Do we keep encode and update locations serially?
Keep track of the objects that flash
How do we do it? What properties of individual objects do we use?
Explaining Multiple Object Tracking Basic finding: People (even 5 year old children) can track 4 to 5 individual objects that have no unique visual properties. How is it done?
Can it be done by keeping track of the only distinctive property of objects – their location?
○ Based on the assumption of finite attention movement speed, our modeling suggest that this cannot be done by encoding and updating locations (because of the speed at which they are moving and the distance between them) ○ If tracking is not done by using the only uniquely distinguishing property of objects, then it must be done by tracking their historical continuity as the
same individual object
If we are not using objects’ locations, then how are we tracking them?
Our independently motivated hypothesis is that a small number of objects (e.g., 4-5) are individuated and reference tokens or indexes are assigned to them An index keeps referring to the object as the object changes its properties and its location (that makes it the same object!) An object is not selected or tracked by using an encoding of any of its properties. It is picked it out nonconceptually just the way a demonstrative does in language (i.e.,
this, that)
Although
some
physical properties must be responsible for the individuation and indexing of an object, we have data showing that these properties are not encoded, and the properties that are encoded need not be used in tracking
What has this to do with the
Binding Problem
?
First I will introduce the
binding problem
appears in psychology as it
The role of selection in encoding conjunctions of properties (the binding problem) The binding problem was initially described by Anne Treisman who showed conditions under which vision may fail to correctly bind conjunctions of properties (resulting in conjunction illusions) Feature binding requires focal attention (i.e.,
selection
) The problem has been of interest to philosophers because it places constraints on how information may be encoded in early vision (or, as Clark would put it, ‘at the sensory level’ or nonconceptually) I introduce the binding problem to show how the object-based view is essential for its solution
Introduction to the Binding Problem: Encoding conjunctions of properties Experiments show the special difficulty that vision has in detecting conjunctions of several properties It seems that items have to be attended (i.e., individuated and selected) in order for their property-conjunction to be encoded When a display is not attended, conjunction errors are frequent
Read the vertical line of digits in this display What were the letters and their colors?
This is what you saw briefly … Under these conditions Conjunction Errors are very frequent
Encoding conjunctions requires selection
One source of evidence is from search experiments: Single feature search is fast and appears to be independent of the number if items searched through (suggesting it is automatic and ‘pre-attentive’) Conjunction search is slower and the time increases with the number of items searched through (suggesting it requires serial scanning of attention)
Rapid visual search
(Treisman) Find the following simple figure in the next slide:
This case is easy – and the time is independent of how many nontargets there are – because there is only one red item. This is called a ‘popout’ search
This case is also easy – and the time is independent of how many nontargets there are – because there is only one right-leaning item. This is also a ‘popout’ search.
Rapid visual search
(conjunction) Find the following simple figure in the next slide:
Feature Integration Theory and feature Binding
Treisman’s
attention as glue
hypothesis: focal attention (selection) is needed in order to
bind
properties together We can recognize not only the presence of “squareness” and “redness”, but we can also distinguish between different ways they may be conjoined together •
Red square and green circle
vs
green square and red circle
The evidence suggests that conjoined properties are encode only if they are attended or
selected
Notice that properties are considered to be conjoined
if and only if
they are properties of the same object, so it is objects that must be selected!
Constraints on nonconceptual representation of visual information (and the binding problem) Because early (nonconceptual) vision must not fuse the conjunctive grouping of properties, visual properties can’t just be represented as
being present
in the scene – because then the binding problem could not be solved!
What else is required?
The most common answer is that each property must be represented as being at a particular location According to Peter Strawson and Austin Clark, the basic unit of sensory representation is
Feature F at location L
This is the
global map
or
feature placing
proposal. This proposal fails for interesting empirical reasons But if feature placing is not the answer, what is?
The role of attention to location in Treisman’s
Feature Integration Theory
Conjunction detected R Color maps Shape maps Orientation maps Y G Master location map Attention “beam” Original Input
But in encoding properties, early vision can’t just bind them together according to their spatial co-occurrence – even their co occurrence
within the same region .
That’s because the relevant region depends on the object. So the selection and binding must be
according to the objects that have those properties
If location of properties will not give us a way of solving the binding problem, what will?
This is why we need object-based selection and why the object-based attention literature is relevant …
An alternative view of how we solve the binding problem
If we assume that
only
properties of indexed objects (of which there are about 4-5) are encoded and that these are stored in
object files
associated with each object, then properties that belong to the same object are stored in the same
object file
, which is why they get bound together This automatically solves the binding problem!
This is the view exemplified by both FINST Theory (1989) and Object File Theory (1992) The assumption that only properties of indexed objects are encoded raises the question of what happens to properties of the other (unindexed) objects or properties in a display The logical answer is that they are not encoded and therefore not available to conceptualization and cognition But this is counter-intuitive!
An intriguing possibility….
Maybe we see far less than we think we do!
This possibility has received a great deal of recent attention with the discovery of various ‘blindnesses’ such as change-blindness and inattentional blindness
The assumption that
no properties other than properties of indexed objects can be encoded
is in conflict with strong intuitions – namely that we
see
much more than we conceptualize and are aware of. So what do we do about the things we “see” but do not conceptualize?
Some philosophers say they are represented
nonconceptually
But what makes this a nonconceptual
representation
, as opposed to just a causal reaction?
○ At the very minimum postulating that something is a representation must allow generalizations to be captured over their
content
, which would otherwise not be available ○ Traditionally representations are explanatory because they account for the possibility of misrepresentation and they also enter into conceptualizations and inferences. But unselected objects and unencoded properties don’t seem to fit this requirement (or do they?)
Maybe information about non-indexed objects is not
represented
at all!!
A possible view (which I am not prepared to fully endorse yet) is that certain topographical or biological reactions (e.g., retinal activity) are not representations – because they have no truth values and so cannot
misrepresent
One must distinguish between
causal
and
represented
properties Properties that cause objects to be indexed and tracked and result in object files being created
need not be encoded
and made available to cognition
Is this just terminological imperialism?
If we call all forms of patterned reactions
representations
then we will need to have a further distinction among types within this broader class of representation We may need to distinguish between personal and subpersonal types of ‘representation’ with only the former being representations for our purposes We may also need to distinguish between patterned states within an encapsulated module that are not available to the rest of the mind/brain and those that are available ○ Certain patterned causal properties may be available to motor control – but does that make them representations? An essential diagnostic is whether reference to content – to
what is represented
– allows generalizations that would otherwise be missed and that, in turn, suggests that there is
no representation without misrepresentation
○ We don’t want to count retinal images as representations because they can’t misrepresent, though they can be misinterpreted later
What next?
This picture leaves many unanswered questions, but it does provide a mechanism for solving the binding problem and also explaining how mental representations could have a nonconceptual connection with objects in the world (something required if mental representations are to connect with actions)
The End
… except for a few loose ends …
Can objects be individuated but not indexed? A new twist to this story We have recently obtained evidence that objects that are not tracked in MOT are nonetheless being inhibited and the inhibition moves with them It is harder to detect a probe dot on an untracked object than on either a tracked object or empty space!
But how can inhibition move with a nontarget when the space through which they move is not inhibited? Doesn’t this require the nontargets to be tracked?
The beginnings of the puzzle of clustering prior to indexing, and what that might mean!
If moving objects are inhibited then inhibition moves along with the objects. How can this be unless they are being tracked? And if they are being tracked there must be at least 8 FINSTs!
This puzzle may signal the need for a kind of individuation that is weaker than the individuation we have discussed so far – a mere clustering, circumscribing, figure-ground distinction without a pointer or access mechanism – i.e. without reference!
It turns out that such a circumscribing-clustering process is needed to fulfill many different functions in early vision. It is needed whenever the correspondence problem arises – whenever visual elements need to be placed in correspondence or paired with other elements. This occurs in computing stereo, apparent motion, and other grouping situations in which the number of elements does not affect ease of pairing (or even results in faster pairing when there are more elements). Correspondence is not computed over continuous visual manifolds but only over some pre-clustered elements.
Example of the correspondence problem for apparent motion The grey disks correspond to the first flash and the black ones to the second flash. Which of the 24 possible matches will the visual system select as the solution to this correspondence problem? What principal does it use?
Curved matches Linear matches
Here is how it actually looks
Views of a dome
Structure from Motion Demo Cylinder Kinetic Depth Effect
The correspondence problem for biological motion
FINST Theory postulates a limited number of pointers in
early vision
that are elicited by causal events in the visual field and that enable vision to refer to things without doing so under concept or a description