Institute Jean Nicod, Oct 28, 2005 What is focal attention for? The What and Why of perceptual selection  The central function of.

Download Report

Transcript Institute Jean Nicod, Oct 28, 2005 What is focal attention for? The What and Why of perceptual selection  The central function of.

Institute Jean Nicod, Oct 28, 2005

What is focal attention for?

The What and Why of perceptual selection

  The central function of focal attention is to

select

We must select because our capacity to process information is limited  We must select because we need to be able to

mark

certain aspects of a scene and to

refer

to the marked tokens individually  That’s what this talk is principally about: but first some background

The functions of focal attention

A central notion in vision science is that of “picking out” or

selecting

(also

referring, tracking

)

. T

he usual mechanism for perceptual selection is called

selective attention

or

focal attention

.

Why must we select at all? Overview

 We must select because we can’t process all the information available. This is the

resource-limitation

reason. ○ But in what way (along what dimensions) is it limited? What happens to what is not selected? The “filter theory” has many problems.

 We need to select because certain patterns cannot be computed without first

marking

certain special elements (e.g. in counting)  We need to select in order to track the identity of individual things e.g., to solve the correspondence problem by identifying tokens in order to establish the equivalence of

this

(t=i) and

this

(t=i+ε)  We need to select because of the way relevant information in the world is packaged. This leads to the

Binding Problem

. That’s an important part of what I will discuss in this talk.

Broadbent’s Filter Theory

(illustrating the resource-limited account of selection) Rehearsal loop Effectors Motor planner Limited Capacity Channel Store of conditional probabilities of past events (in LTM) Broadbent, D. E. (1958).

Perception and Communication

. London: Pergamon Press.

Attention and Selection

The question of what is the basis for selection has been at the bottom of a lot of controversy in vision science. Some options that have been proposed

include:

We select what can be described physically (i.e., by “channels”) – we select transducer outputs  e.g., we select by frequency, color, shape, or

location

 We select according to what is important to us (e.g., affordances – Gibson), or according to phenomenal salience (William James)  We select what we need to treat as special or what we need to refer to  selecting as “marking”

Consider the options for what is the basis of visual selection

 The most obvious answer to what we select is

places

or

locations .

We can select most other properties by their location – e.g., we can move our eyes so our gaze lands on different places  Must we always move our eyes to change what we attend to? ○ ○ Studies of

Covert Attention-Movement

: Posner (1980) Other empirical questions about place selection… • When places are selected, are they selected automatically or can they be selected voluntarily? • How does the visual system specify

where

to move attention to?

• Are there restrictions on what places we can select? • Are selected places punctate or can they be regions?

• Must selected places be filled or can they be empty places? • Can places be specifiable in relation to landmark objects (e.g., select the place half way between

X

and

Y

)?

Fixation frame

Covert movement of attention

Cue Target-cue interval Detection target * Cued Uncued * Example of an experiment using a cue-validity paradigm for showing that the locus of attention moves without eye movements and for estimating its speed.

Posner, M. I. (1980). Orienting of Attention.

Quarterly Journal of Experimental Psychology, 32

, 3-25.

Extension of Posner’s demonstration of attention switch Does the improved detection in intermediate locations entail that the “spotlight of attention” moves continuously through empty space?

Sperling & Weichselgartner argued that this analog movement is best explained by a quantal mechanism The theory assumes a quantal jump in attention in which the spotlight pointed at location -2 is extinguished and, simultaneously, the spotlight at location +2 is turned on. Because extinction and onset take a measurable amount of time, there is a brief period when the spotlights partially illuminate both locations simultaneously.

Could

Objects,

rather than places

,

be the basis for selection?

 An independently motivated alternative is that selection occurs when token perceptual objects are

individuated

Individuation

involves distinguishing something from all things it is not. In general individuation involves appealing to properties of the thing in question (cf Strawson). ○ But a more primitive type of individuation or

perceptual parsing

may be computed in early vision 

Primitive Individuation

(

PI

) may be automatic ○ PI is associated with transients or the appearance of a new object ○ PI is sometimes accompanied by assignment of a deictic reference or FINST that keeps individuals distinct without encoding their properties (nonconceptual individuation). This indexing process is, however, numerically limited (to about 4 objects) [* More later] ○ Individuation is often accompanied by the creation of an

Object File

(OF) for that individual, though the OF may remain empty

Some empirical evidence for object based selection and indexing

General empirical considerations

 Individuals and patterns – the need for argument-binding  Examples: subitizing, collinarity and other relational judgments 

Experimental demonstrations

 Single-object advantage in joint judgments   Evidence that whole enduring objects are selected Multiple-Object tracking 

Clinical/neuroscience findings

Some empirical evidence for object-based selection

General empirical considerations

 Individuals and patterns – the need for argument-binding  Examples: subitizing, collinarity and other relational judgments 

Experimental demonstrations

 Single-object advantage in joint judgments   Evidence that whole enduring objects are selected Multiple-Object tracking 

Clinical/neuroscience findings

Individuals and patterns

   Vision does not recognize patterns by applying templates but by parsing the pattern into parts –

recognition-by-parts

(Biederman) A pattern is encoded over time (and over eye movements), so the visual system must

keep track of the individual parts

them as the

same objects

and recognize at different times and stages of encoding Individuating is a prerequisite for recognition of configurational properties (patterns) defined among several individual parts  An example of how we can easily detect patterns if they are defined over a small enough number of parts is in

subitizing

 In order to recognize a pattern, the visual system must pick out individual parts and bind them to the representation being constructed  Examples include what Ullman called “visual routines”  Another area where the concept of an individual has become important is in cognitive development, where it is clear that babies are sensitive to the numerosity of individual things in a way that is independent of their perceptual properties

Are there collinear items (n>3)?

Several objects must be picked out at once in making relational judgments

 The same is true for other relational judgments like

inside

or

on-the same-contour

… etc. We must pick out the relevant individual objects first. Respond: Inside-same contour? On-same contour?

Another example: Subitizing

vs

Counting. How many squares are there? Subitizing

is fast, accurate and only slightly dependent on how many items there are

. Only the squares on the right can be subitized.

Concentric squares cannot be subitized because individuating them requires a curve tracing operation that is not automatic.

Signature subitizing phenomena only appear when objects are automatically individuated and indexed Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A limited capacity preattentive stage in vision.

Psychological Review, 101

(1), 80-102.

Some empirical evidence for object-based selection

General empirical considerations

 Individuals and patterns – the need for argument-binding  Examples: subitizing, collinarity and other relational judgments 

Some experimental demonstrations

 Single-object advantage in joint judgments   Evidence that whole enduring objects are selected Multiple-Object tracking 

Clinical/neuroscience findings

Single-object superiority occurs even when the shapes are controlled

Instruction:

Attend to the Red objects

 Which vertex is higher, left or right  (Note: There are now many control studies that eliminate most obvious confounds)

Attention spreads over

perceived

objects

Spreads to B and not C Spreads to C and not B Spreads to B and not C Spreads to C and not B Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads to other parts of the same visual object compared to equally distant parts of different objects.

We can select a shape even when it is intertwined among other similar shapes Are the green items the same? On a surprise test at the end, subjects were not able to recall shapes that had been present but had not been attended in the task (Rock & Gutman, 1981; DeSchepper & Treisman, 1996)

Further evidence that attention is object-based comes from the finding that various attention phenomena move with moving objects  Once an object is selected, the selection appears to remain with the object as it moves

Inhibition of return appears to be object-based  Inhibition-of-return (IOR) is the phenomenon whereby attention is slow to go back to an object that had been attended about 0.7 – 1.0 secs before  It is thought to help in visual search since it prevents previously visited objects from being revisited  Tipper, Driver & Weaver (1991) showed that IOR moves with the inhibited object

IOR appears to be object-based (it travels with the object that was attended)

Objects endure despite changes in location; and they carry their history with them!

Object File Theory

of Kahneman & Treisman A B A 1 2 3 Letters are faster to read if they appear in the

same box

where they appeared initially. Priming travels with the object. According to the theory, when an object first appears, a file is created for it and the properties of the object are encoded and subsequently accessed through this object-file.

Some empirical evidence for object-based selection

 

General empirical considerations

 Individuals and patterns – the need for argument-binding  Examples: subitizing, collinarity and other relational judgments

Experimental demonstrations

 Single-object advantage in joint judgments   Evidence that whole enduring objects are selected Multiple-Object tracking studies (later) 

Clinical/neuroscience findings

 Visual neglect  Balint syndrome & simultanagnosia

Visual neglect syndrome is object-based

When a right neglect patient is shown a dumbbell that rotates, the patient continues to neglect the object that had been on the right, even though It is now on the left (Behrmann & Tipper, 1999) .

Simultanagnosic (Balint Syndrome) patients attend to only one object at a time Simultanagnosic patients cannot judge the relative length of two lines, but they can tell that a figure made by connecting the ends of the lines is not a rectangle but a trapezoid (Holmes & Horax, 1919) .

Balint patients attend to only one object at a time

even if they are overlapping!

Luria, 1959

Some empirical evidence for object-based selection

Some general empirical considerations

 Individuals and patterns – the need for argument-binding  Examples: subitizing, collinarity and other relational judgments 

Some direct experimental demonstrations

 Single-object advantage in joint judgments  Evidence that whole enduring objects are selected 

Multiple-Object tracking studies

Clinical/neuroscience findings

Multiple Object Selection

 One of the clearest cases illustrating object-based selection is

Multiple Object Tracking

 Keeping track of individual objects in a scene requires a mechanism for

individuating, selecting, accessing

and

tracking the identity

of individuals over time  These are the functions we have proposed are carried out by the mechanism of

visual indexes

(FINSTs)  We have been using a variety of methods for studying

visual indexing

, including subitizing, subset selection for search, and Multiple Object Tracking (MOT).

Multiple Object Tracking

 In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off.

 After these 4 “targets” have been briefly identified, all objects resume their identical appearance and move randomly. The subjects’ task is to keep track of which ones had earlier been designated as targets.

 After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets.

 People are very good at this task (80%-98% correct). The question is:

How do they do it?

Keep track of the objects that flash

How do we do it? Do we keep encode and update locations serially?

Keep track of the objects that flash

How do we do it? What properties of individual objects do we use?

Explaining Multiple Object Tracking  Basic finding: People (even 5 year old children) can track 4 to 5 individual objects that have no unique visual properties. How is it done?

 Can it be done by keeping track of the only distinctive property of objects – their location?

○ Based on the assumption of finite attention movement speed, our modeling suggest that this cannot be done by encoding and updating locations (because of the speed at which they are moving and the distance between them) ○ If tracking is not done by using the only uniquely distinguishing property of objects, then it must be done by tracking their historical continuity as the

same individual object

If we are not using objects’ locations, then how are we tracking them?

 Our independently motivated hypothesis is that a small number of objects (e.g., 4-5) are individuated and reference tokens or indexes are assigned to them  An index keeps referring to the object as the object changes its properties and its location (that makes it the same object!)  An object is not selected or tracked by using an encoding of any of its properties. It is picked it out nonconceptually just the way a demonstrative does in language (i.e.,

this, that)

 Although

some

physical properties must be responsible for the individuation and indexing of an object, we have data showing that these properties are not encoded, and the properties that are encoded need not be used in tracking

What has this to do with the

Binding Problem

?

 First I will introduce the

binding problem

appears in psychology as it

The role of selection in encoding conjunctions of properties (the binding problem)   The binding problem was initially described by Anne Treisman who showed conditions under which vision may fail to correctly bind conjunctions of properties (resulting in conjunction illusions)  Feature binding requires focal attention (i.e.,

selection

) The problem has been of interest to philosophers because it places constraints on how information may be encoded in early vision (or, as Clark would put it, ‘at the sensory level’ or nonconceptually)  I introduce the binding problem to show how the object-based view is essential for its solution

Introduction to the Binding Problem: Encoding conjunctions of properties  Experiments show the special difficulty that vision has in detecting conjunctions of several properties  It seems that items have to be attended (i.e., individuated and selected) in order for their property-conjunction to be encoded  When a display is not attended, conjunction errors are frequent

Read the vertical line of digits in this display What were the letters and their colors?

This is what you saw briefly … Under these conditions Conjunction Errors are very frequent

Encoding conjunctions requires selection

 One source of evidence is from search experiments:  Single feature search is fast and appears to be independent of the number if items searched through (suggesting it is automatic and ‘pre-attentive’)  Conjunction search is slower and the time increases with the number of items searched through (suggesting it requires serial scanning of attention)

Rapid visual search

(Treisman) Find the following simple figure in the next slide:

This case is easy – and the time is independent of how many nontargets there are – because there is only one red item. This is called a ‘popout’ search

This case is also easy – and the time is independent of how many nontargets there are – because there is only one right-leaning item. This is also a ‘popout’ search.

Rapid visual search

(conjunction) Find the following simple figure in the next slide:

Feature Integration Theory and feature Binding

Treisman’s

attention as glue

hypothesis: focal attention (selection) is needed in order to

bind

 properties together We can recognize not only the presence of “squareness” and “redness”, but we can also distinguish between different ways they may be conjoined together •

Red square and green circle

vs

green square and red circle

  The evidence suggests that conjoined properties are encode only if they are attended or

selected

Notice that properties are considered to be conjoined

if and only if

they are properties of the same object, so it is objects that must be selected!

Constraints on nonconceptual representation  of visual information (and the binding problem) Because early (nonconceptual) vision must not fuse the conjunctive grouping of properties, visual properties can’t just be represented as

being present

in the scene – because then the binding problem could not be solved!

 What else is required?

 The most common answer is that each property must be represented as being at a particular location  According to Peter Strawson and Austin Clark, the basic unit of sensory representation is

Feature F at location L

 This is the

global map

or

feature placing

proposal.  This proposal fails for interesting empirical reasons  But if feature placing is not the answer, what is?

The role of attention to location in Treisman’s

Feature Integration Theory

Conjunction detected R Color maps Shape maps Orientation maps Y G Master location map Attention “beam” Original Input

But in encoding properties, early vision can’t just bind them together according to their spatial co-occurrence – even their co occurrence

within the same region .

That’s because the relevant region depends on the object. So the selection and binding must be

according to the objects that have those properties

If location of properties will not give us a way of solving the binding problem, what will?

 This is why we need object-based selection and why the object-based attention literature is relevant …

An alternative view of how we solve the binding problem

  If we assume that

only

properties of indexed objects (of which there are about 4-5) are encoded and that these are stored in

object files

associated with each object, then properties that belong to the same object are stored in the same

object file

, which is why they get bound together   This automatically solves the binding problem!

This is the view exemplified by both FINST Theory (1989) and Object File Theory (1992) The assumption that only properties of indexed objects are encoded raises the question of what happens to properties of the other (unindexed) objects or properties in a display The logical answer is that they are not encoded and therefore not available to conceptualization and cognition But this is counter-intuitive!

An intriguing possibility….

 Maybe we see far less than we think we do!

 This possibility has received a great deal of recent attention with the discovery of various ‘blindnesses’ such as change-blindness and inattentional blindness

The assumption that

no properties other than properties of indexed objects can be encoded

is in conflict with strong intuitions – namely that we

see

much more than we conceptualize and are aware of. So what do we do about the things we “see” but do not conceptualize?

 Some philosophers say they are represented

nonconceptually

 But what makes this a nonconceptual

representation

, as opposed to just a causal reaction?

○ At the very minimum postulating that something is a representation must allow generalizations to be captured over their

content

, which would otherwise not be available ○ Traditionally representations are explanatory because they account for the possibility of misrepresentation and they also enter into conceptualizations and inferences. But unselected objects and unencoded properties don’t seem to fit this requirement (or do they?)

Maybe information about non-indexed objects is not

represented

at all!!

 A possible view (which I am not prepared to fully endorse yet) is that certain topographical or biological reactions (e.g., retinal activity) are not representations – because they have no truth values and so cannot

misrepresent

  One must distinguish between

causal

and

represented

properties Properties that cause objects to be indexed and tracked and result in object files being created

need not be encoded

and made available to cognition

Is this just terminological imperialism?

If we call all forms of patterned reactions

representations

then we will need to have a further distinction among types within this broader class of representation  We may need to distinguish between personal and subpersonal types of ‘representation’ with only the former being representations for our purposes  We may also need to distinguish between patterned states within an encapsulated module that are not available to the rest of the mind/brain and those that are available ○ Certain patterned causal properties may be available to motor control – but does that make them representations?  An essential diagnostic is whether reference to content – to

what is represented

– allows generalizations that would otherwise be missed and that, in turn, suggests that there is

no representation without misrepresentation

○ We don’t want to count retinal images as representations because they can’t misrepresent, though they can be misinterpreted later

What next?

 This picture leaves many unanswered questions, but it does provide a mechanism for solving the binding problem and also explaining how mental representations could have a nonconceptual connection with objects in the world (something required if mental representations are to connect with actions)

The End

 … except for a few loose ends …

Can objects be individuated but not indexed? A new twist to this story  We have recently obtained evidence that objects that are not tracked in MOT are nonetheless being inhibited and the inhibition moves with them  It is harder to detect a probe dot on an untracked object than on either a tracked object or empty space!

 But how can inhibition move with a nontarget when the space through which they move is not inhibited?  Doesn’t this require the nontargets to be tracked?

The beginnings of the puzzle of clustering prior to indexing, and what that might mean!

   If moving objects are inhibited then inhibition moves along with the objects. How can this be unless they are being tracked? And if they are being tracked there must be at least 8 FINSTs!

This puzzle may signal the need for a kind of individuation that is weaker than the individuation we have discussed so far – a mere clustering, circumscribing, figure-ground distinction without a pointer or access mechanism – i.e. without reference!

It turns out that such a circumscribing-clustering process is needed to fulfill many different functions in early vision. It is needed whenever the correspondence problem arises – whenever visual elements need to be placed in correspondence or paired with other elements. This occurs in computing stereo, apparent motion, and other grouping situations in which the number of elements does not affect ease of pairing (or even results in faster pairing when there are more elements). Correspondence is not computed over continuous visual manifolds but only over some pre-clustered elements.

Example of the correspondence problem for apparent motion The grey disks correspond to the first flash and the black ones to the second flash. Which of the 24 possible matches will the visual system select as the solution to this correspondence problem? What principal does it use?

Curved matches Linear matches

Here is how it actually looks

Views of a dome

Structure from Motion Demo Cylinder Kinetic Depth Effect

The correspondence problem for biological motion

FINST Theory postulates a limited number of pointers in

early vision

that are elicited by causal events in the visual field and that enable vision to refer to things without doing so under concept or a description