Attention, Selection and Nonconceptual Reference An empirically-motivated proposal concerning the nonconceptual link between the perceived world and its conceptual representation Zenon Pylyshyn, Rutgers Center for.

Download Report

Transcript Attention, Selection and Nonconceptual Reference An empirically-motivated proposal concerning the nonconceptual link between the perceived world and its conceptual representation Zenon Pylyshyn, Rutgers Center for.

Attention, Selection and
Nonconceptual Reference
An empirically-motivated proposal concerning
the nonconceptual link between the perceived
world and its conceptual representation
Zenon Pylyshyn, Rutgers Center for Cognitive Science
Focal attention: What is it for?
The principal function of focal attention is to select.
But why do we need to select?
1. We must select because our capacity to process
information is limited. This is a widely accepted reason.
2. We also must select because we need to be able to mark
certain tokens in the perceived world and to refer to the
marked tokens qua individuals (e.g., as in counting things).
 We
need to select individuals in order to refer to them when
we detect relational properties among them (Collinear,
Inside, Part-of, Connected-to, ...).
 Binding predicate arguments to things in the
world
….What is focal attention for ?
3. Another important reason for selection is that it provides
a way to collect properties at the earliest nonconceptual
stage of perception – and thus to provide a way to solve
the binding problem

That’s what this talk is mostly about

But first some background about the need for
nonconceptual selection
We need to be able to pick out sensory individuals
directly – without mediation of concepts

We need to make nonconceptual contact with the world
through perception in order to stop the regress of
concepts being defined in terms of other concepts which
are defined in terms of still other concepts
 Sometimes called the symbol grounding problem

My proposal is that nonconceptual selection of
individual objects is the primitive basis for all
conceptualization and predication
 The argument for nonconceptual selection of token
objects as the primitive operation is primarily empirical.
 I begin with a personal experience in developing a
model for reasoning about geometry by drawing a
diagram and noticing properties of the diagram
Begin by drawing a line….
Now draw a second line….
And draw a third line….
Notice what you have so far….
(noticings are local – you encode what you attend to)
There is an intersection of two lines…
But which of the two lines you drew are they?
There is no way to indicate which individual
things are seen again unless there is a way to
refer to individual things
Look around some more to see what is there ….
L3
L6
Here is another intersection of two lines…
Is it the same intersection as the one seen earlier?
To be able to tell without a reference to individuals you
would have to encode unique properties of the
individual lines. Which properties should you encode?
Footnote
Notice that in the previous example it would not help
if you labeled the diagram as you drew it. Why not?
 Because to refer to the line with label L1 you would
have to be able to think “This is line L1” and you could
not think that unless you had a way to think “this” and
the label would not help you to do that!
 Being able to think “this” is another way to view the
very problem I will be concerned with in this talk. You
need an independent way to pick out and refer to an
individual element – even if it is labeled! (I will also
provide evidence that you need to do this for several
individuals simultaneously).

This is exactly the point of Kaplan’s and Perry’s claim
about the “essential indexical”
The requirements for picking out individual things
and keeping track of them reminded me of an
early comic book character called “Plastic Man”
Imagine being able to place several of your fingers on things
in the world without being able to detect their properties in
this way, but being able to refer to those things so you could
move your gaze or attention to them. If you could do that
you would possess FINgers of INSTantiation = FINSTs!
Outline of remainder of this talk





Selection: What is selected?
 Places vs ‘Objects’ (Posner & analogue attention movement)
 Evidence in favor of object-based selection
Selection and demonstrative reference
 Multiple selection
 FINST Theory and Object Files
Multiple Object Tracking (MOT) and FINST Indexes
as direct (non-conceptually-mediated) reference
Selection and the Binding Problem
Implication for philosophical ideas about individuals,
tracking and nonconceptual representation
Covert movement of attention
Fixation
frame
Cue
Target-cue
interval
Detection target
*
Cued
Uncued
*
Example of an experiment using a cue-validity paradigm for showing that the
locus of attention moves without eye movements and for estimating its speed.
Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.
Extension of Posner’s demonstration of attention switch
Does the improved detection in intermediate locations entail that the “spotlight of
attention” moves continuously through empty space?
Enhancement of intermediate locations does not
require a continuous analogue movement of
attention through empty space
 When attention is attracted by an onset event, the
appearance of analog movement of focal attention
can be explained by a punctate (quantal) theory of
attention-switching

Sperling & Weichselgartner (1995) – an episodic theory of attention shift
This raises the possibility that in shifting between two
objects, attention does not actually move through
empty space
 Maybe attention is allocated to objects rather than
locations?
Evidence for Objects as the basis for selection

Single Object Advantage: pairs of judgments are faster when
both judgments concern the same perceived object

Entire objects acquire enhanced sensitivity from the allocation of
focal attention to part of the object

Single-Object advantage occurs even with generalized “objects”
that move through feature space (Blaser & Pylyshyn, 2000)

Clinical (brain damage) syndromes such as Simultanagnosia
and Hemispatial Neglect show an object-based character

Attention moves with Moving Objects
 Inhibition of Return (IOR)
 Object Files
 Multiple Object Tracking MOT (and generalization to movement in
feature space)
Single-object superiority even
when the shapes are controlled
There are a large number of published experiments showing that when
several perceptual judgments are made they are faster when they
pertain to the same object, even when all other factors are controlled
*Baylis, G. C., & Driver, J. (1993). Visual attention and objects: Evidence for hierarchical coding of
location. Journal of Experimental Psychology: Human Perception and Performance, 19, 451-470
Attention spreads over perceived objects
A
C
A
C
Spreads to
B and not C
Spreads to
C and not B
*B
A
D
C
B
A
D
C
Spreads to
B and not C
Spreads to
C and not B
B
D
B
D
Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads t
other parts of the same visual object compared to equally distant parts of different objects.
Objecthood endures over space-time

Several studies have shown that what counts as the
same object endures over time and location;
 Object-specific priming (Kahneman; Scholl)
 Inhibition of return is object-based
 Certain forms of disappearance-reappearance preserve
objecthood
 Multiple Object Tracking MOT (Scholl, Keane)
 Apparent motion (Kolers, Yantis)
 Tunnel

Effect (Michotte, 1953; Flombaum & Scholl, 2006)
This identity constancy gives “visual objects” a real
physical-object character and is one of the reasons why
psychologists refer to them as “objects”.
Objects endure despite changes in location;
and they carry their history with them!
Object File Theory of Kahneman & Treisman
A
B
A
1
2
3
Letters are faster to read if they appear in the same box in which they had
appeared initially. Priming travels with the object. According to the theory,
when an object first appears, a file is created for it and the properties of the
object are encoded and subsequently accessed through this object-file.
Inhibition of return appears to be object-based

Inhibition-of-return is thought to help in visual
search since it prevents previously visited objects
from being revisited
 The original study used static objects. Then
(Tipper, Driver & Weaver, 1991) showed that IOR
moves with the inhibited object.
IOR appears to be object-based (it travels
with the object that was attended)
There is also evidence from clinical studies
supporting object-based selection

Hemispatial Neglect (moves with neglected objects)
 Balint and simultanagnosia syndromes
An empirical hypothesis: To select is to refer

When we select an object with focal attention we can
then refer to it. Consequently we can e.g.,
 Entertain thoughts about it (“this is red”)
 Carry out certain actions towards it (e.g., move our gaze to it)

Since we can select several (n ≤ 4) objects at once so;
 We can have demonstrative thoughts about several objects
“this1 is above this2”
 Having selected several objects we can evaluate predicates
over them or move focal attention to them

We can also subitize them or search through them <experiments>
 We can keep track of selected objects if we or they move
unpredictably or change their properties <MOT>
Pick out 3 dots I will cue and keep track of them
 In a field of identical elements you can select several of them and move
your attention among them (e.g., “move one up” or Move 2 right” etc)
so long as at no time do you have to hold on to more than 3 or 4 dots
Subset selection for search
+
+
+
Target =
+
single
feature
search
conjunction
feature
search
Burkell, J., & Pylyshyn, Z. W. (1997). Searching through subsets: A test of the visual indexing hypothesis. Spatial Vision,
11(2), 225-258.
Subset search results:
 Only properties of the subset matter
 If the subset is a single-feature search it is fast and parallel
 If the subset is a conjunction search set, finding the target
takes longer and is a serial search (RT increases with set size)
 The distance between targets does not matter, so
observers don’t seem to be scanning the display looking
for the target but can switch their attention directly to the
subset items.
 This finding supports the claim that we have a small
number of FINST indexes that can be captured by sudden
onsets and can serve to direct focal attention
Individuals and patterns

Vision does not recognize patterns by applying templates
but rather by decomposing them into parts
 Recognition-By-Parts (Biederman, 2000)

A pattern is encoded over time (and often over different
views separated by saccades), so the visual system must
keep track of the individual parts and merge descriptions
of the same part at different times and stages of encoding
 In recognizing a pattern, the visual system must pick out
individual parts and bind them to the representation being
constructed
Are there collinear items (n>3)?
In order for us to judge that there are collinear items, there
must be a way to specify which ones are being judged
Several objects must be picked out at once
in making relational judgments

The same is true for other relational judgments like inside or on-the-samecontour… etc. We must pick out the relevant individual objects first.
Respond: Inside-same contour? On-same contour?
When items cannot be individuated, predicates
over them cannot be evaluated
● Do these figures contain one or two distinct curves?
● Individuating these curves requires a “curve tracing”
operation, so Number_of_curves (C1, C2, …) takes
time proportional to the length of the shortest curve.
The figure on the left is one continuous curve, the one
on the right is two distinct curves – as shown in color.
Signature ‘subitizing’ phenomena only appear when
objects are automatically individuated and indexed
Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A
limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.
Demonstrations of MOT
*These require a Quicktime Viewer. Delete demo window after each show –
otherwise the next demo will be hidden behind this slide!

Basic MOT with repulsion:
Basic Early MOT with repulsion between items

MOT with no restrictions
Basic MOT without repulsion
●
MOT with occluding surfaces
Objects can be tracked even if they briefly disappear
●
Tracking without keeping track of identities
Track these and recall what label they had initially
Explaining Multiple Object Tracking
 Do we track by storing and updating objects’ locations?

Not likely: the possibility that locations of targets are
encoded and updated through serial visitation by focal
attention was excluded in an early study
 This supports the idea that the FINST mechanism
automatically keeps track of objects as long as there are 4
or fewer of them (in other words indexes are “sticky”).
Other findings using MOT
There have been dozens of studies using MOT with
many surprising findings. Here are a few:




Tracking performance is not affected if objects continually
change their color or shape during a tracking trial (whether the
change is synchronous or asynchronous)
Tracking is not disrupted of objects disappear briefly but
totally behind opaque strips or if they all disappear together
If objects change their color or shape while occluded the
change is not noticed
Targets can be selected automatically (by flashing) and also
voluntarily. If selected voluntarily they have to be visited
serially (while indexes are “dropped off”)
Review: A FINST is a mechanism that:
1.
2.
Picks out and keeps track of about 4 individual distal objects
It does so directly – without the mediation of concepts and
without using any encoded property of the indexed objects
 In other words, FINSTs pick out and track objects as individuals
rather than as bearers of certain properties
3.
Because FINSTs do not pick out and track individuals as
members of any category (including the category object), their
connection to the world is nonconceptual. Selecting is not an
opaque context (like believing): so it’s not “selecting as”;
 Consequently an observer may literally not know what he has
selected (although indexes do make it possible for properties of
the objects to be subsequently encoded into Object Files)

Pace John Campbell (2002, p134)“conscious experience of an object
explains how you know the reference of a demonstrative”, I claim
that we may not know the reference of a (perceptual) demonstrative
More on FINSTs
● A FINST is a numerically limited mechanism for selecting
individual visual objects currently in view. It functions just
the way that a pointer functions in a computer data structure:
It provides epistemic access to selected sensory individuals
without representing their locations or any other property;
● Although a FINST does not pick out an object in terms of its
represented properties, there are properties that cause an
index to be assigned (cf. Kripke’s distinction between
properties that fix a referent vs properties of the referent).
There must also be properties (likely different properties) of
objects and trajectories that allow objects to be tracked;
● A FINST is usually captured or grabbed by an object that
suddenly appears. But its attachment to particular items can
be voluntarily enabled by moving unitary focal attention to
the desired objects, thus precipitating the capture of an index
A fundamental problem of perception:
Encoding conjunctions of properties
Finally this brings me to an important function of
FINST indexes: they provide a solution to the
ubiquitous binding problem in perception
 Since we can distinguish between one combination of
properties and another, early vision (sensation?) cannot
simply announce the presence of properties for which there
are sensors. They must provide additional information that
allows the reconstruction of which properties go with which.
 The almost universal assumption about how this is done is
that in early vision, properties are encoded as being at
particular locations
 Treisman’s Feature Integration Theory
 Strawson’s (and Clark’s) use of Feature Placing Theory
The role of location in Treisman’s Feature Integration Theory
But in encoding properties, early vision can’t just bind them
together according to their spatial co-occurrence – even their cooccurrence within some region. That’s because the relevant
region depends on the object. So the selection and binding must
be according to the objects that have those properties
The problem of binding conjunctions by the location
of conjuncts does not work when feature location is
not punctate and becomes even more problematic if
they are co-located – e.g., if their relation is “inside”
An alternative:
In computing conjunctions of properties
attention selects objects since it is objects
that have conjoined properties

Instead of being like a spotlight beam that can be
scanned around a scene, and can be zoomed to cover a
larger or smaller area, maybe attention can only be
directed to occupied places – i.e., to visual objects
 A large experimental literature shows that attention is Object-
Based

This suggests an alternative view of how the binding
problem is solved in early vision – through the prior
selection of perceptual objects
 But selection does not have to depend only on unitary focal
attention. FINSTs allow multiple objects to be selected.
Object Files and the binding problem

Suppose that only properties of indexed objects are
conceptually encoded and that these are stored in
object files associated with each object.
 Then properties that belong to the same object are
stored in the same object file (which may be empty,
as they are in MOT).
 This automatically solves the binding problem since
it connects encoded properties to their visual object

This view comes out of both FINST Theory (Pylyshyn,
1989) and Object File Theory (Kahneman et al., 1992)
FINSTs and Object Files form the link
between the world and its conceptualization
Object File
contents are
conceptual!
FINSTs and Object Files form the link
between the world and its conceptualization
Object File
contents are
conceptual!
Information (causal) link
FINST Demonstrative
reference link
Some open questions

We have arrived at the view that only properties of selected
(indexed) objects are conceptually represented and thus enter
into subsequent perception-based thought (i.e., only
information in object files is made available to cognition)
So what happens to the rest of the visual information?

Visual information seems rich and fine-grained while this
theory only allows for properties of 4 or 5 objects to be
represented!
 The present view leaves no room for nonconceptual
representations whose content corresponds to the content of
conscious experience
 According to the present view, the only content that
nonconceptual representations have is the demonstrative
content provided by indexes which refer to perceptual objects
 Question: Why do we need any more than that?
An intriguing speculative possibility….
Maybe the relevant information we represent is less than
(or at least different from) what we experience
 This possibility has received attention recently with the discovery
of various “blindnesses” (e.g., change-blindness, inattentional
blindness, blindsight…) as well as the discovery of independentvision systems (e.g., recognition and motor control)
 The qualitative content of conscious experience may not play a role
in explanations of cognitive processes
 Even if unconceptualized information enters into causal process
(e.g., motor control) it may not be represented – not even as a
nonconceptual representation. Is retinal activity a representation?
• The bar should be high for something to be a representation because
appeal to representation is not constrained. We should require that its
content figure in explanations and that it capture generalizations. It
must have truth conditions and therefore allow for misrepresentation.
It is an empirical question whether current proposals do (e.g., primal
sketch, scenarios) qualify. Michael Devitt calls this: Pylyshyn’s Razor
The phenomenology of seeing is misleading
and has led to theoretical dead ends
Many theories of vision and mental imagery have attempted to encompass
the experience of seeing a rich and detailed panorama. But there is no
evidence to support such a representation, whether conceptual or not,
except the conviction that the content of this experience must have a
corresponding representation.
Maybe the content of conscious
experience is not represented at all!
Isn’t how things appear one of the things that our theories
must explain? Answer: There is no a priori ‘must explain’!
●
The content of subjective experience is a major source of evidence.
But it may turn out not to be the most reliable source for inferring the
relevant functional states. It competes with other types of evidence.
● How things appear cannot be taken at face value: it carries substantive
theoretical assumptions. It also draws on many levels of processing.
 It was a serious obstacle to early theories of vision (Kepler)
 It has been a poor guide in the case of theories of mental imagery (e.g., color
mixing, image size, image distances). ‘Reading X off an image’ is an illusion.
●
It seems likely that vision science will use evidence of conscious
experience the way linguistics uses evidence of grammatical intuitions
– only as it is filtered through developing theories.
 The questions that a science is expected to answer cannot be set in advance – they
change as the science develops.
What next?
This picture leaves many unanswered questions,
but it does provide a mechanism for solving the
binding problem and also explaining how mental
representations could have a nonconceptual
connection with objects in the world (something
required if mental representations are to connect
with actions)

For a copy of these slides see:
http://ruccs.rutgers.edu/faculty/pylyshyn/SelectionReference.ppt

Or MIT Press
Paperback
The End