Attention, Selection and Nonconceptual Reference An empirically-motivated proposal concerning the nonconceptual link between the perceived world and its conceptual representation Zenon Pylyshyn, Rutgers Center for.

Download Report

Transcript Attention, Selection and Nonconceptual Reference An empirically-motivated proposal concerning the nonconceptual link between the perceived world and its conceptual representation Zenon Pylyshyn, Rutgers Center for.

Attention, Selection and
Nonconceptual Reference
An empirically-motivated proposal concerning
the nonconceptual link between the perceived
world and its conceptual representation
Zenon Pylyshyn, Rutgers Center for Cognitive Science
Focal attention: What is it for?
Perceptual selection and perceptual demonstratives
The principal function of focal attention is to select. But why do
we need to select?
1. We must select because our capacity to process information is limited.
2. We also must select because we need to be able to mark certain
tokens in the perceived world and to refer to the marked tokens
qua individuals (e.g., as in counting things).

Another way to put this is that we need to select in order to refer to
things and we need to refer to things whenever we detect relational
properties among them (Collinear, Inside, Part-of, Connected-to, ...)
3. An important reason for early selection is that it provides a way to
group properties appropriately at the earliest (nonconceptual)
stages of perception – and thus to help solve the binding problem

That’s what this talk is about: but first some background
Some background ….
The early origins and motivation for the
view that there is nonconceptual
selection … a personal introduction
Why do we need to be able to pick out
individuals without concepts?

We need to make nonconceptual contact with the world through
perception in order to stop the regress of concepts being defined in
terms of other concepts which are defined in terms of still other
concepts – sometimes called the symbol grounding problem

Sensory transduction appears to be the universal, though typically
tacit, assumption about how grounding occurs, at least in
psychology and artificial intelligence. Yet most concepts cannot be
reduced to sensory transduction.

My proposal is that nonconceptual selection of individual objects
is the primitive basis for all conceptualization and predication
 The argument for nonconceptual selection of token objects as the
primitive operation is primarily empirical.
 I begin with a personal experience in developing a model for
reasoning about geometry by drawing a diagram.
Begin by drawing a line….
Now draw a second line….
And draw a third line….
Notice what you have so far….
(noticings are local – you encode what you attend to)
There is an intersection of two lines…
But which of the two lines you drew are they?
There is no way to indicate which individual
things are seen again unless there is a way to
refer to individual things
Look around some more to see what is there ….
L3
L6
Here is another intersection of two lines…
Is it the same intersection as the one seen earlier?
To be able to tell without a reference to individuals you
would have to encode unique properties of the
individual lines. Which properties should you encode?
Keeping track by encoding unique properties of
individual items will not work in general

No description can keep picking out the same individual when
it is changing its location or appearance unpredictably
 But a perceptual representation is always changing since it is always
built up over time as properties are noticed – so you need a way to find
the representation of a particular token element when new properties of
that particular token element are noticed

Many writers have postulated a “marking” process for
computing relational predicates. But where is the “mark”
placed? It can’t be placed in the representation, because its
purpose is to keep track of which things in the world
correspond to which things in the representation (e.g. counting).
 People can pick out several individual items even if they are in a field
of identical individuals – e.g., pick out a dot in a uniform field of dots
so the picking out cannot be done solely by direction of gaze.
Footnote
Notice that in the previous example it would not help
if you labeled the diagram as you drew it. Why not?
 Because to refer to the line with label L1 you would
have to be able to think “This is line L1” and you could
not think that unless you had a way to think “this” and
the label would not help you to do that!
 Being able to think “this” is another way to view the
very problem I will be concerned with in this talk. You
need an independent way to pick out and refer to an
individual element – even if it is labeled! (I will also
provide evidence that you need to do this for several
individuals simultaneously).

This is exactly the point of Kaplan’s and Perry’s claim
about the “essential indexical”
The requirements for picking out individual things
and keeping track of them reminded me of an
early comic book character called “Plastic Man”
Imagine being able to place several of your fingers on things
in the world without being able to detect their properties in
this way, but being able to refer to those things so you could
move your gaze or attention to them. If you could you would
possess FINgers of INSTantiation = FINSTs!
Outline of remainder of this talk





Selection: What is selected?
 Places vs ‘Objects’ (Posner & analogue attention movement)
 Evidence in favor of object-based selection
Selection and demonstrative reference
 Multiple selection
 FINST Theory and Object Files
Multiple Object Tracking (MOT) and FINST Indexes
as direct (non-conceptually-mediated) reference
Selection and the Binding Problem
Implication for philosophical ideas about individuals,
tracking and nonconceptual representation
Covert movement of attention
Fixation
frame
Cue
Target-cue
interval
Detection target
*
Cued
Uncued
*
Example of an experiment using a cue-validity paradigm for showing that the
locus of attention moves without eye movements and for estimating its speed.
Posner, M. I. (1980). Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.
Extension of Posner’s demonstration of attention switch
Does the improved detection in intermediate locations entail that the “spotlight of
attention” moves continuously through empty space?
But the enhancement of intermediate locations
does not require a continuous analogue
movement of attention through empty space
 When attention is attracted by an onset event, the
appearance of analog movement of focal attention
can be explained by a punctate (quantal) theory of
attention-switching

Sperling & Weichselgartner (1995) – an episodic theory of attention shift
This raises the possibility that in shifting between two
objects, attention does not actually move through
empty space
 Maybe attention is allocated to objects rather than
locations?
Evidence for Objects as the basis for selection

Single Object Advantage: pairs of judgments are faster when
both judgments concern the same perceived object

Entire objects acquire enhanced sensitivity from the allocation of
focal attention to part of the object

Single-Object advantage occurs even with generalized “objects”
defined in feature space (Blaser & Pylyshyn, 2000) and even when the
object is distributed over time-slices (Flombaum & Scholl, 2006)

Clinical (brain damage) syndromes such as Simultanagnosia
and Hemispatial Neglect show object-based properties

Attention moves with Moving Objects
 Inhibition of Return (IOR)
 Object Files
 Multiple Object Tracking MOT (and generalization to movement in
feature space)
Single-object superiority even
when the shapes are controlled
There are a large number of published experiments showing that when
several perceptual judgments are made they are faster when they
pertain to the same object, even when all other factors are controlled
Attention spreads over perceived objects
A
C
A
C
Spreads to
B and not C
Spreads to
C and not B
B
*
A
D
C
B
A
D
C
Spreads to
B and not C
Spreads to
C and not B
B
D
B
D
Using a priming method (Egly, Driver & Rafal, 1994) showed that the effect of a prime spreads t
other parts of the same visual object compared to equally distant parts of different objects.
Objecthood endures over space-time

Several studies have shown that what counts as the
same object endures over time and location;
 Object-specific priming (Kahneman; Scholl),
Inhibition of return (Tipper)
 Inhibition of return is object-based
 Certain forms of disappearance-reappearance preserve
objecthood
 Multiple Object Tracking MOT (Scholl, Keane)
 Apparent motion (Kolers, Yantis)
 Tunnel Effect (Michotte, 1953; Flombaum & Scholl, 2006)

This identity constancy gives “visual objects” a real
physical-object character and is one of the reasons why
psychologists refer to them as “objects”.
Objects endure despite changes in location;
and they carry their history with them!
Object File Theory of Kahneman & Treisman
A
B
A
1
2
3
Letters are faster to read if they appear in the same box in which they had
appeared initially. Priming travels with the object. According to the theory,
when an object first appears, a file is created for it and the properties of the
object are encoded and subsequently accessed through this object-file.
Inhibition of return appears to be object-based

Inhibition-of-return is thought to help in visual
search since it prevents previously visited objects
from being revisited
 The original study used static objects. Then
(Tipper, Driver & Weaver, 1991) showed that IOR
moves with the inhibited object.
IOR appears to be object-based (it travels
with the object that was attended)
There is also evidence from clinical studies
supporting object-based selection

Hemispatial Neglect
 Balint and simultanagnosia syndromes
An empirical hypothesis: To select is to refer

When we select an object with focal attention we thereby
refer to it. Consequently we can e.g.,
 Entertain thoughts about it (“this is red”)
 Carry out certain actions towards it (e.g., move our gaze to it)

But we can select several (n ≤ 4) objects at once so;
 We can have demonstrative thoughts about several objects
“this1 is above this2”
 Having selected several objects we can evaluate predicates
over them or move focal attention to them

We can also subitize them or search through them <experiments>
 We can keep track of selected objects if we or they move
unpredictably or change their properties <MOT>
Pick out 3 dots I will cue and keep track of them
 In a field of identical elements you can select several of them and move
your attention among them (e.g., “move one up” or Move 2 right” etc)
so long as at no time do you have to hold on to more than 3 or 4 dots
Subset selection for search
+
+
+
Target =
+
single
feature
search
conjunction
feature
search
Burkell, J., & Pylyshyn, Z. W. (1997). Searching through subsets: A test of the visual indexing hypothesis. Spatial Vision,
11(2), 225-258.
Subset search results:
 Only properties of the subset matter
 If the subset is a single-feature search it is fast and parallel
 If the subset is a conjunction search set, finding the target
takes longer and is a serial search (RT increases with set size)
 The distance between targets does not matter, so
observers don’t seem to be scanning the display looking
for the target but can switch their attention directly to the
subset items.
 This finding supports the claim that we have a small
number of FINST indexes that can be captured by sudden
onsets and can serve to direct focal attention
Individuals and patterns

Vision does not recognize patterns by applying templates
but rather by decomposing them into parts
Recognition-By-Parts (Biederman, 2000)

A pattern is encoded over time (and often over different
views separated by saccades), so the visual system must
keep track of the individual parts and merge descriptions
of the same part at different times and stages of encoding
 In recognizing a pattern, the visual system must pick out
individual parts and bind them to the representation being
constructed
Are there collinear items (n>3)?
Several objects must be picked out at once
in making relational judgments

The same is true for other relational judgments like inside or on-the-samecontour… etc. We must pick out the relevant individual objects first.
Respond: Inside-same contour? On-same contour?
When items cannot be individuated, predicates
over them cannot be evaluated
● Do these figures contain one or two distinct curves?
● Individuating these curves requires a “curve tracing”
operation, so Number_of_curves (C1, C2, …) takes
time proportional to the length of the shortest curve.
The figure on the left is one continuous curve, the one
on the right is two distinct curves – as shown in color.
Signature ‘subitizing’ phenomena only appear when
objects are automatically individuated and indexed
Trick, L. M., & Pylyshyn, Z. W. (1994). Why are small and large numbers enumerated differently? A
limited capacity preattentive stage in vision. Psychological Review, 101(1), 80-102.
Demonstrations of MOT
*These require a Quicktime Viewer

Basic MOT with repulsion:
Basic Early MOT with repulsion between items

MOT with no restrictions
Basic MOT without repulsion
●
MOT with occluding surfaces
Objects can be tracked even if they briefly disappear
●
Tracking without keeping track of identities
Track these and recall what label they had initially
Explaining Multiple Object Tracking
 Do we track by storing and updating objects’ locations?

Not likely: the possibility that locations of targets are
encoded and updated through serial visitation by focal
attention was excluded in an early study
 This supports the idea that the FINST mechanism
automatically keeps track of objects as long as there are 4
or fewer of them (in other words indexes are “sticky”).
Other findings using MOT
There have been dozens of studies using MOT with
many surprising findings. Here are a few:




Tracking performance is not affected if objects continually
change their color or shape during a tracking trial (whether the
change is synchronous or asynchronous)
If objects do change their color or shape the change is not
noticed
Tracking is not disrupted of objects disappear briefly but
totally behind opaque strips or if they all disappear together
Targets can be selected automatically (by flashing) and also
voluntarily. If selected voluntarily they have to be visited
serially (while indexes are “dropped off”)
Review: A FINST is a mechanism that:
1.
2.
Picks out, and keeps track of individual distal objects
It does so directly – without the mediation of concepts and
without using any encoded property of the indexed objects
 In other words, FINSTs pick out and track objects as individuals
rather than as bearers of certain properties
3.
Because FINSTs do not pick out and track individuals as
members of any category (including the category object), their
connection to the world is transparent and nonconceptual. It is
not an opaque “selecting as” relation;
 Consequently a person may literally not know what he has
selected (although indexes do make it possible for properties of
the objects to be subsequently encoded into Object Files)

Pace John Campbell (2002, p134)“conscious experience of an object
explains how you know the reference of a demonstrative”, we may
not know the reference of a (perceptual) demonstrative
More on FINSTs
● A FINST is a numerically limited mechanism for selecting
individual visual objects currently in view. It works just the
way that a pointer in a computer data structure works: It
provides epistemic access to a particular item without
representing the item’s location or other properties;
● Although a FINST does not pick out an object in terms of its
represented properties, there are properties that cause an
index to be assigned (cf Kripke’s distinction between
properties that fix a referent vs properties of the referent).
There are also properties (maybe different properties) that
allow objects to be tracked;
● A FINST is usually captured or grabbed by an object that
suddenly appears. But its attachment to particular items can
be voluntarily enabled by moving unitary focal attention to
the desired objects, thus precipitating the capture of an index
A fundamental problem of perception:
Encoding conjunctions of properties
☺ Finally this brings me to an important function that
FINST indexes provide – a way to solve the
ubiquitous binding problem in perception
 Since we can distinguish between one combination of
properties and another, early vision (sensation?) cannot
simply announce the presence of properties for which
there are sensors. They must provide additional
information that allows the reconstruction of which
properties ‘go with’ which.
 The almost universal assumption about how this is done is
that in early vision properties are encoded as being at
particular locations
 Treisman’s Feature Integration Theory
 Strawson’s (and Clark’s) use of Feature Placing Theory
The role of location in Treisman’s Feature Integration Theory
But in encoding properties, early vision can’t just bind them
together according to their spatial co-occurrence – even their cooccurrence within some region. That’s because the relevant
region depends on the object. So the selection and binding must
be according to the objects that have those properties
The problem of binding conjunctions by the location
of conjuncts does not work when feature location is
not punctate and becomes even more problematic if
they are co-located – e.g., if their relation is “inside”
An alternative:
In computing conjunctions of properties
attention is directed at objects since it is
objects that have conjoined properties

Instead of being like a spotlight beam that can be
scanned around a scene, and can be zoomed to cover a
larger or smaller area, maybe attention can only be
directed to occupied places – i.e., to visual objects
 A large experimental literature shows that attention is Object-
Based

This suggests an alternative view of how the binding
problem is solved in early vision – through the prior
selection of perceptual objects
 But selection does not have to depend only on unitary focal
attention. FINSTs allow multiple objects to be selected.
Object Files and the binding problem

Suppose that only properties of indexed objects are
conceptually encoded and that these are stored in
object files associated with each object.
 Then properties that belong to the same object are
stored in the same object file (which may be empty,
as they are in MOT).
 This automatically solves the binding problem since
it connects encoded properties to their visual object

This view comes out of both FINST Theory
(Pylyshyn, 1989) and Object File Theory (Kahneman
et al., 1992)
FINSTs and Object Files form the link
between the world and its conceptualization
Some open questions

We have arrived at the view that only properties of selected
(indexed) objects enter into subsequent conceptualization and
perception-based thought (i.e., only information in object files
is made available to cognition)
So what happens to the rest of the visual information?

Visual information seems rich and fine-grained while this
theory only allows for the properties of 4 or 5 objects to be
encoded!
 The present view leaves no room for nonconceptual
representations whose content corresponds to the content of
conscious experience
 According to the present view, the only content that
nonconceptual representations have is the demonstrative content
of indexes that refer to perceptual objects
 Question: Why do we need any more than that?
An intriguing possibility….
Maybe the theoretically relevant information we take in is
less than (or at least different from) what we experience
 This possibility has received attention recently with the discovery
of various “blindnesses” (e.g., change-blindness, inattentional
blindness, blindsight…) as well as the discovery of independentvision systems (e.g., recognition and motor control)
 The qualitative content of conscious experience may not play a role
in explanations of cognitive processes
 Even if unconceptualized information enters into causal process
(e.g., motor control) it may not be represented or made available to
the cognitive mind it – not even as a nonconceptual representation
• For something to be a representation its content must figure in
explanations – it must capture generalizations. It must have truth
conditions and therefore allow for misrepresentation. It is an
empirical question whether current proposals do (e.g., primal sketch,
scenarios). cf Devitt: Pylyshyn’s Razor
Vision science has always been deeply
ambivalent about role of conscious experience
Isn’t how things appear one of the things that our theories
must explain? Answer: There is no a priori ‘must explain’!
●
The content of subjective experience is a major type of evidence. But
it may turn out not to be the most reliable source for inferring the
relevant functional states. It competes with other types of evidence.
● How things appear cannot be taken at face value: it carries substantive
theoretical assumptions. It also draws on many levels of processing.
 It was a serious obstacle to early theories of vision (Kepler)
 It has been a poor guide in the case of theories of mental imagery (e.g., color
mixing, image size, image distances). ‘Reading X off an image’ is an illusion.
●
It seems likely that vision science will use evidence of conscious
experience the way linguistics uses evidence of grammatical intuitions
– only as it is filtered through developing theories.
 The questions a science is expected to answer cannot be set in advance – they
change as the science develops.
What next?
This picture leaves many unanswered questions,
but it does provide a mechanism for solving the
binding problem and also explaining how mental
representations could have a nonconceptual
connection with objects in the world (something
required if mental representations are to connect
with actions)

For a copy of these slides see:
http://ruccs.rutgers.edu/faculty/pylyshyn/SelectionReference.ppt

Or MIT Press
Paperback
A new puzzle: individuation without reference?

The correspondence problem is often solved
without a numerical limit, therefore without the
objects being indexed.
 Examples include apparent motion and stereovision
 Such computations do not seem to be over
continuous visual manifolds but over discrete
elements
 Such discrete elements must therefore be created by
a process that clusters features over space and time
 Psychologists call the creation of individual elements
“individuation”
Structure from Motion Demo
Cylinder Kinetic Depth Effect
The correspondence problem for biological motion
Apparent motion of random dots
Another example: Punctate inhibition of moving objects?
 We
have recently obtained evidence that nontargets are
inhibited (as measured by the rate of detection of small
faint probe dots).
 There appears to be no inhibition of the empty region
through which the nontargets move
 The inhibition is spatially local
 How
can punctate moving objects be inhibited unless they
are somehow being tracked? And how can they be tracked
if there are many (n > 5) of them?
 This
provides more evidence for individuation without
reference: Maybe Indexing is a two-stage process?
1. Individuate (numerically unlimited)
2. Assign a demonstrative reference (limited to ~4 indexes)
Recent experimental results on Inhibition of nontargets
100%
Probe Detection while Tracking and Nontracking
90%
Detection %
While Tracking
Non-Tracking Control
80%
70%
60%
50%
40%
OpenSpace
Target
Probe Location
NonTarget