Transcript Slide 1

Spatial representation in the mind/brain:
Do we need a global topographical map?
Zenon Pylyshyn
Rutgers Center for Cognitive Science and Institute Jean Nicod

What is special about representation of space in
perception and thought?


Do we need a single global spatial representation?
Do we need a topographical display in the brain?
Workshop on Frames of Reference
Paris, November 17-19, 2005
Outline of talk







Representing space in LTM vs in Working Memory (WM)
Some conditions on representing space in WM
Why a unitary global spatial ‘display’ is often assumed as the
form of representation and a few reasons why that’s wrong
An alternative way of satisfying the conditions on spatial
‘representation’: The Projection Hypothesis
 Aside on Spatial Index (FINST) Theory
How the projection hypothesis explains the spatial properties of
certain representations: Examples from the visual modality
How to generalize this story to proprioception: The spatial sense
Where is the global allocentric display we thought we needed?
What is special about spatial representation?
 I have suggested (Pylyshyn, 1973) that there is no reason why a
form of representation adequate for general knowledge (i.e.,
a Language of Thought or LOT) cannot also serve for
encoding the content of spatial representations in memory
 The difference between representing spatial relations and
representing other contents may lie in their being different
topics requiring a different conceptual vocabulary, rather than
in their having a different form or medium
 This general-LOT format fails to account for certain
phenomena that are observed when vision and spatial
reasoning are actively engaged in solving problems or in
determining actions – i.e., when spatial representations are
functioning in working memory.
Spatial representation during
perception and reasoning
 I have outlined a number of ways that the representation
of space in WM is different in form from that of other
contents of WM. In this talk I will focus one of these
ways, namely in the way that they deal with space
 Because such representations are not tied to vision or
conscious visual experience, they are best referred to as
spatial representations rather than mental images
 By the end I will conclude that even calling them spatial
representations is somewhat misleading – but that comes
later!
What are some constraints on a
theory of spatial representation?
 I begin by trying to set out some functional
requirements (or boundary conditions) that may
apply to a system for representing space and spatial
relations in working memory in perception and
especially in spatial reasoning

I will later argue that the wrong conclusions have
been drawn from these requirements about the form
of such spatial representations
Some conditions on a system of codes
for representing spatial relations (1)
1. The system must be able to represent magnitudes
 Psychophysical evidence shows that we encode magnitudes (at
least relative magnitudes) and that these magnitudes (i.e., the
semantics of the codes) have systematic effects in behavior
(e.g., the phenomena of scalar variance ratio, Fechner’s law,
the symbolic distance effect, etc).
 Thus something about the form of the representation itself
must explain these systematic magnitude effects (e.g.,
phenomena such as those listed above would not arise if
the magnitudes were encoded symbolically as numerals)
Some conditions on a system of codes for
representing spatial relations (2)
2. The system must represent stable spatial configurations
 Spatial configurations involve relations over multiple objects –
in that sense they are holistic and require simultaneous access
to multiple objects (i.e., multiple arguments in relational
predicates must be simultaneously bound)
 What is special about such configurations is that they may
allow some spatial ‘inferences’ by pattern lookup without
reference to independent geometrical axioms (such as the
axiom of transitivity)
○ Example of 3-term series problems and ‘spatial paralogic’
Some conditions on a system of codes for
representing spatial relations (3)
3. The system must somehow ‘capture’ the continuity and
connectedness of space. This requirement leaves many
unanswered questions:
 Does continuity entail that empty places are represented as such?
 Does continuity entail that the representational system itself
determines that distances meet metrical axioms (e.g., the triangle
inequality AB + BC ≥ AC) or that they are Euclidean?
 Does continuity entail that the representation of movements of
objects is constrained so that in getting from A to B objects must
pass through ‘intermediate’ locations?
The proposal I will present later gives a partial answer to these
Some conditions on a system of codes for
representing spatial relations (4)
4. The system must represent spatial properties across
modalities, including proprioceptive and efferent
‘modalities’

Spatial representations must be able to engage the motor
system in a fairly direct manner
○ One of the characteristics of what we call a “spatial
representation’ is that we can ‘point to’ represented things
(e.g., in our mental image). That’s why a proposition such as
LEFT-OF(A,B) seems an inadequate representation for <A,B>
 But note that motor actions towards perceptual and imagined
representations are not identical because they engage
different perceptual-motor pathways (Goodale et al. 1994)
Some conditions on a system of codes for
representing spatial relations (5)
5. The system must be able to represent spatial relations in 3D
 When relations in the depth are encoded, they must be in a
similar format as the encoding of relations in the plane since the
two have to operate together (e.g., in determining the Euclidean
distance between points in 3D space)
○ Experimental evidence from such phenomena as ‘mental rotation’
or ‘mental scanning’ show identical functions in depth as in the
plane
Summary of constraints to be met:
A system of spatial representations must
somehow do the following:
1. It must represent magnitudes
2. It must represent holistic configurations which enable at
least some direct one-step inferences (by pattern-matching)
3. It must capture connectedness and continuity
4. It must represent spatial relations seamlessly across
modalities and to engage the motor system
5. It must represent distances in depth as well as in the plane
in a uniform manner (i.e., it must represent 3D)
 I will return to these constraints when I discuss a different
proposal for how we ‘represent’ space
An additional major assumption
about spatial representation
The foregoing list of constraints has frequently led people to
make one additional assumption about spatial representation
that I will argue is not justified:

The single frame of reference assumption is the
assumption that we represent spatial layouts in perception
or in thought in a single global frame of reference, as
opposed to a patchwork of distinct but coordinated frames
 Every theory I know that attempts too explain mental
imagery or cross-modal coordination makes this assumption,
explicitly or implicitly
Why a single ‘display’ for vision?
In vision the global spatial-display theory explains why our
visual experience is panoramic and stable even though the
visual inputs are highly local, partial and constantly changing
But many studies have shown that there is no such rich stable panoramic
display (e.g., change blindness, superposition, etc., see O’Regan, 1992)
Why a single ‘display’ for spatial reasoning?
The global spatial-display theory also explains how a mental
representation can meet the spatial conditions listed earlier – it
does so by creating a 2D image in a real spatial medium
Such a display was assumed to use the same global spatial medium that
is used in vision. But both display assumptions have serious problems.
The global spatial display assumption
 There are many deep problems with the assumption that spatial
properties are represented in vision and reasoning by an inner
spatial display which corresponds to our experience of a stable
world (perceived or imagined), many of which I have discussed in
connection with the ‘picture theory’ of mental imagery (BBS, 2002)
 V1 can’t serve as the medium for an image representation for many
reasons given in my BBS paper and book – e.g., not stable, not broad
enough, not 3D, images not presented in the right form (no Emmert’s
law, no amodal completion, image size not in the right form, no image rotation…)
 One of the main problems relevant to the present discussion is the
assumption that visual spatial perception, cross-modal spatial
integration, visuomotor control, and spatial reasoning derive from
a single representation in an allocentric frame of reference
 There are many reasons to doubt that there is a single global
allocentric representation (‘master map’) for spatial information…
Many reasons to reject the Master Map assumption
 There are many known frames of reference between perception
and motor control, relying on both external and internal sensors
 While gaze-centered coordinates are common in motor control they are
gain-modulated by inputs from eye, head and body positions as well as by
motor intentions (Anderson & Buneo, 2002, Duhamel et al., 1992)
○ Visual information is also represented in hand- and body-centered (also
personal & peripersonal) frames of reference (Làdavas, 2002)
○ Spatial neglect appears in many different frames of reference
 Motor control necessarily involves many different frames of reference,
including proprioceptive, kinesthetic, joint-angle, and even dynamic
frames of reference based on muscle spindle and joint tendon receptors
 Earlier (downstream) frames of reference are often not overwritten but
may continue to have observable consequences on errors in
kinesthetically-guided movements (Baud-Bovy & Viviani, 1998), so multiple
frames can coexist in the nervous system
A different way of approaching the
question of spatial representation


Because of the many problems with the global spatial display
assumption, I have proposed a provisional hypothesis that
preserves some of the advantages of the global spatial display,
but assumes that the relevant spatial properties are in the
perceived world and can be accessed if we have the right
access mechanisms for selecting and indexing objects in the
perceived world
For ease of reference let’s call this the Projection Hypothesis
because it is somewhat analogous to ‘projecting’ the spatial
display onto the real space that we perceive – even though
only objects’ identities (labels) and locations, and none of
their other visual properties, are ‘projected’
The projection hypothesis
The projection hypothesis claims that the perceptual systems rely on
the spatial properties of the concurrently perceived world to meet
the 5 conditions outlined earlier. The hypothesis rests on three
theoretical postulates:
1.We have a system of “pointers” (such as the FINST mechanism) by
which a small number of perceived objects in the world can be selected
and indexed. FINSTs are reference pointers to these target objects and
remain attached to them despite changes in their locations
2.When we perceive a scene that contains indexed objects, our perceptual
system is able to treat those indexed objects as though they were
assigned unique visual labels. (Thus it can detect previously-unnoticed
patterns among indexed objects)
3.Our LTM representation of locations need not meet the 5 conditions
because it is not directly used in spatial reasoning or motor control
SHORT DETOUR (while gray background)!
Visual Index (FINST) Theory

Because FINST Indexes play a central role in this story I will
make a short detour to illustrate this mechanism and to give
some examples of indexes at work
Pick out the 3 dots I will cue and keep track of them
 After you pick out the 3 cued dots, I’ll ask you move your attention from the
center one to the dot below it. Describe the new relation among the three dots.
 In a field of identical elements you can select several of them and move your
attention among them so long as they are not too close together (Intriligator &
Cavanagh, 2001)
Several objects must be picked out at once
in making relational judgments
You must have the ability to pick out several individual items and keep track of them since in
order to make relational judgments, such as inside or on-the-same-contour you must pick out
the relevant individual objects first. Are dots Inside-same-contour? On-same-contour?
Other experimental demonstrations of
FINST indexes


Recognizing the cardinality of small sets of things:
Subitizing vs counting (Trick, 1994)
Searching through subsets – selecting items to search
through (Burkell, 1997)
 Selecting subsets and maintaining the selection during a saccade

(Currie, 2002)
Multiple Object Tracking (MOT)
Subset selection for search
single
feature
search
conjunction
feature
search
Target =
Burkell, J., & Pylyshyn, Z. W. (1997). Searching through subsets: A test of the visual indexing hypothesis. Spatial Vision,
11(2), 225-258.
Subset search results:

Only properties of the subset matter
If the subset is a single-feature search it is fast and the slope (RT vs
number of items) is shallow
If the subset is a conjunction search set, it takes longer and is more
sensitive to the set size

The distance between targets does not matter, so observers
don’t seem to be scanning the display looking for the target
but can switch their attention directly to the subset items
Selective search is also found when a saccade occurs
between the late onset cues and start of search
A saccade
occurs
here
+
+
+
Target =
+
single
feature
search
conjunction
feature
search
Even with a saccade between selection and access, items can be accessed efficiently
Demonstrating the function of FINSTs with
Multiple Object Tracking (MOT)
 In a typical MOT experiment, 8 simple identical objects are
presented on a screen and 4 of them are briefly distinguished in
some visual manner – usually by flashing them on and off.
 After these 4 targets are briefly identified, all objects resume
their identical appearance and move randomly. The observers’
task is to keep track of the ones that had been designated as
targets at the start
 After a period of 5-10 seconds the motion stops and observers
must indicate, using a mouse, which objects are the targets
Keep track of the objects that flash
How do we do it? What properties of
individual objects do we use?
Keep track of the objects that flash
Our explanation is that FINST indexes are bound to
targets when they flash and remain bound during the
duration of the trial. At the end of the trial they allow
attention to be moved to each target to select the targets
FINST indexes allow selected objects to be accessed
directly and without searching for specific properties:
Indexes stay bound to objects as the objects move
If you were like the cartoon character Plastic Man and could
place your fingers on things in the world so as to refer to them
uniquely, and if you could then move your gaze or attention to
any of them at will, you would possess fingers of instantiation
If you were like the cartoon character Plastic Man and could
place your fingers on things in the world so as to refer to them
uniquely, and if you could then move your gaze or attention to
them, you would possess FINgers of INSTantiation (or FINSTs)
End of aside on FINSTs!
Summary




The FINST mechanism provides a limited set of indexical
pointers bound to perceived objects
FINSTs can associate perceived objects with objects of thought
The binding is stable over some period of time (e.g., a few
seconds) and continues despite motion of the objects or eye
movements.
Perception is able to treat the indexed objects as though they
were perceptually marked
Examples of the projection hypothesis


To illustrate how the projection hypothesis works, first
consider index-based projection in the visual modality,
where indexes can convert some apparently mental-space
phenomena into perceived-space phenomena (although I
will return to the non-visual case shortly, the visual case is
more salient and tends to dominate other modalities)
Examples from some ‘mental imagery” experiments
 Mental scanning (Kosslyn, 1973)
 Mental image superposition (Podgorny& Shepard, 1978)
 Visual-motor adaptation (Finke, 1979)
 S-R compatibility to imagined locations (Tlauka, 1998)
Studies of mental scanning
Often cited to suggest that representations have metrical properties
beach X
X
X
windmill
X
steeple
X
tree
X
Time to “see” feature on image
tower
Distance on image
Brain image or index-based projection?
 A way to do this task:
 Associate places on the imagined map with places in
the world that you perceive
 Move your attention or gaze from one place to
another as they are named
Using a perceived room to anchor
FINSTs tagged with map labels
Using vision with selected ‘labeled’ objects
 If you ‘project’ the pattern of map places by picking out objects in

the room in front of you that correspond roughly to these memorized
locations, then you can scan attention from one such marked object
to another. The space here is real and the equation time = distance 
speed is a physical principle, not tacit knowledge about the world.
You can also use the tagged objects to infer configurational
properties you may not have noticed, despite somehow memorizing
the location of all objects
 Which 3 or more places on the map are collinear?
 Which place on the map is furthest North, South, East, West?
 Which 3 places form an isosceles triangle?
 Such configurational consequence can be detected as opposed to
logically inferred, so long as they involve only a few places, because
the visual system can examine a scene with labeled indexed objects
Another example of a result attributable to FINSTbased projection: Podgorny-Shepard experiment
Remember
Are the following
the following
dots on
pattern
or offand
theimagine
imagined
it after
pattern?
it is gone
The pattern of reaction times is the same for
perceived shapes as for recalled shapes
 Both when the F display is seen and when the F is imagined,
the time to judge that the dot was on the F was fastest when
the dot was at the vertex of the F and slower when it was on
an arm of the F (slowest when it was one square away).
 Does this show that the F and dots are superimposed on a
display in the brain and perceived with the visual system?
 A more plausible explanation is that the cells corresponding to
rows and columns of the F in the matrix are indexed and thus
made distinct, allowing vision to be used to judge whether the
dots fall on those rows/columns?
Skip?


Perceptual-motor adaptation to
imagined hand position (Finke, 1979)
If you wear prism displacing lenses and repeatedly reach for
objects in front of you for just a few minutes, you adapt to the
erroneous feedback. When the lenses are removed you
overshoot in the opposite direction.
If, instead of wearing lenses, you move your hand invisibly
while you imagine that your hidden hand is at the displaced
location, you get the same adaptation phenomena
 Does this show that both your imagined hand and other properties of

the scene are displayed somewhere in your visual system?
All you need are indexes to several objects in the visual
scene, together with a distinct label for each (e.g., hand,
block). This allows attention or even gaze to move to them.
 No visual details (e.g. hand properties) need to be imagined
 Some real visual objects (e.g., texture) needs to be visible to bind
indexes – just a blank background will not work (c.f., Rossetti)
S-R Compatibility effect with a visual display
The Simon effect: It is faster to make a response in the
direction of an attended objects than in another direction
Response for A is
faster when YES in
on the left in these
displays
S-R Compatibility effect with a recalled (mental) display
The same RT pattern occurs for a recalled display as for a perceived one
RT is faster when the A is
recalled (imagined)
as being on the left
In all these examples you only need to index a
few visual objects located in appropriate places
 In all examples that we have seen, the results can be
explained without appealing to a global spatial display,
by assuming that:
1. Vision can index a few visible objects (including
texture elements on an apparently plain surface) and
2. Vision can treat indexed objects as distinct or visually
labeled
Reminder of the constraints to be met by
a system of spatial representations
1.
2.
3.
4.
Represent magnitudes
Represent configurations
Capture connectedness and continuity
Represent spatial relations across modalities and must be
able to engage the motor system
5. Represent 3D distances and relations
By anchoring mental particulars to a few perceived
objects in a scene, the visual system is able to
exploit the above properties of the perceived world
Visual indexes can anchor spatial representations
to a scene containing visual objects: But how
does this work without vision (e.g., in the dark)?

We must rely on our remarkable capacity to orient to (point to,
navigate towards, …) perceived and recalled objects (including
proprioceptive ‘objects’) in space without vision
 Call this general capacity our location- or spatial-sense

How can the projection hypothesis account for this apparently
world-centered spatial sense without assuming a global
allocentric frame of reference?
 Answer: Just as it does with vision, by binding represented objects

to (non-visually) perceived objects in the world
Indexing non-visual ‘objects’ must exploit auditory and general
proprioceptive signals, and perhaps even preparatory motor
programs (Anderson & Bruneo, 2002; Duhamel, Colby & Goldberg, 1992)
The real problem of our sense of space


In order to solve the problem of how we index generalized
‘objects’ in the world using proprioceptive inputs we need to
solve the problem of how we recognize two such inputs as
corresponding to (reaching) the same object in space
This is the problem of the computing the equivalence of
movements, or of proprioceptive inputs, that correspond to
reaching the same object. Solving this problem requires
solving the problem of coordinating signals between different
afferent and efferent frames of reference

That’s why mechanisms of coordinate transformation are
of central importance – they make it possible to compute
the relevant equivalence classes

Such mechanisms are ubiquitous in PPC, SC and elsewhere
Proprioception, coordinate transformations
and the allocentric frame of reference
 Coordinate transformations provide the basis for computing the
equivalence classes of proprioceptive signals {S} associated with
reaching or sensing individual objects in space (S ≡ S′ iff there is
an appropriate coordinate transformation from S to S′)
 Because of the ability to compute the set {S} corresponding
reaching/sensing to places in the world, proprioception is able to
provide allocentric information (c.f., Rossetti’s point that we
should not equate proprioception with egocentric and vision with
allocentric frames of reference)
 Computing {S} is the problem that Henri Poincaré recognized as
central to understanding our sense of space (see Poincaré’s “Why
space has three dimensions” in Les Dernier Penseés, 1913). Without
this we could not reach objects in the dark or from memory!
Coordinate transformations are the basis for
the illusory “global frame of reference”


A coordinate transformation operation takes a
representation of an object relative to one coordinate system
– say retinal coordinates – and produces a representation of
that object relative to another frame of reference – say
relative to the location of a hand in proprioceptive or
kinematical coordinates.
An important consequence of these mechanisms is that, as
(Colby & Goldberg, 1999, p319) put it, “Direct sensory-to-motor
coordinate transformation obviates the need for a single
representation of space in environmental coordinates”
The spatial sense and FINST Indexes

Not all points in a representation need to be converted. As in
the visual case, only a few equivalence classes,
corresponding to a few objects in the world, need to be
computed at any one time.
 This idea is closely related to the conversion-on-demand
hypothesis proposed by Henriques et al. (1998) to explain
how open-loop reaching can be carried out during eye
movements using gaze-centered coordinates
○In the Henriques et al proposal visual information is held in
a gaze-centered frame of reference and objects are
converted to motor coordinates only when needed, but the
details are not essential here
Generalized FINST Indexes


According to the projection hypothesis, the objects that
are transformed are ones that have been selected and
assigned a reference index, as postulated in Perceptual
Index Theory (call them ‘generalized FINSTs’).
With these indexes we can anchor a few objects in
perceptual or imagined representations to objects in real
space using propriocentic signals, just as in the visual
examples discussed earlier
 This is what we need in order to explain the spatial
character of spatial thoughts and the stable character of
perceived space as argued in the visual case
CONCLUSION:
How many spatial frames of reference are there?
There are many coordinated frames of reference
and many topographical spatial layouts in the
brain, but the only frame of reference that is
global and allocentric is the one outside our head
– the real space to which we have only selective
indexical access
PS: Must there always be some perceived
objects for there to be a spatial sense?


A prediction of the projection hypothesis is that in the absence of any
perceived ‘objects’ there would be no spatial sense and therefore that
none of the findings demonstrating the spatial character of representations
(e.g., the ‘mental imagery’ experiment results) would be observed
I know of no data involving a total lack of sensory objects, but the
following results are suggestive:
 In the absence of visual objects, as in the Ganzefeld (Avant, 1965)
orientation and eye movements become uncoordinated, so one might
reasonably expect poor spatial coordination with no perceived objects
 Auditory localization is better when there is structured visual input
(Warren, 1970) or auditory landmarks (Dufour& Despres, 2002), suggesting
that concurrent perception of things in space is necessary for
orientation
 Sensory deprivation (while extreme) also leads to disorientation
The End
…and an appeal for help…
 Does anyone know of evidence relevant to the question whether
typical spatial sense skills are manifested in the absence of
structured perceptual input of any kind?
 Typical spatial skills might include being able to solve geometry
problems by constructing figures in your head
 A more direct test might be to see if deafferented patients tested in
the dark have impaired spatial skills, but I have seen no data on this
The End