Transcript Slide 1

Spatial representation in the mind/brain: Do we need a global topographical map?

Zenon Pylyshyn Rutgers Center for Cognitive Science and Institute Jean Nicod

 What is special about representation of space in perception and thought?

  Do we need a single global spatial representation?

Do we need a topographical display in the brain?

Workshop on Frames of Reference Paris, November 17-19, 2005

What is special about spatial representation?

I have suggested (Pylyshyn, 1973) knowledge (i.e., a that no convincing reason has been given why a form of representation adequate for general

Language of Thought

) cannot also serve for encoding the content of spatial representations  The difference between representing spatial relations and representing other contents may lie in their being different

topics

requiring a different conceptual vocabulary, but they may not require a different

format

or medium of representation. Why can’t spatial content be encoded in a first-order calculus with using Cartesian coordinates?

 Is it just that it conflicts with our conscious experience?

 The problem with the general-LOT proposal is that it fails to account for certain psychophysical phenomena that are observed when vision and spatial reasoning are actively engaged in solving problems or in planning actions – i.e.,

when spatial representations are constructed in working memory

.

Spatial representation during perception and reasoning

   The impression that spatial representations are different from other kinds of representations is usually associated with examples from perception and spatial reasoning. In these contexts, as opposed to long-term-memory storage, there is reason to think that such representations are different in several ways I have suggested several such differences (Pylyshyn, 1978) – e.g.,  Working memory contents typically involve relationships among

tokens

and contains

no quantifiers

or

negation

, e.g., 

(x)F(x)

represented by a finite set of

x

’s, each of which has property

F

(

x

) is (i.e.,

all circles are red

is represented by a set of circles each of which is red) In the present talk I will focus on another way that such representations are special – in the way they

encode space

. Because these representations are not tied to vision, and do not even require a visual cortex or be accompanied by conscious experience, they are best referred to as

spatial representations

rather than mental images

What are some constraints on a theory of spatial representation?

 First I will attempt to tease out some

functional requirements

that may apply to a system for representing space and spatial relations in perception and especially in spatial reasoning  These requirements may explain why people often assume that there is a unified global frame of reference for vision and spatial reasoning that is

implemented as a spatial display in the brain

.  These requirements also serve to introduce an alternative proposal that meets the conditions without assuming a global spatial display

Some conditions on a system of codes for representing spatial relations (1)

1.

The system must be able to represent 

magnitudes

Psychophysical evidence shows that we have encodings of magnitudes (at least relative magnitudes) and that the magnitudes that are encoded (i.e., the semantics of the codes) have a particular systematic effect in reasoning (e.g., scalar variance, Fechner’s law, symbolic distance effect, etc).  This suggests that the codes themselves must have properties that explain these systematic magnitude effects (which would not be the case if the magnitudes were encoded as numerals)

Some conditions on a system of codes for representing spatial relations (2)

2.

The system must represent stable spatial

configurations

 Spatial configurations involve relations over multiple objects – in that sense they are holistic and require simultaneous access to multiple objects (multiple arguments in relational predicates must be simultaneously bound)  What is special about such configurations is that they may allow some spatial ‘inferences’ by pattern lookup without reference to independent geometrical axioms (see “Using space to represent spatial properties” later).

Some conditions on a system of codes for representing spatial relations (3)

3.

The system must somehow ‘capture’ the

continuity and connectedness of space

questions:

.

This leaves many

unanswered

 Does continuity entail that empty places are represented as such?

 Does continuity entail that the representational system itself determines that distances meet

metrical axioms

(e.g., the triangle inequality AB + BC ≥ AC) or that they are

Euclidean

?

 Does continuity entail that the representation of movements of objects is constrained so that in getting from A to B objects must pass through

intermediate

locations?

The proposal I will present later gives a partial answer to these

Some conditions on a system of codes for representing spatial relations (4)

4. The system must represent spatial properties across modalities, including proprioception and the motor system  It must be possible for a pattern such as SQUARE(w,x,y,z) to involve objects in different modalities  Spatial representations must be able to engage the motor system in a fairly direct manner ○ One of the characteristics of what we call a “spatial representation’ is that we can ‘point to’ represented things (e.g., in our mental image).

 But note that motor actions towards perceptual and imagined representations are not identical because they engage different perceptual-motor systems (Goodale et al. 1994)

Some conditions on a system of codes for representing spatial relations (5)

5.

The system must be able to represent spatial relations in 3D  When relations in the depth are encoded, they must be in a similar format to the encoding of relations in the plane since the two have to operate together  Experimental evidence from such mental imagery phenomena as ‘mental rotation’ or ‘mental scanning’ show identical functions in depth as in the plane

Summary of constraints to be met: A system of spatial representations must somehow do the following:

1. It must represent

magnitudes

2. It must represent holistic

configurations

which enable at least some direct one-step inferences (by pattern-matching) 3. It must capture

connectedness

and

continuity

4. It must represent spatial relations seamlessly

across modalities

and to

engage the motor system

5. It must represent distances in

depth

as well as in the plane in a uniform manner (i.e., it must represent

3D

)  I will return to these constraints when I discuss a different proposal for how we ‘represent’ space

Two additional common assumptions about spatial representation

The foregoing list of constraints has frequently led people to make two assumptions about spatial representation that I will argue are not justified:

1. The single frame of reference assumption

is the assumption that when we represent spatial layouts in perception or in thought we do so in a single global frame of reference, as opposed to a patchwork of distinct but coordinated frames  Our conscious awareness of spatial layouts suggests a single frame of reference, but like a lot of properties of conscious awareness this may be illusory

2. The holism/stability assumption

is the assumption that when we represent spatial layouts in perception or thought the representation simultaneously contains a large number of objects and properties in a stable spatial configuration

Why an inner display for vision?

In vision the spatial-display theory was meant to explain why our visual experience is panoramic and stable even though the visual inputs are highly local, partial and constantly changing But many studies have shown that there is no such rich stable panoramic display (e.g., change blindness, superposition, etc., see O’Regan, 1992)

Why an inner display for spatial reasoning?

The spatial-display theory was also meant to explain how a mental representation can meet the spatial conditions listed earlier – by creating a 2D image in a real spatial medium Such a display was assumed to use the same global 2-D spatial medium that is used in vision. But both display assumptions have serious problems.

The global spatial display assumption

 There are many deep problems with the assumption that spatial properties are represented in vision and reasoning by an

inner spatial display

which corresponds to our experience of a stable world (perceived or imagined), many of which I have discussed in connection with the ‘

picture theory

’ of mental imagery

(Behavioral and Brain Sciences, 2002)

 One of the main problems relevant to the present discussion is the assumption that visual perceptual, cross-modal spatial integration, visuomotor control, and spatial reasoning derive from a single representation in an allocentric reference frame  There are many reasons to doubt that there is a unified global frame of reference for representing spatial information

Reasons to reject the Master Map assumption

 There are many known frames of reference between perception and motor control, relying on both external and internal sensors  While gaze-centered coordinates are common in motor control they are gain-modulated by inputs from eye, head and body positions as well as by motor intentions (Anderson & Buneo, 2002, Duhamel, ‘92) ○ Visual information is also represented in hand- and body-centered frames of reference (Làdavas, 2002) ○ The neglect syndrome appears in many different frames of reference  Motor control necessarily involves many different frames of reference, including joint-angle, proprioceptive, kinesthetic, and even frames that depend on groups of spindle bundles  Earlier (downstream) frames of reference are often not overwritten but may continue to have observable consequences in perceptual-motor coordination and in errors in kinesthetically-guided motion (Baud-Bovy & Viviani, 1998) so multiple frames continue to exist in the nervous system

A different way of approaching the question of spatial representation

  Based on such problems with the global spatial display assumption, I have proposed a provisional hypothesis that preserves some of the advantages of the global spatial display, but assumes that the relevant spatial properties are

in the perceived world

and can be accessed if we have the right

access mechanisms

for

selecting

and

indexing objects

in the perceived world For ease of reference let us call this the

Projection Hypothesis

because it is as though the spatial display were projected onto the real space we perceive (though with only objects’ identities and locations, and none of their other visual properties)

The

projection hypothesis

 The

projection hypothesis

relies on the spatial properties of the

concurrently perceived

world to meet the 5 conditions outlined earlier. It rests on two theoretical postulates: 1.

We have a system of “pointers” (such as the

FINST

perceptual index

mechanism to be described later) by which a small number of perceived objects in the world can be selected and indexed. Indexes provide a fixed reference to their targets despite changes in targets’ locations 2.

When we perceive a scene that contains indexed objects, our perceptual system is able to treat those selected objects

as though they were assigned unique labels.

Thus our perceptual system is able to detect novel configurational properties among these indexed objects.

Aside on FINSTs indexes

 Because FINST Indexes play a central role in this story I will make a short detour to illustrate this mechanism and to give some examples of indexes at work

Pick out 3 dots I will cue and keep track of them  

After you pick out the 3 cued dots, I’ll ask you move your attention from the center one. Describe the new relation among the three dots.

In a field of identical elements you can select several of them and move your attention among them (e.g., “move one up” or Move 2 right” etc) so long as at no time do you have to hold on to more than 4 dots (Intriligator & Cavanagh, 2001)

In making relational judgments you must select and keep track of several objects at once

When we judge that certain objects are collinear, we must first pick out the relevant objects while ignoring all their properties except their location Such

picking out

and

referring

are the basic functions of FINST Indexes

Several objects must be picked out at once in making relational judgments

 In making relational judgments such as

inside

or

on-the-same-contour

you must pick out the relevant individual objects first. Are dots Inside-same-contour? On-same-contour?

Other experimental demonstrations of FINST indexes

 Recognizing the cardinality of small sets of things: Subitizing vs counting (Trick, 1994)  Searching through subsets – selecting items to search through (Burkell, 1997)  Selecting subsets and maintaining the selection during a saccade (Currie, 2002)  Multiple Object Tracking (MOT)

+

Subset selection for search

Target = + + single feature search + conjunction feature search Burkell, J., & Pylyshyn, Z. W. (1997). Searching through subsets: A test of the visual indexing hypothesis.

Spatial Vision, 11

(2), 225-258.

Subset search results:

  Only properties of the subset matter   If the If the

subset

number of items) is shallow

subset

is a single-feature search it is fast and the slope (RT vs is a conjunction search set, it takes longer and is more sensitive to the set size The distance between targets does not matter, so observers don’t seem to be scanning the display looking for the target but can switch their attention directly to the subset items

Selective search is also found when a saccade occurs between the late onset cues and start of search A saccade occurs here + + + single feature search Target = + conjunction feature search Even with a saccade between selection and access, items can be accessed efficiently

Demonstrating the function of FINSTs with Multiple Object Tracking (MOT)

 In a typical MOT experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off.

 After these 4 targets are briefly identified, all objects resume their identical appearance and move randomly. The observers’ task is to keep track of the ones that had been designated as targets at the start  After a period of 5-10 seconds the motion stops and observers must indicate, using a mouse, which objects are the targets

Keep track of the objects that flash

How do we do it? What properties of individual objects do we use?

Keep track of the objects that flash

Our explanation is that FINST indexes are bound to targets when they flash and remain bound during the duration of the trial. At the end of the trial they allow attention to be moved to each target to select the targets

FINST indexes allow selected objects to be accessed directly and without searching for specific properties: Indexes stay bound to objects as the objects move

If you were like the cartoon character

Plastic Man

and could place your fingers on things in the world so as to refer to them uniquely, and if you could then move your gaze or attention to them, you would possess

FIN

gers of

INST

antiation (

FINSTs

)!

End of aside on FINSTs!

Summary

   The FINST mechanism provides a limited set of indexical pointers bound to perceived objects FINSTs can associate perceived objects with objects of thought The binding is stable over some period of time (e.g., a few seconds) and continues despite motion of the objects or eye movements.

 Perception is able to treat the indexed objects as though they were

perceptually marked

Examples of the projection hypothesis

  To illustrate how the projection hypothesis works, first consider index-based projection in the visual modality, where indexes can convert some apparently mental-space phenomena into perceived-space phenomena (although I will return to the non-visual case shortly, the visual case is more salient and tends to dominate other modalities) Examples from some ‘mental imagery” experiments     Mental scanning (Kosslyn, 1973) Mental image superposition (Podgorny& Shepard, 1978) Visual-motor adaptation (Finke, 1979) S-R compatibility to imagined locations (Tlauka, 1998)

Studies of mental scanning

Often cited to suggest that representations

have

metrical properties

tree X tower X beach X X windmill X steeple X

Distance on image

Brain image or index-based projection?

 A way to do this task:  Associate places on the imagined map with places in the world that you perceive  Move your attention or gaze from one place to another as they are named

Using a perceived room to anchor FINSTs tagged with map labels

Using vision with selected ‘labeled’ objects

 If you ‘project’ the pattern of map places by picking out objects in the room in front of you that correspond roughly to these memorized locations, then you can scan attention from one such marked object to another. The space here is real and the equation

time = distance

speed

is a physical principle, not tacit knowledge about the world.

 You can also use the tagged objects to infer configurational properties you may not have noticed, despite somehow memorizing the location of all objects    Which 3 or more places on the map are collinear?

Which place on the map is furthest North, South, East, West?

Which 3 places form an isosceles triangle?

 Such configurational consequence can be

detected

as opposed to logically inferred, so long as they involve only a few places, because the

visual

system can examine a scene with labeled indexed objects

Another example of a result attributable to FINST based projection: Podgorny-Shepard experiment

The pattern of reaction times is the same for perceived shapes as for recalled shapes

 Both when the F display is

seen

and when the F is

imagined

, the time to judge that the dot was on the F was fastest when the dot was at the vertex of the F and slower when it was on an arm of the F (slowest when it was one square away).

 Does this show that the F and dots are superimposed on a display in the brain and perceived with the visual system?

 A more plausible explanation is that the cells corresponding to rows and columns of the F in the matrix are

indexed

and thus made distinct, allowing

vision

to be used to judge whether the dots fall on those rows/columns?

  

Perceptual-motor adaptation to imagined hand position

(Finke, 1979)

If you wear prism displacing lenses and repeatedly reach for objects in front of you for just a few minutes, you adapt to the erroneous feedback. When the lenses are removed you overshoot in the opposite direction.

If, instead of wearing lenses, you move your hand invisibly while you

imagine

that your hidden hand is at the displaced location, you get the same adaptation phenomena  Does this show that both your imagined hand and other properties of the scene are displayed somewhere in your visual system?

All you need are indexes to several objects in the visual scene, together with a distinct label for each (e.g.,

hand, block

). This allows attention or even gaze to move to them.  No other visual properties need to be represented in order to create the

discrepancy between felt and ‘seen’ (i.e. indexed) position

that is required for adaptation to occur

S-R Compatibility effect with a visual display

The

Simon effect

: It is faster to make a response in the direction of an attended objects than in another direction Response for

A

is faster when YES in on the left in these displays

S-R Compatibility effect with a recalled (mental) display

The same RT pattern occurs for a recalled display as for a perceived one RT is faster when the

A

is recalled (imagined) as being on the left

In all these cases you only need indexes to a few visual objects located in appropriate places

 In all examples we have seen, the results can be predicted without appealing to a mental display, if you assume that: 1. You can index a few visible objects (including texture elements on an apparently plain surface) and 2. The visual system can treat indexed objects as distinct or

visually labeled

Visual indexes can anchor spatial representations to a scene containing visual objects: But how does this work without vision (e.g., in the dark)?

 We must rely on our remarkable capacity to orient to (point to, navigate towards, …) perceived and recalled objects (including proprioceptive ‘objects’) in space without vision  Call this general capacity our

spatial sense

 How can the

projection hypothesis

account for this apparently world-centered spatial sense without assuming a global allocentric frame of reference?

Answer:

Just as it does with vision, by anchoring represented objects to (non-visually) perceived objects in the world

The spatial sense and the projection hypothesis

 Indexing non-visual ‘objects’ must exploit auditory and somatosensory signals, and perhaps even preparatory motor programs (the ‘intentional’ frame of reference proposed by Anderson & Bruneo, 2002; Duhamel, Colby & Goldberg, 1992)  Is there some special problem about somatosensory inputs that makes them different from visual inputs?

Is there a problem about somatosensory inputs providing ‘objects’ for anchoring the spatial sense?

 Unlike visual objects, the “objects” in the somatosensory modalities are not fixed in an allocentric frame of reference  Notice that even in vision and audition, objects are always moving

relative to sensors

, so representations must be updated to take account of movements (Andersen, 1999; Stricanne, Anderson & Mazzoni, 1996)  Does the spatial sense entail a representation in a global allocentric frame of reference?  Does coordinating between somatosensory and visual inputs require a single

global representational frame of reference

?

Some concrete examples of spatial skills that suggest a global frame of reference

 The assumption of a global spatial representation underlying

sense of space

is suggested by such observations as your ability (not always very accurate) to do the following

in the dark

:      Point to (or touch) a finger of your other hand Move your eye towards or reach towards a source of sound Reach towards where your hand was a second or so earlier Imagine a rectangle and point to where its vertices are in space Pick a random point on each side of the imagined rectangle and join pairs of points on opposite sides of the rectangle. Describe and point to where the newly drawn lines intersect.

 Look at the things in front of you and then turn around and point to the location of one of the object you saw that is now behind you (as in the experiments by Attneave & Ferrar, 1977)

How are indexes going to help with such examples?

 In order for the somatosensory case to work the way the purely visual case worked: 1. We need to specify how it is possible to

index an object in space

using somatosensory signals, and 2. We need to show that a limited number of

selected (indexed) individuals

are involved

,

as in the nonvisual case.

What is the real problem of our sense of space?

 In order to solve the problem of how we index objects in the world using somatosensory inputs we need to solve the problem of how we recognize two such inputs as corresponding to (reaching) the same thing in the world  This is the problem of the

equivalence

of movements, or of proprioceptive inputs, corresponding to reaching the same object – it’s the problem that Henri Poincaré recognized as the central problem of understanding our sense of space (Poincaré’s “Why space has three dimensions” in

Les Dernier Penseés, 1913

)  Solving this problem requires solving the problem of

coordinating signals across frames of reference

 That’s why mechanisms of

coordinate transformation

are of central importance – they generate the relevant equivalences!

Coordinate transformations are the basis for the illusory “global frame of reference”

A

coordinate transformation

operation takes a representation of an object relative to one coordinate system – say retinal coordinates – and produces a representation of that object relative to another frame of reference – say relative to the location of a hand in proprioceptive or kinematical coordinates  Coordinate transformations can thus define equivalence classes of gestures and somatosensory inputs that correspond to reaching the same object in space  They are also ubiquitous in the brain (especially in posterior parietal cortex and superior colliculus)  Another important consequence of these mechanisms is that, as (Colby & Goldberg, 1999) put it,

“Direct sensory-to-motor coordinate transformation obviates the need for a single representation of space in environmental coordinates”

(p319)

Coordinate transformations need not transform all points in a given frame of reference

 Coordinate transformations need not transform all

points

in a frame of reference or even all sensory objects:

Only a few selected objects

 need to be transformed at any one time The computational complexity of coordinate transformations can be greatly reduced by only transforming

selected

objects  This idea is closely related to the

conversion-on-demand

hypothesis proposed by Henriques et al. (1998) to explain how open-loop reaching can be carried out during eye movements using gaze-centered coordinates ○ In the Henriques et al proposal visual information is held in a gaze centered frame of reference and objects are converted to motor coordinates only when needed, but the details are not essential here

This completes the parallel with the visual case

 Coordinate transformations provide the basis for computing equivalence classes of somatosensory signals {S} to things in real space (S ≡ S′ iff there is a coordinate transformation from S to S′)  As in the visual case, the evidence suggests that only a few such equivalence classes are computed, corresponding to a few distal objects in the world  These objects are ones that have been selected and assigned a reference index, as postulated in Perceptual Index Theory (call them ‘generalized FINSTs’).

 With these few indexes we can anchor a few objects in perceptual representations or imagined representations to objects (filled places) in real space, which is what we require in order to explain the spatial character of spatial thoughts and the stable character of perceived space (as in the visual examples discussed earlier)

Summary: One or many spatial frames of reference?

There are many coordinated frames of reference and many ‘topographical spatial layouts’ in the brain, but the only frame of reference that is global and allocentric is the one

outside

our head – the real space to which we have only limited indexical access

Finally: Must there always be perceived objects for there to be a spatial sense?

A prediction of the projection hypothesis is that in the absence of any perceived ‘objects’ there would be no spatial sense and therefore that none of the findings demonstrating the spatial character of representations (e.g., the ‘mental imagery’ experiments quoted earlier) would be observed  I know of no data involving a total lack of sensory objects, but the following results are suggestive:  In the absence of

visual

objects, as in the

Ganzefeld

(Avant, 1965) orientation and eye movements become uncoordinated, so it is reasonable to expect poor spatial coordination with no perceived objects in any modality  Auditory localization is better when there is structured visual input (Warren, 1970) or auditory landmarks (Dufour& Despres, 2002) suggesting that concurrent perception of things in space is necessary for orientation  Sensory deprivation (while an extreme case) also leads to disorientation

The End

…and an appeal for help…  Does anyone know of evidence relevant to the question whether typical spatial sense skills are manifested in the absence of structured perceptual input of any kind?

Typical spatial skills

might include being able to solve geometry problems by constructing figures in your head  A more direct test might be to see if deafferented patients tested in the dark have impaired spatial skills, but I have seen no data on this

The End