Review: So far I have argued

Download Report

Transcript Review: So far I have argued

What is thinking?
Thinking/reasoning is the process by which we go beyond the
information given (beyond what we see or are told)
● Distinguish between representations involved in the course of actively
reasoning and those that constitute “standing knowledge
 Only active representations are referred to as thought These are
sometimes viewed as being in “working memory” (STM)
 Such active representations take part in reasoning, solving problems and
drawing inferences from standing knowledge.
 “Standing knowledge” is said to be in “long term memory” (LTM).
● The BIG questions:
 Do we think in language (e.g., English) or
 Do we think in pictures, or
 Both or neither
Desiderata for a form of representation
● A format for representing thoughts must meet certain
conditions (Fodor & Pylyshyn, 1988) :
1) The capacity to think is productive (there is no limit to how
many distinct thoughts the competence encompasses),
 Therefore thoughts are built from a finite set of concepts
2) The capacity to represent and to draw inferences is systematic: If
we have the capacity to think certain thoughts then we also have
the capacity to think other related thoughts.
3) Thoughts may be false but they are not ambiguous to the thinker

When sentences are ambiguous it is because they express several possible
unambiguous thoughts.
Thinking in words
● Our experience of “thinking in words” is that of carrying on
in inner dialogue with ourselves. But consider a typical
fragment of such a dialogue. It conforms to Gricean
principles of discourse: Be maximally informative – don’t
say things that your audience already knows.
● Since our mental dialogue follows these maxims it is clear
that much is left unsaid. But according to the view that one
thinks in words, if it is unsaid then it is also unthought! Or,
conversely, if it was thought, it was not thought in words. It
was left to the imagination of the listener – but according to
this view there is no room for imagining something other
than what was said.
● Thoughts experienced as inner dialogue grossly
underdetermine what is thought, so words cannot be the
vehicle of thought.
As I sit here thinking about what I will say in this
lecture, I observe myself thinking,
“I’d better find a concrete example of this
or nobody will understand what I mean,
and then they certainly will not believe it!”
If this was my thought, then what did I mean by “example” or “this”?
And who was I referring to when I said “nobody”? Was there a
presupposition that I wanted to persuade someone?
Obviously I knew what I meant, but how was this knowledge
represented? Not in words since I cannot find it anywhere in my
consciousness. And if it was there in unconscious words, it would
still have the same properties of anaphora, ambiguity, presupposition,
and entailment since those are inherent in natural language.
Lingua Mentis
● The representation of thoughts needs to meet the four
conditions just listed (finite conceptual base,
productivity, systematicity, freedom from ambiguity)
● For that reason, thought requires a format similar to a
logical calculus (or LF). Call it the Language of
Thought (LOT), after Fodor’s famous 1975 book.
 This is not to say that reasoning cannot use other forms of
representation in addition to LOT.
 Because LOT appears ill suited to represent magnitudes, the
proposal that there is an additional (perhaps analog) form of
representation is attractive

But none proposed so far is satisfactory – perhaps because the
notion of an analog is ill defined.
Representational and inferential systematicity
● Representational systematicity (Fodor & Pylyshyn, 1998) refers to the fact
that if you can think certain thoughts then you have the capacity to think
an indefinite number of other related thoughts: e.g., if you can think both
that snow is white and that crows are black then you have the concepts
snow, crow, white, and black which gives you the capacity also to think
snow is black and crows are white.
● Inferential or rule systematicity (Pylyshyn, 1984, Chapter 3) refers to the
fact that for representations to enter into rules, the representations must
have the relevant distinct constituents. For a rule of inference such as
“From P  Q and Not-Q infer P” the parts P and Q have to be explicitly
recognizable. The same is true of if-then rules. Suppose a system’s
behavior is expressed by a pair of rules such as (1) if Q1 and Q2 hold, then
execute action A1, and (2) if Q1 and Q3 hold then execute action A2. The
three distinct conditions Q1, Q2 and Q3 must be constituents of a
representation of the state of the system to which a rule applies. The rule
could not be expressed by a representation that fuses the conditions, as
connectionist models do (with Qn≡F[Q1,Q2]).
We often have strong experiences about the
steps we go through in solving a problem:
 But does that tell us how we solve it?
● Arnheim’s Visual Thinking (1969)
 Rudolph Arnheim claims that when people solve visual analogy problems,
they go through a sequence of mental states that involve a “rich and dazzling
experience” of “instability,” of “fleeting, elusive resemblances,” and of being
“suddenly struck by” some perceptual relationships. If this is true, does it
explain how we solve the problems?
● What steps do you go through in understanding language?
● How do you experience thinking about numbers?
 What is 9 + 7? What is 6 x 8? Which is larger 379 or 397?
● Daniel Tammet, autistic savant
 Ramachandra’s Science Video
 What does the description of Daniel’s experience tell us?
Thinking in pictures (or pictures + words)
● There is a large literature on scientific discovery that
credits images as the cause of the discovery (benzine ring)
● Are pictures better than words for expressing thoughts
or for creating new thoughts? Why are images often
cited as the format for non-verbal or intuitive thoughts?
● To understand how pictures or words could serve as the
basis for encoding thoughts we need to understand the
assumptions underlying the claim that thoughts are
encoded in pictures or words.
 What’s missing is an understanding of the distinction between
form and content which itself rests on another distinction
central to cognitive science – the distinction between
architecture and representation (more on this later)
How well do you know your own mind?
The problem of experiential access to
mental contents and processes
● What does how things look tell you about the contents of your
mental representation?
 Must there be a representation corresponding to an appearance?
 What do the changing contents of your conscious experience tell you about
the changing representations that your mind goes through? Does it provide a
trace of the process?
● What do the contents of experience tell you about how you make
decisions or solve problems? (Example later)
● Does a description of your experience provide the basis for an
account that is explanatory?
Why suppose that thoughts might be
represented in the form of pictures?
● Over 65% of the cortex is devoted to vision
● Most of our knowledge of the world comes
through vision
● If we have a visual module, why not use it to
encode/decode thoughts?
There are many questions about what
goes on when we have the experience
of “seeing with the mind’s eye”
● Is mental imagery a special form of thought? If so, in what
way is it special?
 Are mental images sensory-based and modality-specific?
 Are mental images like pictures? In what respect?
 Are images different from other forms of thought? Do they, for
example, resemble what they represent?
● Does mental imagery use the visual system? If so, what does
that tell us about the format of images?
● Is there neurophysiological evidence for a pictorial “display”
in visual cortex?
 What if a display were found in human visual cortex?
● These questions will be addressed this week (and
maybe even next week)
● But if mental imagery is to be thought of as being
closely related to vision, we first we need to ask some
questions about what vision is like.
● First we need to recognize that what drives the
imagery-vision parallel is the similar phenomenology
but yet the phenomenology of vision is very misleading
The phenomenology of seeing (including its
completeness, its filled-out fine details, and its
panoramic scope) turns out to be an illusion!
● We see far less, and with far less detail and scope, than
our phenomenology suggests
● Objectively, outside a small region called the fovea, we
are colorblind and our sight is so bad we are legally
blind. The rest of the visual field is seriously distorted
and even in the fovea not all colors are in focus at once.
● More importantly, we register surprisingly little of what
is in our field of view. Despite the subjective impression
that we can see a great deal that we cannot report, recent
evidence suggests that we cannot even tell when things
change as we watch.
What do we take in when we see?
● What we actually take in functionally depends on:
 Whether you are asking about the preconceptual
information or the conceptual (seeing-as) information
 Even pre-conceptual information is impoverished and built
up over time. We will see later that this consists primarily
in individuating and keeping track of individual objects
 Whether the information was attended or not.
Although unattended information is not entirely
screened out, it is certainly curtailed and sometimes
even inhibited.
Examples of attentional inhibition
● Negative Priming (Treisman & DeShepper, 1996).
 Is there a figure on the right that is the same as the figure on the left?
 When the figure on the left is one that had appeared as an ignored
figure on the right, RT is long and accuracy poor.
 This “negative priming” effect persisted over 200 intervening trials
and lasted for a month!
The effect of attention
on whether objects are
perceived /encoded:
Inattentional Blindness
(Mack, A., & Rock, I., 1998.
Inattentional blindness.
Cambridge, MA: Mit Press)
Inattentional Blindness
● The background task is to report which of two arms of the + is
longer. One critical trial per subject, after about 3,4 background
trials. Another “critical” trial presented as a divided attention
control.
● 25% of subjects failed to see the square when it was presented in
the parafovea (2° from fixation).
● But 65% failed to see it when it was at fixation!
● When the background task cross was made 10% as large,
Inattentional Blindness increased from 25% to 66%.
● It is not known whether this IB is due to concentration of
attention at the primary task, or whether there is inhibition of
outside regions.
Where does this leave us?
● Given the examples of memory errors, should we conclude that
seeing is a process of constructing conceptual descriptions?
● Most cognitive scientists and AI people would say yes, although
there would be several types of exception.
 There remains the possibility that for very short durations (e.g.
0.25 sec) there is a form of representation very like visual
persistence – sometimes called an ‘iconic storage’ (Sperling, 1960).
 From a neuroscience perspective there is evidence of a neural
representation in early vision – in primary visual cortex – that is
retinotopic and therefore “pictorial.”
 Doesn’t this suggest that a ‘picture” is available in the brain in vision?
 We shall see later that this evidence is misleading and does not support
a picture theory of vision or of visual memory
 A major theme of later lectures will be to show that an important
mechanism of vision is not conceptual but causal: Visual Indexes
 Many people continue to hold a version of the “picture theory” of
mental representation in mental imagery. More on this later.
Architecture and Process
● We now come to the most important distinction of all –
that between behavior attributable to the architecture
of a system and that attributable to properties of the
things that are represented. Without this distinction we
cannot distinguish between phenomena that reveal the
nature of the system and phenomena that reflect the
effects of external variables
● So here is an example to make the point
An illustrative example: Behavior of a mystery box
time
What does this behavior pattern tell us about the nature of the box?
The moral of this example:
Regularities in behavior may be due
to one of two very different causes:
1. The inherent nature of the system (to its
relatively fixed structure), or
2. The nature of what the system
represents (what it “knows”).
The imagery debate
The main difference between picturetheorists and the rest of us (me) is in how
we answer the following question:
● Do experimental findings on mental imagery (such as
those I will review) tell us anything about the properties
of a special imagery architecture? Or do they tell us
about the knowledge that people have about how things
would look if they were actually to see them (together
with some common psychophysical skills)?


While these are the main alternatives, there are also other
reasons why experiments come out as they do
Notice that the architecture alternative includes properties of
the format adopted in a particular domain of representation –
e.g., the Morse code used by the code-box in our example
Examples to probe your intuition
Imagine various events unfolding
before your “mind’s eye” –
● Imagine a bicyclist racing up a hill. Down a hill?
● Imagine turning a large heavy wheel. A light wheel.
● Imagine a baseball being hit. What shape trajectory does it trace
out? Where would you run to catch it?
● Imagine a coin dropping and whirling on its edge as it eventually
settles. Describe how it behaves.
● Imagine a heavy ball (a shot-put) being dropped at the same time
as a light ball (a tennis ball). Indicate when they hit the floor.
Repeat for different heights.
● Form a vivid auditory image of Alfred Brendel playing the minute
waltz so you hear every note clearly. How long will it take? Why?
What color do you see when two color
filters overlap?
Conservation of volume example
A basic mistake: Failure to distinguish between properties
of the world being represented and properties of the
representation or of the representational medium
“Representation of object O with property P”
Is ambiguous between these two parsings:
Representation of (Object O with property P)
vs
(Representation of object O) with property P
Major Question 1: Constraints imposed by imagery
Why do things happen the way
they do in your imagination?
● Is it because of the format of your image or your
cognitive architecture? Or because of what you know?
 Did it reveal a capacity of mind?
 Or was it because you made it do what it did?
● Can you make your image have any properties you
choose? Or behave in any way you want? Why not?
 How about imagining an object from all directions at once, or
from no particular direction?
 How about imagining a 4-dimensional object?
 Can you imagine a printed letter which is neither upper nor
lower case? A triangle that is not a particular type?
More demonstrations of the relation
between vision and imagery
● Images constructed from descriptions
 The D-J example(s)
 The two-parallelogram example
● Amodal completion
● Reconstruals: Slezak
Can images be visually reinterpreted?
● There have been many claims that people can visually
reinterpret images
 These have all been cases where one could easily figure out
what the combined image would look like without actually
seeing it (e.g., the J – D superposition).
● Pederson’s careful examination of visual “reconstruals”
showed (contrary to her own conclusion) that images are
never ambiguous (no Necker cube or figure-ground
reversals) and when new construals were achieved from
images they were quite different from the ones achieved
in vision (more variable, more guessing from cues, etc).
● The best evidence comes from a philosopher (Slezak, 1992, 1995)
Slezak figures
Pick one (or two) of these animals and
memorize what they look like. Now
rotate it in your mind by 90 degrees
clockwise and see what it looks like.
Slezak figures rotated 90o
Connect each corner of the top parallelogram with the
Do
this
imagery
exercise:
Now
imagine
an
identical
corresponding corner of the bottom
parallelogram
Imagine
a
parallelogram
like
this
one
parallelogram directly below this one
What do you see when you imagine the connections?
Did the imagined shape look (and change) like the one you see now?
Amodal completion by imagery?
Is this what you saw?
Continue….
Images and space
1. Are images spatial – i.e. do they have spatial properties
such as size, distance, and relations such as above, nextto, in-between? Do the axioms of Euclidean geometry
and measure theory apply to them?
a) ab + bc  ac and ab = ba
b) If abc = 90°, then ab2 + bc2 = ac2
2. If yes, what would that entail about how they must be
instantiated in the brain?
a) Could they be analogue? What constraints does that impose?
b) Is the space 2-D or 3D?
3. Might they be in some “functional space” – i.e., behave
as though they were spatial without having to be in real
physical brain-space? What does that entail?
Do mental images have size?
1.
Imagine a mouse across the room so it’s image size (%
of your image display it occupies) is small
2.
Now imagine it close to you so it fills your image
display
 Of these two conditions, in which does it take longer to
answer “can you see the mouse’s whiskers?”
3.
Imagine a horse. How close can you come to the image
before it starts to overflow your image display? Repeat
with a toaster, a table, a person’s face, etc
Do mental images have size?
Imagine a very small mouse. Can you see its whiskers?
Now imagine a huge mouse. Can you see its whiskers?
Which is faster?
Image of (small X) vs Small(image of X)
Mental rotation
Time to judge whether (a)-(b) or (b)-(c) are the
same except for orientation increases linearly
with the angle between them (Shepard & Metzler, 1971)
Imagine this shape rotating in 3D
When you make it rotate in your mind, does it seem
to retain its rigid 3D shape without re-computing it?
The missing bit of logic:
● What is assumed in the case of mental rotation?
● According to Prinz (2002) p 118,
“If visual-image rotation uses a spatial medium of the
kind Kosslyn envisions, then images must traverse
intermediate positions when they rotate from one
position to another. The propositional system can be
designed to represent intermediate positions during
rotation, but that is not obligatory.”
● But what makes this obligatory in “functional Space”?
How are these ‘assumptions’ realized?
● Assumptions such as rigidity must therefore be a
property inherent in the architecture (the ‘display’)
● That raises the question of what kind of architecture
could possibly enforce rigidity of form. This brings us
back to the proposed architecture – a physical display
 Notice, however, that such a display, by itself, does not rigidly
maintain the shape as orientation is changed.
 There is evidence that rotation is incremental not holistic and
is dependent on the complexity of the form and the task
 Also such rigidity could not be part of the architecture of an
imagery module because we can easily imagine situations in
which rigidity does not hold (e.g. imagine a rotating snake!).
Mental Scanning
● Some hundreds of experiments have now been done
demonstrating that it takes longer to scan attention
between places that are further apart in the imagined
scene. In fact the relation is linear between time and
distance.
● These have been reviewed and described in:
 Denis, M., & Kosslyn, S. M. (1999). Scanning visual mental
images: A window on the mind. Cahiers de Psychologie
Cognitive / Current Psychology of Cognition, 18(4), 409-465.
 Rarely cited are experiments by Pylyshyn & Bannon which I
will summarize for you.
Studies of mental scanning
Does it show that images have metrical space?
2
1.8
1.6
Latency (secs)
1.4
scan image
imagine lights
show direction
1.2
1
0.8
0.6
0.4
0.2
0
Relative distance on image
(Pylyshyn & Bannon. See Pylyshyn, 1981)
Conclusion: The image scanning effect is Cognitively Penetrable
 i.e., it depends on goals and beliefs, or on Tacit Knowledge.
 The central problem with imagistic explanations…
What is assumed in imagist
explanations of mental scanning?
● In actual vision, it takes longer to scan a longer distance because
real distance, real motion, and real time is involved, therefore this
equation holds due to natural law:
Time = distance
speed
But what ensures that a corresponding relation holds in an image?
The obvious answer is: Because the image is laid out in real space!
 But what if that option is closed for empirical reasons?
● Imagists appeal to a “Functional Space” which they liken to a
matrix data structure in which some pairs of cells are closer and
others further away, and to move from one to another it is natural
that you pass through intermediate cells
● Question: What makes these sorts of properties “natural” in a
matrix data structure?
Kosslyn view: Images as depictive representations
● “A depictive representation is a type of picture, which specifies the
locations and values of configurations of points in a space. For example,
a drawing of a ball on a box would be a depictive representation.
● The space in which the points appear need not be physical…, but can be
like an array in a computer, which specifies spatial relations purely
functionally. That is, the physical locations in the computer of each
point in an array are not themselves arranged in an array; it is only by
virtue of how this information is ‘read’ and processed that it comes to
function as if it were arranged into an array (with some points being
close, some far, some falling along a diagonal, and so on).
● Depictive representations convey meaning via their resemblance to an
object, with parts of the representation corresponding to parts of the
object… When a depictive representation is used, not only is the shape
of the represented parts immediately available to appropriate processes,
but so is the shape of the empty space … Moreover, one cannot
represent a shape in a depictive representation without also specifying a
size and orientation….”
(Kossyln, 1994, p 5)
Thou shalt not cheat
●
●
There is no natural law or principle that requires that the
representations of time, distance and speed to be related
according to the motion equation. You could equally easily
imagine an object moving instantly or according to any
motion relation you like, since it is your image!
There are two possible answers why the relation
Time = Representation of distance
Representation of speed
typically holds in an image-scanning task?
1.
2.
Because subjects have tacit knowledge that this is what
would happen if they viewed a real display, or
Because the matrix is taken to be a simulation of a realworld display, as it often is in computer science
Thou shalt not cheat
● What happens in ALL imagist accounts of phenomena,
including mental scanning and mental rotation is that imagists
assume that images have the properties of real space in order
to provide a principled explanation, and then retreat to some
“functional” or not-quite-real space when it is pointed out that
they are assuming that images are laid out in real brain space.
● This happens with mental rotation as well, even though it is an
involuntary and universal way of solving the rotated-figure
task so long as the task involves tokens of enantiomorphs.
● Experiments have shown that:
 No rotation occurs if the figures have landmarks or asymmetries
that can be used to identify them, and
 Records of eye movements show that mental rotation is done
incrementally: It is not a holistic rotation as experienced.
 The “rate of rotation” depends on the conceptual complexity of
the recognition task so is not a result of the architecture
A final point…
● In Kosslyn, Thompson & Ganis (2007) the authors cite
Ned Block to the effect that one does not need an actual
2D surface, so long as the connections upstream from the
cortical surface can decode certain pairs of neurons in
terms of their imagined distance. Think of long stretchy
axons going from a 2D surface to subsequent processes.
Imagine that the neurons are randomly moved around so
they are no longer on a 2D layout. As long as the
connections remain fixed it will still behave as though
there was a 2D surface.
● Call this the “encrypted 2D layout” version of literal
space.
The encrypted-spatial layout alternative
● By itself the encrypted-layout alternative will not do because
without referring to the original locations, the relation between
pairs of neurons and scan time is not principled. In the end the
only principle we have is Time=distance/speed so unless the
upstream system decrypts the neuron locations into their
original 2D surface locations the explanation for the increase
in time with increased imagined distance remains a mere
stipulation. It stipulates, but does not explain why, when two
points are further away in the imagined layout it takes longer
to scan between them or why scanning between them requires
that one visit ‘intermediate’ locations along the way.
● But this is what we need to explain! So long as what we have
is a stipulation, we can apply it to any form of representation!
What was a principled explanation with the literal 2D display
has now been given up for a mere statement of how it shall be.
The ‘Imagery Debate’ Redux
● According to Kosslyn there have been 3 stages
in the debate over the nature of mental images:
1. The first was concerned with the role of images in
learning and memory (Paivio’s Dual Code theory).
While influential at the time it has now been largely
abandoned except for a few recidivists like Barsalou;
2. The second stage involved the study of metrical-spatial
properties of images and the parallels between vision
and imagery, as assessed by reaction time measures;
3. Finally we now have the discovery of brain
mechanisms underlying visual imagining and so finally
the ‘resolution of the imagery debate’.
Mental imagery and neuroscience
● Neuroanatomical evidence for a retinotopic display in
the earliest visual area of the brain (V1)
● Neural imaging data showing V1 is more active during
mental imagery than during other forms of thought
 Also the form of activity differs for small vs large images in the
way that it differs when viewing small and large displays
● Transcranial magnetic stimulation (TMS) of visual areas
interferes more with imagery than other forms of thought
● Clinical cases show that visual and image impairment
tend to be similar (Bisiach, Farah)
● More recent psychophysical measures of imagery shows
parallels with comparable measures of vision, and these
can be related to the receptive cells in V1
Neuroscience has shown that the retinal pattern of
activation is displayed on the surface of the cortex
There is a topographical projection
of retinal activity on the visual
cortex of the cat and monkey.
Tootell, R. B., Silverman, M. S., Switkes, E., & de Valois, R. L
(1982). Deoxyglucose analysis of retinotopic organization in
primate striate cortex. Science, 218, 902-904.
Examples of problems of drawing conclusions about
mental imagery from neuroscience data
1.
The capacity for imagery and for vision are known to be
independent. Also all imagery results are observed in the blind.
2.
Cortical topography is 2-D, but mental images are 3-D – all
phenomena (e.g. rotation) occur in depth as well as in the plane.
Patterns in the visual cortex are in retinal coordinates whereas
images are in world-coordinates
3.

4.
Your image stays fixed in the room when you move your eyes or turn
your head or even walk around the room
Accessing information from an image is very different from
accessing it from the perceived world. Order of access from
images is highly constrained.

Conceptual rather than graphical properties are relevant to image
complexity (e.g., mental rotation).
Problems with drawing conclusions about mental
images from the neuroscience evidence
5.
6.
7.
8.
Retinal and cortical images are subject to Emmert’s Law,
whereas mental images are not;
The signature properties of vision (e.g. spontaneous 3D
interpretation, automatic reversals, apparent motion, motion
aftereffects, and many other phenomena) are absent in images;
A cortical display account of most imagery findings is
incompatible with the cognitive penetrability of mental imagery
phenomena, such as scanning and image size effects;
The fact that the Mind’s Eye is so much like a real eye (e.g.,
oblique effect, resolution fall-off) should serve to warn us that
we may be studying what observers know about how the world
looks to them, rather than what form their images take.
Problems with drawing conclusions about mental
images from the neuroscience evidence
9. Many clinical cases can be explained by appeal to tacit
knowledge and attention
 The ‘tunnel effect’ found in vision and imagery (Farah) is
likely due to the patient knowing what things now looked
like to her post-surgery
 Hemispatial neglect seems to be a deficit in attention,
which also explains the “representational neglect” in
imagery reported by Bisiach
 A recent study shows that imaginal neglect does not
appear if patients have their eyes closed. This fits well in
the account I will offer in which the spatial character of a
mental images derives from concurrently perceived space.
An over-arching consideration:
What if colored three-dimensional images were
found in visual cortex? What would that tell you
about the role of mental images in reasoning?
Would this require a homunculus?
Is this a straw man?
Should we welcome back the homunculus?
● In the limit if the visual cortex contained the contents of
ones conscious experience in imagery we would need an
interpreter to “see” this display in visual cortex
● But we will never have to face this prospect because many
experiments show that the contents of mental images are
conceptual (or, as Kosslyn puts it, are ‘predigested’).
● And finally, it is clear that you can make your image do
whatever you want and to have whatever properties you
wish.
 There are no known constraints on mental images that cannot be
attributed to lack of knowledge of the imagined situation (e.g.,
imagining a 4-dimensional block).
 All currently claimed properties of mental images are cognitively
penetrable.
One view: Images as depictive representations
● “A depictive representation is a type of picture, which specifies the
locations and values of configurations of points in a space. For
example, a drawing of a ball on a box would be a depictive
representation.
● The space in which the points appear need not be physical…, but can
be like an array in a computer, which specifies spatial relations purely
functionally. That is, the physical locations in the computer of each
point in an array are not themselves arranged in an array; it is only by
virtue of how this information is ‘read’ and processed that it comes to
function as if it were arranged into an array (with some points being
close, some far, some falling along a diagonal, and so on).
● Depictive representations convey meaning via their resemblance to an
object, with parts of the representation corresponding to parts of the
object… When a depictive representation is used, not only is the
shape of the represented parts immediately available to appropriate
processes, but so is the shape of the empty space … Moreover, one
cannot represent a shape in a depictive representation without also
specifying a size and orientation….”
(Kossyln, 1994, p 5)
In what sense are mental images spatial?
This is the most challenging and the most thoroughly
researched question in the last 20 years of research on
mental imagery. It is also the most seductive because
of the phenomenology associated with imagining.
Even though many experiments have been devoted to
asking whether imagery involves the visual system.
Does mental imagery use the visual system?
 this question is asked because of the expectation that
it will cast some light on the format of mental images,
and in particular that it will tell us something about
why images appear to have spatial properties.
 But I will suggest that a positive answer to this
question actually speaks against the hypothesis that
images are laid out in some space inside the head.
Vision is involved when images are
superimposed onto visual displays
 Many experiments show that when you project an
image onto a display the image acts very much like a
superimposed display




Shepard & Podgorny, Hayes, Bernbaum & Chung …
Interference effects (Brooks)
Interaction with the motor system (Finke, Tluka)
Superposition yields some visual illusions
 Maybe all imagery phenomena are like this!
 Mental scanning and superposition
 Only need pairing of a few perceived objects with
imagined ones
 The mechanism for such pairings may be the FINST index
Visual illusions with projected images
Bernbaum & Chung. (1981)
Shepard & Podgorny experiment
Both when the displays are seen and when the F is
imagined, RT to detect whether the dot was on the F is
fastest when the dot is at the vertex of the F, then when
on an arm of the F, then when far away from the F –
and slowest when one square off the F.
Differences between vision and visual
imagery in the control of motor actions
● Imagery clearly has some connection to motor control –
you can point to things in your image.
 This may be why images feel spatial
● Finke showed that you could get adaptation with imagined
hand position that was similar to adaptation to displacing
prism goggles
● You can also get Stimulus-Response compatibility effects
between the location of a stimulus in an image and the
location of the response button in space
● Both these findings provide support for the view that the
spatial character of images comes from their projection
onto a visual scene.
S-R Compatibility effect with a visual display
The Simon effect: It is faster to make a response in the
direction of an attended objects than in another direction
Response for A is
faster when YES in
on the left in these
displays
S-R Compatibility effect with
a recalled (mental) display
The same RT pattern occurs for a recalled display as for a perceived one
RT is faster when the A
is recalled (imagined)
as being on the left
The spatial-metrical character of images
● A number of experiments have been cited as showing
that images must actually have metrical properties,
particularly spatial ones (not just represent metrical
properties, but have them).
● The most commonly cited ones are experiments
involving
 Image size
 Mental scanning across a mental image
 Mental rotation of images
Where do we stand?
● It seems that a literal picture-in-the-brain theory is
untenable for many reasons – including the major
empirical differences between mental images and cortical
images. A serious problem with any format-based
explanation of mental imagery is the cognitive
penetrability of many of the imagery experiments.
● Is there a middle ground between a view of mental
images as pictorial/spatial and a view that says the
pictoriality of images is an illusion that arises from the
similarity of the experience of imaging and of seeing?
 How do we explain the spatial character of images – the fact
that they seem to be laid out in space?
 How do we explain the fact that images look like what they
represent?
What is the alternative?
● Neither seeing nor imaging entails the existence
of something pictorial.
 The notion of a “picture” only arises because viewing a
(literal) picture produce a similar experience to viewing
scenes (that is why pictures were thought to be “magic”).
 But yet there is something spatial about perception in
general (visual, auditory, proprioceptive,..). Where does
that come from? And does that hold the secret to
understanding the spatiality of mental images?
The spatial character of images and
the spatial nature of the world
● For an answer to what is spatial in imagery we
need to look into what is spatial about perception.
This is a nontrivial question about which we have
some interesting ideas – some of which come from
(of all places) Gibsonian influences on what has
been called the Situated Vision movement in
Cognitive Science and Robotics.
What does it mean to say that we
perceive the world as spatial?
● When we “notice” new properties in a scene, they are
consistent with what we noticed earlier, in terms of the
axioms of geometry and the laws of physics.
 Our noticings are generally “monotonic”
● We can examine a scene in any order and we can reexamine parts of it because its relevant properties
generally do not change. Unlike imagining what would
happen, seeing does not run into a frame problem
● We can navigate through the world we perceive
 But what exactly does this mean?
 We can engage in “reactive planning” and other forms of “situated”
behaviors that require contact with the world
● Rather than representing metrical space itself, we
deploy mechanisms for re-examining objects in the
world (“world as external memory”).
Aside on the ‘situated cognition’ movement
● It has a cult following, but like many cults, there is
some truth to it.
● The idea is that we do not represent the spatial layout
of a visual scene (except in the most sparse and
course manner). To do so would not only greatly tax
memory, but would be redundant. We do not need to
represent the scene in detail if we can return to it for
further information when we need it – and there is
evidence that we do just that. But how do we do it?
 Example of saccadic integration and deictic reference
 Now even more of an aside on Visual Indexing
(FINST) Theory: What it’s like and why we need it
Visual Indexes as Demonstrative Reference
● Visual indexes are a mechanism for referring to (or pointing
to) visual objects without first having represented their
properties: They are a direct referential link between a
mental token and a preconceptual individual object.
● They are needed to specify where to assign focal attention
● They are needed to evaluate multiple-argument predicates
 All arguments in a visual predicate must be bound to objects before the
predicates can be evaluated
● They allow the external environment to be used instead of
memory in carrying out tasks (e.g., Ballard’s copying task).
● They allow us to get around having to assume a metrical
image to explain trans-saccadic integration, visual stability
and many spatial imagery phenomena.
The spatiality of images is inherited from
the spatiality of the seen world
● If we can find what properties that give perceived space it
spatial character, then maybe our mental representations
can exploit these spatial properties when we superimpose
an image on a perceived scene.
 Examples: scanning, visual-motor adaptation
● This is the proposal: The representations underlying
mental images achieve their spatial properties by being
associated with real perceived objects or locations. This is
how they inherit the essential Euclidean character of space.
● Our “sense of space” is extremely accurate even without
vision, and can plausibly be used during mental imaging.
The spatiality of images and the spatiality of the
world in which they are situated
 If we think of projecting images onto a perceived scene as
involving binding objects of thought to objects of
perception, we can explain:
 The sense in which objects in a mental image can be in spatial
relations to one another
 Why images are in allocentric coordinates (where is your image?)

They make use of coordinate updating with voluntary sightless egomotion
 Why Finke was able to show visual-motor adaptation results with
imagined hand positions
 Why we get some induced visual illusions
 Why we get S-R compatibility findings
 Why we sometimes observe hemispatial neglect with mental
images (Bisiach)
A final point: Why do mental images
look like what they represent?
● More important: What kind of a fact is this?
 Is it a conceptual or an empirical fact that imagined things
look like what they are images of?
 Could an image of X look like something quite different
from X? Is there a possible world in which your image of
your dog looked like a soup bowl? Or even in which your
image of Tom looked like an image of his twin brother?
● “Looks like” is a problematic notion because it
bridges from an experience to a description.
 Wittgenstein’s example: Why does it “look like” the sun is
going around the earth rather than that the earth is rotating?
But that doesn’t explain why we can
solve mental geometry problems more
easily by imagining the figures!
● There are many problems that you can solve much
more easily when you imagine a layout than when
you do not.
● In fact many instances of solving problems by
imagining a layout that seem very similar to how
would solve them if one had pencil-and-paper.
● We need to understand is what happens in the visual
case in order to see how images can help this
processes in the absence of a real diagram.
How do real visual displays help thinking?
 How do diagrams, graphs, charts, maps and other visual
objects help us to reason and to solve problems?
 The question why visual aids help is nontrivial my Seeing
& Visualizing, chapter 8 contains some speculative
discussion, e.g., they allow the visual system to:
• make certain kinds of inferences just by looking
• make use of visual demonstratives to offload some of the memory*
• Physical displays embody the axioms of measure theory and of
geometry so they don’t need to be explicitly expressed in reasoning
 The big question is whether (and how) any of these
advantages carry over to imaginal thinking! Do mental
images have some (or any) of the critical properties that
make diagrams helpful in reasoning?
Visual inferences?
● If we recall a visual display it is because we have encoded
enough information about its visual-geometrical properties
that we can meet some criteria, e.g., we can draw it. But
there are innumerably many ways to encode this information
that are sufficient for the task (e.g. by encoding pairwise
spatial relations, global spatial relations, and so on). For
many properties the task of translating from one form to
another is much more difficult than the task of visually
encoding it – the translation constitutes visual inference.
● The visual system generalizes from particular instances as
part of its object-recognition skill (all recognition is
recognition-as and therefore assumes generalization from
tokens to types). It is also very good at noticing certain
properties (e.g., relative sizes, deviations from square or
circle, collinearity, inside, and so on). These capabilities can
be exploited in graphical layouts.
An example in which an image helps
thinking that is primarily logical
● Three-term series problems:
 John is taller than Mary and John is shorter than Fred.
 Who is tallest, who is shortest?
● A common way to solve this problem is by
using “spatial paralogic”. Construct a list:
 Read off Tallest at the top and
Shortest at the bottom
● What does this assume about
the image format/architecture?
Fred
John
Mary
Some assumptions about the image medium
● When two items are entered in the image their relative
locations remain fixed despite certain operations
on the image (moves)
● When a third item is entered with a certain
spatial relation to an earlier one, the relation
between the first two may remain unchanged
● The spatial relations in the image continue
to be a correct model of the intended relation
(e.g., “taller”) even for relations between pairs
not originally intended – e.g.,
Fred
John
Mary
 Reading off ‘Fred is taller than Mary’ is a valid inference
 But note that many pair-wise relations do not validly map onto the
vertical dimension: e.g., married to, loves, stands next to,…
● These image assumptions are true of a picture drawn on a rigid
surface, but why must it be true in the mind?
 The Frame Problem in robotics
 Also the relevance problem in inference
A few of difficulties with this view
● Indeterminacies are a problem




John is taller than Mary
Fred
Mary is shorter than Fred
Are Fred and John the same height?
What do we do when we discover that
John is shorter than Fred?
 Spatial location alone does not help – we
still need symbols such as the arrow
John
Mary
● Johnson-Laird: Reasoning by model
 J-L found that the difficulty of the problem increases when
more than one possible spatial model applies
What is assumed?
The fact that interspersing the plate between the spoon
and the knife does not alter the previously encoded
relation between spoon and knife, and between the
knife and the cup, is attributable to the user’s
knowledge of both the formal relation “to the right of”
(e.g., that it survives interpolation) and the cohesion of
the knife-cup pair which were moved together because
their relation was previously specified. Properties such
as cohesion can be easily seen in a diagram where all
that is required in order to place the plate to the right of
the spoon is to move the knife-cup column over by one.
But rigidity of form is not a property of images, one
would have to compute what happens to parts of a
pattern when other parts are moved (cf mental rotation).
What is the least you need to assume?
● Relative spatial relations need to be represented in some
manner that is invariant over some transformations
● If spatial relations between individual items are
represented in some way, recognizing them may require
perceptual (not necessarily visual) pattern recognition
 How this can happen without an internal display is the
subject of Chapter 5 of the Things and Places book.
Example: Memorize this map so you can draw it accurately
From your memory:
●
●
●
●
●
Which groups of 3 or more locations are collinear?
Which locations are midway between two others?
Which locations are closest to the center of the island?
Which pairs of locations are at the same latitude?
Which is the top-most (bottom-most) location?
 If you could draw the map from memory using whatever
properties you noticed and encoded, you could easily
answer the questions by looking at your drawing – even
if you had not encoded the relations in the queries.
Draw a rectangle. Draw a line from the bottom corners to a point
on the opposite vertical side. Do these two lines intersect? Is the
point of intersection of the two lines below or above the
midpoint? Does it depend on the particular rectangle you drew?
A
B
x
y
m
m’
D
C
Which properties of a real diagram
also hold for a mental diagram?
● A mental “diagram” does not have any of the properties that a
real diagram gets from being on a rigid 2D surface.
● When you imagine 3 points on a line, labeled A, B, and C,
must B be between A and C? What makes that so? Is the
distance AC greater than the distance AB or BC?
● When you imagine drawing point C after having drawn points
A and B, must the relation between A and B remain unchanged
(e.g., the distance between them, their qualitative relation such
as above or below). Why?
● These questions raise what is known as the frame problem in
Artificial Intelligence. If you plan a sequence of actions, how
do you know which properties of the world a particular action
will change and which it will not, given that there are an
unlimited number of properties and connections in the world?
What happens when we fail to make the represented-representation distinction