Metody Inteligencji Obliczeniowej

Download Report

Transcript Metody Inteligencji Obliczeniowej

Attractor neural networks and concept
formation in psychological spaces:
mind from brain?
Włodzisław Duch
Department of Informatics,
Nicholas Copernicus University, Toruń, Poland.
www.phys.uni.torun.pl/~duch
Bioinspired Computational Models of Learning and
Memory, Lejondal Castle, Sept. 2002
Plan:
• Intro: gap between neuroscience and psychology.
•
•
•
•
•
•
•
From molecules to brain, forgetting the mind
Psychological spaces
Platonic mind model - static version
Some applications
Dynamic extensions
Related ideas
Conclusions
Cognitive Science
The Central Paradox of Cognition: how can the structure
and meaning, expressed in symbols and ideas at the mental
level, result from numerical processing at the brain level?
Very few general laws in psychology (mostly psychophysical).
Psycho-logy lost the soul (psyche)?
Cognitive science: mixture (syntopy) of cognitive psychology,
neurosciences, AI, linguistics, philosophy of mind,
psychophysics, anthropology ...
No central model of mind in cognitive science.
Philosophical problems in foundations of cognitive sciences.
Mind the Gap
Gap between neuroscience and psychology: cognitive
science is at best incoherent mixture of various branches.
Is a satisfactory understanding of the mind possible ?
Roger Shepard, Toward a universal law of generalization for
psychological science (Science, Sept. 1987)
“What is required is not more data or more refined data but a
different conception of the problem”.
• Mind is what the brain does, a potentially conscious
subset of brain processes.
How to approximate the dynamics of the brain
to get satisfactory (geometric) picture of the mind?
From molecules ...
10-10 m, molecular level: ion channels, synapses, membrane
properties, neurochemistry, biophysics, psychopharmacology,
mind from molecular perspective (Ira Black)?
10-6 m, single neurons: biophysics, computational neuroscience (CS),
compartmental models, spikes, LTP, LTD, neurochemistry &
neurophysiology.
10-4 m, small networks: neurodynamics, recurrence, spiking neurons,
synchronization, neural code (liquid?), memory effects,
multielectrode recordings, neurophysiology, CS.
10-3 m, neural assemblies: cortical columns, multielectrode & large
electrode recordings, microcircuits, neurodynamics,
neuroscience, CS.
… to behavior.
10-2 m, mesoscopic networks: self-organization, sensory and motor
maps, population coding, continuous activity models,
mean field theories, brain imaging, EEG, MEG, fMRI.
10-1 m, transcortical networks, large brain structures: simplified
models of cortex, limbic structures, subcortical nuclei,
integration of functions, concept formation, sensorimotor
integration, neuropsychology, computational psychiatry ...
And then a miracle happens …
1 m, CNS, brain level: intentional behavior, psychology, thinking,
reasoning, language, problem solving, symbolic processing,
goal oriented knowledge-based systems, AI.
Where is the inner perspective?
Usually: transcortical networks => finite state automata => behavior
Alternative: Platonic model => mental events.
Static Platonic model: motivation
Plato believed in reality of mind, ideal forms
recognized by intellect.
A useful metaphore: perceived
mind content is like a shadow of
ideal, real world of objects
projected on the wall of a cave.
(drawing: Marc Cohen)
Real mind objects: shadows of neurodynamics?
Physics and psychology
R. Shepard (BBS, 2001):
psychological laws should be formulated
in appropriate psychological abstract spaces.
Physics - macroscopic properties results from microscopic interactions.
Description of movement - invariant in appropriate spaces:
• Euclidean 3D => Galileo transformations;
• (3+1) pseudo-Euclidean space => Lorentz x-t transformations;
• Riemannian curved space => laws invariant in accelerating frames.
Psychology - behavior, categorization, results from neurodynamics.
Neural networks: microscopic description, too difficult to use.
Find psychological spaces that result from neural dynamics and allow to
formulate general behavioral laws.
P-spaces
Psychological spaces:
K. Lewin, The conceptual representation and the
measurement of psychological forces (1938), cognitive
dynamic movement in phenomenological space.
George Kelly (1955), personal
construct psychology (PCP),
geometry of psychological
spaces as alternative to logic.
A complete theory of cognition,
action, learning and intention.
PCP network, society, journal,
software …
P-space definition
P-space: region in which we may place and classify
elements of our experience, constructed and evolving, „a
space without distance”, divided by dichotomies.
P-spaces should have (Shepard 1957-2001):
• minimal dimensionality
• distances that monotonically decrease with
increasing similarity
This is done with multi-dimensional non-metric scaling,
reproducing similarity relations in low-dimensional
spaces.
Laws of generalization
Shepard (1987), Universal law of generalization.
Tenenbaum, Griffith (2001), Bayesian framework unifying settheoretic approach (introduced by Tversky 1977) with Shepard ideas.
Generalization gradients tend to fall off approximately exponentially
with distance in an appropriately scaled psychological space.
Distance - from MDS maps of perceived similarity of stimuli.
G(D) = probability of response learned to stimulus for D=0, for
many visual/auditory tasks, falls exponentially with the distance.
Minds work in low D!
Mind uses only those features that are useful to act/decide.
The structure of the world is internalized in the brain.
3 examples of elegant low-D mental principles in vision:
• In a 3-D vector space, in which each variation in natural
illumination is cancelled by application of its inverse from the
three-dimensional linear group of terrestrial transformations of
the invariant solar source, color constancy is achieved.
• Positions and motions of objects represented as points and
connecting geodesic paths in the 6-D manifold (3-D Euclidean
group and 3-D symmetry group of each object) conserve their
shapes in the geometrically fullest and simplest way.
• Kinds of objects support optimal generalization/categorization
when represented as connected regions with shapes
determined by Bayesian revision of maximum-entropy priors.
Object recognition
Object recognition theory, S. Edelman (1997)
Second-order similarity in low-dimensional (<300) space is sufficient.
Population of columns as weak classifiers working in chorus - stacking.
Static Platonic model
Newton introduced space-time, arena for physical events.
Mind events need psychological spaces.
Goal: integrate neural and behavioral information in one model,
create model of mental processes at intermediate level between
psychology and neuroscience.
Static version: short-term response properties of the brain,
behavioral (sensomotoric) or memory-based (cognitive).
Applications: object recognition, psychophysics, category
formation in low-D psychological spaces, case-based reasoning.
Approach:
• simplify neural dynamics, find invariants (attractors), characterize
them in psychological spaces;
• use behavioral data, represent them in psychological space.
How to make static model?
From neural responses to stimulus spaces.
Bayesian analysis of multielectrode responses (Földiak).
P(ri|s), i=1..N computed from multi-electrode measurements
The posterior probability P(s|r) = P(stimulus | response)
Bayes law:
N
P  s | r   P  s | r1 , r2 ..rN  
P( s ) P  ri | s 
i 1
N
 P(s ') P  r | s '
i
s'
i 1
Population analysis: visual object represented
as population of column activities.
Same for words and abstract objects
(evidence from brain imaging).
Semantic memory
Autoassociative network, developing internal
representations (McClleland-Naughton-O’Reilly, 1995).
After training distance relations between different
categories are displayed in a dendrogram, showing
natural similarities/ clusters.
MDS mappings: min S (Rij - rij)2
from internal neural activations;
from original data in the P-space - hypercube,
dimensions
for predicates, ex. robin(x)  {0, 1};
from psychological experiments, similarity matrices;
show similar configurations.
From neurodynamics to P-spaces
Modeling input/output relations with some internal parameters.
Walter Freeman: model of olfaction in rabbits, 5 types of odors, 5
types of behavior, very complex model in between.
Simplified models: H. Liljeström.
Attractors of dynamics in high-dimensional space => via fuzzy symbolic
dynamics allow to define probability densities (PDF) in feature spaces.
Mind objects - created from fuzzy prototypes/exemplars.
More neurodynamics
Amit group, 1997-2001,
simplified spiking neuron
models of column activity
during learning.
Stage 1: single columns
respond to some feature.
Stage 2: several columns
respond to different features.
Stage 3: correlated activity
of many columns appears.
Formation of new attractors
=>formation of mind objects.
PDF: p(activity of columns|
given presented features)
Category learning.
Large field, many models.
Classical experiments: Shepard, Hovland and Jenkins (1961),
replicated by Nosofsky et al. (1994)
Problems of increasing complexity; results determined by logical rules.
3 binary-valued dimensions:
shape (square/triangle), color (black/white), size (large/small).
4 objects in each of the two categories presented during learning.
Type I - categorization using one dimension only.
Type II - two dim. are relevant (XOR problem).
Types III, IV, and V - intermediate complexity between Type II - VI.
All 3 dimensions relevant, "single dimension plus exception" type.
Type VI - most complex, 3 dimensions relevant,
logic = enumerate stimuli in each of the categories.
Difficulty (number of errors made): Type I < II < III ~ IV ~ V < VI
Canonical dynamics.
What happens in the brain during category learning?
Complex neurodynamics <=> simplest, canonical dynamics.
For all logical functions one may write corresponding equations.
For XOR (type II problems) equations are:
V  x, y, z   3xyz 
1 2
2
2 2
x

y

z


4
V
 -3 yz -  x 2  y 2  z 2  x
x
V
y -3xz -  x 2  y 2  z 2  y
y
V
z -3xy -  x 2  y 2  z 2  z
z
x-
Corresponding feature space for relevant
dimensions A, B
Inverse based rates
Relative frequencies (base rates) of categories are used for classification:
if on a list of disease and symptoms disease C associated with (PC, I)
symptoms is 3 times more common as R,
then symptoms PC => C, I => C (base rate effect).
Predictions contrary to the base:
inverse base rate effects (Medin, Edelson 1988).
Although PC + I + PR => C (60% answers)
PC + PR => R (60% answers)
Why?
Psychological explanations are not convincing.
Effects due to the neurodynamics of learning?
I am not aware of any dynamical models of such effects.
IBR explanation
Psychological explanation:
J. Kruschke, Base Rates in Category Learning (1996).
PR is attended to because it is a distinct symptom, although PC is more
common.
Basins of attractors - neurodynamics;
PDFs in P-space {C, R, I, PC, PR}.
PR + PC activation leads more frequently
to R because the basin of attractor for R is
deeper.
Construct neurodynamics, get PDFs.
Unfortunately these processes are in 5D.
Prediction: weak effects due to order and timing of presentation
(PC, PR) and (PR, PC), due to trapping of the mind state by different
attractors.
Learning
Point of view
Neurodynamics
Psychology
I+PC more frequent => stronger
synaptic connections, larger and
deeper basins of attractors.
Symptoms I, PC are typical for C
because they appear more often.
To avoid attractor around I+PC
leading to C, deeper, more
localized attractor around I+PR
is created.
Rare disease R - symptom I is
misleading, attention shifted to
PR associated with R.
Probing
Point of view
Neurodynamics
Psychology
Activation by I leads to C because
longer training on I+PC creates
larger common basin than I+PR.
I => C, in agreement with base
rates, more frequent stimuli I+PC
are recalled more often.
Activation by I+PC+PR leads
frequently to C, because I+PC
puts the system in the middle of
the large C basin and even for PR
gradients still lead to C.
I+PC+PR => C because all
symptoms are present and C is
more frequent (base rates again).
Activation by PR+PC leads more
PC+PR => R because R is distinct
frequently to R because the basin symptom, although PC is more
of attractor for R is deeper, and the common.
gradient at (PR,PC) leads to R.
Automatisation of actions
How does the sensorimotor and cognitive learning takes place?
Initially conscious decisions are needed, at the end it is automatic,
subconscious, intuitive, and well-localized in the brain.
Formation of attractors during learning => model of Amit & co.
Reinforcement learning requires observing and evaluating the
actions that the brain has planned and is executing.
Relating current performance to memorized episodes of
performance requires evaluation + comparison (Gray – subiculum)
followed by emotional reactions that provide reinforcement and
increase neuromodulation, facilitating rapid learning.
Working memory is essential to perform such complex task.
Errors are painfully conscious and are remembered.
Conscious experiences: transferring data from WM to motor cortex
and memory in the ERTAS loop?
Is this the main role of consciousness?
Psychophysics
Static Platonic model is useful for immediate, memory-based behavior.
Local maxima of PDF are due to the potential activations of the longterm memory, “mind landscape” slowly changing with time.
Working memory, content of mind - currently active objects.
Psychophysical phenomena, like
masking: the circle exposed for 30
ms is seen, but not if the ring follows.
Dennett (1991): Stalinist and
Orwellian scenarios: preventing
conscious experience or erasing the
history?
How to describe such phenomena?
Masking in P-space
P-space: basic feature of objects.
Mind state, object seen: initially blank screen, object O1  attractor.
O1 is an active object, mind state has momentum and inertia.
External stimulus (circle) pushes
the mind state towards O2.
A masking stimulus O3 close to
O2 blocks activation of O2;
no conscious recall of the small
disk is noted;
Priming lowers inertia.
Solid state physics: use effective
mass, forget microscopic
interactions.
A series of masking stimuli following each other – no time to settle in an
attractor, no conscious experience at all?
Geometric properties
Geometric representation of mental events should be
understandable.
Problem of all Euclidean models: similarities are non-metric.
Re-entry connections between columns are not symmetric.
Asymmetric MDS requires change of perspective for each object.
Solution: Finsler geometry (ex: time as distance)
A curve X(t) parameterized by t, distance between t1=A, t2=B
depends on the positions X(t+dt) and derivative dX(t)/dt.
B
s  A, B   min  L  X (t ), dX (t ) / dt  dt
A
where L(.) is the metric function (Lagrangian in physics).
Distance = „action” , fundamental laws of physics have such form.
To get nonsymetric distance s(A,B), potential may be introduced,
for example proportional to probability density.
Feature Space Mapping
Platonic Model: inspiration for FSM (Duch 1994) - neurofuzzy system for
modeling PDFs using separable transfer (fuzzy membership) functions.
Classification, extraction of logical rules, decision support.
Set up (fuzzy) facts explicitly as dense regions in the feature space;
Initialize by clusterization - creates rough PDF landscape.
Train by tuning adaptive parameters P;
novelty criteria allow for creation of new nodes as required.
Self-organization of G(X;P) = prototypes of objects in the feature space.
g p ( X; P )   g p ,i  xi ; Pi p 
N
p
i 1
F ( X; P)  Wp g p  X; P p 
n
p 1
Recognition: find local maximum
of the F(X;P) function.
Intuitive thinking
Question in qualitative physics:
if R2 increases, R1 and Vt are constant,
what will happen with current and V1, V2 ?
Geometric representation of facts:
+ increasing, 0 constant, - decreasing.
Ohm’s law V=I×R; Kirhoff’s V=V1+V2.
True (I-,V-,R0), (I+,V+,R0), false (I+,V-,R0).
5 laws: 3 Ohm’s & 2 Kirhoff’s.
All laws A=B+C, A=B×C , A-1=B-1+C-1,
have identical geometric interpretation!
13 true, 14 false facts; simple P-space,
complex neurodynamics.
Intuitive reasoning
5 laws are simultaneously fulfilled, all have the same representation:
5
F (Vt , R, I ,V1 ,V2 , R1, R2 )   Fi ( Ai , Bi , Ci )
i 1
Question: If R2=+, R1=0 and V =0, what can be said about I, V1, V2 ?
Find missing value giving F(V=0, R, I,V1, V2, R1=0, R2=+) >0
Suppose that variable X = +, is it possible?
Not, if F(V=0, R, I,V1, V2, R1=0, R2=+) =0, i.e. one law is not fulfilled.
If nothing is known 111 consistent combinations out of 2187 (5%) exist.
Intuitive reasoning, no manipulation
of symbols; heuristics: select
variable giving unique answer.
Soft constraints or semi-quantitative
=> small |FSM(X)| values.
Platonic mind model.
Feature detectors/effectors: topographic maps.
Objects in long-term memory (parietal, temporal, frontal): local P-spaces.
Mind space (working memory, prefrontal, parietal): construction of mind
space features/objects using attention mechanisms.
Feelings: gradients in the global space.
Language of thought.
Precise language, replacing folk psychology,
reducible to neurodynamics.
Mind state dynamics modeled by gradient
dynamics in mind space, „sticking” to PDF
maxima, for example:
S (0)  X inp ;


S (t )    S M ( S ; t ) 1  g  M  S ; t     (t )
where g(x) controls the „sticking” and
(t) is a noise + external forces term.
Mind state has inertia and momentum;
transition probabilities between mind objects
should be fitted to transition prob. between
corresponding attractors of neurodynamics
(QM-like formalism).
Primary mind objects - from sensory data.
Secondary mind objects - abstract categories.
Some connections
Geometric/dynamical ideas related to mind may be found in many fields:
Neuroscience:
D. Marr (1970) “probabilistic landscape”.
C.H. Anderson, D.C. van Essen (1994): Superior Colliculus PDF maps
S. Edelman: “neural spaces”, object recognition, global representation space
approximates the Cartesian product of spaces that code object fragments,
representation of similarities is sufficient.
Psychology:
K. Levin, psychological forces.
G. Kelly, Personal Construct Psychology.
R. Shephard, universal invariant laws.
Folk psychology:
to put in mind, to have in mind, to keep in mind (mindmap), to make up one's
mind, be of one mind ... (space).
More connections
AI: problem spaces - reasoning, problem solving, SOAR, ACT-R, little
work on continuous mappings (MacLennan) instead of symbols.
Engineering: system identification, internal models inferred from
input/output observations – this may be done without any parametric
assumptions if a number of identical neural modules are used!
Philosophy:
P. Gärdenfors, conceptual spaces
R.F. Port, T.van Gelder, ed. Mind as motion (MIT Press 1995)
Linguistics:
G. Fauconnier, Mental Spaces (Cambridge U.P. 1994).
Mental spaces and non-classical feature spaces.
J. Elman, Language as a dynamical system (San Diego, 1997).
Stream of thoughts, sentence as a trajectory in P-space.
Psycholinguistics: T. Landauer, S. Dumais, Latent Semantic Analysis,
Psych. Rev. (1997) Semantic for 60 k words corpus requires about 300 dim.
Conclusions
Platonic model - a unified paradigm for cognitive science?
Complex neurodynamics replaced by simpler dynamics in P-spaces.
Searching for low-dimensional representation of mental events.
Relations between different levels of modeling are important, eg.
recurrent neural network => psychological spaces.
Useful technical/psychological applications of the static Platonic model.
Open questions:
High-dimensional P-spaces with Finsler geometry needed for visualization
of the mind events - will the model be understandable?
Mathematical characterization of mind space? Many choices.
Challenge: neurodynamical model => P-spaces for monkey categorization.
Large-scale simulations of models of mind are missing but ... hierarchical
approach: networks of networks in simulated environment, is coming.
At the end of the road: physics-like theory of events in mental spaces ?
And in the end ?
A lot of work to do ...