A Brain-Like Computer for Cognitive Applications: The

Download Report

Transcript A Brain-Like Computer for Cognitive Applications: The

A Brain-Like Computer for Cognitive Applications: The Ersatz Brain Project James A. Anderson

[email protected]

Department of Cognitive and Linguistic Sciences Brown University, Providence, RI 02912

Paul Allopenna

[email protected]

Aptima, Inc.

12 Gill Street, Suite 1400, Woburn, MA

Our Goal: We want to build a first-rate, second-rate brain.

Participants

Faculty: Jim Anderson

, Cognitive Science.

Gerry Guralnik

, Physics.

Tom Dean

, Computer Science

David Sheinberg

, Neuroscience.

Students: Socrates Dimitriadis

, Cognitive Science.

Brian Merritt

, Cognitive Science.

Benjamin Machta,

Physics.

Private Industry: Paul Allopenna

, Aptima, Inc.

John Santini

, Anteon, Inc.

Comparison of Silicon Computers and Carbon Computer

Digital computers are • Made from silicon • Accurate (essentially no errors) • Fast (nanoseconds) • Execute long chains of

operations

(billions)

logical

• Often irritating (because they don’t think like us).

Comparison of Silicon Computers and Carbon Computer

Brains are • Made from carbon • Inaccurate (low precision, noisy) • Slow (milliseconds, 10 6 slower) times • Execute short chains of

parallel alogical associative operations

(perhaps 10 operations/second) • Yet largely understandable (because they think like us).

Comparison of Silicon Computers and Carbon Computer

• Huge disadvantage for carbon: more than

10 12

in the product of speed and power. • But we still do better than them in many

perceptual

skills: speech recognition, object recognition, face recognition, motor control.

• Implication: Cognitive “software” uses only a few but very powerful elementary operations.

Major Point

Brains and computers are

very different

underlying hardware, leading to major differences in software.

in their Computers, as the result of 60 years of evolution, are great at modeling

physics

. They are not great (after 50 years of and largely failing) at modeling

human cognition

. One possible reason:

inappropriate hardware leads to inappropriate software

. Maybe we need something completely different:

new software

,

new hardware

,

new basic operations

, even

new ideas about computation

.

So Why Build a Brain-Like Computer?

1. Engineering .

Computers are all special purpose devices. Many of the most important practical computer applications of the next few decades will be cognitive in nature:      Natural language processing.

Internet search.

Cognitive data mining.

Decent human-computer interfaces.

Text understanding.

We claim it will be necessary to have a cortex-like architecture (either software or hardware) to run these applications efficiently.

2. Science :

Such a system, even in simulation, becomes a powerful research tool. It leads to designing software with a particular structure to match the brain-like computer. If we capture any of the essence of the cortex, writing good programs will give insight into biology and cognitive science.

If we can write good software for a vaguely brain like computer we may show we really understand something important about the brain.

3. Personal :

It would be the ultimate cool gadget.

A technological vision:

In 2055 the personal computer you buy in Wal-Mart will have

two CPU’s

with very different architectures:

First

, a traditional

von Neumann machine

that runs spreadsheets, does word processing, keeps your calendar straight, etc. etc. What they do now.

Second

, a

brain-like chip

 To handle the interface with the von Neumann machine,  Give you the data that you need from the Web or your files (but didn’t think to ask for).

 Be your silicon friend, guide, and confidant.

History : Technical Issues

Many have proposed the construction of brain-like computers.

These attempts usually start with

 massively parallel arrays of neural computing elements  elements based on biological neurons, and  the layered 2-D anatomy of mammalian cerebral cortex.

Such attempts have failed commercially. The early

connection machines

from

Thinking Machines,Inc.

,(W.D. Hillis,

The Connection Machine,

1987) was most nearly successful commercially and is most like the architecture we are proposing here.

Consider the extremes of computational brain models.

First Extreme: Biological Realism

The human brain is composed of the order of

10 10

neurons, connected together with at least

10 14

connections. (Probably underestimates.) neural Biological neurons and their connections are extremely complex electrochemical structures. The more realistic the neuron approximation the smaller the network that can be modeled. There is good evidence that for cerebral cortex

a bigger brain is a better brain.

Projects that model neurons in detail are of scientific importance.

But they are not large enough to simulate interesting cognition.

Neural Networks.

The most successful brain inspired models are

neural networks

. They are built from simple approximations of biological neurons: nonlinear integration of many weighted inputs.

Throw out all the other biological detail.

Neural Network Systems

Units with these approximations can build systems that   can be made large,     can be analyzed, can be simulated,   can display complex cognitive behavior.

Neural networks have been used to model (rather well) important aspects of human cognition.

Second Extreme: Associatively Linked Networks

.

The second class of brain-like computing models is a basic part of computer science:

Associatively linked structures

. One example of such a structure is a semantic network. Such structures underlie most of the practically successful applications of artificial intelligence.

Associatively Linked Networks (2)

The connection between the biological nervous system and such a structure is unclear. Few believe that nodes in a semantic network correspond in any sense to single neurons. Physiology (fMRI) suggests that a complex cognitive structure – a word, for instance – gives rise to

widely distributed cortical activation

. Major virtue of Linked Networks:

They have sparsely connected “interesting” nodes. (words, concepts)

In practical systems, the

number of links converging on a node

range from one or two up to a dozen or so.

The Ersatz Brain Approximation: The Network of Networks.

Conventional wisdom says

neurons are the basic computational units of the brain

.

The Ersatz Brain Project is based on a different assumption.

The Network of Networks model was developed in collaboration with Jeff Sutton (Harvard Medical School, now at NSBRI). Cerebral cortex contains

intermediate level structure

, between neurons and an entire cortical region. Intermediate level brain structures are hard to study experimentally because they require recording from many cells simultaneously.

Cortical Columns: Minicolumns

“The basic unit of cortical operation is the

minicolumn

… It contains of the order of 80-100 neurons except in the primate striate cortex, where the number is more than doubled. The minicolumn measures of the order of 40-50  m in transverse diameter, separated from adjacent minicolumns by vertical, cell-sparse zones … The minicolumn is produced by the iterative division of a small number of progenitor cells in the neuroepithelium .” (Mountcastle, p. 2) VB Mountcastle (2003). Introduction [to a special issue of

Cerebral Cortex

on columns].

Cerebral Cortex

,

13

, 2-4.

Figure: Nissl stain of cortex in

planum temporale.

Columns: Functional

Groupings

V1.

of minicolumns seem to form the physiologically observed

functional columns

. Best known example is orientation columns in They are significantly bigger than minicolumns, typically around 0.3-0.5 mm.

Mountcastle’s summation : “Cortical columns are formed by the binding together of many minicolumns by common input and short range horizontal connections. … The number of minicolumns per column varies … between 50 and 80. Long range intracortical projections link columns with similar functional properties.” (p. 3)

Cells in a column ~ (80)(100) = 8000

Sparse Connectivity

The brain is

sparsely connected

. (Unlike most neural nets.) A neuron in cortex may have on the order of

100,000

synapses. There are more than

10 10

neurons in the brain. Fractional connectivity is very low:

0.001%

. Implications: • Connections are take up space, use energy, and are hard to wire up correctly.

expensive

biologically since they • Therefore, connections are

valuable.

• The

pattern of connection

Short

is under tight control.

local connections are cheaper than

long

ones.

Our approximation makes extensive use of local connections for computation.

Network of Networks Approximation

We use the

Network of Networks [NofN]

approximation to structure the hardware and to reduce the number of connections.

We assume the

basic computing units

are

not neurons

, but small (10 4 neurons)

attractor networks

.

Basic Network of Networks Architecture

: •

2 Dimensional array of modules Locally connected to neighbors

The activity of the non linear attractor networks (

modules)

is dominated by their

attractor states

.

Attractor states may be

built in

or

acquired through learning.

We

approximate

the

activity of a module

as a weighted sum of attractor states.That is: an

adequate set of basis functions

.

Activity of Module:

x

= Σ c

i

a

i

where the

a

i

are the attractor states.

Elementary Modules

The Single Module: BSB

The attractor network we use for the individual modules is the

BSB network

(Anderson, 1993).

It can be analyzed using the

eigenvectors

and

eigenvalues

of its local connections.

Interactions between Modules

Interactions between modules are described by

state interaction matrices, M

. The state interaction matrix elements give the

contribution

of an attractor state in one module to the amplitude of an attractor state in a connected module. In the BSB

linear

region

x(t+1)

=

Σ Ms i

+

f

+

x(t) weighted sum input ongoing from other modules activity

The Linear-Nonlinear Transition

The first BSB processing stage is

linear

influences from other modules. The second processing stage is

nonlinear

.

and sums This

linear to nonlinear transition

is a powerful computational tool for cognitive applications.

It describes the

processing path

cognitive processes.

taken by many A generalization from

cognitive science

:

Sensory inputs

(categories, concepts, words)

Cognitive processing moves from

continuous values

to

discrete entities.

Binding Module Patterns Together.

An associative

Hebbian learning event

will tend to link

f

with

g

through the local connections.

Two adjacent modules interacting. Hebbian learning will tend to bind responses of modules together if

f

and

g

frequently co-occur.

There is a speculative connection to the important

binding problem

of cognitive science and neuroscience.

The larger groupings will act like a unit.

Responses will be stronger to the pair

f,g

than to either

f

or

g

by itself.

We can extend this associative model to larger scale groupings. It may become possible to suggest a natural way to bridge the gap in scale between single neurons and entire brain regions. Networks > Networks of Networks > Networks of (Networks of Networks) > Networks of (Networks of (Networks of Networks )) and so on …

Scaling

Interference Patterns

We are using

local transmission of (vector) patterns

, not

scalar activity level

. We have the potential for

traveling pattern waves

using the local connections.

Lateral information flow allows the potential for the formation of

feature combinations

in the

interference patterns

where two different patterns collide.

Learning the Interference Pattern

The individual modules are

nonlinear learning networks.

We can form

new attractor states

when an interference pattern forms when two patterns meet at a module.

Module Evolution

Module evolution with learning:  From an

initial repertoire

states of basic attractor  to the development of

specialized pattern combination

states

unique

to the history of each module.

Biological Evidence: Columnar Organization in Inferotemporal Cortex

Tanaka (2003) suggests a columnar organization of different response classes in primate

inferotemporal cortex.

There seems to be some internal structure in these regions: for example, spatial representation of orientation of the image in the column.

IT Response Clusters: Imaging

Tanaka (2003) used intrinsic visual imaging of cortex. Train video camera on exposed cortex, cell activity can be picked up.

At least a factor of ten higher resolution than fMRI

.

Size of response is around the size of functional columns seen elsewhere: 300-400 microns.

Columns: Inferotemporal Cortex

Responses of a region of IT to complex images involve discrete columns.

The response to a picture of a fire extinguisher shows how regions of activity are determined.

Boundaries are where the activity falls by a half.

Note: some spots are roughly equally spaced.

Active IT Regions for a Complex Stimulus

Note the large number of roughly equally distant spots (2 mm) for a familiar complex image.

Network of Networks Functional Summary

.

• The NofN approximation assumes a

two dimensional array of attractor networks

.

• The

attractor states

dominate the output of the system at all levels.

• Interactions between different modules are approximated by

interactions between their attractor

states. • Lateral information propagation plus nonlinear learning allows

formation of new attractors

at the location of

interference patterns.

• There is a

linear

and a

nonlinear

region of operation in both single and multiple modules. • The qualitative behavior of the attractor networks can be controlled by

analog gain control

parameters.

Engineering Hardware Considerations

We feel that there is a size, connectivity, and computational power “

sweet spot

” at the level of the parameters of the network of network model. If an

elementary attractor network

that network display

50 attractor

has

10 4 actual neurons

, states. Each elementary network might

connect to 50 others connection matrices

. through

state

A brain-sized system might consist of

10 6

with about

10 11 (0.1 terabyte) elementary units

numbers specifying the connections. If

100 to 1000 elementary units

would be a total of sized system. can be placed on a chip there

1,000 to 10,000 chips

in a cortex These numbers are large but within the upper bounds of current technology.

A Software Example: Sensor Fusion

A potential application is to means merging information from different sensors into a unified interpretation.

sensor fusion.

Sensor fusion Involved in such a project in collaboration with Texas Instruments and Distributed Data Systems, Inc. The project was a way to do the

de-interleaving problem

radar signal processing using a neural net. in In a radar environment the problem is to determine how many radar emitters are present and whom they belong to. Biologically, this corresponds to the behaviorally important question,

“Who is looking at me?”

(To be followed, of course, by “

And what am I going to do about it?

”)

Radar

A

receiver

for radar pulses provide several kinds of

quantitative

data: • frequency, • intensity, • pulse width, • angle of arrival, and • time of arrival. The

user

of the radar system wants to know

qualitative

information: • How many emitters? • What type are they? • Who owns them? •

Has a new emitter appeared?

Concepts

The way we solved the problem was by using a

concept forming

model from cognitive science.

Concepts

are labels for a large class of members that may differ substantially from each other. (For example, birds, tables, furniture.) We built a system where a nonlinear network developed an attractor structure where

each attractor corresponded to an emitter

.

That is, emitters became discrete, valid concepts.

Human Concepts

One of the most useful computational properties of human concepts is that they often show a hierarchical structure. Examples might be:

animal > bird > canary > Tweetie

or

artifact > motor vehicle > car > Porsche > 911.

A weakness of the radar concept model is that it did not allow development of these important hierarchical structures.

Sensor Fusion with the Ersatz Brain.

We can do simple

sensor fusion

Brain.

in the Ersatz The

data representation

based on the topographic data representations used in the brain: we develop is directly

topographic computation

.

Spatializing the data

, that is letting it find a

natural topographic organization

that reflects the

relationships

between data values, is a technique potential power. We are working with

relationships not with the values themselves.

between values,

Spatializing the problem

provides a way of “programming” a parallel computer.

Topographic Data Representation

Low Values Medium Values High Values

••

++++

•••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••

++++

••••••••••••••••••••••• ••••••••••••••••••••••••••••••••••••••••••••

++++

•• We initially will use a simple

bar code

value of a single parameter. to code the The precision of this coding is low.

But we don’t care about

quantitative precision

: We want

qualitative analysis.

Brains are good at qualitative analysis, poor at quantitative analysis. (Traditional computers are the opposite .)

For our demo Ersatz Brain program, we will assume we have

four parameters

derived from a source. An “

object

” is characterized by values of these four parameters, coded as bar codes on the

edges of the array

of CPUs. We assume local linear transmission of patterns from module to module.

Demo

Each pair of input patterns gives rise to an

interference pattern

, a line perpendicular to the midpoint of the line between the pair of input locations.

There are places where three or four features meet at a module. The

higher-level combinations

represent relations between the individual data values in the input pattern.

Combinations

have literally

fused

spatial relations

of the input data,

Formation of Hierarchical Concepts.

This approach allows the formation of what look like

hierarchical concept representations

. Suppose we have

three parameter values that are fixed

each object and

one value that varies widely

from example to example. for The system develops two different types of spatial data.

In the

first

, some high order feature combinations are fixed since the

three fixed input (core) patterns never change

. In the

second

there is a varying set of feature combinations corresponding to the

details of each specific example

of the object.

The specific examples all contain the common core pattern.

The group of coincidences in the center of the array is due to the

three

input values arranged around the left, top and bottom edges.

Core Representation

Left are two common

core examples

where there is a different value on the right side of the array. Note the pattern (above).

Development of A “Hierarchy” Through Spatial Localization.

The coincidences due to the

core

(three values) and to the

examples

(all four values) are spatially separated. We can use the

core

as a

representation

of the

examples

since it is present in all of them.

It acts as the

higher level

in a simple hierarchy:

all examples contain the core

.

This approach is based on

relationships

between parameter values and

not

on the values themselves.

Relationships are Valuable

Consider :

Which pair is most similar?

Experimental Results

One pair has stimulus, that is, one half of the figure is identical.

high physical similarity

to the initial The other pair has

high relational similarity

, that is, they form a

pair

of identical figures.

Adults

tend to choose relational similarity.

Children

tend to choose physical similarity.

However

, It is easy to bias adults and children toward either relational or physical similarity. Potentially very a very

flexible

and

programmable

system.

Cognitive Computation: Second Example - Arithmetic

• Brains and computers are very different in the way they do things, largely because the underlying hardware is so different.

• Consider a computational task that humans and computers do frequently, but by different means: –

Learning simple arithmetic facts

The Problem with Arithmetic

• We often congratulate ourselves on the powers of the human mind.

• But why does this amazing structure have such trouble learning elementary arithmetic?

• Even adults doing arithmetic are slow and make many errors.

• Learning the times tables takes children several years and they find it hard.

The Problem with Arithmetic

At the same time children are having trouble learning arithmetic they are knowledge sponges learning – Several new words a day.

– Social customs.

– Many facts in other areas.

Association

In structure, arithmetic facts are simple associations.

Consider multiplication:

(Multiplicand)(Multiplicand)

Product

Multiplication

These are not arbitrary associations.

• They have an ambiguous structure that gives rise to associative interference.

4 x 3 = 12 4 x 4 = 16 4 x 5 = 20

• Initial ‘

4’

has associations with many possible products.

Ambiguity causes difficulties for simple associative systems.

Number Magnitude

• One way to cope with ambiguity is to embed the fact in a larger context.

• Numbers are much more than arbitrary abstract patterns.

• Experiment: –

Which is greater? 17 or 85

Which is greater? 73 or 74

Response Time Data

Number Magnitude

It takes much longer to compare 74 and 73.

When a “distance” intrudes into what should be an abstract relationship it is called a

symbolic distance

effect.

A computer would be unlikely to show such an effect. (Subtract numbers, look at sign.)

Magnitude Coding

Key observation: We see a similar pattern when

sensory magnitudes

are being compared.

Deciding which of – two

weights

– two

lights

is heavier, is brighter, – two

sounds

– two

numbers

is louder is bigger displays the same reaction time pattern.

Magnitude Coding

This effect and many others suggest that we have an

internal representation

of number that acts like a sensory magnitude.

Conclusion:

Instead of number being an

abstract symbol

, humans use a

much richer coding of number

containing powerful sensory and perceptual components .

Magnitude Coding

This elaboration of number is a good thing. It – Connects number to the

physical world.

– Provides the basis for

mathematical intuition.

– Responsible for the

creative aspects

of mathematics .

Model Makes Small Mistakes, Not Big Ones

Model used a neural network based associative system.

Buzz words:

non-linear, associative, dynamical system, attractor network

.

The magnitude representation is built into the system by assuming there is a

topographic map of magnitude

somewhere in the brain.

First Observation about Arithmetic Errors

Arithmetic fact errors are not random

.

• Errors tend to be

close in size

correct answer.

to the • In the simulations, this effect is due to the presence of the magnitude code.

Second Observation: Error Values

Values of incorrect answers are not random

.

• They are

product numbers

, that is, the answer to

some

multiplication problem.

• Only 8% of errors are not the answer to a multiplication problem.

Human Algorithm for Multiplication

The answer to a multiplication problem is:

1. Familiar (a product) 2. About the right size.

Human Algorithm for Multiplication

• Arithmetic fact learning is a

memory and estimation

process.

It is not really a computation!

Flexible and programmable

Learning facts alone doesn’t get you far. The world never looks exactly like what you learned.

Heraclitus (500 BC): •

It is not possible to step twice into the same river.

A major goal of learning is to apply past learning to new situations.

Getting Correct What you Never Learned: Comparisons

Consider number comparisons:

Is 7 bigger than 9?

We can be sure that children do not learn number comparisons individually.

There are too many of them. – About 100 single digit comparisons – About 10,000 two-digit comparisons – And so on.

Magnitude

• We now see the usefulness of the “

sensory” magnitude

representation. • We can use

magnitude

to do computations like number comparisons

without

having to learn special cases.

• A generalization of the multiplication simulation did comparisons of number pairs it had never seen before. (Without further learning.)

Implications

We have constructed a system that acts like like

logic

or

symbol processing

but in a

limited domain

.

It does so by using its

connection to perception

to do much of the computation.

These “abstract” or “symbolic” operations display their underlying perceptual nature in effects like symbolic distance and error patterns in arithmetic.

Connect perception to abstraction and gain the power of each approach

• Humans are a

hybrid

computer.

• We have a recently evolved, rather buggy ability to handle abstract quantities and symbols.

• (only 100,000 years old. We have the

alpha release

software.) of the intelligence

Connect perception to abstraction and gain the power of each approach

• We combine symbol processing with highly evolved, extremely effective sensory and perceptual systems.

• Realized in a mammalian neocortex.

• (over 500 million years old. We have a

late release, high version

number of the perceptual software.) • The two systems cooperate and work together effectively.

Conclusions

A hybrid strategy is biological: – Let a new system complement an old one. Never throw anything away.

– Even a little abstract processing goes a long way. Perhaps that is one reason why our species has been so successful so fast.

Conclusions

Speculation:

Perhaps digital computers and humans (and brain-like computers??) are evolving toward a complementary relationship

.

• Each computational style has its virtues: –

Humans (and brain-like computers):

show flexibility, estimation, connection to the physical world –

Digital Computers:

accuracy.

show speed, logic, • Both styles are valuable. There is a valuable place for both.