Time-constrained reasoning Learn to be a split

Download Report

Transcript Time-constrained reasoning Learn to be a split

Time-constrained reasoning
Learn to be a split-second expert
Niels Taatgen and John Anderson
Carnegie Mellon University
Funded by grants from ONR (ACT-R), NWO (Set) and NASA
(CMU-ASP)
Overview
Time-constrained reasoning
 ACT-R essentials
 Set (a game)
 CMU-ASP (complex dynamic task)

Goals of the ACT-R project
Explain as much as possible of
cognition using a single theory
 The theory has to be explicated in a
cognitive architecture, a simulation
environment that mimics actual human
behavior
 Single mechanisms should explain
many phenomena

Skill Acquisition through
Production Compilation




People become faster at performing a certain
task through learning
The general theory is that they transform
declarative into procedural knowledge
Speed-up is explained by the fact that
declarative knowledge no longer has to be
retrieved
A second aspect of becoming faster is that
some things that had to be done serially can
now be done in parallel leading to qualitative
changes in behavior
Time-constrained reasoning





Initial knowledge
Speed-up by production compilation
Qualitative improvements in performance
Embedded learning
“Being fast” means: explaining why people
can learn to be very fast at certain complex
tasks (so nothing fast computers solve)
Goals of the ACT-R project
Model Human
behavior in order to:
 Better understand
human reasoning
 Use models of
human reasoning in:
• HCI
• Intelligent agents
• Tutoring
The simulation’s
predictions are as
fine-grained as
possible:
•
•
•
•
•
•
reaction times
errors
choices
learning curves
eye movements
fMRI data
ACT-R’s foundations
Rational Analysis: Knowledge is treated
as having a potential benefit for the
agent (so ‘truth’ is not fundamental)
 Procedural/Declarative distinction
 Hybrid: both symbolic and subsymbolic
 Strong focus on learning
 Strong focus on interaction with the
environment

Summary of ACT-R’s memory
 Declarative
memory
• contains facts and
past goals (called
chunks)
• activation determines
which chunk is
selected
• chunks that are
retrieved or recreated
often receive high
activations
Procedural
memory
• contains rules
• utility determines
which rule is selected
• High utility = high
probability of success
and low costs
Overview ACT-R in a Diagram
Declarative Module
(Temporal/Hippocampus)
Goal Buffer
(DLPFC)
Retrieval Buffer
(VLPFC)
Productions
(Basal Ganglia)
Intentional module
(not identified)
ACT-R Cycle:
Matching
Selection
Matching (Striatum)
Execution
Selection (Pallidum)
Execution (Thalamus)
Visual Buffer
(Parietal)
Manual Buffer
(Motor)
Visual Module
(Occipital/Parietal)
Manual Module
(Motor/Cerebellum)
External World
Overview ACT-R in a Diagram
Declarative Module
(Temporal/Hippocampus)
Goal Buffer
(DLPFC)
Retrieval Buffer
(VLPFC)
Productions
(Basal Ganglia)
Intentional module
(not identified)
Matching (Striatum)
Selection (Pallidum)
Execution (Thalamus)
Visual Buffer
(Parietal)
Manual Buffer
(Motor)
Visual Module
(Occipital/Parietal)
Manual Module
(Motor/Cerebellum)
External World
ACT-R Cycle:
Matching:
Production rules
that match the
current contents of
the buffers are
determined
Overview ACT-R in a Diagram
Declarative Module
(Temporal/Hippocampus)
Goal Buffer
(DLPFC)
Retrieval Buffer
(VLPFC)
Productions
(Basal Ganglia)
Intentional module
(not identified)
Selection:
Select the
production rule with
the highest Utility
Matching (Striatum)
Selection (Pallidum)
Execution (Thalamus)
Visual Buffer
(Parietal)
Manual Buffer
(Motor)
Visual Module
(Occipital/Parietal)
Manual Module
(Motor/Cerebellum)
External World
ACT-R Cycle:
Overview ACT-R in a Diagram
Declarative Module
(Temporal/Hippocampus)
Goal Buffer
(DLPFC)
Retrieval Buffer
(VLPFC)
Productions
(Basal Ganglia)
Intentional module
(not identified)
Execution:
The selected
production rule
modifies contents of
buffers
Matching (Striatum)
Selection (Pallidum)
Execution (Thalamus)
Visual Buffer
(Parietal)
Manual Buffer
(Motor)
Visual Module
(Occipital/Parietal)
Manual Module
(Motor/Cerebellum)
External World
ACT-R Cycle:
Modules operate
asynchronously
from central
cognition
Production Compilation:
Learning new production rules
Rule 1
Rule 2
Fact from
Declarative memory
Rule 1
Rule 2
Rule1 & Rule2
Combine two existing
rules that are used in
sequence into a new
rule, while substituting
a fact that is retrieved
from memory
Solidify recurring
reasoning patterns
Fact from
Declarative memory
Procedural
Declarative
Set
Game
 Predictions
 Experiment
 Model
 Application
 Evaluation

With Marcia van Oploo, Jos Braaksma and Jelle Niemantsverdriet
Computer Games
Computer chess: design a program that
plays chess as good as possible
 But chess players complain that
computers play boring chess
 Different goal: opponents that play as
humanly as possible.

The Game of Set!


Game consists of 81
cards
Each card has four
attributes: color,
shape, filling and
number
Goal of the Game


Twelve cards are
put on the table
Find a Set: three
cards in which for
each attribute, the
attribute values for
each of the cards
are all different, or
all the same
Goal of the Game
Not a Set!
Some Sets are more difficult
then others
One attribute different
Two attributes different
Three attributes different
Four attributes different
1
2
3
4
5
6
7
8
9
10
11
12
Set! as a game to play
against the computer
For a computer, the game is trivial (as
opposed to chess, etc.)
 The challenge is to program an
opponent that acts as a human player
 So the computer opponent has to be
fast at sets that people are fast at, and
slow at sets that people are slow at

The Predictions
1.
2.
The “easy” sets will be found faster
than the “hard” sets.
Experts on the game will mainly excel
in finding the hard sets, and will be
approximately equally good as
beginners on the easy sets
The Experiment
8 subjects, 4 beginners, 4 experts
 20 set-problems (12 cards, find the set
as fast as possible)
 5 problems of each of the four levels of
difficulty

Experimental results
Set!
100
90
80
70
Time (sec)
Experiment confirms
both hypotheses:
1. Difficult problems
take longer
2. Beginners and Experts
are equally good at
easy problems, but
Experts excel on hard
problems
60
50
40
30
20
Beginner data
10
Expert data
0
1
2
3
Difficulty (number of attributes different)
4
The model

Why do hard sets take
longer?
• Checking for unequal
attributes takes longer

Why are experts better at
hard problems?
• They are better at multitasking, which pays off in
hard problems
How the model works
Goal
3
Visual
3

First, pick a random
card and stick it in
the goal buffer
How the model works

Goal
3
Visual
2

Second, search for
a card of the same
color. If this fails,
search for an
arbitrary different
card
We don’t put this
card in the goal, but
leave it in the visual
buffer
How the model works

Goal
Visual
3
2
Retrieval buffer:
Have we tried these
two cards before?
Declarative
Memory busy
retrieving the fact
Now the model is going
to do two things in
parallel:
• Check in declarative
memory whether or not
we tried this combination
of two cards before
• Make a prediction what
the third card has to look
like
How the model works

Goal
Visual
3
2
Retrieval buffer:
Have we tried these
two cards before?
Declarative
Memory busy
retrieving the fact
Predicting the third card
• For each attribute, we
determine what it has to
be like in the third card,
and put this back into the
goal
• When the attribute for
goal and visual are
equal, this attribute is
also the desired attribute
for the new card
How the model works
Goal

Visual
13
2
Retrieval buffer:
Have we tried these
two cards before?
Declarative
Memory busy
retrieving the fact
1
Predicting the third
card
• When the attributes
are different, we
have to determine
the third value
How the model works

Goal
31
After predicting the third
card
• Now that the third card
has been predicted, the
model tries to find it on
the screen.
• If this fails, it starts all
over again
• If this succeeds, it
announces it has found a
Set!
Beginners vs. Experts

Beginner:
IF value in the goal is val1 and the
value in the visual buffer is val2
THEN send a retrieval request for a
value that is different from val1
and val2
IF the retrieval buffer contains val3,
different from val1 and val2
THEN put val3 in the goal
Expert:
IF value in the goal is
red and the value in
the visual buffer is
blue
THEN put green in the
goal

How the model works
Goal
Visual
Retrieval buffer:
What is the third 
value?
Beginners:
•
•
3
2
Retrieval buffer:
Have we tried these
two cards before?
Declarative
Memory busy
retrieving the fact
•

At the moment that (in this
example) the filling has to be
determined, a declarative
retrieval is needed
This is however impossible,
because declarative memory is
still engaged in another
retrieval!
So the beginner has to wait
until the first declarative
retrieval is done
Experts
•
Have proceduralized the
retrieval for the third attribute
value
Nothing going
on here except
wait for retrieval
Third card is
predicted
Novice
Third card is
Expert
sought
Results of the Model
Set!
120
100
Time (sec)
80
Beginner model
Expert model
60
Beginner data
Expert data
40
20
0
1
2
3
Difficulty (number of attributes different)
4
Conclusion (intermediate)
Experts are not just faster than
beginners
 They can do certain reasoning steps
effortlessly which are effortful to novices
 In this task the beginners can still do it,
but one can image tasks where you
have to be a (partial) expert to be able
to do it at all

The Application
CMU-ASP task


Subjects have to classify planes (tracks) on a
radar-screen
They have to do three things to classify a
track:
• Select one by clicking on it
• Use one of two classification methods, each of
which sometimes successful and sometimes not
• Enter the classification into the system
One classification
method is to look
at altitude and speed
Clicking on
a track selects it
Finally, a series of
Another is tokeypresses
ask for is used to
enterby
thepressing
information
a radar signature
some keys and waiting for
the information to come up
General modeling approach
Represent instructions as declarative
knowledge
 Have task-general production rules that
interpret these instructions
 Production compilation produces taskspecific rules

Representation of
instructions (Anderson)
Hand to
F-keys
Look for
a track
Look at
alt & speed
Select
“EWS”
Identify
Tracks
Hook
track
Id it
Do Radar
Select
“Query”
Encode
Repeat
Classify
This model misses many
interesting phenomena




Moving hands from the mouse to the
keyboard and vice versa ahead of time
Deciding which classification strategy to use
first
Comparing tracks before selecting one
without using to much time
To summarize: optimize parallelization of
behavior
From a strong to weak
Hierarchy
Select
Look for
Hand to
F-keys
Look for
a track
Look at
alt & speed
Select
“EWS”
Identify
Tracks
Hook
track
Id it
Do Radar
Select
“Query”
Hook
Track
Track
Track
Check
alt/speed
Look for
alt
Check
range
Find
approp.
key
Read
Air ID
Look for
speed
Repeat
Classify
Use
Radar
Id
Encode
Enter
Classification
Find
approp.
key
Recall
classification
Instructions involve
multiple steps
Look at a track
Hook a track
1. Find a visual1. Move the hand to
location of type
the mouse
track
2. Move the mouse to
2. Move attention to it
the location of the
track
3. Store the location
in the goal
3. Click the mouse
Carrying out these steps in order is inefficient!
Example productions
(p retrieve-next-clause
=goal>
isa task
rule =id
step done
-retrieval>
==>
+retrieval>
isa clause
rule =id
)
IF the goal is a
task
THEN retrieve the
next instruction
Note: spreading
activation will insure
the right instruction
is retrieved
Example productions
(p look-for-visual-track
=goal>
isa task
step done
=retrieval>
isa clause
relation look-for
arg2 =type
=visual-location>
isa visual-location
kind =type
-visual>
==>
+visual>
isa visual-object
screen-pos =visual-location
-retrieval>)
IF the goal is a task
AND an instruction is
retrieved to look for
something of a certain
type
AND we have found a
location on the screen
with something of that
type
THEN
move the eyes to the
location and attend it
Example productions
(p hook-new-mouse-not-at-destination
=goal>
isa task
var1 =object
step done
=retrieval>
isa clause
relation hook-new
arg1 var1
=manual-state>
isa module-state
modality free
!eval! (not (cursor-at =object))
==>
-retrieval>
+manual>
isa move-cursor
object =object)
IF the goal is a task
AND an instruction has
been retrieved that
specifies that an
object has to be
clicked
AND the motor module
is available
AND the cursor is not on
the object
THEN
move the mouse to
the object
Additional rule

If the visual system finds another track
while one is already in the goal, then it
compares the new track to the old track
and retains the best
Novice
Attend a track
Move hand to mouse
Move mouse to track
Click mouse
Learned rules
(p Production1013
=goal>
isa TASK
rule Find-Track
step Done
=manual-state>
isa MODULE-STATE
modality Free
!eval! (hand-not-on-mouse)
==>
+manual>
isa HAND-TO-MOUSE)
IF the goal is to find a track
AND the motor module is free
AND the hand is not on the mouse
THEN move the hand to the mouse
Learned rules
(p Production544
=goal>
isa TASK
rule Find-Track
step Done
=visual-location>
isa VISUAL-LOCATION
kind Square-Track
=visual-state>
isa MODULE-STATE
modality Free
-visual>
==>
+visual>
isa VISUAL-OBJECT
screen-pos =visual-location)
IF the goal is to find a
track
AND a visual-location of
type track has been
found
AND it is not attended
THEN attend the track
Expert
Other changes in the model’s
behavior
The model sometimes opts for doing a
Radar sweep first and then looking at
the altitude and speed (subjects do too)
 The model sometimes prefers using the
left hand for keying and the right hand
for mousing

From strong to weak task
hierarchy



Production compilation not only speeds up
performance, but also leads to qualitative
changes in behavior
Bottom-up behavior next to Top-down
behavior
Is this real reasoning? Maybe reasoning with
the small “r”, but nevertheless crucial in
understanding the flexibility of human
learning
Related work
Mike Freed: APEX (but no learning)
 Ron Chong: EPIC-Soar
 Kieras and Meyer are working on it
(EPIC), but they will have to incorporate
learning into EPIC
 Rick Lewis uses an APEX-like approach
in ACT-R

Three models
1.
2.
3.
Wait 35 sec in expert setting and 45
sec in beginner setting
Based on the model just discussed
The model plus two additional
strategies
Usability test
5
Acts like human?
4.5
4
Challenging
opponent?
3.5
3
Difference
Beginner and
Expert?
2.5
2
1.5
1
Model 1
Model 2
Model 3