Transcript Slide 1

Curious Characters:
From Virtual Worlds to Sentient
Homes
Dr. Kathryn Merrick
University of New South Wales
Australian Defence Force Academy
School of Engineering and Information Technology
[email protected]
November, 2009
Machine Learning and
Developmental Robotics
Research Group
Overview
■ Motivated reinforcement learning
■ Curious characters in multiuser games
■ Curious robots
■ Motivated supervised learning
■ Curious places
■ Motivated reflex agents
■ Curious network security agents
■ Future directions
Curious Characters in Multiuser Games
with Prof. Mary Lou Maher, National Science Foundation
World of Warcraft (Blizzard)
Motivation Theories
Existence, relatedness and growth. Free Press, New York.
Atkinson, J.W. and Feather, N.T., 1966. A theory of achievement motivation. Wiley, New York.
Alderfer, C., 1972.
Berlyne, D.E., 1966. Exploration and
Berlyne, D.E., 1970.
curiosity
. Science, 153: 25-33.
Novelty, complexity and hedonic value. Perception and Psychophysics, 8: 279-286.
operant training. In: A. Black and W. Prokasy (Editors), Classical conditioning - current theory
Bindra, D., 1974. A unified account of classical conditioning and
and research. Appleton-Century-Crofts, New York, USA, pp. 453 - 481.
Csikszentmihalyi, M., 1996.
Creativity: Flow and the Psychology of Discovery and Invention. HarperCollins Publisher, New York, NY.
Deci, E. and Ryan, R., 1985.
Intrinsic motivation
and self-determination in human behaviour. Plenum Press, New York.
Easterbrook, J.A., 1959. The effect of emotion on cue utilisation and the organisation of behaviour. Psychological Review, 66: 183-201.
Geen, R.G., Beatty, W.W. and Arkin, R.M., 1984.
Massachussets.
Heider, F., 1958. The psychology of
Human motivation: physiological, behavioural and social approaches. Allyn and Bacon, Inc,
interpersonal relations. Wiley, New York. Attribution theory
Hull, C.L., 1952. A behaviour system: an introduction to behaviour theory concerning the individual organism. Yale University Press, New Haven.
Drive theory
Hunt, J.M., 1975. Implications of sequential order and hierarchy in early psychological development. Exceptional Infant, 3.
Kandel, E.R., Schwarz, J.H. and Jessell, T.M., 1995. Essentials of
Maslow, A., 1954. Motivation and
neural science and behaviour. Appleton and Lang, Norwalk.
personality. Harper, New York. Maslow’s hierarchy of needs
Animal behavior
Motivational state theory
McFarland, D., 1995.
. Longman, England.
Mook, D.G., 1987. Motivation: the organisation of action. W. W. Norton and Company, Inc, New York.
Raynor, J.O., 1969. Future orientation and motivation of immediate activity: an elaboration of the theory of achievement motivation. Psychological Review, 76: 606-610.
Sperber, D. and Wilson, D., 1995. Relevance: communication and cognition. Blackwell Publishing.
Tolman, E.C., 1932. Purposive behaviour in animals and men. Century, New York.
competence. Psychological Review, 66: 297-333.
Wundt, W., 1910. Principles of physiological psychology. Macmillan, New York.
White, R.W., 1959. Motivation reconsidered: The concept of
“Problem Finding”
plus
Problem Solving
(Saunders, 2001)
Motivated Reinforcement Learning
■ Learning from trial-and-error
and intrinsic reward
Sensors
S(t)
Sensation
■ Structures:
■ S(t) – Raw sensor data
■ O(t) – Observation of state
■ E(t) – Event (change)
■ M(t) – Motivated reward value
■ B(t) – Learned policy
■ A(t) – Action to execute
O(t), E (t)
Motivation
M(t)
Learning
AGENT
B(t)
Activation
A(t)
Effectors
ENVIRONMENT
Motivation as Curiosity
SOM, K-means, SART network, etc
C(t) =
(Wundt, 1910; Berlyne, 1960; 1966; Stanley, 1976;
Schmidhuber, 1991; Marsland et al., 2000; Saunders, 2001)
A Game Using MRL
Statistical Evaluation
1
cv(K) =
h 1

i 1
( ai  a K )
Maximum Behavioural Complexity
■ Identifying learned tasks
K as repeated events or
observations
h
2
aK
■ Number of learned tasks
■ Behavioural complexity
■ Number of actions to
complete task
5
0
ADAPT_COMPETENCE
RECALL_COMPETENCE
35
Behavioural Variety .
■ Behavioural variety:
10
30
+Competence
25
Interest
20
15
10
5
0
0
20000 40000 60000 80000 100000
Time
Curious Reconfigurable
Robots
with A/Prof. Elanor Huntington, Tom Scully, UNSW@ADFA
■ Function approximation and MRL
■ Neural networks
■ Adaptive resonance theory networks
■ Modelling behaviour cycles for:
■ Motivation
■ Evaluation
■ A toy that ‘comes alive’ as it is
being constructed
Motivation and Behaviour Cycles
(Ahlgren and Halberg,
1990)
■ Biological
■ Cognitive
■ Social
Kolb learning cycle
(Marsland et al., 2000)
Socio-demographic cycles
Evaluation and Visualisation
S(226) = (tacho:139.0, mov:100.0,
red:0.0, green:80.0, blue:0.0)
Merrick, K.: (2009) Evaluating
Intrinsically Motivated Robots
using Affordances and PointCloud Matrices, The Ninth
International Conference on
Epigenetic Robotics
(EpiRob09), Venice, Italy, pp
105-112
Numerical Evaluation
7
Q-MRL
Median cycle length
6
NN-MRL
5
SART-MRL
4
3
2
 Behavioural
 Variety
 Complexity
 Stability
1
0
Snail
Bee
Cricket
1.4
Ant
1.2
NN-MRL
1
Stability
Merrick, K.: (2010)
Modeling Behavior Cycles
as a Value System for
Developmental Robots,
Adaptive Behavior (to
appear)
Q-MRL
SART-MRL
0.8
0.6
0.4
0.2
0
Snail
Bee
Cricket
Ant
Curious Places
with Prof. Mary Lou Maher and Dr. Rob Saunders, University of Sydney
■ Intelligent
environments that
adapt to support and
enhance human
activities by being
curious about, and
learning about those
activities.
■ Consider the space as
an immobile robot
Motivated Supervised Learning
■ Learning intrinsically
motivated tasks by mimicking
Sensors
S(t)
Sensation
■ Structures:
■ S(t) – Raw sensor data
■ O(t) – Observation of state
O(t), X(t), E(t)
Motivation
O(t), X(t)
O(t)
AGENT
■ E(t) – Event (change)
■ X(t) – Example (state+action)
Learning
B(t)
Activation
A(t)
■ B(t) – Learned policy
■ A(t) – Action to execute
Effectors
ENVIRONMENT
Number of Actions
Performed By Human
Evaluation and Case Studies
60
50
Traditional environment
40
Intelligent environment
30
20
10
0
0 Mon 20 Tues40
Wed60
80
Thur
Time (days)
Merrick, K. Shafi, K.: (2009)
Agent Models for Self-Motivated
Home Assistant Bots,
International Symposium on
Computational Models for Life
Sciences, Sofia, Bulgaria
[invited paper] (to appear).
100
Fri
Curious Network Security Agents
with Dr. Kamran Shafi, UNSW@ADFA
■ Curious agents combine three measures to
analyse stimuli (network data):
■ Similarity: clustering layer
■ Recency: habituating layer
■ Frequency: interest layer
■ Online, single-pass learners:
■ Potential for real-time operation
■ Unsupervised learners:
■ Potential to adapt to changes in network usage
■ Don’t require labelled data
Motivated Reflex Agents
■ Triggering intrinsically
motivated reflexes
■ Structures:
■ S(t) – Raw sensor data
■ O(t) – Observation of state
■ E(t) – Event (change)
■ M(t) – Motivation value
■ A(t) – Action to execute
Sensors
S(t)
Sensation
O(t), E(t)
Motivation
M(t)
AGENT
Activation
A(t)
Effectors
ENVIRONMENT
Domain Specific Evaluation
Probe
User to Root (U2R)
100
80
60
True positive rate
True positive rate
100
80
60
Normal Data
40
100
Observations
0
0
False positive rate
20
40
20
Events
80
800000
0
0
200000
400000
600000
Time (Connection records sensed)
800000
80
200000
400000
600000
60
Time (Connection records sensed)
40
Observations
20
Events
100
20
0
60
Events
Remote to Local (R2L)
True positive rate
True positive rate
80
Observations
0
200000
400000
600000
60
Time (Connection records sensed)
Denial of Service40(DOS)
100
Observations
800000
40
Observations
20
Events
Events
0
0
0
200000
400000
600000
Time (Connection records sensed)
800000
0
200000
400000
600000
Time (Connection records sensed)
800000
A Curious Tour Guide Robot
with Dayne Schmidt, UNSW@ADFA
■ Identifies interesting ‘artworks’
■ Reflexively moves towards
them while avoiding obstacles
Machine Learning and
Developmental Robotics
Lab, UNSW@ADFA
■ Identifies interesting ‘artworks’
■ Reflexively moves towards
them while avoiding obstacles
(Saunders, 2001)
Future Directions
■ Modelling motivation
■ Other individual models: biological, cognitive, social…
■ Unified models
■ Learning models for use with motivation
■ Recall and reuse: hierarchical models
■ Other forms of learning, planning…
■ Combined models
■ Evaluating intrinsically motivated behaviour
Questions and Discussion
Dr. Kathryn Merrick
[email protected]
http://www.itee.adfa.edu.au/~s3229187
Lecturer in Information Systems
University of New South Wales
Australian Defence Force Academy
School of Engineering and Information
Technology