Transcript Slide 1
Curious Characters: From Virtual Worlds to Sentient Homes Dr. Kathryn Merrick University of New South Wales Australian Defence Force Academy School of Engineering and Information Technology [email protected] November, 2009 Machine Learning and Developmental Robotics Research Group Overview ■ Motivated reinforcement learning ■ Curious characters in multiuser games ■ Curious robots ■ Motivated supervised learning ■ Curious places ■ Motivated reflex agents ■ Curious network security agents ■ Future directions Curious Characters in Multiuser Games with Prof. Mary Lou Maher, National Science Foundation World of Warcraft (Blizzard) Motivation Theories Existence, relatedness and growth. Free Press, New York. Atkinson, J.W. and Feather, N.T., 1966. A theory of achievement motivation. Wiley, New York. Alderfer, C., 1972. Berlyne, D.E., 1966. Exploration and Berlyne, D.E., 1970. curiosity . Science, 153: 25-33. Novelty, complexity and hedonic value. Perception and Psychophysics, 8: 279-286. operant training. In: A. Black and W. Prokasy (Editors), Classical conditioning - current theory Bindra, D., 1974. A unified account of classical conditioning and and research. Appleton-Century-Crofts, New York, USA, pp. 453 - 481. Csikszentmihalyi, M., 1996. Creativity: Flow and the Psychology of Discovery and Invention. HarperCollins Publisher, New York, NY. Deci, E. and Ryan, R., 1985. Intrinsic motivation and self-determination in human behaviour. Plenum Press, New York. Easterbrook, J.A., 1959. The effect of emotion on cue utilisation and the organisation of behaviour. Psychological Review, 66: 183-201. Geen, R.G., Beatty, W.W. and Arkin, R.M., 1984. Massachussets. Heider, F., 1958. The psychology of Human motivation: physiological, behavioural and social approaches. Allyn and Bacon, Inc, interpersonal relations. Wiley, New York. Attribution theory Hull, C.L., 1952. A behaviour system: an introduction to behaviour theory concerning the individual organism. Yale University Press, New Haven. Drive theory Hunt, J.M., 1975. Implications of sequential order and hierarchy in early psychological development. Exceptional Infant, 3. Kandel, E.R., Schwarz, J.H. and Jessell, T.M., 1995. Essentials of Maslow, A., 1954. Motivation and neural science and behaviour. Appleton and Lang, Norwalk. personality. Harper, New York. Maslow’s hierarchy of needs Animal behavior Motivational state theory McFarland, D., 1995. . Longman, England. Mook, D.G., 1987. Motivation: the organisation of action. W. W. Norton and Company, Inc, New York. Raynor, J.O., 1969. Future orientation and motivation of immediate activity: an elaboration of the theory of achievement motivation. Psychological Review, 76: 606-610. Sperber, D. and Wilson, D., 1995. Relevance: communication and cognition. Blackwell Publishing. Tolman, E.C., 1932. Purposive behaviour in animals and men. Century, New York. competence. Psychological Review, 66: 297-333. Wundt, W., 1910. Principles of physiological psychology. Macmillan, New York. White, R.W., 1959. Motivation reconsidered: The concept of “Problem Finding” plus Problem Solving (Saunders, 2001) Motivated Reinforcement Learning ■ Learning from trial-and-error and intrinsic reward Sensors S(t) Sensation ■ Structures: ■ S(t) – Raw sensor data ■ O(t) – Observation of state ■ E(t) – Event (change) ■ M(t) – Motivated reward value ■ B(t) – Learned policy ■ A(t) – Action to execute O(t), E (t) Motivation M(t) Learning AGENT B(t) Activation A(t) Effectors ENVIRONMENT Motivation as Curiosity SOM, K-means, SART network, etc C(t) = (Wundt, 1910; Berlyne, 1960; 1966; Stanley, 1976; Schmidhuber, 1991; Marsland et al., 2000; Saunders, 2001) A Game Using MRL Statistical Evaluation 1 cv(K) = h 1 i 1 ( ai a K ) Maximum Behavioural Complexity ■ Identifying learned tasks K as repeated events or observations h 2 aK ■ Number of learned tasks ■ Behavioural complexity ■ Number of actions to complete task 5 0 ADAPT_COMPETENCE RECALL_COMPETENCE 35 Behavioural Variety . ■ Behavioural variety: 10 30 +Competence 25 Interest 20 15 10 5 0 0 20000 40000 60000 80000 100000 Time Curious Reconfigurable Robots with A/Prof. Elanor Huntington, Tom Scully, UNSW@ADFA ■ Function approximation and MRL ■ Neural networks ■ Adaptive resonance theory networks ■ Modelling behaviour cycles for: ■ Motivation ■ Evaluation ■ A toy that ‘comes alive’ as it is being constructed Motivation and Behaviour Cycles (Ahlgren and Halberg, 1990) ■ Biological ■ Cognitive ■ Social Kolb learning cycle (Marsland et al., 2000) Socio-demographic cycles Evaluation and Visualisation S(226) = (tacho:139.0, mov:100.0, red:0.0, green:80.0, blue:0.0) Merrick, K.: (2009) Evaluating Intrinsically Motivated Robots using Affordances and PointCloud Matrices, The Ninth International Conference on Epigenetic Robotics (EpiRob09), Venice, Italy, pp 105-112 Numerical Evaluation 7 Q-MRL Median cycle length 6 NN-MRL 5 SART-MRL 4 3 2 Behavioural Variety Complexity Stability 1 0 Snail Bee Cricket 1.4 Ant 1.2 NN-MRL 1 Stability Merrick, K.: (2010) Modeling Behavior Cycles as a Value System for Developmental Robots, Adaptive Behavior (to appear) Q-MRL SART-MRL 0.8 0.6 0.4 0.2 0 Snail Bee Cricket Ant Curious Places with Prof. Mary Lou Maher and Dr. Rob Saunders, University of Sydney ■ Intelligent environments that adapt to support and enhance human activities by being curious about, and learning about those activities. ■ Consider the space as an immobile robot Motivated Supervised Learning ■ Learning intrinsically motivated tasks by mimicking Sensors S(t) Sensation ■ Structures: ■ S(t) – Raw sensor data ■ O(t) – Observation of state O(t), X(t), E(t) Motivation O(t), X(t) O(t) AGENT ■ E(t) – Event (change) ■ X(t) – Example (state+action) Learning B(t) Activation A(t) ■ B(t) – Learned policy ■ A(t) – Action to execute Effectors ENVIRONMENT Number of Actions Performed By Human Evaluation and Case Studies 60 50 Traditional environment 40 Intelligent environment 30 20 10 0 0 Mon 20 Tues40 Wed60 80 Thur Time (days) Merrick, K. Shafi, K.: (2009) Agent Models for Self-Motivated Home Assistant Bots, International Symposium on Computational Models for Life Sciences, Sofia, Bulgaria [invited paper] (to appear). 100 Fri Curious Network Security Agents with Dr. Kamran Shafi, UNSW@ADFA ■ Curious agents combine three measures to analyse stimuli (network data): ■ Similarity: clustering layer ■ Recency: habituating layer ■ Frequency: interest layer ■ Online, single-pass learners: ■ Potential for real-time operation ■ Unsupervised learners: ■ Potential to adapt to changes in network usage ■ Don’t require labelled data Motivated Reflex Agents ■ Triggering intrinsically motivated reflexes ■ Structures: ■ S(t) – Raw sensor data ■ O(t) – Observation of state ■ E(t) – Event (change) ■ M(t) – Motivation value ■ A(t) – Action to execute Sensors S(t) Sensation O(t), E(t) Motivation M(t) AGENT Activation A(t) Effectors ENVIRONMENT Domain Specific Evaluation Probe User to Root (U2R) 100 80 60 True positive rate True positive rate 100 80 60 Normal Data 40 100 Observations 0 0 False positive rate 20 40 20 Events 80 800000 0 0 200000 400000 600000 Time (Connection records sensed) 800000 80 200000 400000 600000 60 Time (Connection records sensed) 40 Observations 20 Events 100 20 0 60 Events Remote to Local (R2L) True positive rate True positive rate 80 Observations 0 200000 400000 600000 60 Time (Connection records sensed) Denial of Service40(DOS) 100 Observations 800000 40 Observations 20 Events Events 0 0 0 200000 400000 600000 Time (Connection records sensed) 800000 0 200000 400000 600000 Time (Connection records sensed) 800000 A Curious Tour Guide Robot with Dayne Schmidt, UNSW@ADFA ■ Identifies interesting ‘artworks’ ■ Reflexively moves towards them while avoiding obstacles Machine Learning and Developmental Robotics Lab, UNSW@ADFA ■ Identifies interesting ‘artworks’ ■ Reflexively moves towards them while avoiding obstacles (Saunders, 2001) Future Directions ■ Modelling motivation ■ Other individual models: biological, cognitive, social… ■ Unified models ■ Learning models for use with motivation ■ Recall and reuse: hierarchical models ■ Other forms of learning, planning… ■ Combined models ■ Evaluating intrinsically motivated behaviour Questions and Discussion Dr. Kathryn Merrick [email protected] http://www.itee.adfa.edu.au/~s3229187 Lecturer in Information Systems University of New South Wales Australian Defence Force Academy School of Engineering and Information Technology