Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS.

Download Report

Transcript Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS.

Learning Prospective Robot
Behavior
Shichao Ou and Roderic Grupen
Laboratory for Perceptual Robotics
University of Massachusetts Amherst
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
A Developmental Approach
• Infant Learning
– In stages
• Maturation processes
– Parents provide constrained
learning contexts
• Protect
• EasyComplex
– Motion mobile for newborns
– Use brightly colored, easy to
pick up objects
– Use building blocks
– Association of words and
objects
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Application in Robotics
• Framework for Robot Developmental Learning
– Role of teacher: setup learning contexts that make target concept
conspicuous
– Role of robot: acquire concepts, generalize to new contexts by
autonomous exploration, provide feedback
• Control Basis
– Robot actions are created using combinations of <σ,ф,τ>
– Establish stages of learning by time-varying constraints on resources
• Easy  Complex
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Example
• Learning to Reach
for Objects
– Stage 1:
SearchTrack
• Focus attention
using single brightly
colored object (σ)
• Limit DOF (τ) to use
head ONLY
– Stage 2: ReachGrab
• Limit DOF (τ) to use
one arm ONLY
– Stage 3:
Handedness, ScaleSensitive
Hart et. al, 2008
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning
• Infant adapts to new situations by prospectively look
ahead and predict failure and then learn a repair strategy
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Robot Prospective Learning with
Human Guidance
S0
Challenge
S0
a0
a0
S1
S1
a1
a1
ai-1
ai-1
Si
ai
aj-1
ai
Si
Sj
aj-1
f
Sj
aj
aj
an-1
an-1
Sn
Sn
g(f)=1 g(f)=0
S0
a0
S1
a1
ai-1
g :0 1

Si
ai
aj-1

Sj
aj
an-1
Sn
sub-task
Si1
Sij
Sin
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
A 2D Navigation Domain
Problem
• 30x30 map
• 6 doors,
randomly closed
• 6 buttons
• 1 start and 1
goal
• 3-bit door sensor
on robot
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Flat Learning Results
• Flat Q-Learning
– 5-bit state
• (x,y, door-bit1, door-bit2,
door-bit3)
– 4 actions
• up, down, left, right
– Reward
• 1 for reaching the goal
• -0.01 for every step taken
– Learning parameter
• α=0.1, γ=1.0, ε=0.1
• Learned solutions after 30,000
episodes
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning
• Stage 1
– All doors open
– Constrain resources to
use only (x,y) sensors
– Allow agent learn a
policy from start to
goal
S0
Right
S1
Down
Right
Si
Right
Up
Sj
Right
Right
Sn
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning
• Stage 2
– Close 1 door
– Robot learns the cause of
the failure
– Robot back tracks and
finds an earlier indicator of
this cause
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning
• Stage 2
– Close 1 door
– Robot learns the cause of
the failure
– Robot back tracks and
finds an earlier indicator of
this cause
– Create a sub-task
– Learn a new policy to subtask
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning
• Stage 2
– Close 1 door
– Robot learns the cause of
the failure
– Robot back tracks and
finds an earlier indicator of
this cause
– Create a sub-task
– Learn a new policy to subtask
– Resume original policy
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Prospective Learning Results
Learned solutions < 2000 episodes
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Humanoid Robot
Manipulation Domain
• Benefits of Prospective
Learning
– Adapt to new contexts
by maintaining majority
of the existing policy
– Automatically generates
sub-goals
– Sub-task can be
learned in a completely
different state space.
– Supports interactive
learning
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE
Conclusion
• A developmental view to robot learning
• A framework enables interactive incremental
learning in stages
• Extension to the control basis learning
framework using the idea of prospective learning
LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE