Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS.
Download ReportTranscript Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS.
Learning Prospective Robot Behavior Shichao Ou and Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE A Developmental Approach • Infant Learning – In stages • Maturation processes – Parents provide constrained learning contexts • Protect • EasyComplex – Motion mobile for newborns – Use brightly colored, easy to pick up objects – Use building blocks – Association of words and objects LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Application in Robotics • Framework for Robot Developmental Learning – Role of teacher: setup learning contexts that make target concept conspicuous – Role of robot: acquire concepts, generalize to new contexts by autonomous exploration, provide feedback • Control Basis – Robot actions are created using combinations of <σ,ф,τ> – Establish stages of learning by time-varying constraints on resources • Easy Complex LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Example • Learning to Reach for Objects – Stage 1: SearchTrack • Focus attention using single brightly colored object (σ) • Limit DOF (τ) to use head ONLY – Stage 2: ReachGrab • Limit DOF (τ) to use one arm ONLY – Stage 3: Handedness, ScaleSensitive Hart et. al, 2008 LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Prospective Learning • Infant adapts to new situations by prospectively look ahead and predict failure and then learn a repair strategy LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Robot Prospective Learning with Human Guidance S0 Challenge S0 a0 a0 S1 S1 a1 a1 ai-1 ai-1 Si ai aj-1 ai Si Sj aj-1 f Sj aj aj an-1 an-1 Sn Sn g(f)=1 g(f)=0 S0 a0 S1 a1 ai-1 g :0 1 Si ai aj-1 Sj aj an-1 Sn sub-task Si1 Sij Sin LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE A 2D Navigation Domain Problem • 30x30 map • 6 doors, randomly closed • 6 buttons • 1 start and 1 goal • 3-bit door sensor on robot LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Flat Learning Results • Flat Q-Learning – 5-bit state • (x,y, door-bit1, door-bit2, door-bit3) – 4 actions • up, down, left, right – Reward • 1 for reaching the goal • -0.01 for every step taken – Learning parameter • α=0.1, γ=1.0, ε=0.1 • Learned solutions after 30,000 episodes LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Prospective Learning • Stage 1 – All doors open – Constrain resources to use only (x,y) sensors – Allow agent learn a policy from start to goal S0 Right S1 Down Right Si Right Up Sj Right Right Sn LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Prospective Learning • Stage 2 – Close 1 door – Robot learns the cause of the failure – Robot back tracks and finds an earlier indicator of this cause LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Prospective Learning • Stage 2 – Close 1 door – Robot learns the cause of the failure – Robot back tracks and finds an earlier indicator of this cause – Create a sub-task – Learn a new policy to subtask LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Prospective Learning • Stage 2 – Close 1 door – Robot learns the cause of the failure – Robot back tracks and finds an earlier indicator of this cause – Create a sub-task – Learn a new policy to subtask – Resume original policy LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Prospective Learning Results Learned solutions < 2000 episodes LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Humanoid Robot Manipulation Domain • Benefits of Prospective Learning – Adapt to new contexts by maintaining majority of the existing policy – Automatically generates sub-goals – Sub-task can be learned in a completely different state space. – Supports interactive learning LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE Conclusion • A developmental view to robot learning • A framework enables interactive incremental learning in stages • Extension to the control basis learning framework using the idea of prospective learning LABORATORY FOR PERCEPTUAL ROBOTICS • UNIVERSITY OF MASSACHUSETTS AMHERST • DEPARTMENT OF COMPUTER SCIENCE