Learning Walk Parameters for Humanoid Robot NAO

Transcript Learning Walk Parameters for Humanoid Robot NAO

Using OpenRDK
to learn walk parameters for
the Humanoid Robot NAO
it’s me 
F. Giannone
A. Cherubini
L. Iocchi
M. Lombardo
G. Oriolo
Overview:
environment
• Humanoid Robot
Robotic Agent
NAO
• Produced by Aldebaran
Application
SDK
Simulator
Robotic Soccer
Overview: (sub)tasks
Vision Module
At First !!!
Modelling Module
Process raw data
from environment
Elaborate raw data to obtain
more reliable information
Actuate robot motors
accordindly
Decide the best behaviour to
accomplish the agent goal
Environment
Behaviour Control
Module
Motion Control
Module
At First !!!
Make Nao walk…how?
Nao is equipped
with a set of motion utilities including
a walk implementation that can be
 called through an interface
(NaoQi Motion Proxy)
 partially customized by tuning
some parameters
For these reasons
we
decided
to develop
Main
Advantage
our walk model
 Ready to Use (…to be tuned)
and to tune it using
machine learnig tecniques
…and a Drawback
 Based on an unknow Walk Model
No flexibility at all!!!
SPQR Walking library development workflow
Develop the Walk
model using Matlab
SPQR Walk Model
Test the walk model on
Webots simulator
Design and Implement a C++ library
for our RDK Soccer Agent
SPQR
Walking
Library
on Webots simulator
Test our Walking
RDK Agent
Finally tune walk parameters
(on webots simulator and on NAO)
on real NAO robot
A simple walking RAgent for Nao
Switches between
two states: walk stand
Simple Behaviour Module
Motion Control Module
uses
SPQR Walking Library
NaoQi Adaptor
Webots Client
Smemy
TCP channel
NAO (NaoQi)
WEBOTS
SPQR Walking Engine Model
21 degrees of freedom
NAO model
characteristics
No actuated trunk
No dynamic model available
We follow the
“Static Walking Pattern”:
Use a-priori definition of the
desired trajectories defined by:
Velocity Commands (v,ω)
• v is linear velocity
• ω is angolar velocity
Choose a set of variable output:
3D coordinates of selected points
of the robot
Choose and parametrize the desired
trajectories for these variables
at each phase of the gait
SPQR velocity commands
(v,0)
(v,0)
Initial Half Step
Behavior Control
Module
(v,ω)
Motion Control
Module
Stand Position
(v,ω)
Rectilinear
Walk
Swing
(0,0)
(0,ω)
(v,ω)
Curvilinear
Walk
Swing
Turn Step
Joints Matrix
(0, ω)
Final Half Step
(0,0)
SPQR walking subtasks and parameters
Biped walking
Swing phase
SS%
Double support phase
SPQR walk subtasks
Foot trajectories in
the xz plane
Xtot, Xsw0, Xds
Zst, Zsw
Center of mass
trajectory in lateral
direction
Yft, Yss, Yds, Kr
Hip yaw/pitch
control (turn)
Hyp
Arm control
Ks
Walk tuning: main issues

Possible choices



Machine Learning seems the best solution



By hand
By using machine learning techniques
Less human interaction
Explores the search space in a more systematic way
…but take care of some aspects



You need to define an effective fitness function
You need to choose the right algorithm to explore the parameter
space
Only a limited amount of experiments can be done on a real
robot
SPQR Learning System Architecture
uses
Learning library
Learner
Webots
Iteration
experiments
Fitness
RAgent
uses
(GPS)
Data
to evaluate
the fitness
Real Nao
Walking library
SPQR Learner
Learner
Policy Gradient
(e.g., PGPR)
First
iteration?
No
Apply the chosen
algorithm (strategy)
Genetic Algorithm
Yes
Return initial
Iteration and
iteration information
Nelder Mead
Simplex Method
Return next
Iteration and
iteration information
Policy Gradient (PG) iteration
Given a point p in
the parameter space  IRK
*=   normalized()
p’=p+*
Generate n (n=mk) policies
from p (for each component
of p: pi , pi+, or pi-)
Evaluate the policies
For each k  {1, …, K},
if F0 > F+ and F0 > Fthen k=0
else k= F+ -F-
For each k  {1, …, K},
compute Fk+, Fk0, Fk-
Enhancing PG: PGPR
At each iteration i, the gradient estimate (i) can be
used to obtain a metric for measuring the
relevance of the parameters.
forgetting factor
Given the relevance and a threshold T, PGPR prunes less relevant parameters
in next iterations.
Curvilinear biped walking experiment

The robot move along a curve with radius R for a time t
Fitness function:
In which:
path length
radial error
Simulators in learning tasks

Advantages
 You
can test the gait model and the learning algorithm
without being biased by noise

Limits
 The
results of the experiments on the simulator can
be ported on the real robot, but specialized solutions
for the simulated model can be not so effective on the
real robot (e.g., it does not take into account
asymmetries, models are not very accurate)
Results (1)
• Fitness increases
in a regular way
• Low variance
among the five
simulations



Five sessions of PG, 20 iterations each, all starting from
the same initial configuration
SS%, Ks, Yft have been set to hand-tuned values
16 policies for each iteration
Results (2)
Final parameter sets
for the five PG runs
Zsw
Xsw0 Xs
Kr
Five runs of PGPR
Bibliography

A. Cherubini, F. Giannone, L. Iocchi, M. Lombardo, G. Oriolo. “Policy
Gradient Learning for a Humanoid Soccer Robot”. Accepted for Journal of
Robotics and Autonomous Systems.

A. Cherubini, F. Giannone, L. Iocchi, and P. F. Palamara, “An extended
policy gradient algorithm for robot task learning”, Proc. of IEEE/RSJ
International Conference on Intelligent Robots and System, 2007.

A. Cherubini, F. Giannone, and L. Iocchi, “Layered learning for a soccer
legged robot helped with a 3D simulator”, Proc. of 11th International
Robocup Symposium, 2007.

http://openrdk.sourceforge.net

http://www.aldebaran-robotics.com/

http://spqr.dis.uniroma1.it
???
??? Any Questions ???
???