Transcript (ppt)

Transfer Learning with Inter-Task Mappings
Matthew E. Taylor
Joint work with Peter Stone
Department of Computer Sciences
The University of Texas at Austin
Transfer Motivation
•
Learning tabula rasa can be unnecessarily slow
•
Humans can use information from previous tasks
–
•
Soccer with different numbers of players
Agents: leverage learned knowledge in novel/modified tasks
–
–
–
Learn faster
Larger and more complex problems become tractable
Different numbers of state variables and actions in tasks
Common TL Metrics
Also: total reward accumulated
Transfer Goals
• Autonomous transfer
–
–
–
–
AI Goal
Explore the world, learning
Transfer autonomously
Utilize past knowledge
• Learn difficult tasks faster
–
–
–
–
Engineering Goal
Learn a set of simple tasks
Eventually learn target task
Total time reduction
Transfer via Inter-Task Mappings
Source Task
π not defined for S’ and A’
π(S) → A
ρ
Target Task
ρ is a transfer functional
task-dependant:
π’(S’) → A’
relies on inter-task mappings
Inter-Task Mappings
• χA: atarget → asource
Given target task action, return similar source task action
• χX: starget → ssource
Similar, but for state variables:
for all x in each target task state: s = ⟨x1, x2, … xn⟩
• ρ automatically formed from
χA and χX to enable transfer of:
–
–
–
–
–
π(s)
Q(s, a)
Rules
Model
etc.
Transfer Functional: ρCMAC
New states and actions in target task → new tiles
Source
Target
Counterintuitive:
Q-Values are very low-level
Very task-specific
Sample Results
•
Can significantly reduce target task time and total time
Keepaway Transfer: 3 vs. 2 to 4 vs. 3
Source Task
Time
Target Task
Time
Source Task Episodes
•
Able to learn inter-task mappings with little data
Empirical Domains
• Robot Soccer Keepaway
• Server Job Scheduling
• Mountain Car
• Killer Application?
– Epilepsy?
– Robotics?
Open Questions: 1/3
• Optimize for Total Time?
Source Task
Time
Target Task
Time
Source Task Episodes
Open Questions: 2/3
• Guarantee transfer efficacy?
• Avoid Negative Transfer (“Giveaway”)?
• Similarity measure?
– Jumpstart in Target
– MDP similarity [Ferns, others]
– Analysis of learned source task knowledge
Open Questions: 3/3
• Learn an inter-task mapping efficiently?
– Sample complexity
– Computational complexity
• Select Source Task?
– In library (sunk cost)
– To learn first (total time metric)
MASTER Overview
Modeling Approximate State Transitions by Exploiting Regression
Record observed (ssource, asource, s’source) tuples in source task
Record small number of (starget, atarget, s’target) tuples in target task
Learn one-step transition model, T(S,A), for the target task:
M (starget, atarget) → s’target
for every possible action mapping χA
for every possible state variable mapping χX
Transform recorded source task tuples
Calculate the error of the transformed source task tuples on the
target task model: ∑(M(stransformed, atransformed) – s’ transformed)2
return χA,χX with lowest error
Utilizing Mappings in 3D Mountain Car