Transcript (ppt)
Transfer Learning with Inter-Task Mappings Matthew E. Taylor Joint work with Peter Stone Department of Computer Sciences The University of Texas at Austin Transfer Motivation • Learning tabula rasa can be unnecessarily slow • Humans can use information from previous tasks – • Soccer with different numbers of players Agents: leverage learned knowledge in novel/modified tasks – – – Learn faster Larger and more complex problems become tractable Different numbers of state variables and actions in tasks Common TL Metrics Also: total reward accumulated Transfer Goals • Autonomous transfer – – – – AI Goal Explore the world, learning Transfer autonomously Utilize past knowledge • Learn difficult tasks faster – – – – Engineering Goal Learn a set of simple tasks Eventually learn target task Total time reduction Transfer via Inter-Task Mappings Source Task π not defined for S’ and A’ π(S) → A ρ Target Task ρ is a transfer functional task-dependant: π’(S’) → A’ relies on inter-task mappings Inter-Task Mappings • χA: atarget → asource Given target task action, return similar source task action • χX: starget → ssource Similar, but for state variables: for all x in each target task state: s = ⟨x1, x2, … xn⟩ • ρ automatically formed from χA and χX to enable transfer of: – – – – – π(s) Q(s, a) Rules Model etc. Transfer Functional: ρCMAC New states and actions in target task → new tiles Source Target Counterintuitive: Q-Values are very low-level Very task-specific Sample Results • Can significantly reduce target task time and total time Keepaway Transfer: 3 vs. 2 to 4 vs. 3 Source Task Time Target Task Time Source Task Episodes • Able to learn inter-task mappings with little data Empirical Domains • Robot Soccer Keepaway • Server Job Scheduling • Mountain Car • Killer Application? – Epilepsy? – Robotics? Open Questions: 1/3 • Optimize for Total Time? Source Task Time Target Task Time Source Task Episodes Open Questions: 2/3 • Guarantee transfer efficacy? • Avoid Negative Transfer (“Giveaway”)? • Similarity measure? – Jumpstart in Target – MDP similarity [Ferns, others] – Analysis of learned source task knowledge Open Questions: 3/3 • Learn an inter-task mapping efficiently? – Sample complexity – Computational complexity • Select Source Task? – In library (sunk cost) – To learn first (total time metric) MASTER Overview Modeling Approximate State Transitions by Exploiting Regression Record observed (ssource, asource, s’source) tuples in source task Record small number of (starget, atarget, s’target) tuples in target task Learn one-step transition model, T(S,A), for the target task: M (starget, atarget) → s’target for every possible action mapping χA for every possible state variable mapping χX Transform recorded source task tuples Calculate the error of the transformed source task tuples on the target task model: ∑(M(stransformed, atransformed) – s’ transformed)2 return χA,χX with lowest error Utilizing Mappings in 3D Mountain Car