Lisa Torrey University of Wisconsin – Madison CS 540 Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning.
Download ReportTranscript Lisa Torrey University of Wisconsin – Madison CS 540 Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning.
Lisa Torrey University of Wisconsin – Madison CS 540 Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative Given Learn Task T Task S higher asymptote performance higher slope higher start training Search Allowed Hypotheses All Hypotheses Search Allowed Hypotheses All Hypotheses Thrun and Mitchell 1995: Transfer slopes for gradient descent Bayesian methods Bayesian Learning Bayesian Transfer Prior distribution + Data = Posterior Distribution Raina et al.2006: Transfer a Gaussian prior Hierarchical methods Pipe Surface Line Circle Curve Stracuzzi 2006: Learn Boolean concepts that can depend on each other Dealing with Missing Data or Labels Task T Task S Shi et al. 2008: Transfer via active learning Agent s1 Q(s1, a) = 0 π(s1) = a1 Q(s1, a1) Q(s1, a1) + Δ π(s2) = a2 a1 a2 s2 r2 δ(s1, a1) = s2 r(s1, a1) = r2 Environment s3 r3 δ(s2, a2) = s3 r(s2, a2) = r3 Starting-point methods Hierarchical methods New RL algorithms Alteration methods Imitation methods Starting-point methods Initial Q-table transfer Source task 2 5 4 8 9 1 7 2 5 9 1 4 0 0 0 0 0 0 0 0 0 0 0 0 no transfer target-task training Taylor et al. 2005: Value-function transfer Hierarchical methods Soccer Pass Run Shoot Kick Mehta et al. 2008: Transfer a learned hierarchy Alteration methods Original states Original actions Original rewards Task S New states New actions New rewards Walsh et al. 2006: Transfer aggregate states New RL Algorithms Q(s1, a) = 0 π(s1) = a1 Agent Q(s1, a1) Q(s1, a1) + Δ π(s2) = a2 a1 s1 a2 s2 r2 δ(s1, a1) = s2 r(s1, a1) = r2 s3 r3 δ(s2, a2) = s3 r(s2, a2) = r3 Environment Torrey et al. 2006: Transfer advice about skills Imitation methods source policy used target training Torrey et al. 2007: Demonstrate a strategy Starting-point methods Hierarchical methods Hierarchical methods New RL algorithms Imitation methods Skill Transfer Macro Transfer 3-on-2 KeepAway 3-on-2 BreakAway 2-on-1 BreakAway 3-on-2 MoveDownfield IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF IF distance(Teammate) ≤ 10 THEN pass(Teammate) distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF … distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Agent Agent Compute Q-functions … Environment Environment Batch 1 Batch 2 Find Q-functions that minimize: ModelSize + C × DataMisfit Batch Reinforcement Learning with Advice (KBKR) Agent Agent Compute Q-functions … Environment Environment Batch 1 Batch 2 Find Q-functions that minimize: Advice ModelSize + C × DataMisfit + µ × AdviceMisfit Source ILP Mapping IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) Advice Taking [Human advice] Target Skill transfer to 3-on-2 BreakAway from several tasks IF [ ... ] THEN pass(Teammate) pass(Teammate) IF [ ... ] THEN pass(Teammate) move(Direction) IF [ ... ] THEN move(left) shoot(goalRight) IF [ ... ] THEN shoot(goalRight) shoot(goalLeft) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN move(ahead) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalLeft) An imitation method source policy used target training Source ILP Demonstration Target Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game) Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP shoot(goalRight) IF […] THEN enter(State) pass(Teammate) IF […] THEN loop(State, Teammate)) Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping