Lisa Torrey University of Wisconsin – Madison CS 540 Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning.
Download
Report
Transcript Lisa Torrey University of Wisconsin – Madison CS 540 Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning.
Lisa Torrey
University of Wisconsin – Madison
CS 540
Education
Hierarchical curriculum
Learning tasks share common stimulus-response elements
Abstract problem-solving
Learning tasks share general underlying principles
Multilingualism
Knowing one language affects learning in another
Transfer can be both positive and negative
Given
Learn
Task
T
Task
S
higher
asymptote
performance
higher
slope
higher
start
training
Search
Allowed Hypotheses
All Hypotheses
Search
Allowed Hypotheses
All Hypotheses
Thrun and Mitchell 1995: Transfer slopes for gradient descent
Bayesian methods
Bayesian Learning
Bayesian Transfer
Prior
distribution
+
Data
=
Posterior
Distribution
Raina et al.2006: Transfer a Gaussian prior
Hierarchical methods
Pipe
Surface
Line
Circle
Curve
Stracuzzi 2006: Learn Boolean concepts that can depend on each other
Dealing with Missing Data or Labels
Task
T
Task
S
Shi et al. 2008: Transfer via active learning
Agent
s1
Q(s1, a) = 0
π(s1) = a1
Q(s1, a1) Q(s1, a1) + Δ
π(s2) = a2
a1
a2
s2
r2
δ(s1, a1) = s2
r(s1, a1) = r2
Environment
s3
r3
δ(s2, a2) = s3
r(s2, a2) = r3
Starting-point
methods
Hierarchical
methods
New RL
algorithms
Alteration
methods
Imitation
methods
Starting-point methods
Initial Q-table
transfer
Source
task
2
5
4
8
9
1
7
2
5
9
1
4
0
0
0
0
0
0
0
0
0
0
0
0
no transfer
target-task training
Taylor et al. 2005: Value-function transfer
Hierarchical methods
Soccer
Pass
Run
Shoot
Kick
Mehta et al. 2008: Transfer a learned hierarchy
Alteration methods
Original states
Original actions
Original rewards
Task
S
New states
New actions
New rewards
Walsh et al. 2006: Transfer aggregate states
New RL Algorithms
Q(s1, a) = 0
π(s1) = a1
Agent
Q(s1, a1) Q(s1, a1) + Δ
π(s2) = a2
a1
s1
a2
s2
r2
δ(s1, a1) = s2
r(s1, a1) = r2
s3
r3
δ(s2, a2) = s3
r(s2, a2) = r3
Environment
Torrey et al. 2006: Transfer advice about skills
Imitation methods
source
policy
used
target
training
Torrey et al. 2007: Demonstrate a strategy
Starting-point
methods
Hierarchical
methods
Hierarchical
methods
New RL
algorithms
Imitation
methods
Skill
Transfer
Macro
Transfer
3-on-2 KeepAway
3-on-2 BreakAway
2-on-1 BreakAway
3-on-2 MoveDownfield
IF
[ ]
THEN pass(Teammate)
IF
distance(Teammate) ≤ 5
THEN pass(Teammate)
IF
IF
distance(Teammate) ≤ 10
THEN pass(Teammate)
distance(Teammate) ≤ 5
angle(Teammate, Opponent) ≥ 15
THEN pass(Teammate)
IF
…
distance(Teammate) ≤ 5
angle(Teammate, Opponent) ≥ 30
THEN pass(Teammate)
Batch Reinforcement Learning via Support Vector Regression (RL-SVR)
Agent
Agent
Compute
Q-functions
…
Environment
Environment
Batch 1
Batch 2
Find Q-functions that minimize:
ModelSize + C × DataMisfit
Batch Reinforcement Learning with Advice (KBKR)
Agent
Agent
Compute
Q-functions
…
Environment
Environment
Batch 1
Batch 2
Find Q-functions that minimize:
Advice
ModelSize + C × DataMisfit + µ × AdviceMisfit
Source
ILP
Mapping
IF
distance(Teammate) ≤ 5
angle(Teammate, Opponent) ≥ 30
THEN pass(Teammate)
Advice Taking
[Human advice]
Target
Skill transfer to 3-on-2 BreakAway from several tasks
IF
[ ... ]
THEN pass(Teammate)
pass(Teammate)
IF
[ ... ]
THEN pass(Teammate)
move(Direction)
IF
[ ... ]
THEN move(left)
shoot(goalRight)
IF
[ ... ]
THEN shoot(goalRight)
shoot(goalLeft)
IF
[ ... ]
THEN shoot(goalRight)
IF
[ ... ]
THEN move(ahead)
IF
[ ... ]
THEN shoot(goalRight)
IF
[ ... ]
THEN shoot(goalLeft)
An imitation method
source
policy
used
target
training
Source
ILP
Demonstration
Target
Learning structures
Positive: BreakAway
games that score
Negative: BreakAway
games that didn’t score
ILP
IF
actionTaken(Game, StateA, pass(Teammate), StateB)
actionTaken(Game, StateB, move(Direction), StateC)
actionTaken(Game, StateC, shoot(goalRight), StateD)
actionTaken(Game, StateD, shoot(goalLeft), StateE)
THEN isaGoodGame(Game)
Learning rules for arcs
Positive: states in
good games that
took the arc
Negative: states in good
games that could have
taken the arc but didn’t
ILP
shoot(goalRight)
IF
[…]
THEN enter(State)
pass(Teammate)
IF
[…]
THEN loop(State, Teammate))
Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway
Machine learning is often designed in standalone tasks
Transfer is a natural learning ability that we would like
to incorporate into machine learners
There are some successes, but challenges remain, like
avoiding negative transfer and automating mapping