A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology [email protected].

Download Report

Transcript A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology [email protected].

A Statistical Mechanical Analysis
of Online Learning:
Can Student be more Clever
than Teacher ?
Seiji MIYOSHI
Kobe City College of Technology
[email protected]
Background (1)
• Batch Learning
–
–
–
–
Examples are used repeatedly
Correct answers for all examples
Long time
Large memory
• Online Learning
–
–
–
–
Examples used once are discarded
Cannot give correct answers for all examples
Large memory isn't necessary
Time variant teacher
2
Background (2)
Teacher
B1
x1
BN
xN
Student
J1
x1
JN
xN
3
Simple Perceptron
Output
N
Output  sgn( J i xi )
i 1
J1
JN
Connection weights
+1
-1
x1
Inputs
xN
4
Background (2)
Teacher
Student
B J
J
B1
x1
BN
xN
J1
x1
JN
xN
J
Learnable Case
5
Background (3)
Teacher
B1
x1
BN
xN
Student
J1
x1
JN
xN
Unlearnable Case
(Inoue & Nishimori, Phys. Rev. E, 1997)
(Inoue, Nishimori & Kabashima, TANC-97, cond-mat/9708096, 1997)
6
Background (4)
B
B
B
J
J
J
Perceptron Learning
Hebbian Learning
7
Model (1)
True Teacher
A
Moving Teacher
B
Student
J
8
A
B
Model (2)
J
Length of
Moving Teacher
Length of
Student
9
Model (3)
A
B
J
10
Simple Perceptron
N
Output  sgn( J i xi )
Output
i 1
Linear Perceptron
J1
JN
Connection weights
x1
N
Output   J i xi
i 1
xN
Inputs
11
Model (3)
Linear Perceptrons with Noises
A
B
J
12
Model (4)
Squared Errors
A
B
J
Gradient Method
g
f
13
Generalization Error
Gaussian
Error
A
B
J
14
Differential equations for order parameters
15
Model (4)
Squared Errors
A
B
J
Gradient Method
g
f
16
Bm+1 = Bm + gm xm
NrBm+1 = NrBm
+ gmym
NrBm+2 = NrBm+1
+ gm+1ym+1
Ndt
+
NrBm+Ndt = NrBm+Ndt-1 + gm+Ndt-1ym+Ndt-1
NrBm+Ndt = NrBm
N(rB+drB) = NrB
+ Ndt <gy>
+ Ndt <gy>
drB / dt = <gy>
17
Differential equations for order parameters
18
Sample Averages
19
Differential equations for order parameters
20
Analytical Solutions of Order Parameters
21
Differential equations for order parameters
22
Generalization Error
Gaussian
Error
A
B
J
23
G en e raliza tio n E rro r
G en e raliza tio n E rro r
Dynamical Behaviors of Generalization Errors
2
1.5
J
1
B
0.5
0
5
10
15
t=m/N
ηJ=1.2
20
1.5
1
B
0.5
J
0
5
10
20
15
t=m/N
ηJ=0.3
24
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0
lJ
1.2
lB
1.0
lB
lJ
0.8
R, l
R, l
Dynamical Behaviors of R and l
RB
RJ
RB
0.6
0.4
RJ
0.2
0
0
5
10
15
t=m/N
ηJ=1.2
20
0
5
10
15
20
t=m/N
ηJ=0.3
25
Analytical Solutions of Order Parameters
26
G en e raliza tio n E rro r
Steady State
10
J
1
B
0.1
0.0
0.5
1.0
1.5
2.0
ηJ
4
1.0
RB
3.5
3
RJ
0.6
l
R
0.8
0.4
2
0.2
1.5
0
1
0.0
0.5
1.0
ηJ
1.5
2.0
lJ
2.5
0.0
lB
0.5
1.0
1.5
2.0
ηJ
27
B
J
B
0
J
B
ηJ
J
B
A J
2
28
Conclusions
• Generalization errors of a model
composed of a true teacher, a moving
teacher, and a student that are all linear
perceptrons with noises have been
obtained analytically using statistical
mechanics.
• Generalization errors of a student can be
smaller than that of a moving teacher,
even if the student only uses examples
from the moving teacher.
29