A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology [email protected].
Download ReportTranscript A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology [email protected].
A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology [email protected] Background (1) • Batch Learning – – – – Examples are used repeatedly Correct answers for all examples Long time Large memory • Online Learning – – – – Examples used once are discarded Cannot give correct answers for all examples Large memory isn't necessary Time variant teacher 2 Background (2) Teacher B1 x1 BN xN Student J1 x1 JN xN 3 Simple Perceptron Output N Output sgn( J i xi ) i 1 J1 JN Connection weights +1 -1 x1 Inputs xN 4 Background (2) Teacher Student B J J B1 x1 BN xN J1 x1 JN xN J Learnable Case 5 Background (3) Teacher B1 x1 BN xN Student J1 x1 JN xN Unlearnable Case (Inoue & Nishimori, Phys. Rev. E, 1997) (Inoue, Nishimori & Kabashima, TANC-97, cond-mat/9708096, 1997) 6 Background (4) B B B J J J Perceptron Learning Hebbian Learning 7 Model (1) True Teacher A Moving Teacher B Student J 8 A B Model (2) J Length of Moving Teacher Length of Student 9 Model (3) A B J 10 Simple Perceptron N Output sgn( J i xi ) Output i 1 Linear Perceptron J1 JN Connection weights x1 N Output J i xi i 1 xN Inputs 11 Model (3) Linear Perceptrons with Noises A B J 12 Model (4) Squared Errors A B J Gradient Method g f 13 Generalization Error Gaussian Error A B J 14 Differential equations for order parameters 15 Model (4) Squared Errors A B J Gradient Method g f 16 Bm+1 = Bm + gm xm NrBm+1 = NrBm + gmym NrBm+2 = NrBm+1 + gm+1ym+1 Ndt + NrBm+Ndt = NrBm+Ndt-1 + gm+Ndt-1ym+Ndt-1 NrBm+Ndt = NrBm N(rB+drB) = NrB + Ndt <gy> + Ndt <gy> drB / dt = <gy> 17 Differential equations for order parameters 18 Sample Averages 19 Differential equations for order parameters 20 Analytical Solutions of Order Parameters 21 Differential equations for order parameters 22 Generalization Error Gaussian Error A B J 23 G en e raliza tio n E rro r G en e raliza tio n E rro r Dynamical Behaviors of Generalization Errors 2 1.5 J 1 B 0.5 0 5 10 15 t=m/N ηJ=1.2 20 1.5 1 B 0.5 J 0 5 10 20 15 t=m/N ηJ=0.3 24 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 lJ 1.2 lB 1.0 lB lJ 0.8 R, l R, l Dynamical Behaviors of R and l RB RJ RB 0.6 0.4 RJ 0.2 0 0 5 10 15 t=m/N ηJ=1.2 20 0 5 10 15 20 t=m/N ηJ=0.3 25 Analytical Solutions of Order Parameters 26 G en e raliza tio n E rro r Steady State 10 J 1 B 0.1 0.0 0.5 1.0 1.5 2.0 ηJ 4 1.0 RB 3.5 3 RJ 0.6 l R 0.8 0.4 2 0.2 1.5 0 1 0.0 0.5 1.0 ηJ 1.5 2.0 lJ 2.5 0.0 lB 0.5 1.0 1.5 2.0 ηJ 27 B J B 0 J B ηJ J B A J 2 28 Conclusions • Generalization errors of a model composed of a true teacher, a moving teacher, and a student that are all linear perceptrons with noises have been obtained analytically using statistical mechanics. • Generalization errors of a student can be smaller than that of a moving teacher, even if the student only uses examples from the moving teacher. 29