Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City.
Download ReportTranscript Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City.
Slide 1
Analysis of Ensemble Learning
using Simple Perceptrons based
on Online Learning Theory
Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5
1 Kobe City College of Tech.,
2 Tokyo Metropolitan College of Tech.,
3 University of Tokyo
4 RIKEN BSI,
5 Intelligent Cooperation and Control, PRESTO, JST
1
Slide 2
ABSTRACT
Ensemble learning of K simple perceptrons,
which determine their outputs by sign functions,
is discussed within the framework of online
learning and statistical mechanics. One purpose
of statistical learning theory is to theoretically
obtain the generalization error. We show that
ensemble generalization error can be calculated
by using two order parameters, that is, the
similarity between a teacher and a student, and
the similarity among students. The differential
equations that describe the dynamical behaviors
of these order parameters are derived in the case
of general learning rules. The concrete forms of
these differential equations are derived
analytically in the cases of three well-known
rules: Hebbian learning, perceptron learning and
AdaTron learning. Ensemble generalization errors
of these three rules are calculated by using the
results determined by solving their differential
equations. As a result, these three rules show
different characteristics in their affinity for
ensemble learning, that is “maintaining variety
among students”. Results show that AdaTron
learning is superior to the other two rules with
2
respect to that affinity.
‘
Slide 3
BACKGROUND
• Ensemble learning has recently attracted the
attention of many researchers. Ensemble learning
means to combine many rules or learning
machines (students in the following) that perform
poorly. Theoretical studies analyzing the
generalization performance by using statistical
mechanics have been performed vigorously.
• Hara and Okada theoretically analyzed the case in
which students are linear perceptrons.
• Hebbian learning, perceptron learning and
AdaTron learning are well-known as learning rules
for a nonlinear perceptron, which decides its
output by sign function. Determining differences
among ensemble learnings with Hebbian learning,
perceptron learning and AdaTron learning, is a
very attractive problem, but it is one that has never
been analyzed.
OBJECTIVE
• We discuss ensemble learning of K
simple perceptrons within the framework
of online learning and finite K.
3
Slide 4
MODEL
Teacher
Students
1
•
•
•
•
2
K
Common input x to teacher and all students in the same
order.
Input x , once used for an update, is abandoned. (Online
learning)
Update of student is independent each other.
Two methods are treated to decide an ensemble output.
One is the majority vote (MV) of students, and the other
is the weight mean (WM).
Input:
Teacher:
Student:
length of
4
student
Slide 5
THEORY
Generalization Error εg:
Probability that an ensemble output disagrees with
that of the teacher for a new input x
Similarity between teacher and student
Similarity among students
5
Slide 6
Differential equations describing l and R
(known result)
Differential equation describing q
(new result)
6
Slide 7
RESULTS
Hebbian
(known result)
(new result)
7
Slide 8
Perceptron
(known result)
8
(new result)
Slide 9
AdaTron
(known result)
(new result)
9
Slide 10
Theory (K=1)
Theory (K=3,MV)
Simulation (K=3,MV)
0.5
εg
Normalized ε g (t=50)
Generalization Error
0.4
0.3
0.2
0.1
2
4
6
8
Time
εg
0.4
0.3
0.2
0.1
2
4
6
8
Time
εg
10
0.4
0.3
0.2
0.1
0
2
4
6
0.2
0.4 0. 6
-1
K
0.8
K=∞
1
K=1
1
0.9
0.8
M V, Theory
M V, S imu lati on
W M, The ory
WM, Si mul atio n
0. 7
0.6
0
0.2
0.4 0. 6
-1
K
0.8
1
Perceptron
Theory (K=1)
Theory (K=3,M V)
Simulation (K=3,MV)
0.5
0.96
0
Normalized ε g (t=50)
0
MV, Theory
M V, S imulation
WM, Theory
WM, S imulation
0. 98
Hebbian
Th eory (K=1)
Theory (K=3,MV)
Simulation (K=3,MV)
0.5
1
10
Normalized ε g (t=50)
0
1. 02
8
10
1
0.9
0.8
M V, T heo ry
MV, S imulation
W M, The ory
WM, Simulation
0. 7
0.6
0
0.2
0.4
0. 6
K
Time
AdaTron
0.8
-1
10
1
Slide 11
DISCUSSION
Similarity between
teacher and student
Similarity among students
B
Jk
Rk R
Jk' J
k
k'
B
J k'
q k k'
11
Slide 12
B
B
Jk
J k'
q kk'
q is small →
Effect of ensemble
is strong.
Jk
J k'
q is large →
Effect of ensemble
is small.
To maintain the variety of
students is important in
ensemble learning.
→Relationship between R and q
is essential.
12
Slide 13
Dynamical behaviors of R and q
1
0.8
0.6
q
O verlap
O ve rlap
1
R
0.4
0.2
0.6
R
q
0.4
0.2
Hebbian
0
0
0.8
2
4
6
Perceptron
0
10
8
0
2
4
6
8
10
Time
Time
0.8
R
0.6
q
0.4
0.2
AdaTron
0
0
2
4
6
8
10
Time
Relationship between R and q
1
Similarity q
O ve rlap
1
0.8
Hebbian
0.6
Perceptron
0.4
AdaTr on
0.2
0
0
0.2
0.4
0.6
0.8
Similarit y R
1
13
Slide 14
Analysis of Ensemble Learning
using Simple Perceptrons based
on Online Learning Theory
Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5
1 Kobe City College of Tech.,
2 Tokyo Metropolitan College of Tech.,
3 University of Tokyo
4 RIKEN BSI,
5 Intelligent Cooperation and Control, PRESTO, JST
1
Slide 2
ABSTRACT
Ensemble learning of K simple perceptrons,
which determine their outputs by sign functions,
is discussed within the framework of online
learning and statistical mechanics. One purpose
of statistical learning theory is to theoretically
obtain the generalization error. We show that
ensemble generalization error can be calculated
by using two order parameters, that is, the
similarity between a teacher and a student, and
the similarity among students. The differential
equations that describe the dynamical behaviors
of these order parameters are derived in the case
of general learning rules. The concrete forms of
these differential equations are derived
analytically in the cases of three well-known
rules: Hebbian learning, perceptron learning and
AdaTron learning. Ensemble generalization errors
of these three rules are calculated by using the
results determined by solving their differential
equations. As a result, these three rules show
different characteristics in their affinity for
ensemble learning, that is “maintaining variety
among students”. Results show that AdaTron
learning is superior to the other two rules with
2
respect to that affinity.
‘
Slide 3
BACKGROUND
• Ensemble learning has recently attracted the
attention of many researchers. Ensemble learning
means to combine many rules or learning
machines (students in the following) that perform
poorly. Theoretical studies analyzing the
generalization performance by using statistical
mechanics have been performed vigorously.
• Hara and Okada theoretically analyzed the case in
which students are linear perceptrons.
• Hebbian learning, perceptron learning and
AdaTron learning are well-known as learning rules
for a nonlinear perceptron, which decides its
output by sign function. Determining differences
among ensemble learnings with Hebbian learning,
perceptron learning and AdaTron learning, is a
very attractive problem, but it is one that has never
been analyzed.
OBJECTIVE
• We discuss ensemble learning of K
simple perceptrons within the framework
of online learning and finite K.
3
Slide 4
MODEL
Teacher
Students
1
•
•
•
•
2
K
Common input x to teacher and all students in the same
order.
Input x , once used for an update, is abandoned. (Online
learning)
Update of student is independent each other.
Two methods are treated to decide an ensemble output.
One is the majority vote (MV) of students, and the other
is the weight mean (WM).
Input:
Teacher:
Student:
length of
4
student
Slide 5
THEORY
Generalization Error εg:
Probability that an ensemble output disagrees with
that of the teacher for a new input x
Similarity between teacher and student
Similarity among students
5
Slide 6
Differential equations describing l and R
(known result)
Differential equation describing q
(new result)
6
Slide 7
RESULTS
Hebbian
(known result)
(new result)
7
Slide 8
Perceptron
(known result)
8
(new result)
Slide 9
AdaTron
(known result)
(new result)
9
Slide 10
Theory (K=1)
Theory (K=3,MV)
Simulation (K=3,MV)
0.5
εg
Normalized ε g (t=50)
Generalization Error
0.4
0.3
0.2
0.1
2
4
6
8
Time
εg
0.4
0.3
0.2
0.1
2
4
6
8
Time
εg
10
0.4
0.3
0.2
0.1
0
2
4
6
0.2
0.4 0. 6
-1
K
0.8
K=∞
1
K=1
1
0.9
0.8
M V, Theory
M V, S imu lati on
W M, The ory
WM, Si mul atio n
0. 7
0.6
0
0.2
0.4 0. 6
-1
K
0.8
1
Perceptron
Theory (K=1)
Theory (K=3,M V)
Simulation (K=3,MV)
0.5
0.96
0
Normalized ε g (t=50)
0
MV, Theory
M V, S imulation
WM, Theory
WM, S imulation
0. 98
Hebbian
Th eory (K=1)
Theory (K=3,MV)
Simulation (K=3,MV)
0.5
1
10
Normalized ε g (t=50)
0
1. 02
8
10
1
0.9
0.8
M V, T heo ry
MV, S imulation
W M, The ory
WM, Simulation
0. 7
0.6
0
0.2
0.4
0. 6
K
Time
AdaTron
0.8
-1
10
1
Slide 11
DISCUSSION
Similarity between
teacher and student
Similarity among students
B
Jk
Rk R
Jk' J
k
k'
B
J k'
q k k'
11
Slide 12
B
B
Jk
J k'
q kk'
q is small →
Effect of ensemble
is strong.
Jk
J k'
q is large →
Effect of ensemble
is small.
To maintain the variety of
students is important in
ensemble learning.
→Relationship between R and q
is essential.
12
Slide 13
Dynamical behaviors of R and q
1
0.8
0.6
q
O verlap
O ve rlap
1
R
0.4
0.2
0.6
R
q
0.4
0.2
Hebbian
0
0
0.8
2
4
6
Perceptron
0
10
8
0
2
4
6
8
10
Time
Time
0.8
R
0.6
q
0.4
0.2
AdaTron
0
0
2
4
6
8
10
Time
Relationship between R and q
1
Similarity q
O ve rlap
1
0.8
Hebbian
0.6
Perceptron
0.4
AdaTr on
0.2
0
0
0.2
0.4
0.6
0.8
Similarit y R
1
13
Slide 14