Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City.

Download Report

Transcript Analysis of Ensemble Learning using Simple Perceptrons based on Online Learning Theory Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5 1 Kobe City.

Slide 1

Analysis of Ensemble Learning
using Simple Perceptrons based
on Online Learning Theory
Seiji MIYOSHI 1 Kazuyuki HARA 2 Masato OKADA 3,4,5
1 Kobe City College of Tech.,
2 Tokyo Metropolitan College of Tech.,
3 University of Tokyo
4 RIKEN BSI,
5 Intelligent Cooperation and Control, PRESTO, JST

1


Slide 2

ABSTRACT
Ensemble learning of K simple perceptrons,
which determine their outputs by sign functions,
is discussed within the framework of online
learning and statistical mechanics. One purpose
of statistical learning theory is to theoretically
obtain the generalization error. We show that
ensemble generalization error can be calculated
by using two order parameters, that is, the
similarity between a teacher and a student, and
the similarity among students. The differential
equations that describe the dynamical behaviors
of these order parameters are derived in the case
of general learning rules. The concrete forms of
these differential equations are derived
analytically in the cases of three well-known
rules: Hebbian learning, perceptron learning and
AdaTron learning. Ensemble generalization errors
of these three rules are calculated by using the
results determined by solving their differential
equations. As a result, these three rules show
different characteristics in their affinity for
ensemble learning, that is “maintaining variety
among students”. Results show that AdaTron
learning is superior to the other two rules with
2
respect to that affinity.



Slide 3

BACKGROUND
• Ensemble learning has recently attracted the
attention of many researchers. Ensemble learning
means to combine many rules or learning
machines (students in the following) that perform
poorly. Theoretical studies analyzing the
generalization performance by using statistical
mechanics have been performed vigorously.
• Hara and Okada theoretically analyzed the case in
which students are linear perceptrons.
• Hebbian learning, perceptron learning and
AdaTron learning are well-known as learning rules
for a nonlinear perceptron, which decides its
output by sign function. Determining differences
among ensemble learnings with Hebbian learning,
perceptron learning and AdaTron learning, is a
very attractive problem, but it is one that has never
been analyzed.

OBJECTIVE
• We discuss ensemble learning of K
simple perceptrons within the framework
of online learning and finite K.
3


Slide 4

MODEL
Teacher

Students

1






2

K

Common input x to teacher and all students in the same
order.
Input x , once used for an update, is abandoned. (Online
learning)
Update of student is independent each other.

Two methods are treated to decide an ensemble output.
One is the majority vote (MV) of students, and the other
is the weight mean (WM).

Input:
Teacher:
Student:
length of
4
student


Slide 5

THEORY
Generalization Error εg:
Probability that an ensemble output disagrees with
that of the teacher for a new input x

Similarity between teacher and student

Similarity among students
5


Slide 6

Differential equations describing l and R
(known result)

Differential equation describing q
(new result)

6


Slide 7

RESULTS
Hebbian

(known result)

(new result)

7


Slide 8

Perceptron

(known result)

8

(new result)


Slide 9

AdaTron

(known result)

(new result)
9


Slide 10

Theory (K=1)
Theory (K=3,MV)
Simulation (K=3,MV)

0.5

εg

Normalized ε g (t=50)

Generalization Error
0.4
0.3
0.2

0.1
2

4

6

8

Time

εg

0.4
0.3
0.2
0.1
2

4

6

8

Time

εg

10

0.4
0.3

0.2
0.1

0

2

4

6

0.2

0.4 0. 6
-1
K

0.8

K=∞

1

K=1

1
0.9
0.8
M V, Theory
M V, S imu lati on
W M, The ory
WM, Si mul atio n

0. 7
0.6
0

0.2

0.4 0. 6
-1
K

0.8

1

Perceptron

Theory (K=1)
Theory (K=3,M V)
Simulation (K=3,MV)

0.5

0.96

0

Normalized ε g (t=50)

0

MV, Theory
M V, S imulation
WM, Theory
WM, S imulation

0. 98

Hebbian

Th eory (K=1)
Theory (K=3,MV)
Simulation (K=3,MV)

0.5

1

10

Normalized ε g (t=50)

0

1. 02

8

10

1
0.9
0.8
M V, T heo ry
MV, S imulation
W M, The ory
WM, Simulation

0. 7
0.6

0

0.2

0.4

0. 6
K

Time

AdaTron

0.8

-1

10

1


Slide 11

DISCUSSION

Similarity between
teacher and student

Similarity among students

B
Jk

Rk R

Jk' J
k
k'

B

J k'

q k k'

11


Slide 12

B

B
Jk

J k'

q kk'

q is small →
Effect of ensemble
is strong.

Jk

J k'

q is large →
Effect of ensemble
is small.

To maintain the variety of
students is important in
ensemble learning.
→Relationship between R and q
is essential.
12


Slide 13

Dynamical behaviors of R and q
1

0.8
0.6

q

O verlap

O ve rlap

1

R

0.4
0.2

0.6

R

q

0.4
0.2

Hebbian

0
0

0.8

2

4

6

Perceptron

0

10

8

0

2

4

6

8

10

Time

Time

0.8

R

0.6

q

0.4

0.2

AdaTron

0

0

2

4

6

8

10

Time

Relationship between R and q
1

Similarity q

O ve rlap

1

0.8

Hebbian

0.6

Perceptron
0.4

AdaTr on
0.2
0
0

0.2

0.4

0.6

0.8

Similarit y R

1

13


Slide 14