Transcript Document

A Simulated-annealing-based Approach for
Simultaneous Parameter Optimization and
Feature Selection of Back-Propagation
Networks (BPN)
Shih-Wei Lin, Tsung-Yuan Tseng,
Shuo-Yan Chou, Shih-Chieh Chen
National Taiwan University of
Science and Technology
Expert Systems with Applications 2008
Introduction



The back-propagation network (BPN)
can be used in various fields.
-evaluating consumer loans
-diagnosing heart disease
Different problems may require
different parameter settings for
network architectures.
Rule of thumb or ‘‘trial and error’’
methods are usually used to
determine them.
Introduction



Not all features are beneficial for
classification in BPN.
Select the beneficial subset of
features which result in a better
classification.
Simulated-annealing (SA) -based
approach, to obtain the optimal
parameter settings for network
architectures of BPN.
BPN

Before applying BPN to solve problems as
follow:
(1) the parameter settings for network
architectures
(2) hidden layer number
(3) learning rate
(4) momentum term
(5) number of hidden neurons
(6) learning cycle
Feature Selection

The main benefits of feature selection are
as follows:
(1) Reducing computational cost and
storage requirements
(2) Dealing with the degradation of
classification efficiency due to the
finiteness of training sample sets
(3) Reducing training and prediction time
(4) Facilitating data understanding and
visualization
Problems




While using BPN, we confront two
problems:
How to set the best parameters for BPN !
How to choose the input attributes for
BPN !
SA-based approach that not only
provided the best parameter settings for
network architecture of BPN, but also
found out the beneficial subset of
features according to different problems.
BPN

BPN is a common neural network model
whose architecture is the multilayer
perceptorns (MLP).
Learning rate of BPN

Learning rate:
1.Too high a learning rate will cause
the network architecture to oscillate
and be hard to converge.
2.Too low a learning rate will cause
slow convergence and may fall into
local optimization.
Momentum term of BPN

Momentum term:
1.Too small a momentum term does
not have an obvious effect and
cannot increase the classification
accuracy rate
2.Too big a momentum term can
excessively affect the learning effect
and cause extreme modification.
Number of hidden neurons of BPN

(3)Number of hidden neurons:
1.When there are too few hidden
neurons, it is apt to cause a
bigger error
2.Increasing the number of hidden
neurons can affect the speeds of
convergence and computing
with almost no help in reducing
errors
Learning cycle of BPN

(4) Learning cycle:
1.Too high a learning cycle will
result in over-fitting
2.Too low a learning cycle can lead
to too little training and result in
a worse classification accuracy
rate of testing data.
Some Solution



Search for the optimal weights after
training
Search for the optimal parameter
settings of BPN
Neural network pruning
Simulated Annealing

1.
2.
3.

Proposed by
Kirkpatrick (1985)
Pick a random
assignment
Make a small change
Accept change if

cost is decreased;
or

Other criteria
First used by Kakuno
et. al.
Simulated-annealing
Simulated-annealing
Initial
random assignment
Make a small change
No
Accept?
Yes
Update current solution
Yes
temperature dropping
temperature dropped
No
Termination?
Yes
Optimized
No
Solution representation




First variable is the learning rate
Second is the momentum term
Third is the number of hidden
neurons
Other is represented as feature
selection
Parameters range




SA was set to 300 to find the
optimal BPN parameter settings
The learning rate ranged from 0 to
0.45
The momentum term ranged from
0.4 to 0.9
The learning cycle of BPN was set
as 500
Platform




Using the C language
Windows XP operating system
Pentium IV 3.0 GHz CPU
512 MB of RAM.
Cross-Validation



To guarantee that the present
results are valid and can be
generalized for making predictions
regarding new data
Using k-fold-cross-validation
This study used k = 10, meaning
that all of the data will be divided
into ten parts, each of which will
take turns at being the testing data
set.
Datasets
System architecture
10-fold classification result of Breast
Cancer dataset
The comparison results of approaches
without feature selection
SA + BPN approach with feature
selection and other approaches
Experimental results summary of
with/without feature selection on
datasets
Concusion


We proposed a SA-based strategy to
select features subset and to set the
parameters for BPN classification.
Compared to the previous studies,
the classification accuracy rates of
the proposed SA + BPN approach
are better than those of other
approaches.
Thank You
Q&A