Multilayer feed-forward artificial neural networks for

Download Report

Transcript Multilayer feed-forward artificial neural networks for

Multilayer feed-forward
artificial neural networks
for
Class-modeling
F. Marini, A. Magrì, R. Bucci
Dept. of Chemistry - University of Rome “La Sapienza”
The starting question….
ANN papers published: 1982-2002
4780 4916
4643
5000
4000
3577
3000
2509
2000
1615
1000
367
1
4
28 148
0
1982
85-86
89-90
93-94
97-98
20012002
Despite literature on NNs has increased significantly,
no paper considers the possibility of performing classmodeling
class modeling: what….
classification
class modeling
• Class modeling considers one class at a time
• Any object can then belong or not to that specific
class model
• As a consequence, any object can be assigned to only
one class, to more than one class or to no class at all
…..and why
• Flexibility
• Additional information:
– sensitivity: fraction of samples from category X accepted by
the model of category X
– specificity: fraction of samples from category Y (or Z, W….)
refused by the model of category X
• No need to rebuild the existing models each time a new
category is added.
less equivocal answer to the question:
“are the analytical data compatible with the product
being X as declared?”
A first step forward
• A particular kind of NN, after suitable modifications
could be used for performing class-modeling (Anal.
Chim. Acta, 544 (2005), 306)
– Kohonen SOM
– Addition of dummy random vectors to the training set
– Computation of a suitable (non-parametric) probability
distribution after mapping on the 2D Kohonen layer.
– Definition of the category space based on this distribution
In this communication…
…The possibility of using a different type of neural
network (multilayer feed-forward) to operate classmodeling is studied
– How to?
– Examples
Just a few words about NN
Polla ta deina kouden anqropou deinoteron pelei.
Sophocles
NN: a mathematical approach
• From a computational point of view, ANNs represent a
way to operate a non-linear functional mapping between
an input and an output space.
y  f (x)
• This functional relation is expressed in an implicit way (via
a combination of suitably weighted non-linear functions,
in the case of MLF-NN)
• ANNs are usually represented as groups of elementary
computational units (neurons) performing simultaneously
the same operations.
• Types of NN differ in how neurons are grouped and
how they operate
Multilayer feed-forward NN
• Individual processing units are organized in three types
of layer: input, hidden and output
• All neurons within the same layer operate
simultaneously
y1
y2
y3
y4
output
hidden
input
x1
x2
x3
x4
x5
x1
The artificial neuron
w1k
w2k
x2
x3

zk
f()
w3k
zk  f (i wik xi  w0k )
hidden
input
x1
x2
x3
x4
x5
The artificial neuron
z1
w1j
w2j
z2
z3
y1

w3j
yj
f()
y j  f (k wkj zk  w0 j ) 
 f (k wkj ( f (i wik xi  w0 k ))  w0 j )
y2
y3
y4
output
hidden
input
x1
x2
x3
x4
x5
Training
• Iterative variation of connection weights, to minimize an
error criterion.
• Usually, backpropagation algorithm is used:
E
 P wij (t )  -
  P wij (t - 1)
wij
P
MLF class-modeling: what to do?
• Model for each category has to be built using only
training samples from that category
• Suitable definition of category space
Somewhere to start from
Input
Hidden
Input
x1
x2
x3
Xj
xm
Output value of hidden node 1
When targets are equal to input values, hidden nodes
could be thought of as a sort of non-linear principal
components
… and a first ending point
• For each category a neural network model is computed
providing the input vector also as desired target vector
Ninp-Nhid-Ninp
• Number of hidden layer is estimated by loo-cv
(minimum reconstruction error in prediction)
• The optimized model is then used to predict unknown
samples:
– Sample is presented to the network
– Vector of predicted responses (which is an estimate of the
original input vector) is computed
– Prediction error is calculated and compared to the average
prediction error for samples belonging to the category (as in
SIMCA).
NN-CM in practice
• Separate category autoscaling
•X
•
xˆ
C
train
C
test,i
 N ;W ;s
C
hid
C
2
0 ,C
 f (xtest,i ; W )  s
C
2
i ,C
C
T
C
ˆ
ˆ
s  (xtest,i - xtest,i ) (xtest,i - xtest,i ) / NV
2
i ,C
• Fi ,C 
si2,C
s02,C
• if p( F  Fi ,C ) is lower than a predifined threshold, the
sample is refused by the category model.
A couple of examples
The classical X-OR
• 200 training samples:
– 100 class 1
– 100 class 2
• 200 test samples:
– 100 class 1
– 100 class 2
3 hidden neurons for each category
• Sensitivity:
Results
– 100% class 1, 100% class2
• Specificity:
– 75% class1 vs class 2
– 67% class2 vs class 1
• Prediction ability:
– 87% class1
– 83% class2
– 85% overall
• These results are significantly better than with
SIMCA and UNEQ (specificities lower than 30% and
classification slightly higher than 60%)
A very small data-set: honey
CM of honey samples
• 76 samples of honey from 6 different botanical origins
(honeydew, wildflower, sulla, heather, eucalyptus and
chestnut)
• 11-13 samples per class
• 2 input variables: specific rotation; total acidity
• Despite the small number of samples, a good NN
model was obtained (2 hidden neurons for each class)
• Possibility of drawing a Coomans’ plot
Further work and Conclusions
• A novel approach to class-modeling based on
multilayer feed-forward NN was presented
• Preliminary results seem to indicate its usefulness in
cases where traditional class modeling fails
• Effect of training set dimension should be further
invetigated (our “small” data set was too good to be
used for obtaining a definitive answer)
• We are analyzing other “exotic” data sets for
classification where traditional methods fail.
Acknowledgements
• Prof. Jure Zupan, Slovenia