#### Transcript Data Mining Techniques 1

```RBF Neural Networks
x2
+1
-
-
-
-
-
-
+2
x1
Examples inside circles 1 and 2 are of class +,
examples outside both circles are of class –
What NN does the job of separating the two
classes?
NN 3
1
Example 1
Let t1,t2 and r1, r2 be the center and radius of circles 1
and 2, respectively, x = (x1,x2) example
x1
x2
_t1
1
y
_t2
1
_t1(x) = 1 if distance of x from t1 less than r1 and 0 otherwise
_t2(x) = 1 if distance of x from t2 less than r2 and 0 otherwise
NN 3
2
Example 1
_t2
+2
-
-
-
(0,1)
-
-
-
+ 1
-
_t1
(1,0)
Geometrically: examples are mapped to the feature space
(_t1, _t2): examples in circle 2 are mapped to (0,1),
examples in circle 1 are mapped to (1,0), and examples
outside both circles are mapped to (0,0).
The two classes become linearly separable in the
(_t1, _t2) feature space!
(0,0)
NN 3
3
RBF ARCHITECTURE
x1
1
w1
x2
y
 m1
wm1
xm
• One hidden layer with RBF activation functions
1...m1
• Output layer with linear activation function.
y  w11 (|| x  t1 ||)  ...  wm1m1 (|| x  tm1 ||)
|| x  t || distanceof x  ( x1,...,xm ) from vectort
NN 3
4
Other Types of φ
1
 ( r )  2 2 12
(r  c )
c0
• Gaussian functions (most used)
 r2 
 ( r )  exp   2 
2 

NN 3
 0
5
Gaussian RBF φ
φ:
center
 is a measure of how spread the curve is:
Small 
Large 
NN 3
6
HIDDEN NEURON MODEL
• Hidden units: use radial basis functions
φ( || x - t||)
the output depends on the distance of
the input x from the center t
x1
x2

φ( || x - t||)
t is called center
xm
NN 3
7
Hidden Neurons
• A hidden neuron is more sensitive to data
points near its center.
• For Gaussian RBF this sensitivity may be
NN 3
8
Example: the XOR problem
• Input space:
• Output space:
x2
(0,1)
(1,1)
(0,0)
(1,0)
0
1
x1
y
• Construct an RBF pattern classifier such that:
(0,0) and (1,1) are mapped to 0, class C1
(1,0) and (0,1) are mapped to 1, class C2
NN 3
9
Example: the XOR problem
• In the feature (hidden layer) space:
1 (|| x  t1 ||)  e
|| x t1 ||2
 2 (|| x  t2 ||)  e
φ2
|| x t2 ||2
(0,0)
1.0
0.5
(0,1) and (1,0)
with t1  (1,1) and t2  (0,0)
Decision boundary
(1,1)
0.5
1.0
φ1
• When mapped into the feature space < 1 , 2 > (hidden layer), C1
and C2 become linearly separable. So a linear classifier with 1(x) and
2(x) as inputs can be used to solve the XOR problem.
NN 3
10
RBF NN for the XOR problem
1 (|| x  t1 ||)  e
|| x t1 ||2
 2 (|| x  t2 ||)  e
with t1  (1,1) and t2  (0,0)
|| x t2 ||2
x1
t1
-1
y
x2
t2
-1
+1
y  e
|| x t1||2
e
|| x t2 ||2
1
If y  0 thenclass 1 otherwiseclass 0
NN 3
11
Application: FACE RECOGNITION
• The problem:
– Face recognition of persons of a known group in
an indoor environment.
• The approach:
– Learn face classes over a wide range of poses
using an RBF network.
• See the PhD thesis by Jonathan Howell
http://www.cogs.susx.ac.uk/users/jonh/index.html
NN 3
12
Dataset
• Sussex database (university of Sussex)
– 100 images of 10 people (8-bit grayscale, resolution
384 x 287)
– for each individual, 10 images of head in different
pose from face-on to profile
– Designed to good performance of face recognition
techniques when pose variations occur
NN 3
13
Datasets (Sussex)
All ten images
for classes 0-3
from the Sussex
database, nosecentred and
subsampled to
25x25 before
preprocessing
NN 3
14
RBF: parameters to learn
• What do we have to learn for a RBF NN
with a given architecture?
– The centers of the RBF activation functions
– the spreads of the Gaussian RBF activation
functions
– the weights from the hidden to the output layer
• Different learning algorithms may be used for
learning the RBF network parameters. We
describe three possible methods for learning
NN 3
15
Learning Algorithm 1
• Centers: are selected at random
– centers are chosen randomly from the training set
• Spreads: are chosen by normalization:
Maximumdistancebetween any 2 centers dmax


m
number of centers
1
• Then the activation function of hidden neuron i
becomes:

i x  t i
2

 m1
 exp  2 x  t i
 d max
NN 3
2



16
Learning Algorithm 1
• Weights: are computed by means of the
pseudo-inverse method.
– For an example ( xi , di ) consider the output of
the network
y( xi )  w11(|| xi  t1 ||)  ...  wm1m1(|| xi  tm1 ||)
– We would like y( xi )  di for each example, that
is
w11(|| xi  t1 ||)  ...  wm1m1 (|| xi  tm1 ||)  di
NN 3
17
Learning Algorithm 1
• This can be re-written in matrix form for one example
1(|| xi  t1 ||) ...m1(|| xi  tm1 ||)[w1...wm1]T  di
and
1 (|| x1  t1 ||)... m1 (|| x1  tm1 ||) 
...
[ w ...w ]T  [d ...d ]T
1
N

 1 m1
1 (|| x N  t1 ||)... m1 (|| x N  tm1 ||)
for all the examples at the same time
NN 3
18
Learning Algorithm 1
 1 (|| x1  t1 ||) ...  m1 (|| xN  tm1 ||)


...


1 (|| x N  t1 ||) ...  m1 (|| xN  tm1 ||)
let
then we can write
If

 w1  d1 
 ...   ... 
   
 wm1  d N 

is the pseudo-inverse of the matrix
we obtain the weights using the following
formula


[w1...wm1 ]   [d1...d N ]
T
NN 3
T
19
Learning Algorithm 1: summary
1. Choose the centers randomly from the
training set.
2. Compute the spread for the RBF function
using the normalization method.
3. Find the weights using the pseudo-inverse
method.
NN 3
20
Learning Algorithm 2: Centers
• clustering algorithm for finding the centers
1 Initialization: tk(0) random k = 1, …, m1
2 Sampling: draw x from input space
3 Similarity matching: find index of center closer to x
k(x) arg mink x(n) t k (n)
t k ( n  1) 
t k (n)  x(n) t k (n)
t k ( n)
if k  k(x)
otherwise
5 Continuation: increment n by 1, goto 2 and continue
until no noticeable changes of centers occur
NN 3
21
Learning Algorithm 2: summary
• Hybrid Learning Process:
• Clustering for finding the centers.
• LMS algorithm for finding the weights.
NN 3
22
Learning Algorithm 3
• Apply the gradient descent method for finding
centers, spread and weights, by minimizing the
1
(instantaneous) squared error
E  ( y ( x )  d )2
2
• Update for:
centers
weights
E
t j  t j
 tj
E
 j   j
 j
E
w ij  ij
w ij
NN 3
23
Comparison with FF NN
RBF-Networks are used for regression and for
performing complex (non-linear) pattern classification
Comparison between RBF networks and FFNN:
• Both are examples of non-linear layered feed-forward
networks.
• Both are universal approximators.
NN 3
24
Comparison with multilayer NN
• Architecture:
– RBF networks have one single hidden layer.
– FFNN networks may have more hidden layers.
• Neuron Model:
– In RBF the neuron model of the hidden neurons is different from the
one of the output nodes.
– Typically in FFNN hidden and output neurons share a common
neuron model.
– The hidden layer of RBF is non-linear, the output layer of RBF is
linear.
– Hidden and output layers of FFNN are usually non-linear.
NN 3
25
Comparison with multilayer NN
• Activation functions:
– The argument of activation function of each hidden neuron in
a RBF NN computes the Euclidean distance between input
vector and the center of that unit.
– The argument of the activation function of each hidden
neuron in a FFNN computes the inner product of input vector
and the synaptic weight vector of that neuron.
• Approximation:
– RBF NN using Gaussian functions construct local
approximations to non-linear I/O mapping.
– FF NN construct global approximations to non-linear I/O
mapping.
NN 3
26
```