#### Transcript Data Mining Techniques 1

RBF Neural Networks x2 +1 - - - - - - +2 x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does the job of separating the two classes? NN 3 1 Example 1 Let t1,t2 and r1, r2 be the center and radius of circles 1 and 2, respectively, x = (x1,x2) example x1 x2 _t1 1 y _t2 1 _t1(x) = 1 if distance of x from t1 less than r1 and 0 otherwise _t2(x) = 1 if distance of x from t2 less than r2 and 0 otherwise Hyperspheric radial basis function NN 3 2 Example 1 _t2 +2 - - - (0,1) - - - + 1 - _t1 (1,0) Geometrically: examples are mapped to the feature space (_t1, _t2): examples in circle 2 are mapped to (0,1), examples in circle 1 are mapped to (1,0), and examples outside both circles are mapped to (0,0). The two classes become linearly separable in the (_t1, _t2) feature space! (0,0) NN 3 3 RBF ARCHITECTURE x1 1 w1 x2 y m1 wm1 xm • One hidden layer with RBF activation functions 1...m1 • Output layer with linear activation function. y w11 (|| x t1 ||) ... wm1m1 (|| x tm1 ||) || x t || distanceof x ( x1,...,xm ) from vectort NN 3 4 Other Types of φ • Inverse multiquadrics 1 ( r ) 2 2 12 (r c ) c0 • Gaussian functions (most used) r2 ( r ) exp 2 2 NN 3 0 5 Gaussian RBF φ φ: center is a measure of how spread the curve is: Small Large NN 3 6 HIDDEN NEURON MODEL • Hidden units: use radial basis functions φ( || x - t||) the output depends on the distance of the input x from the center t x1 x2 φ( || x - t||) t is called center is called spread center and spread are parameters xm NN 3 7 Hidden Neurons • A hidden neuron is more sensitive to data points near its center. • For Gaussian RBF this sensitivity may be tuned by adjusting the spread , where a larger spread implies less sensitivity. NN 3 8 Example: the XOR problem • Input space: • Output space: x2 (0,1) (1,1) (0,0) (1,0) 0 1 x1 y • Construct an RBF pattern classifier such that: (0,0) and (1,1) are mapped to 0, class C1 (1,0) and (0,1) are mapped to 1, class C2 NN 3 9 Example: the XOR problem • In the feature (hidden layer) space: 1 (|| x t1 ||) e || x t1 ||2 2 (|| x t2 ||) e φ2 || x t2 ||2 (0,0) 1.0 0.5 (0,1) and (1,0) with t1 (1,1) and t2 (0,0) Decision boundary (1,1) 0.5 1.0 φ1 • When mapped into the feature space < 1 , 2 > (hidden layer), C1 and C2 become linearly separable. So a linear classifier with 1(x) and 2(x) as inputs can be used to solve the XOR problem. NN 3 10 RBF NN for the XOR problem 1 (|| x t1 ||) e || x t1 ||2 2 (|| x t2 ||) e with t1 (1,1) and t2 (0,0) || x t2 ||2 x1 t1 -1 y x2 t2 -1 +1 y e || x t1||2 e || x t2 ||2 1 If y 0 thenclass 1 otherwiseclass 0 NN 3 11 Application: FACE RECOGNITION • The problem: – Face recognition of persons of a known group in an indoor environment. • The approach: – Learn face classes over a wide range of poses using an RBF network. • See the PhD thesis by Jonathan Howell http://www.cogs.susx.ac.uk/users/jonh/index.html NN 3 12 Dataset • Sussex database (university of Sussex) – 100 images of 10 people (8-bit grayscale, resolution 384 x 287) – for each individual, 10 images of head in different pose from face-on to profile – Designed to good performance of face recognition techniques when pose variations occur NN 3 13 Datasets (Sussex) All ten images for classes 0-3 from the Sussex database, nosecentred and subsampled to 25x25 before preprocessing NN 3 14 RBF: parameters to learn • What do we have to learn for a RBF NN with a given architecture? – The centers of the RBF activation functions – the spreads of the Gaussian RBF activation functions – the weights from the hidden to the output layer • Different learning algorithms may be used for learning the RBF network parameters. We describe three possible methods for learning centers, spreads and weights. NN 3 15 Learning Algorithm 1 • Centers: are selected at random – centers are chosen randomly from the training set • Spreads: are chosen by normalization: Maximumdistancebetween any 2 centers dmax m number of centers 1 • Then the activation function of hidden neuron i becomes: i x t i 2 m1 exp 2 x t i d max NN 3 2 16 Learning Algorithm 1 • Weights: are computed by means of the pseudo-inverse method. – For an example ( xi , di ) consider the output of the network y( xi ) w11(|| xi t1 ||) ... wm1m1(|| xi tm1 ||) – We would like y( xi ) di for each example, that is w11(|| xi t1 ||) ... wm1m1 (|| xi tm1 ||) di NN 3 17 Learning Algorithm 1 • This can be re-written in matrix form for one example 1(|| xi t1 ||) ...m1(|| xi tm1 ||)[w1...wm1]T di and 1 (|| x1 t1 ||)... m1 (|| x1 tm1 ||) ... [ w ...w ]T [d ...d ]T 1 N 1 m1 1 (|| x N t1 ||)... m1 (|| x N tm1 ||) for all the examples at the same time NN 3 18 Learning Algorithm 1 1 (|| x1 t1 ||) ... m1 (|| xN tm1 ||) ... 1 (|| x N t1 ||) ... m1 (|| xN tm1 ||) let then we can write If w1 d1 ... ... wm1 d N is the pseudo-inverse of the matrix we obtain the weights using the following formula [w1...wm1 ] [d1...d N ] T NN 3 T 19 Learning Algorithm 1: summary 1. Choose the centers randomly from the training set. 2. Compute the spread for the RBF function using the normalization method. 3. Find the weights using the pseudo-inverse method. NN 3 20 Learning Algorithm 2: Centers • clustering algorithm for finding the centers 1 Initialization: tk(0) random k = 1, …, m1 2 Sampling: draw x from input space 3 Similarity matching: find index of center closer to x k(x) arg mink x(n) t k (n) 4 Updating: adjust centers t k ( n 1) t k (n) x(n) t k (n) t k ( n) if k k(x) otherwise 5 Continuation: increment n by 1, goto 2 and continue until no noticeable changes of centers occur NN 3 21 Learning Algorithm 2: summary • Hybrid Learning Process: • Clustering for finding the centers. • Spreads chosen by normalization. • LMS algorithm for finding the weights. NN 3 22 Learning Algorithm 3 • Apply the gradient descent method for finding centers, spread and weights, by minimizing the 1 (instantaneous) squared error E ( y ( x ) d )2 2 • Update for: centers spread weights E t j t j tj E j j j E w ij ij w ij NN 3 23 Comparison with FF NN RBF-Networks are used for regression and for performing complex (non-linear) pattern classification tasks. Comparison between RBF networks and FFNN: • Both are examples of non-linear layered feed-forward networks. • Both are universal approximators. NN 3 24 Comparison with multilayer NN • Architecture: – RBF networks have one single hidden layer. – FFNN networks may have more hidden layers. • Neuron Model: – In RBF the neuron model of the hidden neurons is different from the one of the output nodes. – Typically in FFNN hidden and output neurons share a common neuron model. – The hidden layer of RBF is non-linear, the output layer of RBF is linear. – Hidden and output layers of FFNN are usually non-linear. NN 3 25 Comparison with multilayer NN • Activation functions: – The argument of activation function of each hidden neuron in a RBF NN computes the Euclidean distance between input vector and the center of that unit. – The argument of the activation function of each hidden neuron in a FFNN computes the inner product of input vector and the synaptic weight vector of that neuron. • Approximation: – RBF NN using Gaussian functions construct local approximations to non-linear I/O mapping. – FF NN construct global approximations to non-linear I/O mapping. NN 3 26