Transcript Document
Different classes of abstract models: - Supervised learning (EX: Perceptron) Reinforcement learning -Unsupervised learning (EX: Hebb rule) - Associative memory (EX: Matrix memory) Abstraction – so what is a neuron? • Threshold unit (McCullough-Pitts) x x 0 O ( iwi xi w0 ) where ( x ) 0 x 0 O w x • Linear: i i i w0 O sig ( w x • Sigmoid: i i i w0 ) THE PERCEPTRON: (Classification) x x 0 Threshold unit: O ( wi x w0 ) where ( x) i 0 x 0 where o is the output for input pattern x, Wi are the synaptic weights and y is the desired output AND o w1 w2 w3 w4 x1 x2 x3 w5 x4 x5 x1 1 1 0 0 x2 1 0 1 0 y 1 0 0 0 AND x1 1 1 0 0 x2 1 0 1 0 y 1 0 0 0 o 1 x1 x2 1.5 0 0 1 -1.5 1 x1 Linearly seprable 1 x2 OR x1 1 1 0 0 x2 1 0 1 0 y 1 1 1 0 o 1 x1 x2 0.5 0 0 1 -0.5 1 x1 Linearly separable 1 x2 Perceptron learning rule: A Convergence Proof exists (y o ) Wi xi o w1 w2 w3 w4 x2 x3 w5 x4 x5 1. Show examples of Perceptron learning with demo program 2. Show the program itself 3. Talk about linear seperability, define dot product, show on computer. Unsupervised learning – the “Hebb” rule. dWi xi y where xi are the inputs dt and y the output is assumed linear: y W j x j j Results in 2D Example of Hebb in 2D 2 =/3 m w 1 x2 0 -1 -2 -2 -1 x1 0 1 2 (Note: here inputs have a mean of zero) • Show program, tilt axis, look a divergence Why do we get these results? On the board: • Solve simple linear first order ODE • Fixed points and their stability for non linear ODE • Eigen-values, Eigen-vectors In the simplest case, the change in synaptic weight w is: wi xi y where x are input vectors and y is the neural response. Assume for simplicity a linear neuron: y So we get: wi xi x j w j j w j xj j Now take an average with respect to the distribution of inputs, get: E[wi ] E[ xi x j ]w j Qij w j j j If a small change Δw occurs over a short time Δt then: (in matrix notation) dw w Qw t dt If <x>=0 , Q is the covariance matrix. What is then the solution of this simple first order linear ODE ? (Show on board) • Show program of Hebb rule again • Show effect of saturation limits • Possible solution – normalization Oja (PCA) rule dWi 2 xi y Wi y dt Show PCA program: dWi xi y Wi y 2 dt 1 0.8 0.6 0.4 0.2 W1 0 -0.2 -0.4 -0.6 -0.8 -1 -1 -0.5 0 0.5 1 W2 OK- some more programming – convert Hebb program To Oja rule program So OK , simulations, matlab, mathhhh etc. What does this have to do with Biology, with the brain? Another unsupervised learning model: The BCM theory of synaptic plasticity. The BCM theory of cortical plasticity BCM stands for Bienestock Cooper and Munro, it dates back to 1982. It was designed in order to account for experiments which demonstrated that the development of orientation selective cells depends on rearing in a patterned environment. BCM Theory (Bienenstock, Cooper, Munro 1982; Intrator, Cooper 1992) Requires • Bidirectional synaptic modification LTP/LTD • Sliding modification threshold • The fixed points depend on the environment, and in a patterned environment only selective fixed points are stable. LTD LTP The integral form of the average: Is equivalent to this differential form: Note, it is essential that θm is a superlinear function of the history of C, that is: with p>0 Note also that in the original BCM formulation (1982) rather then What is the outcome of the BCM theory? Assume a neuron with N inputs (N synapses), and an environment composed of N different input vectors. A N=2 example: x1 x2 What are the stable fixed points of m in this case? (Notation: ) x1 x2 Note: Every time a new input is presented, m changes, and so does θm What are the fixed points? What are the stable fixed points? The integral form of the average: Is equivalent to this differential form: Alternative form: Show matlab example: Two examples with N= 5 Note: The stable FP is such that for one pattern yi=∑wixi =θm while for the others y(i≠j)=0. (note: here c=y) BCM Theory Stability •One dimension y w xT dw y y M x dt M y 2 •Quadratic form •Instantaneous limit dw dt 0 y y y 2 x y 2 (1 y ) x 1 (c ) y y BCM Theory Selectivity •Two dimensions •Two patterns •Quadratic form •Averaged threshold y w1x1 w2 x2 w xT y1 w x1 , y 2 w x 2 dw y k y k M x k dt M E y2 patterns 2 k 2 p ( y k ) k 1 •Fixed points dw 0 dt x2 x2 BCM Theory: Selectivity dw y k y k M x k dt •Four possible fixed points •Learning Equation (unselective) (Selective) (Selective) (unselective) •Threshold y 1 y1 y1 y1 0 , y2 M , y2 0 , y2 , y2 M 0 0 M M M p1 ( y1 )2 p2 ( y 2 )2 p1 ( y1 )2 y1 1 / p1 w1 x1 x2 w2 Summary • The BCM rule is based on two differential equations, what are they? • When there are two linearly independent inputs, what will be the BCM stable fixed points? What will θ be? •When there are K independent inputs, what are the stable fixed points? What will θ be? Bonus project – 10 extra points for section Write in matlab a code for a BCM neuron trained with 2 inputs in 2D. Include a 1 page write-up and you will also meet with me for about 15 minutes to explain code and results. Associative memory: Famous images Names Albert Input Marilyn . . . . . . Harel x11 1 x2 x31 1 x4 x12 x22 x32 x42 desired output x13 x23 x33 x43 x14 4 x2 x34 4 x4 y11 1 y2 y31 1 y4 y12 y22 y32 y42 y13 y14 y23 y24 y33 y34 3 4 y4 y4 1. Feed forward matrix networks 2. Attractor networks (autoassociative) Linear matrix memory: N input neurons, M output neurons: o1 o oM P input output pairs w1µ w2µ x1 x2 wNµ xN 1. Set synaptic weights by Hebb rule 2. Present input – output is a linear operation P 1. Hebb rule: Wij x y k 1 k i k j N 2. Linear output: O rj xirWij i 1 Here you are on your own – write a matlab program to do this. Tip – use large N, small P, start with orthogonal patterns. A low-D example of a linear matrix memory, do on the board. Use simple Hebb rule between input and desired output. - Orthogonal inputs - Non Orthogonal inputs Give examples Might require other rules, Covariance, Perceptron Formal neural networks can accomplish many tasks, for example: • Perform complex classification •Learn arbitrary functions •Account for associative memory Some applications: Robotics, Character recognition, Speech recognition, Medical diagnostics. This is not Neuroscience, but is motivated loosely by neuroscience and carries important information for neuroscience as well. For example: Memory, learning and some aspects of development are assumed to be based on synaptic plasticity.