Transcript PowerPoint
CS 551/651 Search and “Through the Lens” Lecture 13 Assign 1 Grading Sign up for a slot to demo to TA • Sunday upon return from break • Monday upon return from break Papers to read during break • Spacetime Constraints • Evolved Virtual Creatures • Neuroanimator Single-layer networks Training • Training samples are used to tune the network weights – Input / output pairs • Network generates an output based on input (and weights) • Network’s output is compared to correct output • Error in output is used to adapt the weights • Repeat process to minimize errors Consider error in single-layer neural networks Sum of squared errors (across training data) For one sample: How can we minimize the error? • Set derivative equal to zero (like in Calc 101) – • Solve for weights that make derivative == 0 Is that error affected by each of the weights in the weight vector? Minimizing the error What is the derivative? • The gradient, – Composed of Computing the partial • For a network, hw, with inputs x and correct output y • Remember the Chain Rule: Computing the partial g ( ) = the activation function Computing the partial Chain rule again g’() = derivative of the activation function Minimizing the error Gradient descent Learning rate Why are modification rules more complicated in multilayer? We can calculate the error of the output neuron by comparing to training data • We could use previous update rule to adjust W3,5 and W4,5 to correct that error • But how do W1,3 W1,4 W2,3 W2,4 adjust? Backprop at the output layer Output layer error is computed as in single-layer and weights are updated in same fashion • Let Erri be the ith component of the error vector y – hW – Let Backprop in the hidden layer Each hidden node is responsible for some fraction of the error Di in each of the output nodes to which it is connected • Di is divided among all hidden nodes that connect to output i according to their strengths Error at hidden node j: Backprop in the hidden layer Error is: Correction is: Summary of backprop 1. Compute the D value for the output units using the observed error 2. Starting with the output layer, repeat the following for each layer until done • Propagate D value back to previous layer • Update the weights between the two layers Some general artificial neural network (ANN) info • The entire network is a function g( inputs ) = outputs – These functions frequently have sigmoids in them – These functions are frequently differentiable – These functions have coefficients (weights) • Backpropagation networks are simply ways to tune the coefficients of a function so it produces desired output Function approximation Consider fitting a line to data • Coefficients: slope and y-intercept • Training data: some samples • Use least-squares fit y This is what an ANN does x Function approximation A function of two inputs… • Fit a smooth curve to the available data – Quadratic – Cubic – nth-order – ANN! Curve fitting • A neural network should be able to generate the input/output pairs from the training data • You’d like for it to be smooth (and well-behaved) in the voids between the training data • There are risks of over fitting the data When using ANNs • Sometimes the output layer feeds back into the input layer – recurrent neural networks • The backpropagation will tune the weights • You determine the topology – Different topologies have different training outcomes (consider overfitting) – Sometimes a genetic algorithm is used to explore the space of neural network topologies Through The Lens Camera Control Controlling virtual camera Lagrange multipliers Lagrange Multipliers without Permanent Scarring • Dan Klein – www.cs.berkeley.edy/~klein (now at Stanford) More complicated example Maximize parabaloid subject to a unit circle • Any solution to the maximization problem must sit on x2 + y2 = 1 The central theme of Lagrange Multipliers At the solution points, the isocurve (a.k.a. level curve or contour) of the function to be maximized is tangent to the constraint curve Tangent Curves Tangent curves == parallel normals • Create the Lagrangian • Solve for where gradient = 0 to capture parallel normals and g(x) must equal 0 Go to board for more development