Transcript PowerPoint

CS 551/651
Search and “Through the Lens”
Lecture 13
Assign 1 Grading
Sign up for a slot to demo to TA
• Sunday upon return from break
• Monday upon return from break
Papers to read during break
• Spacetime Constraints
• Evolved Virtual Creatures
• Neuroanimator
Single-layer networks
Training
• Training samples are used to tune the network weights
– Input / output pairs
• Network generates an output based on input (and weights)
• Network’s output is compared to correct output
• Error in output is used to adapt the weights
• Repeat process to minimize errors
Consider error in single-layer neural
networks
Sum of squared errors (across training data)
For one sample:
How can we minimize the error?
•
Set derivative equal to zero (like in Calc 101)
–
•
Solve for weights that make derivative == 0
Is that error affected by each of the weights in the weight vector?
Minimizing the error
What is the derivative?
• The gradient,
– Composed of
Computing the partial
• For a network, hw, with inputs x and correct output y
• Remember the Chain Rule:
Computing the partial
g ( ) = the activation function
Computing the partial
Chain rule again
g’() = derivative of the activation function
Minimizing the error
Gradient descent
Learning rate
Why are modification rules more
complicated in multilayer?
We can calculate the error of the output neuron by
comparing to training data
• We could use previous update rule to adjust W3,5 and W4,5 to
correct that error
• But how do W1,3
W1,4 W2,3 W2,4
adjust?
Backprop at the output layer
Output layer error is computed as in single-layer
and weights are updated in same fashion
• Let Erri be the ith component of the error vector y – hW
– Let
Backprop in the hidden layer
Each hidden node is responsible for some
fraction of the error Di in each of the output nodes
to which it is connected
• Di is divided among all hidden nodes that connect to output i
according to their strengths
Error at hidden node j:
Backprop in the hidden layer
Error is:
Correction is:
Summary of backprop
1. Compute the D value for the output units using
the observed error
2. Starting with the output layer, repeat the
following for each layer until done
•
Propagate D value back to previous layer
•
Update the weights between the two layers
Some general artificial neural network
(ANN) info
• The entire network is a function g( inputs ) = outputs
– These functions frequently have sigmoids in them
– These functions are frequently differentiable
– These functions have coefficients (weights)
• Backpropagation networks are simply ways to tune the
coefficients of a function so it produces desired output
Function approximation
Consider fitting a line to data
• Coefficients: slope and y-intercept
• Training data: some samples
• Use least-squares fit
y
This is what an ANN does
x
Function approximation
A function of two inputs…
• Fit a smooth
curve to the
available
data
– Quadratic
– Cubic
– nth-order
– ANN!
Curve fitting
• A neural network should be able to generate the input/output
pairs from the training data
• You’d like for it to be smooth (and well-behaved) in the voids
between the training data
• There are risks of over fitting the data
When using ANNs
• Sometimes the output layer feeds back into the input layer –
recurrent neural networks
• The backpropagation will tune the weights
• You determine the topology
– Different topologies have different training outcomes
(consider overfitting)
– Sometimes a genetic algorithm is used to explore the
space of neural network topologies
Through The Lens Camera Control
Controlling virtual camera
Lagrange multipliers
Lagrange Multipliers without Permanent Scarring
• Dan Klein – www.cs.berkeley.edy/~klein (now at Stanford)
More complicated example
Maximize parabaloid subject to a unit circle
• Any solution to the maximization problem must sit on x2 + y2 = 1
The central theme of Lagrange
Multipliers
At the solution points, the isocurve (a.k.a. level
curve or contour) of the function to be maximized
is tangent to the constraint curve
Tangent Curves
Tangent curves == parallel normals
• Create the Lagrangian
• Solve for where gradient = 0 to capture parallel normals and
g(x) must equal 0
Go to board for more development