Transcript ANN_2

Computing Gradient Vector and Jacobian Matrix in
Arbitrarily Connected Neural Networks
Author : Bogdan M. Wilamowski, Fellow, IEEE, Nicholas J.
Cotton, Okyay Kaynak, Fellow, IEEE, and Günhan Dündar
Source : IEEE INDUSTRIAL ELECTRONICS MAGAZINE
Date : 2012/3/28
Presenter : 林哲緯
1
Outline
• Numerical Analysis Method
• Neuron Network Architectures
• NBN Algorithm
2
Minimization problem
Newton's method
3
Minimization problem
Steepest descent method
http://www.nd.com/NSBook/NEURAL%20AND%20ADAPTIVE%20SYSTEMS14_Adaptive_Linear_Systems.html
4
Least square problem
Gauss–Newton algorithm
http://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm
5
Levenberg–Marquardt algorithm
• Levenberg–Marquardt algorithm
– Combine the advantages of Gauss–Newton
algorithm and Steepest descent method
– far off the minimum like Steepest descent method
– Close to the minimum like Newton algorithm
– It’s find local minimum not global minimum
6
Levenberg–Marquardt algorithm
• Advantage
– Linear
– First-order differential
• Disadvantage
– inverting
is not used at all
7
Outline
• Numerical Analysis Method
• Neuron Network Architectures
• NBN Algorithm
8
Weight updating rule
First-order algorithm
MLP
Second-order algorithm
FCN
α : learning constant
g : gradient vector
ACN
J : Jacobian matrix
μ : learning parameter
I : identity matrix
e : error vector
9
Forward & Backward Computation
Forward : 12345, 21345, 12435, or 21435
Backward : 54321, 54312, 53421, or 53412
10
Jacobian matrix
Row : pattern(input)*output
Column : weight
p = input number
no = output number
Row = 2*1 = 2
Column = 8
Jacobin size = 2*8
11
Jacobian matrix
12
Outline
• Numerical Analysis Method
• Neuron Network Architectures
• NBN Algorithm
13
Direct Computation of Quasi-Hessian
Matrix and Gradient Vector
14
Conclusion
• memory requirement for quasi-Hessian matrix
and gradient vector computation is decreased
by(P × M) times
• can be used arbitrarily connected neural
networks
• two procedures
– Backpropagation process(single output)
– Without backpropagation process(multiple
outputs)
15