Transcript C05-BPN.ppt

-Artificial Neural NetworkBack Propagation Network
朝陽科技大學
資訊管理系
李麗華 教授
Introduction (1)
• BPN = Back Propagation Network
• BPN is a layered feedforward supervised network.
• BPN provides an effective means of allowing a computer
to examine data patterns that may be incomplete or noisy.
• BPN can take various type of input, i.e.,
binary data or real data.
• The output of BPN is depending on the transfer function
used.
(1) If the sigmoid function is used, then the output 0≤y ≤ 1
(2) If the hyperbolic Tangent function is used,
then the output : -1 ≤y ≤ 1
朝陽科技大學 李麗華 教授
2
Introduction (2)
Architecture:
wih
X1
H1
θ1
whj
Y1
θ2
X2
‧
‧
‧
Xn
Y2
H2
‧
‧
‧
‧
‧
‧
Hh
θh
朝陽科技大學 李麗華 教授
Yj
3
Introduction (3)
• Input layer: [X1,X2,…, Xi, …Xn].
• Hidden layer: can have more than one layer.
•
derive: net1, net2, …neth; transfer output H1, H2,…,Hh,
Hh will be used as the input to derive the result for next layer
• Output layer: [Y1,…,Yj].
• Weights: Wij.
• Transfer function: Nonlinear  Sigmoid function
f (net j ) 
1
1 e
 net j
(*) The nodes in the hidden layers organize themselves in a way
that different nodes learn to recognize different features of the
total input space.
朝陽科技大學 李麗華 教授
4
Introduction (4)
• Application of BPN is quite broad.
– Pattern Recognition (樣本識別; 字母識別)
– Prediction (股巿預測)
– Classification (客群分類)
– Learning (資料學習)
– Control (回饋與控制)
– CRM (客服分群服務)
朝陽科技大學 李麗華 教授
5
Processing Steps (1)
The processing steps can be briefly described as follows.
1. Based on the problem domain, set up the network.
2. Randomly generate weights Wij.
3. Feed a training set, [X1,X2,…,Xn], into BPN.
4. Compute the weighted sum and apply the transfer
function on each node in each layer. Feeding the
迴
圈
transferred data to the next layer until the output layer
迴
is reached.
圈
5. The output pattern is compared to the desired output(T)
and an error is computed for each unit.
朝陽科技大學 李麗華 教授
6
Processing Steps (2)
6. Feedback the error back to each node in the hidden
layer.
迴
圈 7. Each unit in hidden layer receives only a portion of
迴
total errors and these errors then feedback to the
圈
input layer.
8. Go to step 4 until the error is very small.
9. Repeat from step 3 again for another training set.
朝陽科技大學 李麗華 教授
7
Computation Processes(1/10)
• The detailed computation processes of BPN.
1. Set up the network according
to the input nodes and the
output nodes required. Also,
properly choosing the hidden
layers and nodes.
2. Randomly assigned the
weights.
3. Feed the training pattern (set)
into the network and do the
following computation.
θ1
x1
:
:
Wih
θj
net1 H1
:
Whj :
Xi
neth Hh
:
:
Xn
朝陽科技大學 李麗華 教授
Wnh
θh
8
Yj
Computation Processes(2/10)
4. Compute from the Input layer to hidden layer for each node.
net h =  Wih  X i -  h
i
1
H h  f (net h ) 
1  e  neth
5. Compute from the hidden layer to output layer for each node.
net j =  Whj  H h -  j
i
Y j  f (neth ) 
1
1 e
 net j
朝陽科技大學 李麗華 教授
9
Computation Processes(3/10)
6. Calculate the total error & find the difference for correction
δj=Yj(1-Yj)( Tj -Yj)
δh=Hh(1- Hh) ΣjWhj δj
7. ΔWhj=ηδj Hh
ΔΘj = -ηδj
ΔWih=ηδh Xi
ΔΘh= -ηδh
8. update weights
Whj=Whj+ΔWhj ,Wih=Wih+ΔWih ,
Θj= Θj + ΔΘj, Θh= Θh + ΔΘh
9. Repeat steps 4~8, until the error is very small.
10.Repeat steps 3~9, until all the training patterns are learned.
朝陽科技大學 李麗華 教授
10
Exercise:
X1 X2
T
-1
-1 0
-1
1 1
Use BPN to solve XOR (1)
• Use BPN to solve the XOR problem
• Let W11=1, W21= -1, W12= -1, W22=1, W13=1,
W23=1, Θ1=1, Θ2=1,Θ3=1, η=10
Θ1
1 -1 1
1
1 0
X1
W11
H1
Θ3
W21
W12
X2
Y1
Θ2
H2
朝陽科技大學 李麗華 教授
W23
11
Exercise:
•
•
•
•
Use BPN to solve XOR (2)
ΔW12=ηδ1 X1 =(10)(-0.018)(-1)=0.18
ΔW21=ηδ1 X2 =(10)(-0.018)(-1)=0.18
ΔΘ1 = -ηδ1 = -(10)(-0.018)=0.18
以下為第一次修正後的權重值.
X1
1.18
0.754
-0.82
-0.82
X2
1.18
0.754
朝陽科技大學 李麗華 教授
1.915
12
Discussion About BPN
1. Number of hidden nodes increase, the convergence will
get slower. But the error can be minimized.
2. The general concept of designing the number of hidden
node uses:
# of hidden nodes=(Input nodes + Output nodes)/2, or
# of hidden nodes=(Input nodes * Output nodes)1/2
3. Usually, 1~2 hidden layer is enough for learning a
complex problem. Too many layers will cause the
learning very slow. When the problem is hyperdimension and very complex, then an extra layer could
be used
4. Learning rate(η) usually set between [0.1, 1.0], but it
depends on how fast and how detail the network shall
learn.
朝陽科技大學 李麗華 教授
13
The Gradient Steepest Descent
Method(SDM)
• The gradient steepest descent method
• Recall:
n
n 1
net j   Wij Ai
 j
j
• We want the difference of computed output and
expected output getting close to 0. (k:represents the
number of output node in various layers)
E
E  (1 / 2)k (Tk  Ak )  Wij  -
Wij
2
E
so that we can
Wij
• Therefore, we want to obtain
update weights to improve the network results.
朝陽科技大學 李麗華 教授
14
Proof of the Gradient Steepest Descent
Method(SDM) (1)
net nj
A nj
net nj
( 2)
(1)
E
E
E
(
)(
)  ( n )(
)(
)
n
n
Wij
Wij
net j Wij
A j net j
( 3)
For (1)
net nj
Wij

 (k Wkj Akn 1  j )
Wij
For (2)
 Ain 1
For (3-1): when n is the output layer
E

n
Aj
[1/ 2 (Tk  Akn ) 2 ]
k
Anj
 -(Tj-Anj )
Anj
net nj

f (net nj )
net nj
 f ' (net nj )
For (3-2) when n is the hidden layer
netkn 1
E
E
n 1

(
)(
)





k W jk
n
n 1
n
Aj
netk
Aj
k
k
令(
E
n 1
)


k
net nk 1
朝陽科技大學 李麗華 教授
15
The Gradient Steepest Descent
Method(SDM) (2)
• From (1)(2)(3)we have two types of values:
When n is for output layer, then we have
E
 (T j  A nj )  f ' (net nj )  Ain 1
Wij
  jn  Ain1
then, we get  jn  (T j  Anj )  f ' (net nj )
When n is for hidden layer, then we have
E
 [  kn 1  Wkj ]  f ' (net nj )  Ain 1
Wij
k
  jn  Ain1
n
n 1
n


[


W
]

f
'
(
net
 k
then, we get j
kj
j )
k
朝陽科技大學 李麗華 教授
16
The Gradient Steepest Descent
Method(SDM) (3)
E


  jn Ain 1 
Wij
W  W  ΔW
ij
ij
n n 1  ij
 Wij   j Ai 
 j   j  Δ j

      jn


朝陽科技大學 李麗華 教授
17
The Gradient Steepest Descent
Method(SDM) (4)
1
-netj -1
f (net ) 

(1

e
)
 net j
1 e
n
j
f t (net nj )  [(1  e
-net j -1
) ] -2 ][-( e
-net j
)]
e  netj
e  netj
1



 net
 net
 net
(1  e j ) 2 (1  e j ) 1  e j
 f(net j )( 1 - f(net j ))
(Tj - Yj)Yj(1 - Yj)

n
 j  [  n 1W ]  H (1  H )
j
ik
j
j
 
k
if n is output layer
if n is hidden layer
朝陽科技大學 李麗華 教授
18
The Gradient Steepest Descent
Method(SDM) (5)
• Learning computation
1. net j   Wih  X i   h
Compute value of the hidden layer
i
1
H h  f (neth ) 
1  e  neth
2. net j   Whj  H h   j
Compute value of the output layer
i
1
Yj  f (net j ) 
 net
1 e j
3.  j =Yj (1- Yj )(Tj - Yj )
Compute the value difference for correction
δh  H h ( 1 - H h )Whj δ j
j
朝陽科技大學 李麗華 教授
19
The Gradient Steepest Descent
Method(SDM) (6)
4. Whj    j H h
  =  j
Compute the value to be updated
Wih    h H i
5. Whj  Whj  Whj
 j   j   j
Wih  Wih  Wih
 h   h   h
朝陽科技大學 李麗華 教授
20