Transcript Neural Networks. R & G Chapter 8
Neural Networks. R & G Chapter 8 •8.1
Feed-Forward Neural Networks
otherwise known as •
The Multi-layer Perceptron
or •
The Back-Propagation Neural Network
A diagramatic representation of a Feed-Forward NN x1=
1.0
x2=
0.4
x3=
0.7
Input Layer Node 1 Node 2 W 1j W 1i W 2j W 2i Node 3 W 3j W 3i
Inputs and outputs are numeric.
Hidden Layer Node j Node i W jk W ik Output Layer Node k
y Figure 8.1 A fully connected feed forward neural network
Inputs and outputs • Must be
numeric
, but can have any range in general.
• However, R &G prefer to consider constraining to (0-1) range inputs and outputs.
Neural Network Input Format
Real input data values
are standardized (scaled) so that they all have ranges from 0 – 1.
newValue originalVa maximumVal lue ue minimumVal minimumVal ue ue where newValue is the computed value falling in the [0,1] interval range originalVa lue is the value to be converted minimumVal ue is the smallest possible value for the attribute maximumVal ue is the largest possible attribute value Equation 8.1
Categorical input format • We need a way to convert categores to numberical values.
• For “hair-colour” we might have values: red, blond, brown, black, grey.
• 3 APPROACHES:
1. Use of (5) Dummy variables
(
BEST
): • Let XR=1 if hair-colour = red, 0 otherwise, etc…
2. Use a binary array
: 3 binary inputs can represent 8 numbers. Hence let red = (0,0,0), blond, (0,0,1), etc… • However, this sets up a
false
associations.
3. VERY BAD
: red = 0.0, blond = 0.25, … , grey = 1.0
Converts nominal scale into
false
interval scale.
Calculating Neuron Output: The neuron threshhold function. The following sigmoid function, called the standard logistic function, is often used to model the effect of a neuron.
Consider node i, in the hidden layer. It has inputs x1, x2, and x3, each with a weight-parameter.
x
w
0 ,
i
w
1 ,
i x
1
w
2 ,
i x
2
w
3 ,
i x
3
w
0 ,
i
in
3 1
w in
,
i xin
Then calculate the output from the following function:
f
(
x
) 1 ;
e
2 .
718 ...
1
e
x
Equation 8.2
f(x)
1.200
1.000
0.800
0.600
0.400
0.200
0.000
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
x
Note: the output values are in the range (0,1).
.
This is fine if we want to use our output
to predict a probability of an event happening.
Figure 8.2 The sigmoid function
Other output types • If we have a
categorical output
with several values, then we can
use dummy output notes
for each value of the attribute. E.g. if we were predicting one of 5 hair-colour classes, we would have 5 output nodes, with 1 being certain yes, and 0 being certain no..
• If we have a real output variable, with values outside the range (0-1), then another transformation would be needed to get realistic real outputs. Usually the inverse of the scaling transformation. i.e.
output
min
value
(( 0 1 )
output
) *
range
Training the Feed-forward net •The performance
parameters
of the feed-forward neural network are the
weights.
•The weights have to be varied so that the predicted output is close to the true output value corresponding to the inpute values.
•
Training
of the ANN (Artificial Neural Net) is effected by: • Starting with artibrary wieghts • Presenting the data, instance by instance • adapting the weights according the error for each instance.
•Repeating until convergence.
Table 8.1
• Initial Weight Values for the Neural Network Shown in Figure 8.1
W
l
j
0.20
W
l
i
0.10
W
2
j
0.30
W
2
i
–0.10
W
3
j
–0.10
W
3
i
0.20
W jk
0.10
W ik
0.50
8.2 Neural Network Training: A Conceptual View
Supervised Learning/Training with Feed-Forward Networks • Backpropagation Learning Calculated error of each instance is used to ammend weights.
• Least squares fitting.
All the errors for all instances are squared and summed (=ESS). All weights are then changed to lower the ESS .
BOTH METHODS GIVE THE SAME RESULTS.
IGNOR THE R & G GENETIC ALGORITHM STUFF.
Unsupervised Clustering with Self-Organizing Maps
Output Layer Node 1 Node 2
Figure 8.3 A 3x3 Kohonen network with two input layer nodes
Input Layer
n n r n’= n + r*(x-n)
Data Instance
x
8.3 Neural Network Explanation • Sensitivity Analysis • Average Member Technique
8.4 General Considerations • What input attributes will be used to build the network? • How will the network output be represented?
• How many hidden layers should the network contain?
• How many nodes should there be in each hidden layer?
• What condition will terminate network training?
Neural Network Strengths • Work well with noisy data.
• Can process numeric and categorical data.
• Appropriate for applications requiring a time element.
• Have performed well in several domains.
• Appropriate for supervised learning and unsupervised clustering.
Weaknesses • Lack explanation capabilities.
• May not provide optimal solutions to problems.
• Overtraining can be a problem.
Building Neural Networks with iDA Chapter 9
9.1 A Four-Step Approach for Backpropagation Learning 1. Prepare the data to be mined.
2. Define the network architecture.
3. Watch the network train.
4. Read and interpret summary results.
Example 1: Modeling the Exclusive-OR Function
Table 9.1 •
The Exclusive-OR Function Input 1
1 0 1 0
Input 2
1 1 0 0
XOR
0 1 1 0
1.2
1
A
0.8
Input 2
0.6
0.4
0.2
0 0
B
0.2
0.4
0.6
0.8
B
1
A
1.2
Input 1
Figure 9.1A graph of the XOR function
Step 1: Prepare The Data To Be Mined
Figure 9.2 XOR training data
Step 2: Define The Network Architecture
Figure 9.3 Dialog box for supervised learning
Figure 9.4 Training options for backpropagation learning
Step 3: Watch The Network Train
Figure 9.5 Neural network execution window
Step 4: Read and Interpret Summary Results
Figure 9.6 XOR output file for Experiment 1
Figure 9.7 XOR output file for Experiment 2
Example 2: The Satellite Image Dataset
Step 1: Prepare The Data To Be Mined
Figure 9.8 Satellite image data
Step 2: Define The Network Architecture
Figure 9.9 Backpropagation learning parameters for the satellite image data
Step 3: Watch The Network Train
Step 4: Read And Interpret Summary Results
Figure 9.10 Statistics for the satellite image data
Figure 9.11 Satellite image data: Actual and computed output
9.2 A Four-Step Approach for Neural Network Clustering
Step 1: Prepare The Data To Be Mined The Deer Hunter Dataset
Step 2: Define The Network Architecture
Figure 9.12 Learning parameters for unsupervised clustering
Step 3: Watch The Network Train
Figure 9.13 Network execution window
Step 4: Read And Interpret Summary Results
Figure 9.14 Deer hunter data: Unsupervised summary statistics
Figure 9.15 Output clusters for the deer hunter dataset
9.3 ESX for Neural Network Cluster Analysis