Model Essentials – Neural Networks

Download Report

Transcript Model Essentials – Neural Networks

Chapter 5: Introduction to Predictive Modeling:
Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
1
Chapter 5: Introduction to Predictive Modeling:
Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
2
Model Essentials – Neural Networks
Predict new cases.
3
Prediction
formula
Select useful inputs.
None
Optimize complexity.
Stopped
training
...
Model Essentials – Neural Networks
Predict new cases.
4
Prediction
formula
Select useful inputs.
inputs
None
Optimize complexity.
complexity
Stopped
training
...
Model Essentials – Neural Networks
Predict new cases.
5
Prediction
formula
Select useful inputs.
None
Optimize complexity.
Stopped
training
...
Neural Network Prediction Formula
hidden unit
prediction
estimate
bias
estimate
weight
estimate
1
tanh
-5
0
-1
6
5
activation
function
...
Neural Network Binary Prediction Formula
5
logit
link function
1
tanh
8
-50
0
-5
-1
15
...
Neural Network Diagram
x1
x2
input
layer
9
H1
H2
y
H3
hidden
layer
target
layer
...
Neural Network Diagram
x1
x2
input
layer
10
H1
H2
y
H3
hidden
layer
target
layer
...
Prediction Illustration – Neural Networks
1.0
logit equation
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
11
...
Prediction Illustration – Neural Networks
1.0
logit equation
0.9
0.8
0.7
0.6
x2
0.5
0.4
Need weight estimates.
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
12
...
Prediction Illustration – Neural Networks
1.0
logit equation
0.9
0.8
0.7
0.6
x2
Weight estimates found by
maximizing:
0.5
0.4
0.3
0.2
0.1
Log-likelihood Function
13
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
...
Prediction Illustration – Neural Networks
1.0
logit equation
0.9
0.8
0.50
0.70
0.60
0.30
0.7
0.6
x2
0.5
0.60
0.50
0.4
0.40
0.40
0.50
0.60
0.3
Probability estimates are
obtained by solving the logit
equation for p^ for each (x1, x2).
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
14
...
Neural Nets: Beyond the Prediction Formula
Manage missing values.
Handle extreme or unusual values.
values
Use non-numeric inputs.
inputs
Account for nonlinearities.
nonlinearities
Interpret the model.
model
15
...
Training a Neural Network
This demonstration illustrates using the
Neural Network tool.
17
Chapter 5: Introduction to Predictive Modeling:
Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
18
Model Essentials – Neural Networks
19
Predict new cases.
Prediction
formula
Select useful inputs.
inputs
Sequential
None
selection
Optimize complexity.
Best model
from sequence
20
5.01 Multiple Answer Poll
Which of the following are true about neural networks in
SAS Enterprise Miner?
a. Neural networks are universal approximators.
b. Neural networks have no internal, automated process
for selecting useful inputs.
c. Neural networks are easy to interpret and thus are
very useful in highly regulated industries.
d. Neural networks cannot model nonlinear relationships.
21
5.01 Multiple Answer Poll – Correct Answers
Which of the following are true about neural networks in
SAS Enterprise Miner?
a. Neural networks are universal approximators.
b. Neural networks have no internal, automated process
for selecting useful inputs.
c. Neural networks are easy to interpret and thus are
very useful in highly regulated industries.
d. Neural networks cannot model nonlinear relationships.
22
Selecting Neural
Network Inputs
This demonstration illustrates how to use
a logistic regression to select inputs for
a neural network.
23
Chapter 5: Introduction to Predictive Modeling:
Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
24
Model Essentials – Neural Networks
25
Predict new cases.
Prediction
formula
Select useful inputs.
Sequential
selection
Optimize complexity.
Stopped training
...
Fit Statistic versus Optimization Iteration
initial hidden unit weights
^ ))+ 0·H + 0·H
logit( p^ ) = logit(ρ
0logit(
+ 0·0.5
H
11
2
3
H1 = tanh(-1.5 - .03x1 - .07x2)
H2 = tanh( .79 - .17x1 - .16x2)
H3 = tanh( .57 + .05x1 +.35x2 )
26
...
Fit Statistic versus Optimization Iteration
logit( p^ ) = 0 + 0·H1 + 0·H2 + 0·H3
H1 = tanh(-1.5 - .03x1 - .07x2)
H2 = tanh( .79 - .17x1 - .16x2)
H3 = tanh( .57 + .05x1 +.35x2 )
random initial
input weights and biases
27
...
Fit Statistic versus Optimization Iteration
logit( p^ ) = 0 + 0·H1 + 0·H2 + 0·H3
H1 = tanh(-1.5 - .03x1 - .07x2)
H2 = tanh( .79 - .17x1 - .16x2)
H3 = tanh( .57 + .05x1 +.35x2 )
random initial
input weights and biases
28
...
Fit Statistic versus Optimization Iteration
0
29
5
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
01
30
validation
5
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0 2
31
validation
5
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
32
3
validation
5
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
33
45
validation
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
34
validation
5
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
35
validation
5 6
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
36
validation
5
7
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
37
validation
5
8 10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
38
validation
5
9 10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
39
validation
5
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
40
validation
5
1011
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
41
validation
5
10 12
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
42
validation
5
10 13 15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
43
validation
5
10
14 15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
44
validation
5
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
45
validation
5
10
1516
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
46
validation
5
10
15 17
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
47
validation
5
10
15
Iteration
18 20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
48
validation
5
10
15
Iteration
1920
...
Fit Statistic versus Optimization Iteration
ASE
training
0
49
validation
5
10
15
Iteration
20
...
Fit Statistic versus Optimization Iteration
ASE
training
0
50
validation
5
10
15
Iteration
20 21
...
Fit Statistic versus Optimization Iteration
ASE
training
0
51
validation
5
10
15
Iteration
20 22
...
Fit Statistic versus Optimization Iteration
ASE
training
0
52
validation
5
10
15
Iteration
20
23
...
Fit Statistic versus Optimization Iteration
0.50
0.60
ASE
0.60
0.50
0.40
0
53
5
10 12 15
Iteration
0.70
0.30
0.40
0.50
0.60
20
...
Increasing
Network Flexibility
This demonstration illustrates how to further
improve neural network performance.
54
Using the AutoNeural Tool
(Self-Study)
This demonstration illustrates how
to use the AutoNeural tool.
55
Chapter 5: Introduction to Predictive Modeling:
Neural Networks and Other Modeling Tools
5.1 Introduction
5.2 Input Selection
5.3 Stopped Training
5.4 Other Modeling Tools (Self-Study)
56
Model Essentials – Rule Induction
Predict new cases.
57
Prediction rules /
prediction formula
Select useful inputs.
Split search /
none
Optimize complexity.
Ripping /
stopped training
Rule Induction Predictions

[Rips create prediction
rules.]
1.0
0.74
0.9
0.8


A binary model
sequentially classifies
and removes correctly x2
classified cases.
[A neural network predicts
remaining cases.]
0.7
0.6
0.5
0.4
0.3
0.2
0.39
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
58
Model Essentials – Dmine Regression
59
Predict new cases.
Prediction formula
Select useful inputs.
Forward selection
Optimize complexity.
Stop R-square
Dmine Regression Predictions
60

Interval inputs binned,
categorical inputs
grouped

Forward selection picks
from binned and original
inputs
Model Essentials – DMNeural
Predict new cases.
61
Stagewise
prediction formula
Select useful inputs.
Principal
component
Optimize complexity.
Max stage
DMNeural Predictions

Up to three PCs with
highest target R square
are selected.
1.0
0.9
0.8
0.7


62
One of eight continuous
transformations are
selected and applied to
selected PCs.
0.6
The process is repeated
three times with residuals
from each stage.
0.1
0.5
0.4
0.3
0.2
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
Model Essentials – Least Angle Regression
63
Predict new cases.
Prediction formula
Select useful inputs.
Generalized
sequential selection
Optimize complexity.
Penalized best
fit
Least Angle Regression Predictions

Inputs are selected using
a generalization of
forward selection.
1.0
0.9
0.8
0.7

An input combination
in the sequence with
optimal, penalized
validation assessment
is selected by default.
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
64
Model Essentials – MBR
Predict new cases.
65
Training data
nearest neighbors
Select useful inputs.
None
Optimize complexity.
Number of
neighbors
MBR Prediction Estimates


Sixteen nearest training
data cases predict the
target for each point in
the input space.
1.0
Scoring requires training
data and the PMBR
procedure.
0.5
0.9
0.8
0.7
0.6
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
66
Model Essentials – Partial Least Squares
Predict new cases.
67
Prediction formula
Select useful inputs.
VIP
Optimize complexity.
Sequential factor
extraction
Partial Least Squares Predictions



68
Input combinations
(factors) that optimally
account for both predictor
and response variation
are successively
selected.
Factor count with a
minimum validation
PRESS statistic is
selected.
Inputs with small VIP are
rejected for subsequent
diagram nodes.
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x1
Exercises
This exercise reinforces the concepts
discussed previously.
69
Neural Network Tool Review
Create a multi-layer perceptron on
selected inputs. Control complexity with
stopped training and hidden unit count.
70