Neural Networks in Medicine

Download Report

Transcript Neural Networks in Medicine

Neural Networks
and Logistic Regression
Lucila Ohno-Machado
Decision Systems Group
Brigham and Women’s Hospital
Department of Radiology
Coronary
Disease
STOP
Neural
Net
Outline
•
•
•
•
•
•
Examples, neuroscience analogy
Perceptrons, MLPs: How they work
How the networks learn from examples
Backpropagation algorithm
Learning parameters
Overfitting
Examples in Medical
Pattern Recognition
Diagnosis
• Protein Structure Prediction
• Diagnosis of Giant Cell Arteritis
• Diagnosis of Myocardial Infarction
• Interpretation of ECGs
• Interpretation of PET scans, Chest X-rays
Prognosis
• Prognosis of Breast Cancer
• Outcomes After Spinal Cord Injury
Myocardial Infarction Network
Duration
Pain
2
Intensity Elevation
Pain ECG: ST
4
1
Myocardial Infarction
0.8
Smoker
1
Age
50
Male
1
“Probability” of MI
Abdominal Pain Perceptron
20
10
Ulcer
Pain
Cholecystitis
Duodenal Non-specific
Perforated
1
0
AppendicitisDiverticulitis
37
0
1
WBC
Obstruction Pancreatitis
Small Bowel
0
Temp
Age
0
Male
Intensity Duration
Pain
Pain
adjustable
1
1
weights
0
0
Biological Analogy
Perceptrons
Input units
Cough
Headache
D rule
change weights to
decrease the error
weights
No disease
Pneumonia
Flu
Output units
Meningitis
-
what we got
what we wanted
error
Perceptrons
Output of unit j:
Output
j
units
oj = 1/ (1 + e- (aj+q j) )
Input to unit j: aj = S wijai
Input to unit i: ai
measured value of variable i
i
Input units
AND
q = 0.5
y
input output
0
00
01
0
10
0
11
1
w1
w2
x1
x2
f(x1w1 + x2w2) = y
f(0w1 + 0w2) = 0
f(0w1 + 1w2) = 0
f(1w1 + 0w2 ) = 0
f(1w1 + 1w2 ) = 1
f(a) =
q
some possible values for w1 and w2
w1 w2
0.20 0.35
0.20 0.40
0.25 0.30
0.40 0.20
1, for a > q
0, for a  q
XOR
q = 0.5
y
input output
0
00
01
1
10
1
11
0
w1
w2
x1
x2
f(x1w1 + x2w2) = y
f(0w1 + 0w2) = 0
f(0w1 + 1w2) = 1
f(1w1 + 0w2 ) = 1
f(1w1 + 1w2 ) = 0
f(a) =
q
some possible values for w1 and w2
w1 w2
1, for a > q
0, for a  q
XOR
y
input output
0
00
01
1
10
1
11
0
w
3
q = 0.5
w5
w
4
q = 0.5
z
x1
w1 w2
x2
f(w1, w2, w3, w4, w5)
a possible set of values for ws
(w1, w2, w3, w4, w5)
(0.3,0.3,1,1,-2)
f(a) =
q
1, for a > q
0, for a  q
XOR
input output
0
00
01
1
10
1
11
0
w5
w1 w w2
3
w6
q = 0.5 for all units
w4
f(w1, w2, w3, w4, w5 , w6)
a possible set of values for ws
(w1, w2, w3, w4, w5 , w6)
(0.6,-0.6,-0.7,0.8,1,1)
f(a) =
q
1, for a > q
0, for a  q
Linear Separation
Meningitis
Flu
No cough
Headache
Cough
Headache
01
11
No treatment
Treatment
00
No disease
No cough
No headache
10
Pneumonia
Cough
No headache
011
010
111
110
101
000
100
Linear
Discriminant
Y = a(X) + b
Logistic
Regression
Y=
1
1 + e -a(X) + b
Abdominal Pain
Appendicitis
Diverticulitis
0
0
Perforated
Duodenal Non-specific
Cholecystitis
Ulcer
Pain
Small Bowel
Obstruction
Pancreatitis
0
0
0
1
0
adjustable
weights
1
Male
20
Age
37
Temp
10
WBC
1
Pain
Intensity
1
Pain
Duration
Multilayered Perceptrons
Output un its
k
Output of unit k:
ok = 1/ (1 + e - (ak+q k) )
Input to unit k:
ak =Swjkoj
Output of unit j:
Hidden
units
j
Multilayer ed
perceptro n
Perceptro n
oj = 1/ (1 + e- (a j+qj) )
Input to unit j:aj =Swij ai
i
Input units
Input to unit i:ai
m easuredvalue of variable i
Regression vs. Neural Networks
Y
Y
“X1”
X1
X2
X3
X 1X 2
“X2”
“X1X3”
X 1 X 3 X 2X 3 X 1X 2 X 3
(23 -1) possible combinations
Y = a(X 1) + b(X 2) + c(X 3) + d(X 1X2) + ...
X1
X2
X3
“X1X2X3”
Logistic Regression
• One independent variable
f(x) =
1
1 + e -(ax + cte)
•
f(x)
1
Two
f(x) =
1
1 + e -(ax1 + bx2 + cte)
0
x
Logistic function
p =
1
1 + e -(ax + cte)
log(p/1-p)
1
log (p/1-p) = ax + cte
0
a
x
linear
Logistic function
p =
1
1 + e -(ax + cte)
log (p/1-p) = ax + cte
linear
a is the odds for
1 unit of increase in x
Jargon Pseudo-Correspondence
•
•
•
•
Independent variable = input variable
Dependent variable = output variable
Coefficients = “weights”
Estimates = “targets”
• Cycles = epoch
Logistic Regression Model
Inputs
Age
34
Gender
1
Stage
4
Independent
variables
x1, x2, x3
Output
5
4
0.6
S
“Probability
of beingAlive”
8
Coefficients
Dependent
variable
a, b, c
p
Prediction
Sis the sum of inputs * weights
Inputs
Output
Age
34
5
Gender
1
4
Stage
4
8
Independent
variables
S=34*.5+1*.4+4*.8=20.6
Coefficients
Prediction
Logistic function
Inputs
Age
34
Gender
1
Stage
4
Independent
variables
Output
.5
.4
0.6
S
“Probability
of beingAlive”
.8
Coefficients
Prediction
p =
1
1 + e -(S + cte)
Activation Functions...
• Linear
• Threshold or step function
• Logistic, sigmoid, “squash”
• Hyperbolic tangent
Neural Network Model
Inputs
Age
.6
34
.2
.4
S
.5
.1
Gender
2
.2
.3
S
.7
Stage
4
Independent
variables
Output
.8
S
.2
Weights
Hidden
Layer
Weights
0.6
“Probability
of beingAlive”
Dependent
variable
Prediction
“Combined logistic models”
Inputs
Age
Output
.6
34
.5
.1
Gender
S
2
Stage
“Probability
of beingAlive”
.8
.7
4
Independent
variables
Weights
Hidden
Layer
0.6
Weights
Dependent
variable
Prediction
Inputs
Age
Output
34
.5
.2
Gender
2
S
.3
“Probability
of beingAlive”
.8
Stage
4
Independent
variables
.2
Weights
Hidden
Layer
0.6
Weights
Dependent
variable
Prediction
Inputs
Age
Output
.6
34
.5
.2
.1
Gender
1
S
.3
.7
Stage
4
Independent
variables
“Probability
of beingAlive”
.8
.2
Weights
Hidden
Layer
0.6
Weights
Dependent
variable
Prediction
Not really,
no target for hidden units...
Age
.6
34
.2
.4
S
.5
.1
Gender
2
.2
.3
S
.7
Stage
4
Independent
variables
.8
S
.2
Weights
Hidden
Layer
Weights
0.6
“Probability
of beingAlive”
Dependent
variable
Prediction
Perceptrons
Input units
Cough
Headache
D rule
change weights to
decrease the error
weights
No disease
Pneumonia
Flu
Output units
Meningitis
-
what we got
what we wanted
error
Hidden Units and Backpropagation
Error Functions
• Mean Squared Error (for most problems)
S(t - o)2/n
• Cross Entropy Error (for dichotomous or
binary outcomes)
- S(t ln o) + (1-t) ln (1-o)
Minimizing the Error
Error surface
initial error
negative derivative
final error
local minimum
winitial wtrained
positive change
Numerical Methods
a(x3) + b(x 2) + c(x) + d = 0
y
1st pair of guessed roots
-
+
2nd pair ofguessed roots
x
Gradient descent
Error
Global minimum
Local minimum
Overfitting
Real Distribution
Overfitted Model
Overfitting
tss
Overfitted model
tss a
min (Dtss)
a = test set
b = training set
tss b
Stopping criterion
Epochs
Overfitting in Neural Nets
CHD
Overfitted model “Real” model
Overfitted model
error
holdout
training
0
age
cycles
Parameter Estimation
Logistic regression
• It models “just” one
function
– Maximum likelihood
– Fast
– Optimizations
• Fisher
• Newton-Raphson
Neural network
• It models several
functions
–
–
–
–
Backpropagation
Iterative
Slow
Optimizations
• Quickprop
• Scaled conjugate g.d.
• Adaptive learning rate
What do you want?
Insight versus prediction
Insight into the model
• Explain importance of
each variable
• Assess model fit to
existing data
Accurate predictions
• Make a good estimate
of the “real”
probability
• Assess model
prediction in new data
Model Selection
Finding influential variables
Logistic
• Forward
• Backward
• Stepwise
• Arbitrary
• All combinations
• Relative risk
Neural Network
• Weight elimination
• Automatic Relevance
Determination
• “Relevance”
Regression Diagnostics
Finding influential observations
Logistic
• Analysis of residuals
• Cook’s distance
• Deviance
• Difference in
coefficients when case
is left out
Neural Network
• Ad-hoc
How accurate are predictions?
• Construct training and test sets or bootstrap
to assess “unbiased” error
• Assess
– Discrimination
• How model “separates” alive and dead
– Calibration
• How close the estimates are from “real” probability
“Unbiased” Evaluation
Training and Tests Sets
• Training set is used to build the model (may
include holdout set to control for
overfitting)
• Test set left aside for evaluation purposes
• Ideal: yet another validation data set, from
different source to test if model generalizes
to other settings
Small sets: Cross-validation
• Several training and test set pairs are
created so that the union of all test sets
corresponds exactly to the original set
• Results from the different models are
pooled and overall performance is estimated
• “Leave-n-out”
• Jackknife
ECG Interpretation
QRS am plitude
R-R interval
SV tachycardia
QRS duration
Ventricular tachycardia
AVF lead
LV hypertrophy
S-T elevation
RV hypertrophy
Myocardial infarction
P-R interval
Thyroid Diseases
Clinical
¼nding
1
Patient
data
Hidden
lay er
(5 or 10 units)
.
.
.
T4U
Clinical
¼nding
Final
diagnoses
Patient
data Hidden
lay er
(5 or 10 units)
1
Normal
.
.
TSH
Partial
diagnoses
.
.
.
Normal
.
.
Hypothyro idism
Primary
hy pothy roidism
Patients who
will be evaluated TSH
further
T4U
Hy perthy roidism
T3
Other
conditions
TT4
TBG
Hy pothy roidism
Additional
input
Compensated
hy pothy roidism
Secondary
hy pothy roidism
Other
conditions
Time Series
Y = Xn+2
Output units
(dependent variables)
Hidden units
Weights
(estimated parameters)
Xn
X n+1
Input units
(independent variables)
Time Series
DXn+1n+2
Y = Xn+2
Output units
(dependent v ariables)
Weights
(estimated parameters)
Hidden units
Input units
(independent v ariables)
Xn
DXn
n+1
X n+1
Evaluation
Evaluation: Area Under ROCs
ROC Analysis: Variations
Area under ROC
ROC
Slope and
Intercept
Wilcoxon statistic
Confidence interval
Expert Systems and Neural Nets
Model Comparison
(personal biases)
Rule-based Exp. Syst.
Bayesian Nets
Classification Trees
Neural Nets
Regression Models
Modeling
Effort
Examples
Needed
Explanation
Provided
high
high
low
low
high
low
low
high
high
moderate
high
moderate
“high”
low
moderate
Conclusion
Neural Networks are
• mathematical models that resemble nonlinear regression
models, but are also useful to model nonlinearly separable
spaces
• “knowledge acquisition tools” that learn from examples
• Neural Networks in Medicine are used for:
– pattern recognition (images, diseases, etc.)
– exploratory analysis, control
– predictive models
Conclusion
• No final indication for using either logistic
regression or neural network
• Try both, select best
• Make unbiased evaluation
• Compare statistically
Some References
Introductory Textbooks
• Rumelhart, D.E., and McClelland, J.L. (eds) Parallel Distributed
Processing. MIT Press, Cambridge, 1986.
• Hertz JA; Palmer RG; Krogh, AS. Introduction to the Theory of Neural
Computation. Addison-Wesley, Redwood City, 1991.
• Pao, YH. Adaptive Pattern Recognition and Neural Networks.
Addison-Wesley, Reading, 1989.
• Reggia JA. Neural computation in medicine. Artificial Intelligence in
Medicine, 1993 Apr, 5(2):143–57.
• Miller AS; Blott BH; Hames TK. Review of neural network
applications in medical imaging and signal processing.Medical and
Biological Engineering and Computing, 1992 Sep, 30(5):449–64.
• Bishop CM. Neural Networks for Pattern Recognition. Clarendon
Press, Oxford, 1995.