Transcript Slide 1
EEG-based Machine Learning Methods
for Applications in Psychiatry
Jim Reilly
Gary Hasey
Hubert de Bruin
Ahmad Khodayari-R
Duncan MacCrimmon
ON Semiconductor,
April 11, 2011
This is a team effort!
Our research team:
Gary M. Hasey
Ahmad Khodayari-R.
James P. (Jim) Reilly
Hubert de Bruin
Duncan MacCrimmon
Cathy Ivanski
Rose Marie Mueller
Jackie Heaslip
Sandra Chalmers
Joy Fournier
Margarita Criollo
Eleanor Bard
…
Thanks to all nurses and staff who helped doing the clinical experiments!
Outline
•
Subject: Machine learning (ML) for prediction of
response to psychiatric therapy
Motivation
Overview of ML techniques
•
•
•
•
Feature extraction
Feature selection/reduction
Classification
Validation
Results
Commercial Potential
MAJOR DEPRESSIVE DISORDER
2nd LARGEST CAUSE OF
WORK PLACE DISABILITY
ages 15-44
•
37,076,000 on Antidepressant drugs in US, Can, EU, Australia
•
•
•
•
•
•
3rd largest class of pharmaceuticals world-wide
Most commonly prescribed class of drugs in USA
>1/3 female office visits in USA involved antidepressant drug (ADD)
Use increased by 75% from 1996 to 2005 (Center for Disease Control)
5.8 % Canadians and 10.1% of Americans are on ADD
68% of ADD prescribed by Family MD
http://seekingalpha.com/article/22433-antidepressant-drug-market-new-fda-warning-to-have-limited-impact
Washington Post December 3, 2004; Page A15
http://www.cnn.com/2007/HEALTH/07/09/antidepressants/index.html
http://psychcentral.com/news/2009/08/03/antidepressant-use-up-75-percent/7514.html
The current “State of the Art” for antidepressant drug
selection
Random
selection
?
Keep trying
until one fits
How Effective Is the “State of the Art”?
STAR*D Study (Sequential Treatment Achieve Remission of Depression)
✓
✗ ✗
1st choice is
wrong in 2 of 3 patients
Warden, D., et al., The STAR*D Project results: a comprehensive review of findings. Curr Psychiatry Rep, 2007. 9(6): p. 449-59.
COST OF ACHIEVING REMISSION
If Initial treatment works1
: $ 3,600
If initial treatment fails2
: $16,000
1) Baker, C. B. and S. W. Woods (2001). "Cost of treatment failure for major depression: direct costs of continued treatment."
Administration and policy in mental health 28(4): 263-277 (1995 costs quoted adjusted for inflation).
2) Malone, D. C. (2007). "A budget-impact and cost-effectiveness model for second-line treatment of major depression." J Manag
Care Pharm 13(6 Suppl A): S8-18.
How We Propose to Fix This Problem ---
2. Collect pre-treatment QEEG
1. Establish Diagnosis
3. Treat : SSRI, rTMS or Clozaril
Marketed Service
confirms diagnosis
6. Test predictive accuracy
using “leave N out” or an
independent sample
recommends specific treatment
self improving
feedback loop
4. Measure treatment
response
5. Use response data, diagnosis
& QEEG to train computer
Overview of the Prediction Procedure
22 Subjects were prescribed SSRI medication after
pre-treatment EEG
•
•
•
Response (R or NR) is recorded 6 weeks after onset
of treatment.
Responder is defined as 25% improvement in
Hamilton Depression Rating Score
Training Data: consists of subject EEG data and
corresponding response value
Machine Learning Method
•
Steps of the prediction procedure:
1. Extraction of features from the EEG
2. Feature selection /dimensionality reduction
3. Design of the predictor using a classifier
4. Performance evaluation by cross-validation
1. Extraction of features
• Compute statistical parameters from EEG
(from 4 – 32 Hz in 1 Hz increments):
Spectral coherence between all electrode pairs
Mutual information between all electrode pairs
Absolute and relative power spectral density
(PSD) levels
Left-to-right hemisphere power ratios
Anterior/posterior power ratios
• Results in 4336 features!
2. Feature Selection
• the 4336 candidate features are highly
correlated
• Most have no statistical dependence with the
target variable (response)
• We select only those with most statistical
relevance using a modified form of the
method due to Peng2
2. H. Peng et al IEEE Trans PAMI Aug 2005
2. Feature Selection (Cont’d)
•
•
Regularized iterative feature selection based on
Kullback-Leibler (KL) distance:
j -th iteration:
First term describes relevance (relationship with target
variable)
Second describes redundancy with previous features
3. Classification Procedure
• Input:
•
selected feature vector for a
specific subject
Output: responder (R) or non-responder
(NR) categories for each subject
• Classifier structure--
many available:
Support vector machine
Kernelized partial least squares
regression (KPLS) procedure
Etc.
4. Performance Evaluation
•
Nested (11-fold) cross-validation procedure
• performance is biased upwards unless
training is independent of the test set3
• therefore we perform
-Parameter optimization
-feature selection
-testing
independently in each fold
[3] e.g., Hastie, Tibshirani and Friedman “The elements of Statistical learning”
Results
Contingency table for SSRI medication:
Predicted NR
Predicted R
% correct
Actual NR
12
2
Specificity= 85.7%
Actual R
1
7
Sensitivity= 87.5%
Average performance= 86.6%
2-D representation of feature space
obtained using kernel PCA.
multiple
points
(epochs) per
subject
Clustering
behaviour
verifies that
classes can
be well
separated
with a
straight line
2-D representation of scatter plot after
averaging over available EEG epochs
Overfitting?
• it is difficult to prove that the model has not
over-fit the data
• Rules of thumb
Complexity of model (number of
parameters) should be small in
comparison to number of training points
Test set must be independent of the
training set
A list of most-discriminating features showing the
mean and standard deviation of each feature in nonresponder (N) and responder (R) groups
Most discriminating features
• 9-16Hz
bandwidth
• Mostly left
hemisphere
• Dominant
electrodes are
T3, T5 and C3
Prediction of Response to Transcranial
Magnetic Stimulation (rTMS)
Predicted
NR
Predicted
R
% correct
Actual NR
10
3
Specificity=
76.9%
Actual R
2
12
Sensitivity=
85.7%
Average performance = 81.3%
Using eyes-open
pre-treatment EEG,
with Nr=5 features
27 MDD subjects
Left true rTMS therapy
F/B PSD ratio at 21Hz to 24Hz, C3/O1
Coherence at 6Hz, between T3 & T5
Coherence at 9Hz, between C3 & O2
Coherence at 5Hz and 9 Hz, between P4 & O2
FL/BR PSD ratio at 30Hz and 34Hz, F1F7F3/T4C4T6
F/B PSD ratio at 6Hz, F7F3/P3O1
Results of a
diagnosis study
Estimated
as MDD
Estimated
as SCZ
Estimated
as N
Total
No.
Actual
MDD
55
(85.9%)
6
3
64
Actual
SCZ
3
35
(87.5%)
2
40
Actual
N
4
7
80
(87.9%)
91
Avg. performance = 87.1%
195
0.5
Major Depression (MDD)
Diagnosis
Bipolar Depression (BD)
0.4
26
0.3
25
9
21
0.2
20
axis 2
16
0.1
17
2
19
15
18
13
0
6
-0.1
28
29
33
4
14
10
27
22
5
7
-0.3
32
23
8
3
1
-0.2
24
31
30
11
12
-0.4
-0.4
Estimated as
MDD
Actual
MDD
Actual BD
Estimated
as BD
60
(93.8%)
4
4
44
(91.7%)
Average performance = 92.7%
-0.2
Total No.
64
12 (X 4)
76
0
axis 1
0.2
0.4
0.6
Predictive Accuracy for Clozapine
Clozaril (clozapine)
Using leave 1 out cross-validation
Predicted
Responder
Actual Responder
Actual Non-responder
Predicted Nonresponder
% Correct
10
2
83.33% =
Sensitivity
1
10
90.91% =
Specificity
Using an independent test sample
Actual Responder
6
1
85.7% =
Sensitivity
Actual Non-responder
1
6
85.7% =
Specificity
Plans for Commercialization
•
•
The method is protected by patent applications
We are currently in the process of gathering more training data
to expand the number of medications, and increase quantity of
training data
•
A commercial partner is currently funding this effort
•
Plans for starting our own company are currently underway
•
Major market are the health care insurers in Canada, US and
worldwide
SOME Arithmetic (USA)
•For a US corporation with 1000 employees:
-10.1 % employees (101) are on antidepressant meds
•Assumptions using “state of the art” treatment:
-66% do not remit with 1st medication
-In non-remitters costs rise from $3600 to $16,000
•If our method decreases non remission rate to 30%
-Savings = 101 X (.66-.3) X ($16,000-$3,600) = $450,864
•Projected cost of testing = 101 X $400 =$40,400
SUMMARY: Application of our method could result in
SUMMARY: Application of our method could result in
savings of $4,064/depressed
employee
savings of $4,064/depressed employee
i.e. 11.1 X ROI
i.e. 11.1 X ROI
Discussion and Conclusions
Our results show it is possible to predict response
A surprising result is that a set of discriminating
predictive EEG features for prediction do exist
The proposed methodology can result in
significantly reduced times to remission
Neurological significance? -- selected features are
mostly left temporal and alpha/high-beta band
previous work has identified a subset of the features
identified in this study