Transcript Slide 1
EEG-based Machine Learning Methods for Applications in Psychiatry Jim Reilly Gary Hasey Hubert de Bruin Ahmad Khodayari-R Duncan MacCrimmon ON Semiconductor, April 11, 2011 This is a team effort! Our research team: Gary M. Hasey Ahmad Khodayari-R. James P. (Jim) Reilly Hubert de Bruin Duncan MacCrimmon Cathy Ivanski Rose Marie Mueller Jackie Heaslip Sandra Chalmers Joy Fournier Margarita Criollo Eleanor Bard … Thanks to all nurses and staff who helped doing the clinical experiments! Outline • Subject: Machine learning (ML) for prediction of response to psychiatric therapy Motivation Overview of ML techniques • • • • Feature extraction Feature selection/reduction Classification Validation Results Commercial Potential MAJOR DEPRESSIVE DISORDER 2nd LARGEST CAUSE OF WORK PLACE DISABILITY ages 15-44 • 37,076,000 on Antidepressant drugs in US, Can, EU, Australia • • • • • • 3rd largest class of pharmaceuticals world-wide Most commonly prescribed class of drugs in USA >1/3 female office visits in USA involved antidepressant drug (ADD) Use increased by 75% from 1996 to 2005 (Center for Disease Control) 5.8 % Canadians and 10.1% of Americans are on ADD 68% of ADD prescribed by Family MD http://seekingalpha.com/article/22433-antidepressant-drug-market-new-fda-warning-to-have-limited-impact Washington Post December 3, 2004; Page A15 http://www.cnn.com/2007/HEALTH/07/09/antidepressants/index.html http://psychcentral.com/news/2009/08/03/antidepressant-use-up-75-percent/7514.html The current “State of the Art” for antidepressant drug selection Random selection ? Keep trying until one fits How Effective Is the “State of the Art”? STAR*D Study (Sequential Treatment Achieve Remission of Depression) ✓ ✗ ✗ 1st choice is wrong in 2 of 3 patients Warden, D., et al., The STAR*D Project results: a comprehensive review of findings. Curr Psychiatry Rep, 2007. 9(6): p. 449-59. COST OF ACHIEVING REMISSION If Initial treatment works1 : $ 3,600 If initial treatment fails2 : $16,000 1) Baker, C. B. and S. W. Woods (2001). "Cost of treatment failure for major depression: direct costs of continued treatment." Administration and policy in mental health 28(4): 263-277 (1995 costs quoted adjusted for inflation). 2) Malone, D. C. (2007). "A budget-impact and cost-effectiveness model for second-line treatment of major depression." J Manag Care Pharm 13(6 Suppl A): S8-18. How We Propose to Fix This Problem --- 2. Collect pre-treatment QEEG 1. Establish Diagnosis 3. Treat : SSRI, rTMS or Clozaril Marketed Service confirms diagnosis 6. Test predictive accuracy using “leave N out” or an independent sample recommends specific treatment self improving feedback loop 4. Measure treatment response 5. Use response data, diagnosis & QEEG to train computer Overview of the Prediction Procedure 22 Subjects were prescribed SSRI medication after pre-treatment EEG • • • Response (R or NR) is recorded 6 weeks after onset of treatment. Responder is defined as 25% improvement in Hamilton Depression Rating Score Training Data: consists of subject EEG data and corresponding response value Machine Learning Method • Steps of the prediction procedure: 1. Extraction of features from the EEG 2. Feature selection /dimensionality reduction 3. Design of the predictor using a classifier 4. Performance evaluation by cross-validation 1. Extraction of features • Compute statistical parameters from EEG (from 4 – 32 Hz in 1 Hz increments): Spectral coherence between all electrode pairs Mutual information between all electrode pairs Absolute and relative power spectral density (PSD) levels Left-to-right hemisphere power ratios Anterior/posterior power ratios • Results in 4336 features! 2. Feature Selection • the 4336 candidate features are highly correlated • Most have no statistical dependence with the target variable (response) • We select only those with most statistical relevance using a modified form of the method due to Peng2 2. H. Peng et al IEEE Trans PAMI Aug 2005 2. Feature Selection (Cont’d) • • Regularized iterative feature selection based on Kullback-Leibler (KL) distance: j -th iteration: First term describes relevance (relationship with target variable) Second describes redundancy with previous features 3. Classification Procedure • Input: • selected feature vector for a specific subject Output: responder (R) or non-responder (NR) categories for each subject • Classifier structure-- many available: Support vector machine Kernelized partial least squares regression (KPLS) procedure Etc. 4. Performance Evaluation • Nested (11-fold) cross-validation procedure • performance is biased upwards unless training is independent of the test set3 • therefore we perform -Parameter optimization -feature selection -testing independently in each fold [3] e.g., Hastie, Tibshirani and Friedman “The elements of Statistical learning” Results Contingency table for SSRI medication: Predicted NR Predicted R % correct Actual NR 12 2 Specificity= 85.7% Actual R 1 7 Sensitivity= 87.5% Average performance= 86.6% 2-D representation of feature space obtained using kernel PCA. multiple points (epochs) per subject Clustering behaviour verifies that classes can be well separated with a straight line 2-D representation of scatter plot after averaging over available EEG epochs Overfitting? • it is difficult to prove that the model has not over-fit the data • Rules of thumb Complexity of model (number of parameters) should be small in comparison to number of training points Test set must be independent of the training set A list of most-discriminating features showing the mean and standard deviation of each feature in nonresponder (N) and responder (R) groups Most discriminating features • 9-16Hz bandwidth • Mostly left hemisphere • Dominant electrodes are T3, T5 and C3 Prediction of Response to Transcranial Magnetic Stimulation (rTMS) Predicted NR Predicted R % correct Actual NR 10 3 Specificity= 76.9% Actual R 2 12 Sensitivity= 85.7% Average performance = 81.3% Using eyes-open pre-treatment EEG, with Nr=5 features 27 MDD subjects Left true rTMS therapy F/B PSD ratio at 21Hz to 24Hz, C3/O1 Coherence at 6Hz, between T3 & T5 Coherence at 9Hz, between C3 & O2 Coherence at 5Hz and 9 Hz, between P4 & O2 FL/BR PSD ratio at 30Hz and 34Hz, F1F7F3/T4C4T6 F/B PSD ratio at 6Hz, F7F3/P3O1 Results of a diagnosis study Estimated as MDD Estimated as SCZ Estimated as N Total No. Actual MDD 55 (85.9%) 6 3 64 Actual SCZ 3 35 (87.5%) 2 40 Actual N 4 7 80 (87.9%) 91 Avg. performance = 87.1% 195 0.5 Major Depression (MDD) Diagnosis Bipolar Depression (BD) 0.4 26 0.3 25 9 21 0.2 20 axis 2 16 0.1 17 2 19 15 18 13 0 6 -0.1 28 29 33 4 14 10 27 22 5 7 -0.3 32 23 8 3 1 -0.2 24 31 30 11 12 -0.4 -0.4 Estimated as MDD Actual MDD Actual BD Estimated as BD 60 (93.8%) 4 4 44 (91.7%) Average performance = 92.7% -0.2 Total No. 64 12 (X 4) 76 0 axis 1 0.2 0.4 0.6 Predictive Accuracy for Clozapine Clozaril (clozapine) Using leave 1 out cross-validation Predicted Responder Actual Responder Actual Non-responder Predicted Nonresponder % Correct 10 2 83.33% = Sensitivity 1 10 90.91% = Specificity Using an independent test sample Actual Responder 6 1 85.7% = Sensitivity Actual Non-responder 1 6 85.7% = Specificity Plans for Commercialization • • The method is protected by patent applications We are currently in the process of gathering more training data to expand the number of medications, and increase quantity of training data • A commercial partner is currently funding this effort • Plans for starting our own company are currently underway • Major market are the health care insurers in Canada, US and worldwide SOME Arithmetic (USA) •For a US corporation with 1000 employees: -10.1 % employees (101) are on antidepressant meds •Assumptions using “state of the art” treatment: -66% do not remit with 1st medication -In non-remitters costs rise from $3600 to $16,000 •If our method decreases non remission rate to 30% -Savings = 101 X (.66-.3) X ($16,000-$3,600) = $450,864 •Projected cost of testing = 101 X $400 =$40,400 SUMMARY: Application of our method could result in SUMMARY: Application of our method could result in savings of $4,064/depressed employee savings of $4,064/depressed employee i.e. 11.1 X ROI i.e. 11.1 X ROI Discussion and Conclusions Our results show it is possible to predict response A surprising result is that a set of discriminating predictive EEG features for prediction do exist The proposed methodology can result in significantly reduced times to remission Neurological significance? -- selected features are mostly left temporal and alpha/high-beta band previous work has identified a subset of the features identified in this study