Transcript slides

Multiple-instance learning
improves CAD detection of
masses in digital mammography
Balaji Krishnapuram, Jonathan Stoeckel, Vikas Raykar, Bharat Rao,
Philippe Bamberger, Eli Ratner, Nicolas Merlet, Inna Stainvas,
Menahem Abramov, and Alexandra Manevitch
CAD and Knowledge Solutions (IKM CKS),
Siemens Medical Solutions Inc., Malvern PA 19355, USA
Siemens Computer Aided Diagnosis Ltd., Jerusalem, Israel
For internal use only / Copyright © Siemens AG 2006. All rights reserved.
Outline of the talk
1. CAD as a classification problem
2. Problems with off-the-shelf algorithms
3. Multiple instance learning
4. Proposed algorithm
5. Results
6. Conclusions
Page 2
July-22, 2008
Vikas Raykar
IWDM 2008
Typical CAD architecture
Mammogram
Candidate Generation
Feature Computation
Classification
Location of lesions
Focus of the current talk
Page 3
July-22, 2008
Vikas Raykar
IWDM 2008
Traditional classification algorithms
region on a mammogram
lesion
not a lesion
Various classification algorithms
Neural networks
Support Vector Machines
Logistic Regression ….
Make two key assumtions
(1) Training samples are independent
(2) Maximize classification accuracy over all
candidates
Page 4
July-22, 2008
Vikas Raykar
Often violated in CAD
IWDM 2008
Violation 1: Training examples are correlated
Candidate generation produces a lot of spatially adjacent candidates.
Hence there are high level of correlations.
Also correlations exist across different images/detector type/hospitals.
Proposed algorithm can handle correlations.
Page 5
July-22, 2008
Vikas Raykar
IWDM 2008
Violation 2: Candidate level accuracy is not
important
Most algorithms maximize classification accuracy.
Try to classify every candidate correctly.
Several candidates from the CG point to the
same lesion in the breast.
Lesion is detected if at least one of them is
detected.
It is fine if we miss adjacent overlapping
candidates.
Hence CAD system accuracy is measured in
terms of per lesion/image/patient sensitivity.
So why not optimize the performance metric we
use to evaluate our system?
Proposed algorithm can optimize per lesion/image/patient sensitivity.
Page 6
July-22, 2008
Vikas Raykar
IWDM 2008
Proposed algorithm
Specifically designed with CAD in mind:
Can handle correlations among training examples.
Optimizes per lesion/image/patient sensitivity.
Joint classifier design and feature selection.
Selects accurate sparse models.
Very fast to train and no tunable parameters.
Developed in the framework of multiple-instance learning.
Page 7
July-22, 2008
Vikas Raykar
IWDM 2008
Outline of the talk
1. CAD as a classification problem
2. Problems with off-the-shelf algorithms
Assume training examples are independent.
Minimize classification accuracy.
3. Multiple instance learning
4. Algorithm summary
5. Results
6. Conclusions
Page 8
July-22, 2008
Vikas Raykar
IWDM 2008
Multiple Instance Learning
How do we acquire labels ?
Candidates which overlap with the radiologist mark is a positive.
Rest are negative.
Single Instance Learning
Multiple Instance Learning
Positive Bag
1
1
1
0
0
0
0
0
0
0
0
Classify every candidate correctly
Page 9
July-22, 2008
Classify at-least one candidate correctly
Vikas Raykar
IWDM 2008
Simple Illustration
Single instance learning:
Multiple instance learning:
Reject as many negative candidates as possible.
Reject as many negative candidates as possible.
Detect as many positives as possible.
Detect at-least one candidate in a positive bag.
Single Instance Learning
Page 10
July-22, 2008
Vikas Raykar
Multiple Instance Learning
IWDM 2008
Outline of the talk
1. CAD as a classification problem
2. Problems with off-the-shelf algorithms
Assume training examples are independent.
Minimize classification accuracy.
3. Multiple instance learning
Notion of positive bags
A bag is positive if at-least one instance is positive.
4. Algorithm summary
5. Results
6. Conclusions
Page 11
July-22, 2008
Vikas Raykar
IWDM 2008
Algorithm Details
Logistic Regression model
weight vector
Page 12
feature vector
July-22, 2008
Vikas Raykar
IWDM 2008
Maximum Likelihood Estimator
Page 13
July-22, 2008
Vikas Raykar
IWDM 2008
Prior to favour sparsity
If we know the hyperparameters
we can find our desired solution.
How to choose them?.
Page 14
July-22, 2008
Vikas Raykar
IWDM 2008
Feature Selection
Page 15
July-22, 2008
Vikas Raykar
IWDM 2008
Feature Selection
Page 16
July-22, 2008
Vikas Raykar
IWDM 2008
Outline of the talk
1. CAD as a classification problem
2. Problems with off-the-shelf algorithms
Assume training examples are independent.
Minimize classification accuracy.
3. Multiple instance learning
Notion of positive bags
A bag is positive if at-least one instance is positive.
4. Algorithm summary
Joint classifier design and feature selection.
Maximizes the performance metric we care about.
5. Results
Page 17
July-22, 2008
Vikas Raykar
IWDM 2008
Datasets used
Training set
144 biopsy proven malignant-mass cases.
2005 normal cases from BI-RADS 1 and 2 category.
Validation set
108 biopsy proven malignant-mass cases.
1513 normal cases from BI-RADS 1 and 2 category.
Page 18
July-22, 2008
Vikas Raykar
IWDM 2008
Patient level FROC curve for the validation set
Proposed method
is more accurate
Page 19
July-22, 2008
Vikas Raykar
IWDM 2008
MIL selects much fewer features
Total number of features
81
Proposed MIL algorithm
40
Proposed algorithm without MIL
56
Page 20
July-22, 2008
Vikas Raykar
IWDM 2008
Patient vs Candidate level FROC curve
Improves per-patient FROC at the cost of deteriorating per-candidate FROC
Message: Design algorithms to optimize the metric you care about.
Page 21
July-22, 2008
Vikas Raykar
IWDM 2008
Conclusions
A classifier which maximzes the performance
metric we care about.
Selects sparse models.
Very fast. Takes less than a minute to train for
over 10,000 patients.
No tuning parameters.
Improves the patient level FROC curves substantially.
Page 22
July-22, 2008
Vikas Raykar
IWDM 2008
Questions / Comments?
Page 23
July-22, 2008
Vikas Raykar
IWDM 2008