Transcript ROC

‫‪ROC & AUC, LIFT‬‬
‫ד"ר אבי רוזנפלד‬
Introduction to ROC curves
• ROC = Receiver Operating Characteristic
• Started in electronic signal detection theory
(1940s - 1950s)
• Has become very popular in biomedical
applications, particularly radiology and imaging
• ‫גם בשימוש בכריית מידע‬
Confusion matrix 1
Confusion matrix 2
P
N
P
N
P
20
10
P
10
20
N
30
90
N
15
105
FP
FN
Predicted
Precision (P) = 20 / 50 = 0.4
Recall (P) = 20 / 30 = 0.666
F-measure=2*.4*.666/1.0666=.5
Actual
Actual
False Positives / Negatives
Predicted
Different Cost Measures
• The confusion matrix (easily generalize to multi-class)
Actual
class
Yes
No
Predicted class
Yes
No
TP: True
FN: False
positive
negative
FP: False
positive
TN: True
negative
• Machine Learning methods usually minimize FP+FN
• TPR (True Positive Rate): TP / (TP + FN) = Recall
• FPR (False Positive Rate): FP / (TN + FP) = Precision
4
Specific Example
People without
disease
People
with
disease
Test Result
Threshold
Call these patients “negative”
Call these patients “positive”
Test Result
Some definitions ...
Call these patients “negative”
Call these patients “positive”
True Positives
Test Result
without the disease
with the disease
Call these patients “negative”
Call these patients “positive”
Test Result
without the disease
with the disease
False
Positives
Call these patients “negative”
Call these patients “positive”
True
negatives
Test Result
without the disease
with the disease
Call these patients “negative”
Call these patients “positive”
False
negatives
Test Result
without the disease
with the disease
Moving the Threshold: left
‘‘-’’
‘‘+’’
Test Result
without the disease
with the disease
ROC curve
True Positive Rate
(Recall)
100%
0%
0%
False Positive Rate
(1-specificity)
100%
‫ההשפעה של שינוי ה‪ THRESHOLD‬על הגרף‬
Figure 5.2 A sample ROC curve.
‫סוגים שונים של ‪ ROC‬גרפים‬
‫)‪Area under ROC curve (AUC‬‬
‫• מדד כללי‬
‫• השטח מתחת לגרך ‪ROC‬‬
‫• ‪ 0.50‬הוא מחירה רנדומאלי‪ 1.0 ,‬הוא מושלם‪.‬‬
AUC for ROC curves
100%
100%
0
%
True Positive Rate
True Positive Rate
AUC = 100%
0
%
0
%
False Positive Rate
100
%
0
%
False Positive Rate
100
%
100%
100%
0
%
False Positive Rate
True Positive Rate
True Positive Rate
AUC = 90%
0
%
AUC = 50%
100
%
AUC = 65%
0
%
0
%
False Positive Rate
100
%
Lift Charts
Model
40% of responses for 10%
of cost
Lift factor = 4
• X axis is sample size: (TP+FP) / N
• Y axis is TP
80% of responses for 40%
of cost
Lift factor = 2
Random
Lift factor
4
3.5
3
2.5
Lift
2
1.5
1
0.5
Sample Size
95
85
75
65
55
45
35
25
15
0
5
Lift Value
4.5
‫הקשר בין המדדים‬
‫לקראת התרגיל‪...‬‬
‫לחצן ימני על מודל ואז‬
Cost / Benefit Analysis for Wood
‫אפשר לשנות את הסף וגם לראות את ה‬
‫‪CONFUSION MATRIX‬‬
‫אפשר לראות גם את ה‪ Lift‬וגם השפעת מחיר‬