Transcript ROC
ROC & AUC, LIFT ד"ר אבי רוזנפלד Introduction to ROC curves • ROC = Receiver Operating Characteristic • Started in electronic signal detection theory (1940s - 1950s) • Has become very popular in biomedical applications, particularly radiology and imaging • גם בשימוש בכריית מידע Confusion matrix 1 Confusion matrix 2 P N P N P 20 10 P 10 20 N 30 90 N 15 105 FP FN Predicted Precision (P) = 20 / 50 = 0.4 Recall (P) = 20 / 30 = 0.666 F-measure=2*.4*.666/1.0666=.5 Actual Actual False Positives / Negatives Predicted Different Cost Measures • The confusion matrix (easily generalize to multi-class) Actual class Yes No Predicted class Yes No TP: True FN: False positive negative FP: False positive TN: True negative • Machine Learning methods usually minimize FP+FN • TPR (True Positive Rate): TP / (TP + FN) = Recall • FPR (False Positive Rate): FP / (TN + FP) = Precision 4 Specific Example People without disease People with disease Test Result Threshold Call these patients “negative” Call these patients “positive” Test Result Some definitions ... Call these patients “negative” Call these patients “positive” True Positives Test Result without the disease with the disease Call these patients “negative” Call these patients “positive” Test Result without the disease with the disease False Positives Call these patients “negative” Call these patients “positive” True negatives Test Result without the disease with the disease Call these patients “negative” Call these patients “positive” False negatives Test Result without the disease with the disease Moving the Threshold: left ‘‘-’’ ‘‘+’’ Test Result without the disease with the disease ROC curve True Positive Rate (Recall) 100% 0% 0% False Positive Rate (1-specificity) 100% ההשפעה של שינוי ה THRESHOLDעל הגרף Figure 5.2 A sample ROC curve. סוגים שונים של ROCגרפים )Area under ROC curve (AUC • מדד כללי • השטח מתחת לגרך ROC • 0.50הוא מחירה רנדומאלי 1.0 ,הוא מושלם. AUC for ROC curves 100% 100% 0 % True Positive Rate True Positive Rate AUC = 100% 0 % 0 % False Positive Rate 100 % 0 % False Positive Rate 100 % 100% 100% 0 % False Positive Rate True Positive Rate True Positive Rate AUC = 90% 0 % AUC = 50% 100 % AUC = 65% 0 % 0 % False Positive Rate 100 % Lift Charts Model 40% of responses for 10% of cost Lift factor = 4 • X axis is sample size: (TP+FP) / N • Y axis is TP 80% of responses for 40% of cost Lift factor = 2 Random Lift factor 4 3.5 3 2.5 Lift 2 1.5 1 0.5 Sample Size 95 85 75 65 55 45 35 25 15 0 5 Lift Value 4.5 הקשר בין המדדים לקראת התרגיל... לחצן ימני על מודל ואז Cost / Benefit Analysis for Wood אפשר לשנות את הסף וגם לראות את ה CONFUSION MATRIX אפשר לראות גם את ה Liftוגם השפעת מחיר