Title of Presentation

Download Report

Transcript Title of Presentation

Addressing Attrition:
Ultra-Dynamic Multi-Dimensional
Attrition Analytics with Tree Ensemble Models
Dr. Gerald Fahner
Senior Director Analytic Science
FICO
© 2014 Fair Isaac Corporation. Confidential.
This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.
Retaining Your Best Customers Vs. Acquiring Your
Competitors’ Best Customers
Lower Cost
Targeted Actions
Easier Identification
Acquisition
Higher Cost
Generic Offers
Unknown Risk/Gain
2
© 2014 Fair Isaac Corporation. Confidential.
Retention
Machine Learning and Daily Scoring Improve Key
Requirements
Precision
Speed
Insight
3
© 2014 Fair Isaac Corporation. Confidential.
Lower Cost
Targeted Actions
Easier Identification
Retention
Putting Attrition Scoring On Steroids
Traditional Attrition Scores
4
Ultra-dynamic Attrition Scores
Score monthly behavior rollups
Velocity
Score daily transaction behavior
Predict and preempt attrition
before it happens
Timing
Detect, intervene, re-engage
as attrition happens
Direct mail campaigns
Channel
Email, SMS, mobile
Customer attrition
Entity
Customer + category attrition
Manually developed models
Algorithm
Machine learning
© 2014 Fair Isaac Corporation. Confidential.
Daily Scoring Approach
► Daily
score predicts attrition risk
► Based
on daily activity, or lack thereof
► Prolonged inactivity signals higher attrition risk
► Engage
customer when attrition score exceeds some threshold
Customer uses card
Customer’s daily attrition risk score
… Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We …
5
© 2014 Fair Isaac Corporation. Confidential.
Transaction Dynamics Hold Key Information
► Given
information at time of scoring, who is more likely to attrite?
► Which
►
measures are most informative?
How to combine Recency and Frequency into predicting attrition risk?
Recency
Spence
Attrite?
Recency
Frequency
Days Since Last Card Use
Time of Scoring
Frequency
Attila
6
© 2014 Fair Isaac Corporation. Confidential.
Attrite?
Fraction of Days Card Used
How Machine Learning Complements Domain Expertise
Optimal path
Domain Experts
Great at intuiting key predictors
#
Machine Learning
1
Intuition doesn’t scale to many variables
Poor at combining multiple predictors
Lacks intuition
2
Excels at combining many derived features
into accurate probability predictions
Difficulties in quantifying uncertainty
4
Tell the “story” behind the numbers
3
Diagnose/visualize models to make sense
Product Showcase: 4:45–5:30: “A Power Tool for Prediction Exploration”:
Also at Solution Centers, 12:30 and 4:30
7
© 2014 Fair Isaac Corporation. Confidential.
Key Analytic Elements of Our Approach
8
Powerful machine
learning tools
►
►
Stochastic Gradient Boosting
Black-box model visualization
Rich set of relevant
variables/features
►
►
High-dimensional feature space of complex events
Based on Recency and Frequency
Problem-oriented
performance evaluation
►
►
Lift, portfolio profit gain
Out-of-sample/Out-of-time
© 2014 Fair Isaac Corporation. Confidential.
Stochastic Gradient Boosting
Jerome Friedman[1]
Prediction Function
Tree 1
Tree 2
Weighted
average
Scored
? New case
Predictors
Tree M
Score aggregates predictions
from many shallow trees
9
Score
Outcomes
Training Data
© 2014 Fair Isaac Corporation. Confidential.
Predictors
Demonstration Problem
Simulated Data
Outcomes
10
5
Noisy training samples
0
-5
-10
2
2
1
0
-1
-2
-2
Predictors
10
© 2014 Fair Isaac Corporation. Confidential.
-1
0
1
Predictive relationship from
which data were generated
(“ground truth”)
Stochastic Gradient Boosting: 1 Shallow Tree
10
Tree 1
5
Weighted
Average
0
-5
-10
2
1
0
-1
11
© 2014 Fair Isaac Corporation. Confidential.
-2
-2
-1
0
1
2
Stochastic Gradient Boosting: 5 Shallow Trees
10
Tree 1
5
Tree 5
Weighted
Average
0
-5
-10
2
1
0
-1
12
© 2014 Fair Isaac Corporation. Confidential.
-2
-2
-1
0
1
2
Stochastic Gradient Boosting: 200 Shallow Trees
10
Tree 1
5
Weighted
Average
0
-5
Tree 200
-10
2
1
0
-1
13
© 2014 Fair Isaac Corporation. Confidential.
-2
-2
-1
0
1
2
Addressing Customer Attrition
► Machine
14
Learning for Maximal Profit
© 2014 Fair Isaac Corporation. Confidential.
Credit Card Case Study
Project Design
► ~5
million accounts generated: ~1 billion transactions over 3 years
► Transaction information: Date, Merchant Code, Amount, Authorized Flag
Development
2 years
6 months
Observation Period
Performance
Period
Time of Scoring
Out of Time Validation
Attrition Definition
0/1 indicator of card activity
during performance period
15
© 2014 Fair Isaac Corporation. Confidential.
Observation Period
Performance
Period
Exclusions
Less than 3 transactions during observation period or
Card not used within 3 months prior to Time of Scoring
Statistical Measures of Model Performance
Lift and Precision
Target Top  %
High Scoring Accounts
with retention offer
Attriters
Non-attriters
Lift( %):


Fraction of Attriters Among Targeted
Base Attrition Rate
Precision
Base Attrition Rate
 # Attriters Among Targeted

# Targeted 

# Attriters Total
# Total

Low Scoring Accounts
16
© 2014 Fair Isaac Corporation. Confidential.

Relating Attrition Model Performance To Profit
17
Actual Behaviors of
Targeted Customers
Profit Contribution
per Customer
Fraction of Targeted Customers
with this Behavior
Would-be attriters
we persuade to stay
(CLV Gain
– Contact Cost
– Incentive Cost)
Precision * Persuasion Rate
Unpersuadable attriters
(No CLV Gain
– Contact Cost)
Precision * (1 – Persuasion Rate)
Non-attriters we target
erroneously
(No CLV Gain
– Contact Cost
– Incentive Cost)
1 – Precision
© 2014 Fair Isaac Corporation. Confidential.
Relating Model Improvement to Portfolio Profit Gain
Scott Neslin et al.[3]
Gain  B   A  N 0  CLV   1    is Portfolio Profit Gain
from improving model B over model A, where :
18
λA
Lift from model A
λB
Lift from model B
α
Targeting Fraction
5%
β0
Base Attrition Rate
8%
N
Portfolio Size
5 million
CLV
Customer Lifetime Value
$1,000
δ
Incentive Cost
$100
γ
Persuasion Rate
20%
© 2014 Fair Isaac Corporation. Confidential.
We will benchmark
alternative models
Portfolio specific
parameters and
assumptions
Benchmarking Models of Increasing Complexity
How much can we improve Lift and Profit by making models more complex?
Are more complex models robust over time?
Model 2: Interaction
model in R and F of
card use
R: Recency, F: Frequency
Model 1: Additive
model in R and F of card
use
Model 3: Interaction
model in R and F of
complex events
Examples:
►
Recent restaurant visit and frequent hotels
► More than $1,000 spent on airline last week
► Recent car deal and frequently at the pump
Dimensionality of Feature Space
19
© 2014 Fair Isaac Corporation. Confidential.
Experiment A
Do Recency and Frequency Interact in their Effect on Attrition?
► Predictors:
Recency and Frequency of card use
► Model
1: Additive, nonlinear in R and F
► Model 2: Capture interaction between R and F
Out-of-sample /
Out-of-time validation
1  6.03
2  6.54
 Gain  $2.86 MM s.t. portfolio assumptions
Capturing (R x F) interaction is profitable!
effect is in agreement with research by Fader and Hardie[4]
in the context of stochastic models for CLV
► Interaction
20
© 2014 Fair Isaac Corporation. Confidential.
Interaction Visualization Tells Story
Two-dimensional Partial Dependency Function[2]
Spence: R = 20, F = 0.05
Attila:
Probability to use
card during the
next 6 months
= 1 – Pr(Attrition)
Spence
R = 20, F = 0.55
Time of Scoring
Attila
Recency
Frequency
21
© 2014 Fair Isaac Corporation. Confidential.
Who is more
likely to attrite?
Experiment B
Do Complex Event Features Boost Model Performance?
► Define
R and F features for complex events
► Model
3: Candidate predictors include:
Card use events
+ Hundreds of merchant category events
+ Monetary events (defined by hitting spending bands)
+ No-authorization events
Out-of-sample /
Out-of-time
validation
λ3  7.52
Recall :
1  6.03
2  6.54
 Gain over Model 1 (simple, additive)  $8.34 MM s.t. portfolio assumptions
Predictors based on complex events are very profitable!
22
© 2014 Fair Isaac Corporation. Confidential.
Experiment C
Effects of Training Sample Size
Complex model keeps improving at
least until 500,000 training samples
Simpler model doesn’t benefit from
more than 60,000 training samples
To Train Complex Machine Learning Models, Use as Much Data as You Can!
23
© 2014 Fair Isaac Corporation. Confidential.
Addressing Category Attrition
24
© 2014 Fair Isaac Corporation. Confidential.
Marketing Benefit
► Category
attrition may signal early belt-tightening or competitive influences—
before total attrition occurs
► Early
detection informs relevant actions/interventions:
► Customer
dialogue
► Category incentives
► Product switch
► Terms adjustment
Overall
Customer Status
Food
Travel
Gas Station
25
© 2014 Fair Isaac Corporation. Confidential.
Trigger customer dialogue.
Perhaps offer incentives
at service stations.
Defining Category Attrition
► Cards:
stop buying from a merchant category while continuing card use
► Retailers:
stop buying from a department while continuing other store purchases
Category A
attriter?
Attriting
completely as
customers
1
No
2
No
3
Yes
Time of Scoring
26
© 2014 Fair Isaac Corporation. Confidential.
Gas Station Attrition Example
27
© 2014 Fair Isaac Corporation. Confidential.
What Have We Learned?
1. Daily scoring quickly detects emergent attrition
►
Retain most valued customers with rapid contact/offer
► Category attrition models inform offers
2. Machine learning enhances scale and insight
►
Automate development of multiple models
► Visualize models to gain understanding
3. With “Big Data”, complex models beat simpler
►
Portfolio profit gains are substantial
► Attrition model performance remains robust over time
Let us know if you’re interested in a Proof of Concept!
28
© 2014 Fair Isaac Corporation. Confidential.
References
[1] Greedy Function Approximation: A Gradient Boosting Machine, by Jerome Friedman,
The Annals of Statistics, 29(5), 2001, 1189-1232.
[2] Predictive learning via rule ensembles, by Jerome Friedman et al., The Annals of Applied
Statistics, 2(3), 2008, 916-954.
[3] Defection Detection: Measuring and Understanding the Predictive Accuracy of Customer
Churn Models, by Scott Neslin et al., Journal of Marketing Research, 43(2), 2006, 204-211.
[4] RFM and CLV: Using Iso-Value Curves for Customer Base Analysis, by Peter Fader,
Bruce Hardie, and Ka Lok Lee, Journal of Marketing Research, 42(4), 2005, 415-430.
29
© 2014 Fair Isaac Corporation. Confidential.
Thank You!
Dr. Gerald Fahner
[email protected]
++1 512 698 0609
© 2014 Fair Isaac Corporation. Confidential.
This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation’s express consent.
Learn More at FICO World
Related Sessions
►Research Showcase: A Power Tool for Prediction Exploration
►Applying Sequential Decisions for Customer Management
Products in Solution Center
►FICO® Model Builder
►FICO® Analytic Modeler
Experts at FICO World
►Michelle Davis
►Shafi Rahman
White Papers Online
►Big Data: Overhyped or Underexploited?
►Does AI + Big Data = Business Gain?
Blogs
►http://www.fico.com/en/blogs/category/analytics-optimization/
31
© 2014 Fair Isaac Corporation. Confidential.
Please rate this session online!
Dr. Gerald Fahner
[email protected]
++1 512 698 0609
32
© 2014 Fair Isaac Corporation. Confidential.