Evaluating Predictive Models CAS Predictive Modeling Seminar Glenn Meyers ISO Innovative Analytics

Download Report

Transcript Evaluating Predictive Models CAS Predictive Modeling Seminar Glenn Meyers ISO Innovative Analytics

CAS Predictive Modeling Seminar
Evaluating Predictive Models
Glenn Meyers
ISO Innovative Analytics
October 5, 2006
Choosing Models
• Predicting losses for individual insurance
policies involves:
– Millions of policy records
– Hundreds (or thousands) of variables
• There are a number of models that provide
good predictions
– GLM, GAM, CART, MARS, Neural Nets, etc.
• Business objectives influence choice of
model
The Modeling Process
• Modeling process involves dimension
reduction techniques
– Clustering, Principal Components, Factor
Analysis
– Building submodels and using predicted
values as input into a higher level model
• The modeling cycle
– 1. Build model with training data
– 2. Evaluate model with test data
– 3. Identify improvements in models and data
– 4. Go back to Step 1
Hidden Parameters
• Classic model building methods correct for
the number of parameters using “degrees
of freedom.”
• The model exploration process “eats up
degrees of freedom” in ways that cannot
be captured by formal model adjustments.
• In essence the “test” data gets merged
into the “training” data.
What Is Significant?
• Statistical packages will often identify
improvements that are “statistically
significant” but not “practically significant.”
• This talk is about determining when a
model identifies “practically significant”
improvements.
• Illustrate how to do this on a real example.
The Example
A Personal Auto Model Under Development
Preliminary Results
• Input – Address of insured vehicle
• Output – Address Specific Loss Cost
– 30 year old, single car with no SDIP points
– 500 deductible or 25/50/25 policy limits
– Symbol 8, model year 2006
– etc.
• Model derived from over 1,200 variables
reflecting weather, traffic, demographic,
topographical and economic conditions.
Difference Between
Address Specific and ISO Territory Loss Cost
Differences Abound
Some Questions to Ask
• Can the model output be used to improve
insurer underwriting results?
• Are the results statistically significant?
Define ELI
Address Specific Loss Cost
Expected Loss Index 
ISO Territory Loss Cost
Use Expected Loss Index for Risk Selection
Expected Loss Index
Loss Ratio %
Less than 75%
69.7
Between 75 and 100%
85.8
Between 100 and 125%
109.7
Greater than 125%
159.5
Denominator = Full ISO Loss Cost
Propose a Standard Way of
Evaluating Lift – The Gini Index
• Originally proposed by Corrado Gini in 1912
• Most often used to measure income and/or
wealth inequality
– Search for “Gini” in wikipedia.org
• In insurance underwriting, we want to
evaluate systematic methods of finding
“loss” inequality.
Gini Index
• Look at set of policy
records below cutoff
point, ELI < 1.
• This set of records
accounts for 59% of
total ISO (full) loss
cost.
• This set of records
accounts for 48% of
total loss.
• 1 − 48/59 → 19%
reduction in loss ratio.
Gini Index
• Do this calculation
for other cutoff
points.
• The results make
up the what we call
the Lorenz Curve
Gini Index
• If ELI is random, the
Lorenz curve will be on
the diagonal line.
• The Gini index is the
percentage of the area
under the “random”
line that is above the
Lorenz curve.
• Higher Gini means
better predictive
model.
A Gini Index Thought Experiment
• If we had the ability
to predict who will
have losses, what
would the Gini
index be?
• It would be 100% if
only one risk had
all the losses
Bodily Injury
Property Damage
Collision
Statistical Significance
• How much random fluctuation is in the
Gini index calculation?
• Use bootstrapping to evaluate
– Take a random sample of records, with
replacement.
– Calculate Gini index for the sample.
– Repeat 250 times.
• Plot a histogram of the results.
Bootstrap Results
Summary
•
Standard tests of statistical significance are
suspect.
–
–
•
•
Informal model selection process
Statistical/Practical significance
Propose Gini index as a test of practical
significance.
Divide data into three samples
1. Training – Used to fit models
2. Test – Used to evaluate fits
3. Holdout – “Final” evaluation
2
R