Presentation title option 1

Download Report

Transcript Presentation title option 1

UEA
Insurance Stats Overview
Steve Cant
Senior Statistics Manager, Aviva
steve.cant @ aviva.co.uk
01603 686857
The Elusive Advert - extracts
Modelling Opportunities at Aviva
Does modelling risk have any appeal ?
Are you interested in an actuarial career but don’t fancy years more study in your free
time ?
Do you have graduate level maths skills ?
Do you have any idea what financial statisticians do ?
Have you ever wondered how insurance premiums are calculated ?
Key Aspects of Role
Building risk cost models – to predict who will claim on their motor or household insurance
Behavioural modelling – to predict how customers will react to pricing decisions
Spatial analysis of postcode area in order to produce world leading maps of insurance risk
Extraction of deeper knowledge from large, already well understood data sets
R & D into new modelling and analytical techniques
Educated guesswork
Working with colleagues across the business including those in pricing, marketing, finance,
actuarial, claims and underwriting
You’ll apply your analytical enthusiasm to a range of business problems and produce
mathematical and statistical models that drive real results.
Products (Personal)
Motor
Bike
Van
Household
Pet
Travel
Breakdown
Creditor
PRICING PROCESS
Model Data
DATA
STATS
RISK MODEL
Cleanse Data
DATA / STATS
BURNING COST
+ expenses
+ commission
+ profit
MASS CUSTOMISED
PREMIUM MODEL
ACTUARIAL
Recalibration
CHANNELS / FINANCE /
UNDERWRITING
CORE RATES
Behavioural Models
Competitive positioning
Profitability Reviews
Price Optimisation
STATS
PRICING TEAMS
LIVE PREMIUMS
Maintenance
STREET RATES
EDD
Death
AM80 and AF80 2 years select q[x-t]+t
0.004500
0.004000
0.003500
0.003000
0.002500
0.002000
0.001500
0.001000
0.000500
0.000000
0
2
4
6
8
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
Age
Large Bodily Injury Claims – Major crashes
THIRD PARTY ONLY
8.0
7.0
About 1 in 5000 vehicle years
Relative Frequency
6.0
5.0
FE
MA
4.0
3.0
2.0
About 1 in 40000 vehicle years
1.0
Main Driver Age
10 years of data > £250K frequency
Still overwhelmingly random and rare – but can produce an index
Anything that influences risk is a rating factor
89
87
85
83
81
79
77
75
73
71
69
67
65
63
61
59
57
55
53
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
0.0
Motor Insurance Rating Factors
Postcode
Vehicle
MOTOR RISK COST MODEL – Ranked by Information Gain
1
Bodily Injury Freq District
14
Property Damage Freq District
2
Young Additional Driver Age
15
Own Damage District
3
NCD
16
Vehicle Age
4
Main Driver Age
17
Transmission
5
Young Additional Driver Sex
18
Theft Freq District
6
Car Group
19
Fuel Type
7
Ritz
20
Duration
8
Driving Restriction
21
Convictions
9
Payment Frequency
22
Licence Length
10
Make Model
23
YAD Owns Car
11
Ownership Length
24
PNCD
12
Mileage
25
Voluntary Excess
13
At Fault Claims
etc
30 other factors
Information gain is a weighted combination of factor range and exposure. E.g. age has
high loadings for low exposure, payment method has lower loadings on high exposure.
Insurance Premiums
Start with a base (average) premium
E.g. £400 (40 year old, 3 year old Ford Focus in Norwich, with full No Claims)
Then add various loadings and discounts
18 year old driver  200% loading
Lives in Liverpool  100% loading
Drives a small car  40% discount
Drives an old car  30% discount
No Claims Discount is zero  233% loading !
(5 years No Claims is a 70% discount)
£400 x 3 = £1200
x2
= £2400
x 0.6
= £1440
x 0.7
= £1008
x 3.33 = £3360 !
Harsh ?
How do we calculate these loadings ?
The Claims Universe
CHAOS
Undiscovered Order
NewORDER
factors
Improved modelling
Risk Modelling
Modelling Process (Motor)
5 Perils: Accidental Damage, Bodily Injury, Theft, Glass, Property Damage
2 Models per peril: Frequency = S No. of Claims Severity = S Cost of Claim
No. of Claims
Exposure
Exposure is the time on risk
E.g. for 1000 cars, one year each this is 1000 ‘vehicle years’
120 claims from these 1000 vehicle years => 120/1000 = 12% frequency
But why bother risk modelling at all ?
Multivariate Modelling
Why bother ?
Attempt to remove random effects (noise)
Avoid the illusions of variable association (Simpson’s paradox)
Consider all rating factors ‘together’ in order to discover ‘true effect’
Examine consistency over time
Ensure best possible prediction of future risk
Simpson’s Paradox
Berkley Sex bias case (Source : Wikipedia)
Breakdown by department
1973 Admission figures
Bias against women ?
Tables are OK for two factors, no use for 50
Linear Modelling
Simple Linear Modelling
LM expresses the relationship between an observed response (Y) and a set of predictors (X)
80
70
60
Cost
50
40
30
20
10
0
0
2
4
6
8
10
12
District
In its simplest form (first order) it can be conceptualised as
E(Y) = β0 + β1X
Y = β0 + β1X + ε
Where ε is an error term with expected value of 0
Linear Modelling
Method of Least Squares
In order to calculate estimates of the parameters β0 and β1 we use the method of
least squares.
This can be thought of as minimising the distance of each observed response yi
is away from the predicted value ŷi.
yi – ŷi
x
Remove
Outliers
We then extend this idea to n dimensions using matrices and Emblem software
Linear Modelling
Method of Least Squares
Minimize the Sum of Squared Errors;
By differentiating it can be shown that to minimise the SSE we must
solve the following;
Linear Modelling
Multiple Linear Modelling
What happens when we believe a number of variables affect the distribution of our random
variable Y ?
We still have the response variable Y but now instead of having a single predictor we have k
predictors which we denote as X1, X2,.., Xk
Now we want to fit the model
So the same basic idea (least squares) but now we’re using matrix notation rather than simple algebra
Matrix notation
Generalized Linear Model (GLM)
Basically An extension of Linear modelling that allows
Multiplicative models (using a ‘link function’) - more appropriate for insurance
A wider selection of errors (‘loss distributions’) from the exponential family
Normal Distribution
•
assumes each observation has the same fixed variance (no tail)
Poisson Distribution
•
assumes the variance increases with the expected value of each observation (longer tail)
Gamma Distribution
•
assumes variance increases with the square of the expected value (even longer tail !)
Emblem Software
• Raw data alone can lead to the wrong conclusion
Data Mining
Decision Trees – Visual carve up of account
Base 42% in Ritz 7-10
Age 50 +
Annual payers
67% in Ritz 7-10
Exposure 27%
Better Wealth Postcode
Proportion Ritz 1- 6
73% in Ritz 7-10
Exposure 17%
Proportion Ritz 7- 10
Ritz 7-10 Segmentation
NUD (NB & Renewals)
Policy Duration 3+
76% in Ritz 7-10
Exposure 9%
District – a quantum change in quality
2005
2010
X 10 Perils
THE END
Any questions ?
© Aviva plc