E/M Algorithm and Bayesian Model Averaging for Ensemble

Download Report

Transcript E/M Algorithm and Bayesian Model Averaging for Ensemble

It Never Rains But It Pours:
Modeling Mixed DiscreteContinuous Weather
Phenomena
J. McLean Sloughter
Based on research being conducted under Adrian E. Raftery and Tilmann Gneiting
This work was supported by the DoD Multidisciplinary University Research Initiative (MURI)
program administered by the Office of Naval Research under Grant N00014-01-10745.
Ensemble Forecasting
• Single forecast model is run multiple times
with different initial conditions
• Ensemble mean tends to outperform
individual members
• Spread-skill relationship: spread of
forecasts tends to be correlated with
magnitude of error
• Model is underdispersive (not calibrated)
Spread/Skill Plot
Spread = max forecast – min forecast
Skill = abs(forecast mean – observed)
Ensemble Member Forecasts
48-hour forecasts for precipitation at 5pm Oct 20, 2003
From http://www.atmos.washington.edu/~emm5rt/ensemble.cgi
Bayesian Model Averaging
• Weighted average of multiple models
• Weights determined by posterior
probabilities of models
• Posterior probabilities given by how well
each member fits the training data
• Weights, then, give an indication of the
relative usefulness of ensemble members
BMA for ensembles
k
p(Y | f1 ,..., f k )   wi p(Y | fi )
i 1
where
f i is the forecast from member i,
wi is the weight associated with member i, and
p(Y | fi ) is the estimated distribution function for Y given member i
Picture taken from Raftery, Balabdaoui, Gneiting, and Polakowski (2003),
“Calibrated MesoscaleShort-Range Ensemble Forecasting Using Bayesian Model Averaging.”
The Trouble With Our Models
• Forecasts never predict zero – artifact of
differential equations used to create
forecasts
• Observed wind speed is often zero
• Wind speed, even ignoring zeroes, is not
normally distributed
What Wind Speed Looks Like
Wind Speed Histogram –
Wind Speed Histogram
Several exceptionally
truncated to only go up to
high values make it harder fifty – there is a spike at zero
to see clearly
Wind Speed Histogram
without zeroes
What Forecasts Look Like
Forecast histogram on left (all eight forecasts have similar histograms)
and observed histogram on right – even after removing zeroes from the actual
histogram, the shape is still not quite right – actual is more sharply skewed.
The Problem With Reality
• As we saw, the histograms for forecasts do not
match the histogram for observations very well
• Maximal observed value is 124.000
• Maximal values for each model:
AVN
CMCG Eta
GASP
JMA
NGPS TCWB UKMO
44.591 54.566 44.722 51.732 51.829 45.383 45.125 49.243
More Trouble With Reality
AVN
CMCG Eta
GASP
JMA
NGPS
TCWB UKMO Y
1.000
0.826
0.845
0.797
0.795
0.783
0.731
0.840
0.417
CMCG 0.826
1.000
0.822
0.807
0.819
0.797
0.770
0.805
0.402
Eta
0.845
0.822
1.000
0.789
0.793
0.781
0.747
0.800
0.406
GASP
0.797
0.807
0.789
1.000
0.788
0.779
0.747
0.786
0.394
JMA
0.795
0.819
0.793
0.788
1.000
0.788
0.756
0.783
0.400
NGPS
0.783
0.797
0.781
0.779
0.788
1.000
0.757
0.779
0.388
TCWB 0.731
0.770
0.747
0.747
0.756
0.757
1.000
0.721
0.384
UKMO 0.840
0.805
0.800
0.786
0.783
0.779
0.721
1.000
0.415
Y
0.402
0.406
0.394
0.400
0.388
0.384
0.415
1.000
AVN
0.417
Pairwise correlations – Y is observed value, others are the various forecasts
Time Trends
Left - average observed wind speed per day
Right – same, but smoothed to average over 3-week interval
Time Trend Troubles
• Higher winds in summer, lower in winter
(note that this appears to be an odd trait of
the northwest)
• Need model to reflect seasonal patterns
• Would still like to just have a simple model
based on forecasts
A Recap Of All The Things That
Make Life Interesting and Miserable
• Distribution not normal
• Time
• Forecasts not very highly corellated with
observations
• Zeroes
What to do about distributions?
• Model using another distribution – Gammas and
Weibulls are popular models for windspeed
• Can apply a transformation to the data
Left: Root of forecast windspeed from model 1
Right: Root of observed windspeed (excluding zeroes for easier visualization)
What to do about time?
• Rather than using all available data as
training set, only train on recent data
• Trade-off between lower variance of
estimates with more data, and better
picture of current trends with less data
• Previous research (on temperature and
pressure) indicates 40-day window of
training data seems to give a good
balance
What to do about the forecasts?
• We’re not making the forecasts, just using them
• We can apply a bias-correction by performing a
linear regression of observed on forecasts (this
is commonly done in forecasting already)
• We can see from our weight terms which models
perform better and which perform worse, and
report that to the folks making the forecasts
• We can hope that the science of meteorology
continues to move forward as it has thus far
What to do about zeroes?
• We need a model that includes a point mass at
zero
• Two main possibilities:
• We could model a weighted average of eight
distributions, each of which is a normal plus a
point mass at zero
• Or, we could first model probability of zero or
non-zero, then, conditioned on non-zero, the
weighted average of eight normals
• We will pursue the second option for now
So, let’s get to it then
• Probability of zero can be modeled by a logistic
regression on the eight forecasts
• Then, the weighted average of normals can be
determined by the EM algorithm
• Assume each normal has the bias-corrected forecast as
its mean, and has a constant variance
• Alternate between predicting membership based on
weights and variances, and weights and variances
based on membership
• Make sure to also include probability of being non-zero
when evaluating our functions
Let’s try out a simple test case first
• Generate a sample of 100,000 ordered triples
(x1, x2, y), x1 uniform over 30 to 50, x2 uniform
over 10 to 20
• logistic regression coefficients of a=10, b1=-.2,
b2=-.6
• with probability determined by logistic
regression, y=0
• otherwise, with probability .6, y is normal around
x1 with sd of 1, and with probability .4 is normal
around x2 with sd of 3.14
How did we do?
• Predicted logisitic coefficients of a=10.257,
b1=-0.206, b2=-0.600
• weights of 0.598 and 0.402
• sds of 1.000 and 3.414
• Seems to be able to model pretty well
under ideal artificial conditions
• So now let’s try the real thing
How do we do?
RMSE from creating forecasts for 33 days, using 40 day training periods –
black is without modeling zeroes, red is with modeling zeroes
Iterations to convergence
again, black is without modeling zeroes, red is with
How do we feel about this?
• Including modeling of zeroes doesn’t
appear to help our error much (p=0.4734),
which is somewhat disappointing
• However, we get our model much faster
(p=2.104e-08), which is a concern when
having to do a lot of these
What We Would Have Done Next
Had We Not Been Distracted By
More Pressing Matters
• Consider fitting different distributions rather than using a
transformation
• Fit a model with point masses at zero for each individual
component
• Try additional bias corrections
• Compare results for different training windows
• Investigate importance of starting values in EM algorithm
• Evaluate performances of prediction intervals rather than
just prediction means
Three Months Later…
• In the distraction interim, precipitation data
became available
• Precipitation forecasts tend to be better
than wind forecasts
• Weather people tend to be more
interested in precipitation forecasts
• And so, we have abandoned (for now)
wind speed, and are looking at
precipitation instead
How is rain different?
• Distribution doesn’t look normal, even
under transformations
• Models do predict zero, but we still see
point masses at zero
Conditional Histograms
Observed given forecast from 1.5 to 4
Observed given forecast from 55 to 80
in .01”
Fitting Gammas
Shape Parameter
Rate Parameter
Coef. Of Variation
Something’s Not Quite Right
Sample Mean
Estimated Mean
At least something worked out
nicely
Proportion of Zeroes
And Now?
• First, find out what’s going funny with our
gamma fitting
• Then, try to come up with a way to do
some sort of gamma fitting in the EM
algorithm
• Then, look at all those things we wanted to
look at before