Deterministic and Probabilistic prediction approaches in Seasonal to Inter-annual climate forecasting Christopher Oludhe Department of Meteorology University of Nairobi P.

Download Report

Transcript Deterministic and Probabilistic prediction approaches in Seasonal to Inter-annual climate forecasting Christopher Oludhe Department of Meteorology University of Nairobi P.

Deterministic and Probabilistic prediction
approaches in Seasonal to Inter-annual
climate forecasting
Christopher Oludhe
Department of Meteorology
University of Nairobi
P. O. Box 30197, Nairobi
KENYA
Email: [email protected]
RA 1 EXPERT MEETING ON THE APPLICATION OF CLIMATE FORECASTS FOR
AGRICULTURE
Banjul, Gambia, 9-13 December 2002
Introduction


Extreme weather and climate events are
known to have major negative impacts on
various sectors of the economy in many
countries of Africa
Advances in the science of weather and
climate prediction and more so, seasonal
to interannual prediction has made it
possible to predict climate with improved
accuracy in a time-spans ranging from
seasons to over a year in advance.
Introduction Cont..


Such knowledge can be used to minimise
destruction of property, loss of life,
enhance food and agricultural production
as well as provide critical information for
required for decision-making.
The objective of this presentation is to
highlight some of the approaches to
deterministic and probabilistic seasonal
to interannual climate prediction
applicable to Africa.
Some Basic Ideas on Probability
and Statistics


In dealing with climatic data, one usually
handles only a small proportion of all the
possible values of interest referred to as
a sample whereas the entire dataset
would constitute a population.
Statistics is a tool that allows the
sample data to be analysed and make
inferences (decisions) concerning the
entire population.
Cont..


Some of the useful statistical parameters that
can be obtained from sample data include
Mean and standard deviation among
others and these can be used to describe the
entire range of possible values.
Partitioning ranked dataset into various
categories, e.g. Terciles, Quartiles and
Percentiles can provide valuable information
on the range of values in a distribution.
Understanding probabilities

The probability of an event occurring
may be defined as the ratio of the
number of possible outcomes in an
event to the total number of
outcomes in the sample space.
Num berof possibleoutcom es
Pr obability
Total num berof cases

An Event in this case could be ‘Rain
tomorrow’, ‘3 or more cyclones next
year’ etc.
Some Basic Rules of Probability



For any event A, the probabilities lie
between 0 and 1, i.e. 0  P( A)  1
Zero probability implies that the event is
unlikely to occur or impossible while a
probability of 1 implies that the event is
certain.
The compliment of an event A (i.e. NOT
A) is the event that happens exactly
when A does notP(occur.
A)  1  P( A)
Cont..


Intermediate probabilities between 0 and 1
can also be stated. For instance, if a
forecaster states that the probability of rain
tomorrow is 0.25 (also expressed as 1/4,
or as 25%), it implies that it is 3 times as
likely not to rain as it is to rain.
Conditional probability is the probability of
an event occurring given that another
event has occurred.
Differences between Deterministic and
Probabilistic Forecasts





Forecast can be presented either as
deterministic or probabilistic.
Short-term forecasts (Nowcasts) are almost
entirely deterministic in the sense that they
state exactly what is to happen, when and
where.
Examples of deterministic forecasts may be
given by statements such as
(i) Rainfall will be above average this season
(ii) Rainfall will be 50% above average this
season.
Cont..


Probabilistic forecasts on the other hand are
forecasts that give the probability of an event of
a certain range of magnitudes occurring in a
specific region in a particular time period. An
example would be: There is a 70% chance that
rainfall will be above average in the coming
season.
This implies that in many past occasions (using
historical information), the calculations has led
to an estimate of 70% probability of the
observed rainfall actually being above average
Cont..


In general, longer times scale forecasts
such as seasonal to interannual are
mostly probabilistic and the forecasts
are usually given in probabilistic ranges
for the season.
It is however possible to generate
probabilistic forecasts from deterministic
ones by assigning probabilities to such
Approaches to Developing Probabilistic
Seasonal Forecasts



Seasonal climate forecasting
procedures usually start by examining
historical climate records (Rainfall or
Temperature), or a climatological
database.
The database should be long enough,
complete and of good quality.
A standard 30-year period, such as
Cont..


Various statistical analyses can then be
carried out on the historical data and some
relationships determined between
potential predictors (e.g., ENSO and
associated teleconnections) and the
predictand (Rainfall or Temperature).
The common predictor choices are usually
lagged SST anomalies over the global
oceans that are considered pertinent to
Cont..


The statistical associations between lagged
SST indices and the climate variables can
be found by performing simple correlation
analysis between these variables.
Once these have been established, what
follows would be to develop either simple
linear regression or multiple linear
regression (deterministic) models relating
the predictors with the predictands and use
the model for prediction.
Cont..


Several other potential predictors such
as QBO, SOI and SST gradients among
others can be included in the
development of the regression model.
Stepwise regression technique
(forward/backward) can be applied in
selecting the best predictors that are to
be included in the multiple linear
regression equation.
Stepwise Regression analysis



Forward Selection: In this procedure, only the best
potential predictors that improves the model the
most, are examined individually and added into the
model equation, starting with the one that explains
the highest variance, etc.
Backward Elimination: The regression model
starts with all potential predictors and at each step
of model construction, the least important predictor
is removed until only the best predictors remain.
A stopping criteria should be selected in both cases.
Use of the Linear Regression Fit


A linear regression model may depict positive
or negative association between the
predictors and the predictand.
Using this type of relationship, it is possible to
make qualitative statements regarding the
expected value of the predictand for the
coming season(s) if knowledge of the
seasonal lags of the predictor indices can be
obtained in well in advance.
Goodness of fit measure



The goodness of “fit” of a linear regression model
can be determined by examining the mean-squared
error (MSE) in the ANOVA table output. This
measure indicates the variability of the observed
values around the forecast regression line.
A perfect linear relationship between the predictor
and predictand gives an MSE of zero, while poor
fits results in large values of MSE.
Another measure of the fit of a regression is the
coefficient of determination (R2) which is, the
squared value of the Pearson correlation coefficient
between predictor and predictand.
Cont..
Qualitatively, R2 can be interpreted as
the proportion of the variance of the
predictand that is described or accounted
for by the regression.

For a perfect regression, the R2 = 1,
while for R2 close to 0 indicates that very
little of the variance is being explained by
the regression line.

Examples of model fits
Observed (VOI)
FCST 2002
4.0
3.0
1.0
-1.0
-2.0
Z7 = -0.04 + 0.30*NPA2M1 - 0.68*SPA1M1 - 0.49*SPA4M1 - 0.53*ZAFM11
-3.0
-4.0
Year
R2 = 64%
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
1975
1973
1971
1969
1967
1965
1963
0.0
1961
Stan d ar d ized Rain fall An o m alies
2.0
Examples cont..
O b s e rve d (K E R UG O YA )
F c s t2 0 0 2 V 2
4.0
2.0
1.0
-2.0
Z8 = 0.04 - 0.46SIN3M1 + 0.83NPA1M12 - 0.59NPA1M1 + 0.37MIB3M1
-3.0
-4.0
Ye a r
R2 = 61%
2001
1999
1997
1995
1993
1991
1989
1987
1985
198 3
1981
1979
1977
1975
1973
1971
1969
1967
1965
-1.0
1963
0.0
1961
S ta nda rdiz e d Ra infa ll Anom a lie s
3.0
Examples cont..
O b s e rve d (G A R B A T U L L A )
Fcst 2002
3 .0
1 .0
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1 981
1979
1977
1975
1973
1971
1969
1967
1965
1963
0 .0
1961
S ta n d a r d iz e d R a in fa ll An o m a lie s
2 .0
-1 .0
-2 .0
Z9 = -0.07 - 0.40*NATM12 + 0.36*NPA3M12 + 0.47*SPA2M1 - 0.41*SAT2M1
-3 .0
Y e ar
R2 = 49%
Verification of Forecast Skills
A number of quantities can be computed
as a means of verifying forecast skills.
These include:
• Accuracy: Is a general term indicating
the level of agreement between the
Predicted value and the Observed value.
The Error is the difference between the
two values and the smaller the error the
greater the accuracy
Cont..
• Skill: This measures the accuracy of a
given forecast relative to the accuracy
of forecasts produced by some
standard procedure. Skill scores
provide a means of accounting for
variations in accuracy that have nothing
to do with the forecaster's ability to
forecast.
Cont..
• Reliability or bias: This may be defined as
the average agreement between the forecast
value of an element and the observed value.
A positive bias indicates that the forecast
value exceeds the observed value on the
average, while a negative bias corresponds
to under forecasting the observed value on
average.
• Sharpness is the tendency to forecast
extreme values
Computing Skill Scores
Forecast Category
Total
Above-Normal
Near-Normal
Below -Normal
AboveNormal
A11
A12
A13
J
NearNormal
A21
A22
A23
K
BelowNormal
A31
A32
A33
L
M
N
Total
O
T
Hit Score (HS)

Hit Score (HS): Number of times a
correct category is forecasted
A11 + A22 + A33
PercentCorrect =
* 100
T
Post Agreement
Post agreement is the number
of correct forecasts
made
divided by the number
of
forecasts
for
each
category.These are:
A11/M,
A22/N,
False Alarm Ratio (FAR)






False Alarm Ratio (FAR): The fraction of
forecast events that failed to materialize
Best FAR=0; worst FAR=1
For Above-Normal=(A21 + A31)/(M)
For Near-Normal=(A12 + A32)/(N)
For Below-Normal=(A13 + A23)/(O)
FAR = 1 – Post Agreement of the
extreme
event
Probability of Detection (POD)



This is the number of correct forecasts
divided by the number observed in each
category. It is a measure of the ability to
correctly forecast a certain category, and is
sometimes referred to as "Hit rate"
especially when applied to severe weather
verification. POD for the three different
categories are:
A11/J,
A22/K,
Bias






Bias: Comparison of the average
forecast with the average observation
Bias > 1 : over-forecasting
Bias < 1 : under-forecasting
For Above-Normal=(M)/(J)
For Near-Normal=(N)/(K)
For Below-Normal=(O)/(L)
Heidke Skill Score
This is given by
where
chance,
R-E
SS =
T-E
R = Total number of correct forecasts,
T = Total number of forecasts,
E = number expected to be correct based on
persistence, climatology
JM + KN + LO
( A11  A22  A33 ) T
HSS =
JM + KN + LO
TT
Computational Example
Zone 8 (Kerugoya)
Forecasts
OBS.
Dry
Normal
Wet
Total
Dry
3
0
0
3
Normal
1
2
1
4
Wet
1
1
2
4
Total
5
3
3
11
Percent Correct = 64%
Dry
Probability of Detection
Post Agreement
100
False Alarm (1st Order)
Hit Skill score (HSS) =
20
0.46
Normal
50
60
Wet
50
67
67
0
Presenting Forecasts in terms
of Probabilities
• A standard format for presenting seasonal
forecasts is by assigning percentage
probabilities into what are known as terciles.
• Terciles basically consist of three ranges of
values that are used to represent three broad
sectors of a normal probability distribution
with equal chances of occurrence,
climatologically, namely the lower, middle,
and upper thirds of the expected distribution
of values.
Cont..
• For example, a typical seasonal forecast may be
presented as (45, 30, 25) and this can be
interpreted as a 45% chance of seasonal total
precipitation being in the upper (Wet) tercile, a 30%
chance of it being in the middle or (Near normal)
category, and a 25% chance that it may fall in the
lowest third (Dry) category.
• Seasonal forecasts presented in this way not only
indicate the most likely outcome for the upcoming
months or seasons, but also the distribution of
possible outcomes. Such forecasts can take all the
Example of tercile groupings
Year
1982
1983
2000
1984
1990
1995
1998
1976
1964
Zone1
-1.51
-1.50
-1.46
-1.11
-0.99
-0.98
-0.98
-0.98
-0.95
Year
1973
2000
1984
1965
1994
1999
1979
1995
1992
Zone2 Year Zone3 Year
-1.68
1980 -1.39 1993
-1.49
2000 -1.28 1972
-1.40
1973 -1.15 1980
-1.33
1983 -1.06 1984
-1.05
1976 -0.98 1983
-1.02
1999 -0.83 1992
-0.70
1994 -0.83 2000
-0.68
1986 -0.74 1996
-0.62
1963 -0.60 1965
Zone4
-1.34
-1.26
-1.19
-1.17
-0.98
-0.96
-0.92
-0.84
-0.74
Year
1980
1961
1988
1974
1969
1965
1977
1992
1993
Zone5
-2.15
-1.86
-1.40
-1.33
-1.31
-1.28
-1.14
-1.09
-0.83
Year
1961
1969
1977
1971
1974
1965
1964
1970
1985
Zone6
-1.84
-1.78
-1.39
-1.20
-1.19
-0.82
-0.80
-0.79
-0.78
Year
2000
1984
1975
1993
1985
2001
1980
1976
1961
Zone7 Year Zone8 Year
-1.14
2000
-1.97 2000
-1.12
1984
-1.85 1984
-1.06
1961
-1.42 1993
-1.06
1972
-1.18 1976
-1.05
1969
-1.10 1972
-0.99
1974
-1.05 1983
-0.92
1965
-1.01 1967
-0.79
1980
-1.00 1963
-0.75
1987
-0.97 1994
Zone9 Year Zone10 Year Zone11 Year Zone12
-1.63
1973 -1.79 1993
-1.51
2000
-2.36
-1.56
1984 -1.72 1984
-1.32
1993
-1.45
-1.41
2000 -1.46 2000
-1.31
1971
-1.33
-1.40
1976 -1.28 1986
-1.04
1979
-1.14
-1.27
1969 -1.13 1976
-0.96
1980
-1.05
-1.10
1993 -1.02 1983
-0.92
1961
-0.89
-1.03
1992 -0.97 1991
-0.91
1973
-0.87
-0.87
1961 -0.95 1995
-0.86
1997
-0.76
-0.74
1972 -0.94 2001
-0.84
1969
-0.75
2001
1992
1961
1963
-0.90
-0.88
-0.87
-0.81
1971
1972
1969
1961
-0.61
-0.59
-0.57
-0.50
1974
1991
1965
1992
-0.59
-0.59
-0.59
-0.58
1973
1985
2001
1994
-0.70
-0.64
-0.63
-0.50
1995
1989
2001
1987
-0.76
-0.69
-0.46
-0.40
1993
1976
1980
1962
-0.76
-0.65
-0.65
-0.63
1977
1983
1963
1971
-0.72
-0.71
-0.63
-0.62
1976
1996
1997
1973
-0.75
-0.70
-0.65
-0.57
1962
1999
1971
1970
-0.55
-0.50
-0.48
-0.42
2001
1965
1983
1986
-0.91
-0.90
-0.83
-0.80
1980
1965
1972
1992
-0.68
-0.59
-0.47
-0.43
2002 -0.70
1964
1978
1992
-0.69
-0.67
-0.63
1972
1981
1971
1973
1999
-0.75
-0.58
-0.52
-0.51
-0.26
1980
1976
1966
1993
1991
-0.49
-0.48
-0.47
-0.37
-0.34
1987
1984
2001
1998
1982
-0.57
-0.52
-0.43
-0.41
-0.41
1976
1970
1963
1987
1964
-0.47
-0.47
-0.43
-0.41
-0.24
1972
1990
1976
1975
1967
-0.40
-0.33
-0.31
-0.29
-0.26
1990
2001
1992
2000
1989
-0.58
-0.51
-0.39
-0.32
-0.29
1989
1962
1995
1965
1986
-0.62
-0.57
-0.57
-0.54
-0.49
1989
1983
2001
1975
1971
-0.45
-0.36
-0.27
-0.18
-0.16
1964
1973
1980
1998
1977
-0.41
-0.41
-0.41
-0.37
-0.34
1996
1975
1982
1991
1971
-0.70
-0.66
-0.55
-0.20
-0.19
1982
1979
1996
1973
1994
-0.38
-0.34
-0.29
-0.27
-0.19
1982
1972
1974
1983
1984
-0.61
-0.60
-0.53
-0.42
-0.42
1994
1996
1993
1969
-0.25
-0.23
-0.10
-0.03
1975
1996
2001
1990
-0.26
-0.20
-0.17
-0.17
1966
1988
1989
1979
-0.34
-0.33
-0.32
-0.27
1974
1977
1999
1966
-0.21
-0.17
-0.12
-0.11
2002 -0.19 2002 -0.28
1981
1971
1962
-0.10
-0.07
-0.07
1981
1975
1984
-0.26
-0.25
-0.21
1964
1973
1988
1999
-0.42
-0.40
-0.35
-0.35
1978
1999
1990
1993
-0.16
-0.15
-0.02
0.13
1965
1986
1966
1985
-0.30
-0.29
-0.25
-0.19
1981
1989
1966
1995
-0.18
-0.06
0.03
0.17
1966
1961
1964
1969
-0.16
-0.14
-0.11
-0.09
1990
1976
1966
1965
-0.36
-0.23
-0.22
-0.18
1991
1988
1970
1980
1987
0.03
0.06
0.27
0.30
0.48
1978
1964
1986
1962
1997
-0.11
-0.03
0.00
0.00
0.04
1961
1967
1996
1970
1972
-0.24
-0.16
-0.07
-0.07
0.09
1961
1969
1967
1982
1991
-0.04
-0.03
0.00
0.02
0.16
2000
1985
1966
1984
1973
-0.05
0.05
0.24
0.30
0.40
1967
1988
1987
1973
1972
-0.19
-0.16
0.01
0.05
0.20
2002 -0.26
1982
1992
1996
1997
-0.24
-0.23
0.04
0.06
1962
1992
1979
1986
1985
0.16
0.17
0.18
0.20
0.35
1989
1996
1974
1988
1995
-0.16
-0.05
0.08
0.10
0.19
1994
1998
1999
1967
2002
0.17
0.17
0.38
0.43
0.49
1975
1978
1968
1974
1963
-0.05
-0.01
0.05
0.14
0.15
1999
1975
1989
1998
1995
-0.09
-0.06
-0.06
0.02
0.04
1978
1986
1997
1989
1975
0.56
0.57
0.62
0.64
0.72
1983
1998
1988
1989
1974
0.23
0.29
0.37
0.37
0.39
1985
1975
1997
1995
1971
0.09
0.39
0.49
0.56
0.61
1986
1962
1995
1988
1998
0.23
0.29
0.36
0.44
0.56
1996
1983
1964
1999
1991
0.45
0.52
0.62
0.68
0.68
1999
1991
1995
1978
1966
0.21
0.36
0.36
0.66
0.67
1987
1974
1998
1994
1970
0.11
0.17
0.30
0.36
0.38
1995
1966
1968
1982
1991
0.43
0.45
0.47
0.55
0.59
1987
1992
1978
2001
1975
0.22
0.24
0.41
0.46
0.63
1964
1990
1979
1987
1977
0.54
0.56
0.57
0.57
0.66
1985
1987
1997
1971
1962
0.19
0.32
0.36
0.38
0.55
2001
1986
1967
1968
1970
0.15
0.37
0.44
0.70
0.72
2002
0.82
1966
1962
1965
0.89
1.03
1.09
1970
1987
1977
1963
0.46
0.71
1.04
1.24
1977
1993
1969
1964
0.64
0.64
0.66
0.86
1981
1975
1979
1997
0.73
0.83
0.86
0.96
1997
1979
1970
1998
0.74
0.76
0.82
0.91
1998
1997
1983
1994
0.70
0.81
1.16
1.20
1969
1979
1972
1978
0.55
0.84
1.06
1.26
1977
2002
1994
1964
0.61
0.78
0.88
0.88
1982
1961
1991
1979
0.69
1.25
1.28
1.38
1985
1980
1970
1997
0.70
0.90
0.97
1.06
1999
1970
1988
1998
0.56
0.56
0.62
0.64
1994
1977
1996
1963
0.90
1.03
1.12
1.12
1974
1979
1985
1.41
1.42
1.46
2002
1.40
1.09
1.30
1.94
1971
1990
1978
0.98
1.19
1.35
1994
1963
1978
1.33
1.39
1.41
1963
1986
1996
1.20
1.31
1.44
1966
1967
1981
1.46
1.47
1.49
1998
1988
1963
0.91
1.26
1.49
1.42
1.62
1.68
1962
1990
1978
2002
1981
1985
1990
1997
1.47
1.80
1962
1988
1974
1.31
1.32
1.38
1989
1967
1977
0.74
0.89
1.49
1987
1985
1991
1.17
1.28
1.45
1968
1.47
1967
1.98
2002
2.62
1989
1.40
1968
1.66
1979
1.50
1991
1.69
1970
1.58
1969
1.85
1978
1.52
2002
1.76
1981
1.62
1977
1967
1.58
2.55
1968
1982
2.27
2.62
1981
1968
2.85
3.15
2002
1.49
1968
4.18
1982
1986
1.74
1.87
1982
1968
2.24
2.35
1990
1968
2.07
3.04
1967
1981
2.24
2.46
1981
1968
1.99
2.11
1968
1963
1.71
2.15
1981
1990
2.09
4.15
1962
1988
2.12
2.12
Example of a Seasonal Forecast
Given in terms of Probabilities
Making Decisions using
Probabilities (Example)
• Suppose that a farmer plants two main crops,
Sorghum (S) or Maize (M). The yield response of
Crop S with increase in rainfall is rather small but it
does best if rainfall in the growing season is
greater than 500 mm. During such seasons, the
farmer can expect to earn about $120 per hectare.
However, if rainfall is between 400 and 500 mm,
the farmer can expect to earn up to about $100 per
hectare. For seasons with rainfall between 250 and
400 mm, the crop yield will be smaller and the
farmer can only earn about $60 per hectare from
Cont..
• On the other hand, Maize (Crop M), responds
extremely well with increase in rainfall. For seasons
with rainfall above 500mm, the farmer can earn
about $200 per hectare. However, if rainfall is
below 400 mm, there is a big chance of a complete
crop failure and his investments in seed and
fertilizer results in a $15 loss per hectare. The table
below gives a summary of the various rainfall
events and their corresponding probabilities
estimated from a 10 year historic records. Also
shown in the table are the likely earnings per
Cont..
Total Rainfall in Growing Season
Historic Probability
Expected years in 10
Expected Crop S Earnings per Hectare ($)
Expected Crop M Earnings per Hectare ($)
More than 500
mm
250 – 400 mm
400 – 500 mm
30%
40%
30%
3
4
3
60
100
120
-15
120
200
(Source: IRI Exercise Website )
Cont..
• If the farmer plants crop S, then over a 10 year
period he can expect to earn $60 in each of the 3
below normal seasons, $100 in each of 4 near
normal seasons and $120 in the 3 above normal
seasons. The total earning over a ten-year period
is will then be $940, which averages to $94 earned
per season per hectare planted. Similarly, if the
farmer plants crop M, then over a 10 year period
he can expect to lose the money invested in seed
and fertilizer in 3 of the 10 years, but will earn
$1080 in the other years. In the end, this strategy
Cont..
• It can be seen from this illustration that the
average earnings over ten years will be
somewhat higher with maize, but there are
advantages and disadvantages to each crop.
There is much less risk in planting sorghum,
since one is assured of having something to
harvest every year. On the other hand, many
farmers would prefer planting maize due to
ease of harvesting and post-harvest handling
and good taste. Various strategies can be
employed in balancing the risks and returns