Deterministic and Probabilistic prediction approaches in Seasonal to Inter-annual climate forecasting Christopher Oludhe Department of Meteorology University of Nairobi P.
Download ReportTranscript Deterministic and Probabilistic prediction approaches in Seasonal to Inter-annual climate forecasting Christopher Oludhe Department of Meteorology University of Nairobi P.
Deterministic and Probabilistic prediction approaches in Seasonal to Inter-annual climate forecasting Christopher Oludhe Department of Meteorology University of Nairobi P. O. Box 30197, Nairobi KENYA Email: [email protected] RA 1 EXPERT MEETING ON THE APPLICATION OF CLIMATE FORECASTS FOR AGRICULTURE Banjul, Gambia, 9-13 December 2002 Introduction Extreme weather and climate events are known to have major negative impacts on various sectors of the economy in many countries of Africa Advances in the science of weather and climate prediction and more so, seasonal to interannual prediction has made it possible to predict climate with improved accuracy in a time-spans ranging from seasons to over a year in advance. Introduction Cont.. Such knowledge can be used to minimise destruction of property, loss of life, enhance food and agricultural production as well as provide critical information for required for decision-making. The objective of this presentation is to highlight some of the approaches to deterministic and probabilistic seasonal to interannual climate prediction applicable to Africa. Some Basic Ideas on Probability and Statistics In dealing with climatic data, one usually handles only a small proportion of all the possible values of interest referred to as a sample whereas the entire dataset would constitute a population. Statistics is a tool that allows the sample data to be analysed and make inferences (decisions) concerning the entire population. Cont.. Some of the useful statistical parameters that can be obtained from sample data include Mean and standard deviation among others and these can be used to describe the entire range of possible values. Partitioning ranked dataset into various categories, e.g. Terciles, Quartiles and Percentiles can provide valuable information on the range of values in a distribution. Understanding probabilities The probability of an event occurring may be defined as the ratio of the number of possible outcomes in an event to the total number of outcomes in the sample space. Num berof possibleoutcom es Pr obability Total num berof cases An Event in this case could be ‘Rain tomorrow’, ‘3 or more cyclones next year’ etc. Some Basic Rules of Probability For any event A, the probabilities lie between 0 and 1, i.e. 0 P( A) 1 Zero probability implies that the event is unlikely to occur or impossible while a probability of 1 implies that the event is certain. The compliment of an event A (i.e. NOT A) is the event that happens exactly when A does notP(occur. A) 1 P( A) Cont.. Intermediate probabilities between 0 and 1 can also be stated. For instance, if a forecaster states that the probability of rain tomorrow is 0.25 (also expressed as 1/4, or as 25%), it implies that it is 3 times as likely not to rain as it is to rain. Conditional probability is the probability of an event occurring given that another event has occurred. Differences between Deterministic and Probabilistic Forecasts Forecast can be presented either as deterministic or probabilistic. Short-term forecasts (Nowcasts) are almost entirely deterministic in the sense that they state exactly what is to happen, when and where. Examples of deterministic forecasts may be given by statements such as (i) Rainfall will be above average this season (ii) Rainfall will be 50% above average this season. Cont.. Probabilistic forecasts on the other hand are forecasts that give the probability of an event of a certain range of magnitudes occurring in a specific region in a particular time period. An example would be: There is a 70% chance that rainfall will be above average in the coming season. This implies that in many past occasions (using historical information), the calculations has led to an estimate of 70% probability of the observed rainfall actually being above average Cont.. In general, longer times scale forecasts such as seasonal to interannual are mostly probabilistic and the forecasts are usually given in probabilistic ranges for the season. It is however possible to generate probabilistic forecasts from deterministic ones by assigning probabilities to such Approaches to Developing Probabilistic Seasonal Forecasts Seasonal climate forecasting procedures usually start by examining historical climate records (Rainfall or Temperature), or a climatological database. The database should be long enough, complete and of good quality. A standard 30-year period, such as Cont.. Various statistical analyses can then be carried out on the historical data and some relationships determined between potential predictors (e.g., ENSO and associated teleconnections) and the predictand (Rainfall or Temperature). The common predictor choices are usually lagged SST anomalies over the global oceans that are considered pertinent to Cont.. The statistical associations between lagged SST indices and the climate variables can be found by performing simple correlation analysis between these variables. Once these have been established, what follows would be to develop either simple linear regression or multiple linear regression (deterministic) models relating the predictors with the predictands and use the model for prediction. Cont.. Several other potential predictors such as QBO, SOI and SST gradients among others can be included in the development of the regression model. Stepwise regression technique (forward/backward) can be applied in selecting the best predictors that are to be included in the multiple linear regression equation. Stepwise Regression analysis Forward Selection: In this procedure, only the best potential predictors that improves the model the most, are examined individually and added into the model equation, starting with the one that explains the highest variance, etc. Backward Elimination: The regression model starts with all potential predictors and at each step of model construction, the least important predictor is removed until only the best predictors remain. A stopping criteria should be selected in both cases. Use of the Linear Regression Fit A linear regression model may depict positive or negative association between the predictors and the predictand. Using this type of relationship, it is possible to make qualitative statements regarding the expected value of the predictand for the coming season(s) if knowledge of the seasonal lags of the predictor indices can be obtained in well in advance. Goodness of fit measure The goodness of “fit” of a linear regression model can be determined by examining the mean-squared error (MSE) in the ANOVA table output. This measure indicates the variability of the observed values around the forecast regression line. A perfect linear relationship between the predictor and predictand gives an MSE of zero, while poor fits results in large values of MSE. Another measure of the fit of a regression is the coefficient of determination (R2) which is, the squared value of the Pearson correlation coefficient between predictor and predictand. Cont.. Qualitatively, R2 can be interpreted as the proportion of the variance of the predictand that is described or accounted for by the regression. For a perfect regression, the R2 = 1, while for R2 close to 0 indicates that very little of the variance is being explained by the regression line. Examples of model fits Observed (VOI) FCST 2002 4.0 3.0 1.0 -1.0 -2.0 Z7 = -0.04 + 0.30*NPA2M1 - 0.68*SPA1M1 - 0.49*SPA4M1 - 0.53*ZAFM11 -3.0 -4.0 Year R2 = 64% 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 1981 1979 1977 1975 1973 1971 1969 1967 1965 1963 0.0 1961 Stan d ar d ized Rain fall An o m alies 2.0 Examples cont.. O b s e rve d (K E R UG O YA ) F c s t2 0 0 2 V 2 4.0 2.0 1.0 -2.0 Z8 = 0.04 - 0.46SIN3M1 + 0.83NPA1M12 - 0.59NPA1M1 + 0.37MIB3M1 -3.0 -4.0 Ye a r R2 = 61% 2001 1999 1997 1995 1993 1991 1989 1987 1985 198 3 1981 1979 1977 1975 1973 1971 1969 1967 1965 -1.0 1963 0.0 1961 S ta nda rdiz e d Ra infa ll Anom a lie s 3.0 Examples cont.. O b s e rve d (G A R B A T U L L A ) Fcst 2002 3 .0 1 .0 2001 1999 1997 1995 1993 1991 1989 1987 1985 1983 1 981 1979 1977 1975 1973 1971 1969 1967 1965 1963 0 .0 1961 S ta n d a r d iz e d R a in fa ll An o m a lie s 2 .0 -1 .0 -2 .0 Z9 = -0.07 - 0.40*NATM12 + 0.36*NPA3M12 + 0.47*SPA2M1 - 0.41*SAT2M1 -3 .0 Y e ar R2 = 49% Verification of Forecast Skills A number of quantities can be computed as a means of verifying forecast skills. These include: • Accuracy: Is a general term indicating the level of agreement between the Predicted value and the Observed value. The Error is the difference between the two values and the smaller the error the greater the accuracy Cont.. • Skill: This measures the accuracy of a given forecast relative to the accuracy of forecasts produced by some standard procedure. Skill scores provide a means of accounting for variations in accuracy that have nothing to do with the forecaster's ability to forecast. Cont.. • Reliability or bias: This may be defined as the average agreement between the forecast value of an element and the observed value. A positive bias indicates that the forecast value exceeds the observed value on the average, while a negative bias corresponds to under forecasting the observed value on average. • Sharpness is the tendency to forecast extreme values Computing Skill Scores Forecast Category Total Above-Normal Near-Normal Below -Normal AboveNormal A11 A12 A13 J NearNormal A21 A22 A23 K BelowNormal A31 A32 A33 L M N Total O T Hit Score (HS) Hit Score (HS): Number of times a correct category is forecasted A11 + A22 + A33 PercentCorrect = * 100 T Post Agreement Post agreement is the number of correct forecasts made divided by the number of forecasts for each category.These are: A11/M, A22/N, False Alarm Ratio (FAR) False Alarm Ratio (FAR): The fraction of forecast events that failed to materialize Best FAR=0; worst FAR=1 For Above-Normal=(A21 + A31)/(M) For Near-Normal=(A12 + A32)/(N) For Below-Normal=(A13 + A23)/(O) FAR = 1 – Post Agreement of the extreme event Probability of Detection (POD) This is the number of correct forecasts divided by the number observed in each category. It is a measure of the ability to correctly forecast a certain category, and is sometimes referred to as "Hit rate" especially when applied to severe weather verification. POD for the three different categories are: A11/J, A22/K, Bias Bias: Comparison of the average forecast with the average observation Bias > 1 : over-forecasting Bias < 1 : under-forecasting For Above-Normal=(M)/(J) For Near-Normal=(N)/(K) For Below-Normal=(O)/(L) Heidke Skill Score This is given by where chance, R-E SS = T-E R = Total number of correct forecasts, T = Total number of forecasts, E = number expected to be correct based on persistence, climatology JM + KN + LO ( A11 A22 A33 ) T HSS = JM + KN + LO TT Computational Example Zone 8 (Kerugoya) Forecasts OBS. Dry Normal Wet Total Dry 3 0 0 3 Normal 1 2 1 4 Wet 1 1 2 4 Total 5 3 3 11 Percent Correct = 64% Dry Probability of Detection Post Agreement 100 False Alarm (1st Order) Hit Skill score (HSS) = 20 0.46 Normal 50 60 Wet 50 67 67 0 Presenting Forecasts in terms of Probabilities • A standard format for presenting seasonal forecasts is by assigning percentage probabilities into what are known as terciles. • Terciles basically consist of three ranges of values that are used to represent three broad sectors of a normal probability distribution with equal chances of occurrence, climatologically, namely the lower, middle, and upper thirds of the expected distribution of values. Cont.. • For example, a typical seasonal forecast may be presented as (45, 30, 25) and this can be interpreted as a 45% chance of seasonal total precipitation being in the upper (Wet) tercile, a 30% chance of it being in the middle or (Near normal) category, and a 25% chance that it may fall in the lowest third (Dry) category. • Seasonal forecasts presented in this way not only indicate the most likely outcome for the upcoming months or seasons, but also the distribution of possible outcomes. Such forecasts can take all the Example of tercile groupings Year 1982 1983 2000 1984 1990 1995 1998 1976 1964 Zone1 -1.51 -1.50 -1.46 -1.11 -0.99 -0.98 -0.98 -0.98 -0.95 Year 1973 2000 1984 1965 1994 1999 1979 1995 1992 Zone2 Year Zone3 Year -1.68 1980 -1.39 1993 -1.49 2000 -1.28 1972 -1.40 1973 -1.15 1980 -1.33 1983 -1.06 1984 -1.05 1976 -0.98 1983 -1.02 1999 -0.83 1992 -0.70 1994 -0.83 2000 -0.68 1986 -0.74 1996 -0.62 1963 -0.60 1965 Zone4 -1.34 -1.26 -1.19 -1.17 -0.98 -0.96 -0.92 -0.84 -0.74 Year 1980 1961 1988 1974 1969 1965 1977 1992 1993 Zone5 -2.15 -1.86 -1.40 -1.33 -1.31 -1.28 -1.14 -1.09 -0.83 Year 1961 1969 1977 1971 1974 1965 1964 1970 1985 Zone6 -1.84 -1.78 -1.39 -1.20 -1.19 -0.82 -0.80 -0.79 -0.78 Year 2000 1984 1975 1993 1985 2001 1980 1976 1961 Zone7 Year Zone8 Year -1.14 2000 -1.97 2000 -1.12 1984 -1.85 1984 -1.06 1961 -1.42 1993 -1.06 1972 -1.18 1976 -1.05 1969 -1.10 1972 -0.99 1974 -1.05 1983 -0.92 1965 -1.01 1967 -0.79 1980 -1.00 1963 -0.75 1987 -0.97 1994 Zone9 Year Zone10 Year Zone11 Year Zone12 -1.63 1973 -1.79 1993 -1.51 2000 -2.36 -1.56 1984 -1.72 1984 -1.32 1993 -1.45 -1.41 2000 -1.46 2000 -1.31 1971 -1.33 -1.40 1976 -1.28 1986 -1.04 1979 -1.14 -1.27 1969 -1.13 1976 -0.96 1980 -1.05 -1.10 1993 -1.02 1983 -0.92 1961 -0.89 -1.03 1992 -0.97 1991 -0.91 1973 -0.87 -0.87 1961 -0.95 1995 -0.86 1997 -0.76 -0.74 1972 -0.94 2001 -0.84 1969 -0.75 2001 1992 1961 1963 -0.90 -0.88 -0.87 -0.81 1971 1972 1969 1961 -0.61 -0.59 -0.57 -0.50 1974 1991 1965 1992 -0.59 -0.59 -0.59 -0.58 1973 1985 2001 1994 -0.70 -0.64 -0.63 -0.50 1995 1989 2001 1987 -0.76 -0.69 -0.46 -0.40 1993 1976 1980 1962 -0.76 -0.65 -0.65 -0.63 1977 1983 1963 1971 -0.72 -0.71 -0.63 -0.62 1976 1996 1997 1973 -0.75 -0.70 -0.65 -0.57 1962 1999 1971 1970 -0.55 -0.50 -0.48 -0.42 2001 1965 1983 1986 -0.91 -0.90 -0.83 -0.80 1980 1965 1972 1992 -0.68 -0.59 -0.47 -0.43 2002 -0.70 1964 1978 1992 -0.69 -0.67 -0.63 1972 1981 1971 1973 1999 -0.75 -0.58 -0.52 -0.51 -0.26 1980 1976 1966 1993 1991 -0.49 -0.48 -0.47 -0.37 -0.34 1987 1984 2001 1998 1982 -0.57 -0.52 -0.43 -0.41 -0.41 1976 1970 1963 1987 1964 -0.47 -0.47 -0.43 -0.41 -0.24 1972 1990 1976 1975 1967 -0.40 -0.33 -0.31 -0.29 -0.26 1990 2001 1992 2000 1989 -0.58 -0.51 -0.39 -0.32 -0.29 1989 1962 1995 1965 1986 -0.62 -0.57 -0.57 -0.54 -0.49 1989 1983 2001 1975 1971 -0.45 -0.36 -0.27 -0.18 -0.16 1964 1973 1980 1998 1977 -0.41 -0.41 -0.41 -0.37 -0.34 1996 1975 1982 1991 1971 -0.70 -0.66 -0.55 -0.20 -0.19 1982 1979 1996 1973 1994 -0.38 -0.34 -0.29 -0.27 -0.19 1982 1972 1974 1983 1984 -0.61 -0.60 -0.53 -0.42 -0.42 1994 1996 1993 1969 -0.25 -0.23 -0.10 -0.03 1975 1996 2001 1990 -0.26 -0.20 -0.17 -0.17 1966 1988 1989 1979 -0.34 -0.33 -0.32 -0.27 1974 1977 1999 1966 -0.21 -0.17 -0.12 -0.11 2002 -0.19 2002 -0.28 1981 1971 1962 -0.10 -0.07 -0.07 1981 1975 1984 -0.26 -0.25 -0.21 1964 1973 1988 1999 -0.42 -0.40 -0.35 -0.35 1978 1999 1990 1993 -0.16 -0.15 -0.02 0.13 1965 1986 1966 1985 -0.30 -0.29 -0.25 -0.19 1981 1989 1966 1995 -0.18 -0.06 0.03 0.17 1966 1961 1964 1969 -0.16 -0.14 -0.11 -0.09 1990 1976 1966 1965 -0.36 -0.23 -0.22 -0.18 1991 1988 1970 1980 1987 0.03 0.06 0.27 0.30 0.48 1978 1964 1986 1962 1997 -0.11 -0.03 0.00 0.00 0.04 1961 1967 1996 1970 1972 -0.24 -0.16 -0.07 -0.07 0.09 1961 1969 1967 1982 1991 -0.04 -0.03 0.00 0.02 0.16 2000 1985 1966 1984 1973 -0.05 0.05 0.24 0.30 0.40 1967 1988 1987 1973 1972 -0.19 -0.16 0.01 0.05 0.20 2002 -0.26 1982 1992 1996 1997 -0.24 -0.23 0.04 0.06 1962 1992 1979 1986 1985 0.16 0.17 0.18 0.20 0.35 1989 1996 1974 1988 1995 -0.16 -0.05 0.08 0.10 0.19 1994 1998 1999 1967 2002 0.17 0.17 0.38 0.43 0.49 1975 1978 1968 1974 1963 -0.05 -0.01 0.05 0.14 0.15 1999 1975 1989 1998 1995 -0.09 -0.06 -0.06 0.02 0.04 1978 1986 1997 1989 1975 0.56 0.57 0.62 0.64 0.72 1983 1998 1988 1989 1974 0.23 0.29 0.37 0.37 0.39 1985 1975 1997 1995 1971 0.09 0.39 0.49 0.56 0.61 1986 1962 1995 1988 1998 0.23 0.29 0.36 0.44 0.56 1996 1983 1964 1999 1991 0.45 0.52 0.62 0.68 0.68 1999 1991 1995 1978 1966 0.21 0.36 0.36 0.66 0.67 1987 1974 1998 1994 1970 0.11 0.17 0.30 0.36 0.38 1995 1966 1968 1982 1991 0.43 0.45 0.47 0.55 0.59 1987 1992 1978 2001 1975 0.22 0.24 0.41 0.46 0.63 1964 1990 1979 1987 1977 0.54 0.56 0.57 0.57 0.66 1985 1987 1997 1971 1962 0.19 0.32 0.36 0.38 0.55 2001 1986 1967 1968 1970 0.15 0.37 0.44 0.70 0.72 2002 0.82 1966 1962 1965 0.89 1.03 1.09 1970 1987 1977 1963 0.46 0.71 1.04 1.24 1977 1993 1969 1964 0.64 0.64 0.66 0.86 1981 1975 1979 1997 0.73 0.83 0.86 0.96 1997 1979 1970 1998 0.74 0.76 0.82 0.91 1998 1997 1983 1994 0.70 0.81 1.16 1.20 1969 1979 1972 1978 0.55 0.84 1.06 1.26 1977 2002 1994 1964 0.61 0.78 0.88 0.88 1982 1961 1991 1979 0.69 1.25 1.28 1.38 1985 1980 1970 1997 0.70 0.90 0.97 1.06 1999 1970 1988 1998 0.56 0.56 0.62 0.64 1994 1977 1996 1963 0.90 1.03 1.12 1.12 1974 1979 1985 1.41 1.42 1.46 2002 1.40 1.09 1.30 1.94 1971 1990 1978 0.98 1.19 1.35 1994 1963 1978 1.33 1.39 1.41 1963 1986 1996 1.20 1.31 1.44 1966 1967 1981 1.46 1.47 1.49 1998 1988 1963 0.91 1.26 1.49 1.42 1.62 1.68 1962 1990 1978 2002 1981 1985 1990 1997 1.47 1.80 1962 1988 1974 1.31 1.32 1.38 1989 1967 1977 0.74 0.89 1.49 1987 1985 1991 1.17 1.28 1.45 1968 1.47 1967 1.98 2002 2.62 1989 1.40 1968 1.66 1979 1.50 1991 1.69 1970 1.58 1969 1.85 1978 1.52 2002 1.76 1981 1.62 1977 1967 1.58 2.55 1968 1982 2.27 2.62 1981 1968 2.85 3.15 2002 1.49 1968 4.18 1982 1986 1.74 1.87 1982 1968 2.24 2.35 1990 1968 2.07 3.04 1967 1981 2.24 2.46 1981 1968 1.99 2.11 1968 1963 1.71 2.15 1981 1990 2.09 4.15 1962 1988 2.12 2.12 Example of a Seasonal Forecast Given in terms of Probabilities Making Decisions using Probabilities (Example) • Suppose that a farmer plants two main crops, Sorghum (S) or Maize (M). The yield response of Crop S with increase in rainfall is rather small but it does best if rainfall in the growing season is greater than 500 mm. During such seasons, the farmer can expect to earn about $120 per hectare. However, if rainfall is between 400 and 500 mm, the farmer can expect to earn up to about $100 per hectare. For seasons with rainfall between 250 and 400 mm, the crop yield will be smaller and the farmer can only earn about $60 per hectare from Cont.. • On the other hand, Maize (Crop M), responds extremely well with increase in rainfall. For seasons with rainfall above 500mm, the farmer can earn about $200 per hectare. However, if rainfall is below 400 mm, there is a big chance of a complete crop failure and his investments in seed and fertilizer results in a $15 loss per hectare. The table below gives a summary of the various rainfall events and their corresponding probabilities estimated from a 10 year historic records. Also shown in the table are the likely earnings per Cont.. Total Rainfall in Growing Season Historic Probability Expected years in 10 Expected Crop S Earnings per Hectare ($) Expected Crop M Earnings per Hectare ($) More than 500 mm 250 – 400 mm 400 – 500 mm 30% 40% 30% 3 4 3 60 100 120 -15 120 200 (Source: IRI Exercise Website ) Cont.. • If the farmer plants crop S, then over a 10 year period he can expect to earn $60 in each of the 3 below normal seasons, $100 in each of 4 near normal seasons and $120 in the 3 above normal seasons. The total earning over a ten-year period is will then be $940, which averages to $94 earned per season per hectare planted. Similarly, if the farmer plants crop M, then over a 10 year period he can expect to lose the money invested in seed and fertilizer in 3 of the 10 years, but will earn $1080 in the other years. In the end, this strategy Cont.. • It can be seen from this illustration that the average earnings over ten years will be somewhat higher with maize, but there are advantages and disadvantages to each crop. There is much less risk in planting sorghum, since one is assured of having something to harvest every year. On the other hand, many farmers would prefer planting maize due to ease of harvesting and post-harvest handling and good taste. Various strategies can be employed in balancing the risks and returns