STATISTICAL METHODS TO DERIVE EXTREMES: BASICS, APPLICATIONS E.I.KHLEBNIKOVA VOEIKOV MAIN GEOPHYSICAL OBSERVATORY SAINT-PETERSBURG, RUSSIA Contact details: Dr.

Download Report

Transcript STATISTICAL METHODS TO DERIVE EXTREMES: BASICS, APPLICATIONS E.I.KHLEBNIKOVA VOEIKOV MAIN GEOPHYSICAL OBSERVATORY SAINT-PETERSBURG, RUSSIA Contact details: Dr.

STATISTICAL METHODS TO DERIVE
EXTREMES: BASICS, APPLICATIONS
E.I.KHLEBNIKOVA
VOEIKOV MAIN GEOPHYSICAL OBSERVATORY
SAINT-PETERSBURG, RUSSIA
Contact details:
Dr. Elena I. Khlebnikova
Tel: +7 812 2472102
Fax: +7 812 2478661
e-mail: [email protected]
P
le f t t a il
m ean
r ig h t t a il
Extremes: the events connected with the behavior
of a meteorological variable at the tails
of distribution
Generalizations: a) meteorological field,
b) complex of variables
OUTLINE
Introduction
1. The Generalized Extreme Value (GEV) distribution.
1.1. Basics.
1.2. Methods of parameter estimation.
1.3. Drawbacks and extensions.
1.4. Choices at the application stage. Examples.
2. Threshold exceedances approach to derive extremes.
2.1. Basics.
2.3. The Generalized Pareto Distribution (GPD).
2.4. Example.
3. Climate change and extremes.
4. Conclusions.
CLASSIC EXTREME VALUE ANALYSIS
Basic set-up:
X ( m )  max{V1 , V2 ,..., Vm }
V1 , V2 ,..., i.i.d ., F ( x )  P{Vi  x}
P{ X ( m )  x}  [ F ( x)]m for any m
When m  , one may use
the Three Types Theorem:
 X ( m )  un

P
 x   F (bn x  u n )  G ( x )
 bn

G(x) belongs to one of three types
Type 1 (Fisher-Tippet 1, Gumbel):
G1 ( x)  exp( e  x ),
Type 2 (Fréchet):
0,
G2 ( x )  

exp(

x
),

Type 3 (Weibull):
exp(  | x | ),
G3 ( x )  
1,
  x  
(1)
x0
x  0,   0
(2)
x0
x  0,
 0
(3)
GENERALIZED EXTREME VALUE (GEV) DISTRIBUTION
exp( (1  k ( x   ) /  )1/ k ), k  0
G ( x)  
exp(  exp( ( x   ) /  )), k  0


- location parameter
- scale parameter
k  0 - "long-tailed" case,
k  0 - "exponental" tail,
k 0
- "short-tailed" case, finite endpoint
(1)
DOMAINS OF ATTRACTION OF EV DISTRIBUTIONS
Type I (Gumbel)
1
F ( x)   ( x) 
2
a) normal distribution
x
e
 y2 / 2
dy

b) monotonic transformation of normal distribution
(log-normal and others)
F ( x)  exp( e  x )
c) Gumbel distribution
d) exponential distribution
F ( x)  1  e  x
Type II (Fréchet)
a) Pareto distribution
b) Fréchet distribution

F ( x)  1  ax ,
Type III (Weibull)
F ( x )  x,
a) uniform distribution
b) truncated exponential distribution
c) Weibull distribution
  0, a  0, x  a1/ 
0  x 1
CHOICE OF DISTRIBUTION TYPE
1. Theoretical consideration of distribution functions
 Type l (Gumbel)
g (t )  0, such that
(1  F (t  xg (t ))) /(1  F (t ))  exp(  x) ,
xF  
t  xF
 Type ll (Fréchet)
(1  F (tx)) /(1  F (t ))  x ,
  0,
t 
 Type lll (Weibull)
(1  F ( xF  xh )) /(1  F ( xF  h))  x ,
h0
2. Graphical method
3. Hypothesis testing
xF  
  0,
xF  
TYPES OF ASYMPTOTIC EXTREMAL
DISTRIBUTION FUNCTION
6
Gumbel , K=0
Weibull, k=+0.3
Frechet, k=-0.3
5
4
-ln(-ln(P)
3
2
1
0
-1
-2
-2
-1
0
1
2
X
3
4
5
6
ESTIMATORS OF PLOTTING POSITIONS
FOR EMPIRICAL PROBABILITIES
pj
pj
pj
pj
 ( j  0.5) / n
 j /(n  1)
 ( j  0.3) /(n  0.4)
 ( j  c) /(n  1  2c)
The formula (1) is the best in many respects
(1)
(2)
(3)
(4)
HYPOTHESIS TESTING (ACCORDING TO GUMBEL)
X 1  X 2  ...  X n
X * - median
H 0 : X i  G1
D  ln[( X n  X * ) /( X *  X 1 )]
(1)
D  N ( D , D )
n 
 D  ln{ ln( n)  ln 1[ ln 2  ln 1 (1  0.51/ n )]}
(2)
 D  [0.861 ln( n)  0.490]1
(3)
if
D  D
D
 1.96
H 0 is rejected at 5% significance level
METHODS OF PARAMETER ESTIMATION
GUMBEL DISTRIBUTION (k =0)
1. Linear approximation of empirical probabilities
in double-logarithmic scale (graphical method)
X 1  X 2  ...  X n - the ordered sample of maxima
p j - empirical probabilities
( X j , Y j )  Gumbel plot
2. Method of moments
Method of empirical reduced moments:
X 1  X 2  ...  X n , p j - empirical probabilities
Y j   ln(  ln( p j ))
~ , ~
m
x
x - mean value and standard deviation of X,
~ , ~ - mean value and standard deviation of Y
m
y
y
  ~ x / ~ y
~ m
~  ~ / ~
 m
x
y
x
y
k, , 
CONTINUED
3. MAXIMUM LIKELIHOOD METHOD
ML - estimate of θ maximizes likelihood function
L ( X 1 ,..., X n )  ln P ( X 1 ,..., X n )
considered under fixed X 1 ,..., X n as function of θ
ML - equations for Gumbel distribution
1 n

   ln   exp(  X j /  )  ,
 n j 1

n

     X j exp(  X j /  ) 
 j 1

(1)
n

exp(

X
/

)
j


 j 1

(2)
CONTINUED
GEV-DISTRIBUTION
1. Method of moments
Probability weighted moments
k  0,1,2,...
bk  E x( F ( x)) k


(1)
Unbiased estimates ofbk (k=0,1,2):
1
b0 
n
bk 
1
n
n
X
j
,
j 1
n
( j  1)( j  2)...( j  k )
Xj
(
n

1
)(
n

2
)...(
n

k
)
j  k 1

(2)
Estimates of parameters
2 for GEV-distribution
k  7.8590 c  2.9554 c
where
с  ( 2b1  b0 ) (3b2  b0 )  ln 2 / ln 3
  ( 2b1  b0 ) k ( (1  k )(1  2 k )) ,
  b0   ( (1  k )  1) / k
2. Maximum likelihood method
(3)
DESIGN VALUES AND THEIR ACCURACY
T - return period
P{ X ( m )  X T }  1 
1
T
(1)

1
X T    [1  ( ln(1  )) k ]
k
T
1
X T     ln[  ln(1  )]
T
k 0
k 0
(2)
(3)
Standard errors (Abild et al., 1992) for k=0
 X~  ( 2 (0.608z 2  0.514z  1.109) / n)1/ 2 ,
T
1
where z   ln(  ln(1  ))
T
(4)
GEV-MODEL. COMMON PRACTICE
1. Receiving the ordered sample of extremes.
2. Calculation of plotting positions.
3. Gumbel probability plotting.
4. Hypothesis testing.
5. Estimation of initial values for parameters (by the method of
moments).
6. Estimation of parameters by the maximum-likelihood method
(for 2- and 3-parameter distribution).
7. Derivation of design-values.
8. Risk analysis and decision-making.
-ln(-ln(p))
MAXIMUM DISTRIBUTIONS
FOR WIND SPEED SERIES (ST.PETERSBURG)
9
8
7
6
5
4
3
2
1
0
-1
-2
-3
annual
monthly
2
4
6
8
10
12
14
16
18
20
22
wind speed, m/s
Distinctions: a) time scale in use,
b) location parameter
(theory for normal process:
 year   month    ln(12))
ANNUAL MINIMUM TEMPERATURE OF
5-DAY PERIOD (ST. - PETERSBURG)
6
5
1936-65
1966-95
4
-ln(-ln(p))
3
2
1
0
-1
-2
-34 -32 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10 -8
temperature
ANNUAL MAXIMUM TEMPERATURE
(ST. - PETERSBURG)
5
1936-65
1966-95
4
-ln)-ln(P)
3
2
1
0
-1
-2
26
27
28
29
30
31
temperature
32
33
34
EXTENDING THE CLASSICAL METHOD
X n ,1  X n , 2  ...  X n ,r - r - largest order statistics of i.i.d. sample of size n
Yj  ( X j   ) /
f (Y1 ,..., Yr )  
r
1/ k
exp[ (1  kYr )
f (Y1 ,..., Yr )   r exp[  exp( Yr ) 
r
1
 (1  ) ln(1  kY j )]
k j 1

r
Y ]
j
j 1
 ,
can be computed by ML-method
k 0
k 0
(1)
(2)
DRAWBACKS (DEPENDENCE, SEASONALITY)
Dependence:
theory may be applied
provided adequate adjustment of parameters.
Theoretical parameters for stationary normal process:
bT  ( 2 ln( T )) 1/ 2



1/ 2
u

(
2
ln(

T
))

 T
( 2 ln( T ))1/ 2

- scale parameter
- location parameter
ν - mean number of upcrossins over zero-level
per time unit
1/ν -undimensional time scale
Seasonality
- partitioning the data by season,
- seasonal adjustment
THE INFLUENCE OF NON-STATIONARITY
ON EXTREME DISTRIBUTIONS
7
6
5
annual maximum
monthly maxima
-ln(-ln(p))
4
3
2
1
0
-1
-2
-3
20.0
22.5
25.0
27.5
30.0
temperature
32.5
35.0
37.5
EXTREMES AND THE SCALE OF REGION
5,0
NORTHERN HEMISPHERE
4,5
0
ZONE 35-80 N
4,0
C 3,5
3,0
P[maxX<=C,X] = 1-
FIXED POINT
2,5
0,01
-LOG()
1E-3
Standardized threshold of non-exceeding for monthly mean air
temperature. January, Northern Hemisphere.
According to Khlebnikova, 1987.
excess
THRESHOLD EXCEEDANCES
APPROACH
u
T
Let X  F , xF - upper endpoint
Px  u  y / x  u 
Theorem:
F (u  y )  F (u )
 Fu ( y )
1  F (u )
1/ k
y

Fu ( y )  H ( y,  u , k ) , where H ( y,  , k )  1  1  k  - GPD


u  xF
GENERALIZED PARETO DISTRIBUTION
1  (1  kx /  )1 / k ,
k0
H ( x,  , k )  
k 0
1  exp(  x /  ),
(1)
  0 - scale parameter,
k - shape parameter
k  0:
k  0:
0 x
0  x  /k
EX   /(1  k ),
k  1
VarX   /((1  k ) (1  2k )),
2
2
(2)
1
k 
2
(3)
ESTIMATION OF GPD PARAMETERS
a)
Linear approximation based on mean excess function
E[ X  u / X  u ] 
b)

1 k

k
u
1 k
(1)
Method of moments
Y  X u
_
Y , s - sample mean and standard
2
2
_
_
_





1 
1 

k    Y s   1;   Y   Y s   1
2 
2 





c)
Method of Probability-Weighted Moments
k  b0 /( 2b1  b0 )  2 ;   (1  k )b0 ;
where
d)
(2)
b0 ,b1
- probability-weighted moments
Maximum likelihood method
(3)
POISSON - GPD MODEL
(combination of properties of exceedances)
1. The number of exceedances N of the level u has a Poisson distribution
with mean λ.
2. The excess values y1 ,..., yn over threshold u - from GPD.
Estimation of design valuesX T
XT  u 

1  ( T )  ,
k

X T  u   ln( T ) ,
k
k 0
k 0
EXAMPLE
Data:
the set of daily mean temperature in
Arkhangelsk (650N, 400E), 1936-2001, January.
Purpose:
to derive design values for minimum
Methods in use:
1) based on GEV distributions for monthly minimum,
2) based on GPD distribution for excesses
APPROXIMATION BY GEV METHOD
5
4
empiric
Weibull, k=0.34
3
2
1
0
-1
-2
-40
-35
-30
-25
-20
-15
-10
Monthly minimum daily mean air temperature.
January. Arkhangelsk
GPD MODEL FOR EXCEEDANCES
-36
-34
6
-32
-30
-28
-26
-24
-22
-20
NUMBER OF EXCEEDANCES
mean excess
5
4
1
MEAN EXCESS
3
2
1
-18
0
-20
-22
-24
-26
-28
level
-30
-32
-34
-36
-38
number of exceedances
2
ESTIMATES OF QUANTILES
BASED ON DIFFERENT THRESHOLDS
-41
T=10
T=20
T=40
T=100
-40
Quantile XT
-39
-38
-37
-36
-35
-20
-22
-24
-26
-28
-30
Threshold
-32
-34
-36
COMPARATIVE ESTIMATES OF QUANTILES
USING ANNUAL MINIMA AND
GPD DISTRIBUTIONS
-36
0
-37
-35 C
quantile XT
-38
0
-27 C
-39
0
-25 C
-40
minimum
0
-23 C
-41
0
-22 C
-42
0
20
40
60
RETURN PERIOD T
80
100
Profile Log Likelihood
UNCERTAINTY OF ESTIMATION ABOUT EXTREME QUANTILES
-89
T=100
-90
-91
T=50
-92
T=25
-93
-94
200
400
600
800
XT
1000
Profile likelihood plots for the T-year return value X T (flow rates from the
River Nidd). Adapted from Davison and Smith (1990).
Poisson-GPD model:
XT  u 

k
[1  ( T ) k ],
u  100
CLIMATE CHANGES IN ORDER STATISTICS
-40
0
-42
 =1.5 C/10 years
-46
0
temperature, C
-44
-48
-50
-52
-54
-56
-58
-60
1930
1940
1950
1960
1970
1980
1990
2000
Annual minimum air temperature.
Yakutsk (Russia), 1936-2001
2010
CHANGES IN EXTREME QUANTILES.
According to I. Matyasovszky (2000)
-9
Pecs
Szeged
Miscolc
Budapest
o
C
-11
-13
-15
-17
-19
1900
1920
1940
1960
1980
2000
Daily minimum temperatures in
Hungary (winter, q=0.05)
EXTREMES AND CLIMATE CHANGE
1. The problem of interpretation of the concept “return period”
in climate change conditions.
2. Combination of EV-models with models of inter- and
intraanual variability.
3. Application of Monte-Carlo technique based on well-developed
stochastic models of meteorological processes with taking account
climate forecasts.
CONCLUSIONS
1. The most commonly-used approach to derive extremes is based on
GEV-distribution for maxima of meteorological series.
2. There are many fitting methods to calculate parameters of GEV
distribution. The modifications of method of moments can be
recommended for estimating initial values of parameters.
3. The most effective estimates of parameters are provided by the
maximum likelihood method. Some calculating algorithms are
available to realize this method.
4. Alternative approach to derive extremes is based on considering
threshold exceedances and using GPD-distribution for excesses.
5. Uncertainty in estimating extreme values parameters has to be taken into
account to interpret these characteristics for applied purposes.
6. It is recommended to use the maximum likelihood method
accompanied by the Monte-Сarlo technique for evaluating confidence
intervals.
7. It is necessary to develop new approaches to interpretation of climate
variability for applications in climate changes conditions. Such approaches
should be based on combining extreme value models and advanced
statistical models of meteorological processes in different time-space
scales.
REFERENCES
1. Gumbel, E.J. (1958). Statistics of extremes. New York: Columbia
Univ. Press.
2. Сramer H., M. R. Leadbetter (1967). Stationary and related stochastic
processes. Sample function properties and their applications.
John Wiley, New York.
3. Leadbetter, M.R., G. Lindgren, H. Rootzen (1983). Extremes and
related properties of random sequences and processes.
4. Davison, A.C. and Smith, R.L. (1990). Models for exceedances over
high thresholds (with discussion). J. R. Statist.Soc., 52, pp.393-442.
5. Reiss, R.-D., and M. Thomas (Birkhauser, Second Edition, 2001).
Statistical modeling of extreme values from insurance, finance,
hydrology, and other fields.
6. Buishand , T.A. (1989). Statistics of extremes in climatology.
Statistica Nederlandica , 43, 1-30.
7. Farago, T., and R.W. Katz (1990). Extremes and design values in
climatology. Report No. WCAP-14, WMO/TD-No. 386, World
Meterological Organization, Geneva.
8. Palutikof, J.P. , B.B.Brabson, D.H.Lister, and S.T.Adcock (1999). A
review of methods to calculate extreme wind speeds. Meteorological
Applications, 6, 119-132.
9. Khlebnikova, E.I., I.A.Sall, E.E.Sibir. (1988). On the use of space
characteristics of excursions of meteorological fields for climate
change analysis (in Russian). Trudy GGO, 516, 110-120.
10. Matyasovszky I. (2000). A method to estimate temporal behavior
of extreme quantiles. Idöájáras, v.104, No.1, pp.43-51.
WEB PAGES
1. http:// www.esig.ucar.edu/extremevalues/extreme.html,
including “Lecture notes on environmental statistics” by Richard
Smith (Chapter 8 on extremes)
2. http://www.cru.uea.uk.projects/mice/extremes_description.pdf
SOFTWARE
1. Xtremes: http://www.xtremes.de
Windows software for statistical analysis of extremes.
2. Alec McNeil’s S-Plus routines:
http://www.math/ethz.ch/~mcneil/software.html