Presentation Template for Telcordia Technologies

Download Report

Transcript Presentation Template for Telcordia Technologies

Joerg Rothenbuehler
jrothenb – 1
The distribution of the Maximum:
Let ( X 1 , X 2 ,....)be iid random variableswith cdf F(x).
Let M n  max ( X 1 , X 2 ,...., X n ).
T henP[Mn  x]  Fn (x).
As a consequence, M n  xF a.s. , where
xF  sup( x, F ( x)  1).
jrothenb – 2
Fisher-Tippett Theorem
Suppose  c n  0, d n and a non - degeneratedf H,
such that:
(Mn - d n ) D
 H ( x)
cn
(1)
T henH is eitherFrechet,Weibull,or Gumbel.
T hesedistributions are called ExtremeValue
Distributions (EVD).
Notation: F  MDA(H)if (1) holds.
jrothenb – 3
The Extreme Value Distributions
Frechet: Φ α (x)  exp( x  α ), x  0, α  0
is thelimit of heavy tailed distributions
with tail index α :1 - F(x)  cx α as x  
Weibull : Ψ α (x)  exp(-(-x)α ), x  0, α  0
is thelimit of distr. wit h xF  
Gumbel : Λ(x)  exp(e  x ), x  R
is thelimit of light tailed distr. wit h xF  
jrothenb – 4
Generalized Extreme Value Distribution
(GEV)
The three EVD can be represented by a single three
parameter distribution, called the GENERALIZED EVD
(GEV):
1 / ξ

 
 x -   
exp - 1  ξ
   if ξ  0
   





H ξ, , (x)  

 x -  

if ξ  0
 exp - exp(-  




 x -  
where 1  ξ
   0
  

jrothenb – 5
The function of the parameters
  R is called the shape parameter:
ξ  0 : H corresponds to theFrechetdist. with ξ  1/α
ξ  0 : H corresponds to theGumbel dist.
ξ  0 : H corresponds to the Weibull dist. with ξ  - 1/α
  R is thelocation parameter
  0 is the scaling parameter
 and  are introducedfor thepurpose of fitting
theGEV to a dataset.
T hecrucial parameteris theshape parameter.
jrothenb – 6
Excesses over high thresholds
P[X a X  a]  Gξ,β(a) (x)
jrothenb – 7
Generalized Pareto Distribution (GPD)
-1/ξ
 
x
 1 - 1  ξ  if ξ  0
G ξ,β (x)  
β

1 - exp(-x/β )
if ξ  0

with parameters:   0 and ξ  R
where
x  0 if ξ  0
0  x  β / ξ if ξ  0
jrothenb – 8
Properties of GPD
ξ  0 : G is heavy tailed wit h tailindex 1/ξ
ξ  0 : G is light tailed
ξ  0 : G is has finiteright endpoint x1  β/ξ
If X ~ GPD with   1 and β, thenfor a  x F :
β  ξa
e(a)  E[X  a X  a ] 
, β  ξa  0
1- ξ
jrothenb – 9
The Empirical Mean Excess Function
The empirical mean excess function of a GPD with
ξ  .2,β  1
jrothenb – 10
Modeling Extreme Events:
 The number of exceedances of a high threshold follows a
Poisson process (iid exp. distributed interarrival times)
 Excesses over a high threshold can be modeled by a GPD
 An appropriate value of the high threshold can be found
by plotting the empirical mean excess function.
 The distribution of the maximum of a Poisson number of
iid excesses over a high threshold is a GEV with the same
shape parameter as the corresponding GPD.
jrothenb – 11
Extremal Index of a Stationary Time
Series
 The extremal index 0  θ  1 measures the dependence
of the data in the tails.
 1/θ can be interpreted as the average cluster size in the
tails: High values appear in clusters of size 1/θ
 θ  1 means there is no clustering in the tails.
 If the data does not show strong long range dependence,
but has extremal index θ  1, its maxima has distribution
H θ ( x) , where H is the GEV of iid data with the same
marginal distribution.
 GPD analysis may not be appropriate for data with θ  1
jrothenb – 12
The Data: Surveyor Project
 One way delays of probe packets during one week
 Packets sent according to a Poisson process with a rate of
2/sec
 Packet is time-stamped to measure delay
 If delay >10 sec, packet assumed lost, discarded
 Saturday and Sunday excluded for analysis
 More details:
http://telesto.advanced.org/~kalidindi/papers/INET/inet99.html
jrothenb – 13
Time-Series Plot Colorado-Harvard
Monday 12:00am - Friday 8:00 pm
jrothenb – 14
ACF and Ex. Index Estimation
jrothenb – 15
Empirical Mean Excess Function
jrothenb – 16
Estimation of Shape Parameter as a
function the used threshold using GPD
jrothenb – 17
Result of the GPD Fit
jrothenb – 18
Fit of a GPD-Distr. for Colorado-Harvard
 threshold = 107.774
 Quantile of threshold = 0.9993536
 Number of exceedances = 500
 Parameter estimates and Standard Errors
xi
beta
-0.3319409
86.50868
0.03683419
4.844786
jrothenb – 19
Estimations based on GPD Fit
p
0.99940
0.99950
0.99960
0.99970
0.99980
0.99990
0.99995
0.99999
1.00000
quantile
114.13890
129.06973
146.15561
166.39564
191.83186
228.12013
256.94996
303.07272
368.3887
sfall
177.50201
188.71184
201.53964
216.73554
235.83264
263.07730
284.72227
319.35051
368.3887
empirical quantile
115.638
130.8681
147.7083
165.8802
190.7157
229.9743
252.4705
311.1122
329.237
jrothenb – 20
Quantile estimation as a function of the
threshold
99.995% Estimate
Empirical quantile
jrothenb – 21
Fitting a GEV to block wise maxima
Block 1
Block 2
Block 3
Block 4
Block 5
jrothenb – 22
GEV-Fit Results for different Block sizes
 Block size = 7200 : 108 Blocks
xi
sigma
Estimation -0.3375603 59.75591
Std. Error 0.0734163
4.69351
mu
197.1503
6.4305
 Block size = 14400 : 54 Blocks
xi
sigma
Estimation -0.4346847 51.7389
Std. Error 0.1256784
6.2025
mu
235.2513
8.0083
jrothenb – 23
High Level Estimation
Level exceeded during 1 of 50 hours
Block size
Lower
Estimate
Upper
1h
1.5 h
2h
355.9581
357.6406
352.1868
314.8669
318.4539
315.7877
326.7487
327.3824
324.6415
Level exceeded during 1 of 100 hours
Block size
Lower
Estimate
Upper
1h
1.5h
2h
371.3975
371.0107
365.1899
322.4220
325.7779
324.3893
336.7065
335.4343
332.4487
jrothenb – 24
Does GPD always work?
The Army-Lab. – Univ. of. Virginia dataset
Time Series Plot
PACF Plot, Lags: 1-1000
ACF Plot, Lags:1-1000
ACF Plot, Lags:5-1000
jrothenb – 25
What goes wrong beyond the LRD:
Empirical Mean Excess Function
Shape Parameter
jrothenb – 26
Non-Stationarity: Harvard to Army- Lab.
Time Series Plot: Monday 12 am – Friday 8 pm
jrothenb – 27
Pick a few hours per day!
11am – 4pm Mon - Fri
Mean Excess Plot
Empirical Tail Distr.
Shape Parameter Estimation
jrothenb – 28
Single Outlier: Virginia - Harvard
Monday 12am – Friday 8pm
Empirical Tail Distr.
ACF, Lag 3-1000
Estimation of Extremal Index
jrothenb – 29
The effect of the outlier on GEV
 Fit Without outlier:
Block size = 14400 53 Blocks
xi
sigma
-0.4988539
45.08995
0.09091089
5.217626
mu
117.2362
6.735118
x1=280.101
 Fit With outlier
Block size = 14400 53 Blocks
xi
sigma
-0.09130441
44.97725
0.06274905
4.576706
mu
109.3819
6.716929
x1=1242.969
jrothenb – 30
The effect of the single outlier on GPD:
Analysis with outlier
Analysis without outlier
jrothenb – 31
Conclusions:
 The GPD is a model that can be fitted to the tails of a
distribution. The quality of the fit can be checked with
various methods. From the model, we can gain quantile
estimates at the edge of or outside the data range.
However, a good fit is often not possible.
 The GEV provides a model for the distribution of block
wise Maxima. Its use is supported by EVT for stationary
time series without strong LRD, while GPD is only
supported in the iid case. The quality of fit can be checked
with similar tools as in the GPD model. Certain problems
remain, and reliable quantile estimates are not available.
jrothenb – 32
Acknowledgements:
•Applied Research Group at Telcordia:
–E. van den Berg
–K. Krishnan
–J. Jerkins
–A. Neidhardt
–Y. Chandramouli
•Cornell University:
–Prof. G. Samorodnitsky
jrothenb – 33