Time Series Analysis by descriptive statistic

Download Report

Transcript Time Series Analysis by descriptive statistic

Time Series Analysis
by
descriptive statistic
R. Werner
Solar Terrestrial Influences Institute - BAS
Def.: A time series is a sequence of data points
measured at successive times (often)
spaced in uniform time intervals.
Time series analysis comprises methods that
attempt to understand time series, often either to
understand the underlying context of the data
points where did they come from,
what generated them,
or to make forecasts (predictions).
From Wikipedia.org
Using methods of descriptive Statistic of
quantitative cross-section analysis,
important measures are:
1 n
z   zi
- arithmetic mean
n i 1
n
1
2
- variance
s   ( zi  zi )2
n i 1
1 n
( xi  xi )( yi  yi )

n i 1
- correlation rxy 
1 n
1 n
2
coefficient
( xi  xi )
( yi  yi )2


n i 1
n i 1
Do not forget visualization
scatter plots
example: histogramms
Cov( x, y)

sx s y
For the time
series
meaningful
only for
stationarity!
For the time series
Auto-correlation
lag k
1 n
( zt  zt )(zt k  zt k )

- 1  r 1
n

k
t k 1
rk 
1 n
( zt  zt )2

symmetric for k
n t 1
Cov(zt , zt -k )

s(zt )2
0k n
auto-covariance
used in practice:
1  k  n/4
Correspondence of the cross-correlation to
the quantitative cross-section analysis
Relation of two time
series, co-variance:
with lag k
or
1 n
Cov(x,y)   ( xt  xt )( yt  yt )
n t 1
1 nk
Cov(x,y) 
( xt  xt )( yt k  yt )

n  k t 1
1 n
Cov(x,y) 
( xt k  xt )( yt  yt )

n  k t k 1
It is not known which series is the leading series
cross-correlation:
rxy (k ) 
Cov( x, y)
 x y
non-symmetric
for k
- 1  r 1
Time series decomposition into components
often non-stationary (we have trends) and
periodical variations
Models:
Zt  M t  St  Rt
additive:
multiplicative:
Zt  Mt St Rt
T: trend
S: seasonal
R: rest, noise
by logarithmizing → transition to additive model
mixed:
Zt  Mt St  Rt
Step by step:
1. Trend determination
2. Trend subtraction from the series and
determination of the seasonal
component
3. After removing the seasonal
component, the rest remains
After this: analysis of the rest,
correlation, seasonality or other
periodicities or a trend
Determination of the trend
Global trend (over the entire observation
interval)
ut  St  Rt
M t  a  bt
Zt  a  bt  ut
or polynomial
regression model of
order p, splines
Square sum of errors:
n
Qp   (Zt  Tt )
F-test
t 1
Not to be used for
prognoses,
(increasing with p)
t  , Zt   /  
Other linear models: exponential model
logistic trend functions
A
Zt 
1 eBCt
A>0
C>0
Local trend: moving average (running mean), to
remove oscillations (seasonality)
odd:
point
numbers
even:
1
T3  ( z1  z2  z3  z4  z5 )
5
1
Т 2.5  ( z1  z2  z3  z4 )
4
1
Т3.5  ( z2  z3  z4  z5 )
4
1
1
1
1
1
Т3  z1  z2  z3  z4  z5
8
4
4
4
8
How does the variance change?
2q1
Var(Mt )  bi2Var(Zt )
i 1
where
2q+1 is the number of sampling points
bi are the weights
Besides, for removing the seasonal means, we have to calculate
the running mean over 13 months, with bi = 1/24 for the first and
the last month, otherwise bi=1/12 !
For the given examples:
1
T3  ( z1  z2  z3  z4  z5 )
5
1
1
1
1
1
Т3  z1  z2  z3  z4  z5
8
4
4
4
8
Var(M t ) 2q1 2  1  1
 bi  5  
Var(Zt ) i 1
 5 5
2
2q1
2
2
2 3 5
1
1
2
b

2

3

 
 
 

i
8
 4  16 16 16
i 1
Trend removing by calculation of differences
Zt  a  bt
Linear trend:
Zt  Zt  Zt 1
Polynomial trend:
p
Zt  a0   a j t j
j 1
p Zt  p1Zt  p1Zt 1
recursive formulae
Problems related to the trend
determination
1. For short time series, the determined trend will not
be equal to the long time trend, and will not be
distinguishable from the longer periodicities
2. By smoothing the reversal points of the time series
are shifted
3. The production of autocorrelations by smoothing
with running averages (quasi-periodicities – Slutzky
effect)
FFT of the basic period,without trend
FFT of the basic period with
trend
Determination of the seasonal component
A very simple method for
constant seasonal variations
the perfect case:
1 K
Si 
Yi 12k

k  1 k 0
S
i
0
i
in practice:
Standardized
phase average
Assumption: no
trend!
Yt  Zt  M t
also:
Phase average
St  St  p
1 12
S   Si  0
12 i 1
Si( s)  Si  s
i  1,2,...,12; k  0,1,...K
i is the month
k is the number of years
Or dummy regression with:
Dit 
1 if the month number i
0 else
12
Yt    i Dit  et
12 equations !
i 1
or together with a polynomial trend
p
12
Yt  0   j t    i Dit  et
j
j 1
i 1
For a multiplicative model:
Zt
Yt 
;
Mt
S
i
i
 1;
(s)
i
S

Si
s
Periodogram analysis
Zt  c  a cos(t )  b sin(t )  et

2
 2f
Tp
Strategies: - Step by step determination of the period Tp
- Test
of a theoretical hypothesis
j
Fourier frequencies  j 
n
n-1
j  1,2,.....,
2
The entire time interval is used for T1
Harmonic analysis - non-equidistant time intervals
- choice of the basic period
(n odd)
Harmonic series
j
j 

zt  c   a j cos(2 t )  bj sin(2 t )  et
n
n 
j 1 
m
o If j/n are Fourier frequencies, the regressor
functions are orthogonal. All coefficients can be
calculated together and they are not changed by the
choice of a new m
o If j/n are not Fourier frequencies, then we have to
calculate all coefficients again by changing m
o If the data number is equal to the calculated
coefficients, then we have no degree of freedom, the
calculated series is not an estimation. The error
term is zero! → filter
It can be proven that


n


2
r j  Aj  n

  ( zt  z )2 
 t 1

Aj  a 2j  b2j
r2 is the determination coefficient, the part of the
explained sum of the squared deviations,
n 2
besides Aj is the explained sum of the squared deviations
2
Periodogram
Plot of the intensities
n 2 2
I j  (a j  bj ) against the periods Tj
2
Spectrogram
Plot of the intensities
n 2 2
I j  (a j  bj ) against the frequencies fj
2
Other methods are:
 Lomb-Scargle Periodogram
 Wavelet
How to determine which is the better
model approximation, additive or
multiplicative?
Analysis of the variance:
spread versus level plot (SLP-diagram)
- splitting the time series in to intervals,
- determination of the standard deviations
in the intervals
- plotting the stand. dev. against the
means
line parallel to x-axis → additive model
if the SLP
linear line
→ multiplicative model
no decision
→ mixture model
Box/Cox Transformation

z

1
zt* 

t
ln zt

1
for λ ≠ 0
for λ = 0
or in a simpler form
zt* 
zt
for λ ≠ 0
ln zt
for λ = 0
λ = 0 multiplicative model
Determination of λ:
λ = 1 additive model
- stand. dev. plot against
logarithms of the mean time
interval points
Use simple coeff. λ
1/4;1/3; 1/2;...
- combination with SLP
Acknowledgement
I want to acknowledge to the Ministery of Education
and Science to support this work under the contract
DVU01/0120