Introduction to Time Series Analysis

Transcript Introduction to Time Series Analysis

Introduction to Time Series Analysis

Gloria González-Rivera

University of California, Riverside

and Jesús Gonzalo

U. Carlos III de Madrid

Spring 2002

Copyright(© MTS-2002GG): You are free to use and modify these slides for educational purposes, but please if you improve this material send us your new version.

•

Brief Review of Probability Sample Space

Ω  {  } • •

Outcome

  

Event

E   •

Field

  E : E   } •

Random Variables

Z :   S  to a State Space S •

State Space

: S, a space containing the possible values of a random variables –common choices are the integers

, reals

, k-vectors

k , complex numbers

, positive reals

+ , etc •

Probability

P :

 [ 0 , 1 ] •

Distribution

 :

 [ 0 , 1 ] , where

 { A : A  R } (intervals, etc)

Brief Review (cont)

•

Random Vectors

= (Z 1 , Z 2 , ..., Z n ) is a n-dimensional random vector if its components Z 1 , ..., Z n are one-dimensional real-valued random variables If we interpret at time

t=1, ..., n

as equidistant instants of time, Z t can stand for the outcome of an experiment

. Such a time series may, for example, consists of Toyota share prices Z t at n succeeding days.

The new aspect now, compared to a one-dimensional radnom variable, is that now we can talk about the dependence structure of the random vector.

•

Distribution function

F Z of Z : It is the collection of the probabilities F Z ( z )   P P ( Z 1 ({  :  z 1 ,..., Z 1 (  ) Z n  z  z 1 ,..., n Z ) n (  )  z n })

Stochastic Processes

We suppose that the exchange rate €/$ at every fixed instant

between 5p.m and 6p.m. this afternoon is

random.

Therefore we can interpret it as a realization Z t (  ) of the random variable Z t , and so we observe Z t (  ), 5<

<6. In order to make a guess at 6 p.m. about the exchange rate Z 19 (  ) at 7 p.m. it is reasonable to look at the whole evolution of Z t ( ) between 5 and 6 p.m. A mathematical model describing this evolution is called a

stochastic

process.

Stochastic Processes (cont)

stochastic process

is a collection of time indexed random variables defined on some space .

Suppose that (1) For a fixed t (2) For fixed  ( Z t , t  T )  ( Z t (  ), t  T ,    )

Z Z t



(

 :

),



Z t R

:

 

This is just a random variable.

This is a realization or sample function Changing the time index, we can generate several random variables:

Z t

(



),

Z t

(



),.......

Z t n

(



)

From which a realization is:

z t

,

z t

,....

z t n

This collection of random variables is called a STOCHASTIC PROCESS A realization of the stochastic process is called a TIME SERIES

Examples of stochastic processes E1:

Let the index set be T={1, 2, 3} and let the space of outcomes (  ) be the possible outcomes associated with tossing one dice: 1, 2, 3, ,4 ,5, 6} Define Z(t,  )= t + [value on dice] 2 t Therefore for a particular , say  3 ={3}, the realization or path would be (10, 20, 30).

Q1:

Draw all the different realizations (six) of this stochastic process.

Q2:

Think on an economic relevant variable as an stochastic process and write down an example similar to E1 with it. Specify very clear the sample space and the “rule” that generates the stochastic process.

E2:

Brownian Motion B

t , t  [0, infty]): • It starts at zero:

o =0 • It has stationary, independent increments • For evey t>0,

t has a normal

N(0, t)

distribution • It has continuous sample paths: “no jumps”.

Distribution of a Stochastic Process

In analogy to random variables and random vectors we want to introduce non-random characteristics of a stochastic process such as its distribution, expectation, etc. and describe its dependence structure. This is a task much more complicated that the description of a random vector. Indeed, a non-trivial stochastic process Z=(Z t , t  T) with infinite index set T is an infinite-dimensional object; it can be inderstood as the infinite collection of the random variables Z t , t  T. Since the values of Z are functions on T, the distribution of Z should be defined on subsets of a certain “function space”, i.e. P(X  A), A  F, where F is a collection of suitable subsets of this space of functions. This approach is possible, but requires advanced mathematics, and so we will try something simpler.

The

finite-dimensional distributions (fidis)

of the stochastic process Z are the distributions of the finite-dimensional vectors (Z t1 ,..., Z tn ), t 1 , ..., t n  T, for all possible choices of times t 1 , ..., t n  T and every n  1.

Stationarity

Consider the joint probability distribution of the collection of random variables

(

z t

,

z t

,.....

z t n

)



(

Z t

1 

z t

,

Z t

2 

z t

,...

Z t n



z t n

)

1 st order stationary process if

(

z t

1 ) 

(

z t

1 

) 2 nd order stationary process if

(

z t

1 ,

z t

2 ) 

(

z t

1 

z t

2 

)

for any t

1 ,

k for any t

1 ,

2 ,

n-order stationary process if

(

z t

1 .....

z t n

) 

(

z t

1 

.....

z t n



)

for any t

1 ,

t n

Definition.

A process is strongly (strictly) stationary if it is a n-order stationary process for any n.

Moments

E ( Z t )  Var ( Z t ) Cov ( Z t 1   t , Z t 2   t 2 )    Z t f ( z t ) dz t E ( Z t E [( Z t 1   t ) 2   t 1  )( Z t 2  ( Z t   t 2   t )] ) 2 f ( z t ) dz t  ( t 1 , t 2 )  cov( Z t 1 , Z t 2 )  t 2 1  2 t 2

Moments (cont)

For strictly stationary process: because provided that 

 

(

z t

(

)

Z t



)

(

z t

1 

 

,

) 

(



1 2

)

  2



1 

    2  

(

z t

1 ,

z t

2 ) cov(

z t

1  (

1 ,

2 ) , let

1  (

1 ,

2 ) 

z t

2  



(

z t

1 

z t

2 

)  )  cov(

z t

1 

z t

2 

)   (

1 

2 

)  

and (



t t

2 )    (

t t

, then ,



)  

The correlation between any two random variables depends on the time difference

Weak Stationarity

A process is said to be

-order weakly stationary if all its joint moments up to order

exist and are time invariant.

Covariance stationary process (2 nd order weakly stationary): • constant mean • constant variance • covariance function depends on time difference between R.V.

Autocovariance and Autocorrelation Functions

For a covariance stationary process:

(

Z t

)  

Var

(

Z t

)   2

Cov

(

Z t

Z s

)  





 cov(

Z t



) var(

Z t

) var(

Z t



)   

2   

0  k  k : autocovari ance function  : k  R : autocorrel  : k ation function  [  1 , 1 ] (ACF)

Properties of the autocorrelation function

If

 0 

var(

Z t

) then

 0 

1 Since

 k

is a correlatio





1

 

  0

n coefficien



  

t,





since

 



 

(

Z t



(

Z t





)(

Z t

  

)(

(



) 



)

  

 

)



Partial Autocorrelation Function (conditional correlation)

This function gives the correlation between two random variables that are

periods apart when the in-between linear dependence (between

and

t+k

Let

Z t

and

Z t



) is removed. be two random variables the PACF is given by  (

Z t



| ,

Z t

 1 ,......

Z t



 1 )

Motivation

Think about a regression model (without loss of generality, assume that E(Z)=0) Z t  k   k 1 Z t  k  1   k 2 Z t  k  2 ......

  kk Z t where e t  k is uncorrelat ed with Z t  k  j j  1  e t  k (1) Z t multiply  k  j Z t  by Z t  k  j k   k 1 Z t  k  1 Z t  k  j   k 2 Z t  k  2 Z t  k  j ......

  kk Z t Z t  k  j  e t  k Z t  k  j ( 2 ) take expectatio ns  j   k 1  j  1   k 2  j  2 ......

  kk  j  k

Dividing by the variance of the process:     

......

j 

 1 

 2  1 , 2 ,...

 1  2   

1  0 

1  1 

.......



.......

  





 1 

 2  

 Yule-Walker equations





 1 

  1 

11  1  0 

.......

  11   

 1  0

 2   1 2     21 21   0 1     22 22   1 0   22  1  1 1  1  1   1 2 1

 3   1 2  3     31 31   0 1   31  2     32 32   1 0   32  1     33 33   2 1   33  0   33  1  1  2 1  1  2  1 1  1  1 1  1  1  2   3 2  1 1

Examples of stochastic processes

Y t if t is even

E4:

Z t

Y t +1 if t is odd where Y t is a stationary time series. Is Z t weak stationary?

E5:

Define the process S t = X 1 + ... + X n , where X i is iid (0,  2 ). Show that for h>0 Cov (S t+h , S t ) = t  2 , and therefore S t is not weak stationary.

Examples of stochastic processes (cont) E6

White Noise Process

A sequence of uncorrelated random variables is called a white noise process.

 }

:

(

a t

)

 

(normally





0 )

Var

(

a t

)

 

Cov

(

a t

,

a t



)



0 for



0

Autocovari ance and autocorrel ation  

k k

     0

2    1 0

k k

 0  0

k k

 0  0 

 1 0

k k

 0  0 

. . . .

1 2 3 4 k

Dependence: Ergodicity

• See

Reading 1

from

Leo Breiman

a View Toward Applications”

(1969) “

Probability and Stochastic Processes: With

• We want to allow as much dependence as the Law of Large Numbers (LLN) let us do it • Stationarity is not enough as the following example shows:

E7:

Let {U t } be a sequence of iid r.v uniformly distributed on [0, 1] and let Z be N(0,1) independent of {U t }.

Define Y t =Z+U t . Then Y t is stationary (why?), but Y n Y n   Z n 1 t n   1 Y t 1 2   E ( Y t )  1 2 The problem is that there is too correlation between Y 1 and Y t

much dependence

in the sequence {Y t }. In fact the is always positive for any value of t.

Ergodicity for the mean Objective

: estimate the mean of the process  }

 

(

Z t

) Need to distinguishing between: 1.

Ensemble average z



i m

  1

Z i m

Time average z



t n

  1

Z t n

Which estimator is the most appropriate? Ensemble average

Problem

: It is impossible to calculate Under which circumstances we can use the time average? Is the time average an unbiased and consistent estimator of the mean?

Ergodicity for the mean (cont)

Reminder. Sufficient conditions for consistency of an estimator.

lim

 

(



ˆ

)

 

and lim

T  

var(



ˆ

)



0

1. Time average is asymptotically unbiased

(

)  1



t E

(

Z t

)  1



   2. Time average is consistent for the mean var(

) 

1 2

t n

  1

s n

 1 cov(

Z t

Z s

)  

0 2

t n n

  1

 1 



  

0 2

t n

  1 ( 

 1  

 2   



)    0 

2  [(   ( 0    1  (

 1 )      

 (

 2 )  1 )    (    1  0  )]  0    1    

 2 ) 

Ergodicity for the mean (cont)

var(

)   0



 1   (

 1 ) (



) 

 lim

  var(

)  lim

   0



( 1 

k n

  0 

 k ) 

 0



 0 ( 1 

k n

) 

A covariance-stationary process is ergodic for the mean if

lim



(

Z t

)

  A sufficient condition for ergodicity for the mean is

   0  that

  or k    0 is as

  

 k  0  

Ergodicity under Gaussanity

If  }

is a stationary Gaussian process, 



  is sufficient to ensure ergodicity for all moments

Where are We?

The Prediction Problem as a Motivating Problem:

Predict Z t+1 given some information set I t Min E [ Z Solution : t   1 Z t   1  Z t   1 ] 2 E [ Z t  1 | I t ] at time t.

The conditional expectation can be modeled in a

parametric way

or in a

non-parametric way

. We will choose in this course the former. Parametric models can be

linear

non-linear

. We will choose in this course the former way too. Summarizing the models we are going to study and use in this course will be

Parametric and linear models

Some Problems P1:

Let {Z t } be a sequence of uncorrelated real-valued variables with zero means and unit variances, and define the “moving average” Y t  i r   0 a i Z t  i for constants a 0 , a 1 , ... , a  . Show that Y is weak stationary and find its autocovariance function

P2:

Show that a Gaussian process is strongly stationary if and only if it is weakly stationary

: Let X be a stationary Gaussian process with zero mean, unit variance, and autocovariance function c. Find the autocovariance functions of the process X 2  { X ( t ) 2 :   t   } and X 3  { X ( t ) 3 :   t   }

Appendix: Transformations

•

Goal:

To lead to a more manageable process •Log transformation reduces certain type of heteroskedasticity. If we assume  t =E(X t ) and V(X t ) = k  2 t , the delta method shows that the variance of the log is roughly constant: Var ( f ( Z ))  f ' (  ) 2 Var ( Z )  Var (log( Z t )  ( 1 /  t ) 2 Var ( Z t )  k •Differencing eliminates the trend (not very informative about the nature of the trend) •Differencing + Log = Relative Change log( Z t )  log( Z t  1 )  log( Z t Z t  1 )  log( 1  Z t  Z t  1 ) Z t  1  Z t  Z t  1 Z t  1