Time Series - Eastern Michigan University

Download Report

Transcript Time Series - Eastern Michigan University

Time Series
Math 419/592
Winter 2009
Prof. Andrew Ross
Eastern Michigan University
Overview of Stochastic Models
No Space Discrete
No Time
Ch 1, 2, 3
Discrete
Ch 4 DTMC
Continuous Ch 5, 7 Ch 5, 6, 7, 8
Continuous
Ch 1, 2, 3
YOU ARE HERE
Ch 10
Or, if you prefer,transpose it:
No Time Discrete
Continuous
No Space
Ch 5, 7
Discrete
Ch 1, 2, 3 Ch 4 DTMC
Ch 5, 6, 7, 8
Continuous Ch 1, 2, 3 YOU ARE HERE Ch 10
But first, a word from our sponsor
Take Math 560
(Optimization)
this fall!
Sign up soon or it will disappear
Outline

Look at the data!

Common Models

Multivariate Data

Cycles/Seasonality

Filters
Look at the data!
or else!
Atmospheric CO2
Years: 1958 to now; vertical scale 300 to 400ish
Ancient sunspot data
Our Basic Procedure
1. Look at the data
2. Quantify any pattern you see
3. Remove the pattern
4. Look at the residuals
5. Repeat at step 2 until no patterns left
Our basic procedure, version 2.0

Look at the data

Suck the life out of it

Spend hours poring over the noise

What should noise look like?
One of these things is not like the others
2.5
0.3
2.25
0.25
2
1.75
0.2
1.5
0.15
1.25
1
0.1
0.75
0.05
0.5
0
0.25
-0.05
0
-0.25
-0.1
-0.5
-0.15
-0.75
-1
-0.2
-1.25
0
10
20
30
40
50
60
70
-0.25
0
10
20
30
40
50
60
70
0
10
20
30
40
50
60
70
1.75
7
6.5
1.5
6
1.25
5.5
1
5
4.5
0.75
4
0.5
3.5
0.25
3
0
2.5
-0.25
2
-0.5
1.5
-0.75
1
0.5
-1
0
-1.25
0
10
20
30
40
50
60
70
Stationarity

The upper-right-corner plot is Stationary.

Mean doesn't change in time

no Trend

no Seasons (known frequency)

no Cycles (unknown frequency)

Variance doesn't change in time

Correlations don't change in time


Up to here, weakly stationary
Joint Distributions don't change in time

That makes it strongly stationary
Our Basic Notation

Time is “t”, not “n”


State (value) is Y, not X


to avoid confusion with x-axis, which is time.
Value at time t is Yt, not Y(t)


even though it's discrete
because time is discrete
Of course, other books do other things.
Detrending: deterministic trend

Fit a plain linear regression, then subtract it out:

Fit Yt = m*t + b,

New data is Zt = Yt – m*t – b

Or use quadratic fit, exponential fit, etc.
Detrending: stochastic trend

Differencing
 For linear trend, new data is Z = Y – Y
t
t
t-1
 To remove quadratic trend, do it again:
 Wt = Zt – Zt-1=Yt – 2Yt-1 + Yt-2
 Like taking derivatives


What’s the equivalent if you think the
trend is exponential, not linear?
Hard to decide: regression or differencing?
Removing Cycles/Seasons


Will get to it later.
For the next few slides, assume no
cycles/seasons.
A brief big-picture moment

How do you compare two quantities?

Multiply them!

If they’re both positive, you’ll get a big, positive
answer

If they’re both big and negative…

If one is positive and one is negative…

If one is big&positive and the other is
small&positive…
Where have we seen this?

Dot product of two vectors


Inner product of two functions


Proportional to the cosine of the angle between
them (do they point in the same direction?)
Integral from a to b of f(x)*g(x) dx
Covariance of two data sets x_i, y_i

Sum_i (x_i * y_i)
Autocorrelation Function




How correlated is the series with itself at
various lag values?
E.g. If you plot Yt+1 versus Yt and find the
correlation, that's the correl. at lag 1
ACF lets you calculate all these correls. without
plotting at each lag value.
ACF is a basic building block of time series
analysis.
Fake data on bus IATs
y = -0.4234x + 1.4167
R2 = 0.1644
Lag-1 of bus IATs
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
ACF and PACF of bus IATs
0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.21.4
1.8
1.6
1
0.8
0.6
Correl
0
acf
0.4
LCL(95%)
0.2
UCL(95%)
0
-0.2
0
5
10
15
20
-0.4
-0.6
lag
25
30
35
Properties of ACF

At lag 0, ACF=1

Symmetric around lag 0

Approx. confidence-interval bars around ACF=0



To help you decide when ACF drops to near-0
Less reliable at higher lags
Often assume ACF dies off fast enough so its
absolute sum is finite.

If not, called “long-term memory”; e.g.


River flow data over many decades
Traffic on computer networks
How to calculate ACF


R, Splus, SAS, SPSS, Matlab, Scilab will do it
for you
Excel: download PopTools (free!)


http://www.cse.csiro.au/poptools/
Excel, etc: do it yourself.

First find avg. and std.dev. of data

Next, find AutoCoVariance Function (ACVF)

Then, divide by variance of data to get ACF
ACVF at lag h

Y-bar is mean of whole data set

Not just mean of N-h data points

Left side: old way, can produce correl>1

Right side: new way

Difference is “End Effects”

Pg 30 of Peña, Tiao, Tsay

(if it makes a difference, you're up to no good?)
Common Models

White Noise

AR

MA

ARMA

ARIMA

SARIMA

ARMAX

Kalman Filter

Exponential Smoothing, trend, seasons
White Noise

Sequence of I.I.D. Variables et

mean=zero, Finite std.dev., often unknown

Often, but not always, Gaussian
12. 5
10
7. 5
5
2. 5
0
-2. 5
-5
-7. 5
-10
-12. 5
0
500
1000
1500
2000
AR: AutoRegressive

Order 1: Yt=a*Yt-1 + et
 E.g.
New = (90% of old) + random fluctuation

Order 2: Yt=a1*Yt-1 +a2*Yt-2+ et

Order p denoted AR(p)

p=1,2 common; >2 rare

AR(p) like p'th order ODE

AR(1) not stationary if |a|>=1

E[Yt] = 0, can generalize
Things to do with AR

Find appropriate order

Estimate coefficients

via Yule-Walker eqn.

Estimate std.dev. of white noise

If estimated |a|>0.98, try differencing.
MA: Moving Average

Order 1:

Yt = b0et +b1et-1

Order q: MA(q)

In real data, much less common than AR

But still important in theory of filters

Stationary regardless of b values

E[Yt] = 0, can generalize
ACF of an MA process


Drops to zero after
lag=q
That's a good way to
determine what q
should be!
ACF
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
5
10
15
20
25
30
ACF of an AR process?
ACF
1

Never completely dies
off, not useful for
finding order p.
0.8
0.6
0.4
0.2
0

AR(1) has
exponential decay in
ACF
-0.2
-0.4
-0.6
-0.8
-1


Instead, use Partial
ACF=PACF, which
dies after lag=p
PACF of MA never
dies.
0
5
10
15
20
25
30
ARMA

ARMA(p,q) combines AR and MA

Often p,q <= 1 or 2
ARIMA

AR-Integrated-MA

ARIMA(p,d,q)


d=order of differencing before applying
ARMA(p,q)
For nonstationary data w/stochastic trend
SARIMA, ARMAX

Seasonal ARIMA(p,d,q)-and-(P,D,Q)S

Often S=

12 (monthly) or

4 (quarterly) or



52 (weekly)
Or, S=7 for daily data inside a week
ARMAX=ARMA with outside explanatory
variables (halfway to multivariate time series)
State Space Model, Kalman Filter

Underlying process that we don't see

We get noisy observations of it

Like a Hidden Markov Model (HMM), but state
is continuous rather than discrete.

AR/MA, etc. can be written in this form too.

State evolution (vector): St = F * St-1 + ht

Observations (scalar): Yt = H * St + et
ARCH, GARCH(p,q)



(Generalized) AutoRegressive Conditional
Heteroskedastic (heteroscedastic?)
Like ARMA but variance changes randomly in
time too.
Used for many financial models
Exponential Smoothing

More a method than a model.
Exponential Smoothing = EWMA

Very common in practice

Forecasting w/o much modeling of the process.

At = forecast of series at time t


Pick some parameter a between 0 and 1
At = a Yt + (1-a)At-1


or At = At-1 + a*(error in period t)
Why call it “Exponential”?

Weight on Yt at lag k is (1-a)k
How to determine the parameter





Train the model: try various values of a
Pick the one that gives the lowest sum of
absolute forecast errors
The larger a is, the more weight given to recent
observations
Common values are 0.10, 0.30, 0.50
If best a is over 0.50, there's probably some
trend or seasonality present
Holt-Winters

Exponential smoothing: no trend or seasonality


Holt's method: accounts for trend.


Excel/Analysis Toolpak can do it if you tell it a
Also known as double-exponential smoothing
Holt-Winters: accounts for trend & seasons

Also known as triple-exponential smoothing
Multivariate

Along with ACF, use Cross-Correlation

Cross-Correl is not 1 at lag=0

Cross-Correl is not symmetric around lag=0

Leading Indicator: one series' behavior helps
predict another after a little lag


Leading means “coming before”, not “better than
others”
Can also do cross-spectrum, aka coherence
Cycles/Seasonality

Suppose a yearly cycle

Sample quarterly: 3-med, 6-hi, 9-med, 12-lo

Sample every 6 months: 3-med, 9-med

Or 6-hi, 12-lo

To see a cycle, must sample at twice its freq.

Demo spreadsheet

This is the Nyquist limit

Compact Disc: samples at 44.1 kHz,
top of human hearing is 20 kHz
The basic problem




We have data, want to find

Cycle length (e.g. Business cycles), or

Strength of seasonal components
Idea: use sine waves as explanatory variables
If a sine wave at a certain frequency explains
things well, then there's a lot of strength.

Could be our cycle's frequency

Or strength of known seasonal component
Explains=correlates
Correlate with Sine Waves

Ordinary covar:
T 1
(X
t 0
t
 X )(Yt  Y )
T 1

At freq. Omega,
 sin(t )Y
t 0
t
(means are zero)

Problem: what if that sine is out of phase with
our cycle?
Solution

Also correlate with a cosine


Why not also with a 180-out-of-phase?


90 degrees out of phase with sine
Because if that had a strong correl, our original sine
would have a strong correl of opposite sign.
Sines & Cosines, Oh My—combine using
complex variables!
The Discrete Fourier Transform
T 1
d ( )   e
it
t 0

Yt
Often a scaling factor like 1/T, 1/sqrt(T), 1/2pi,
etc. out front.

Some people use +i instead of -i

Often look only at the frequencies

k=0,...,T-1
T 1
d (k )   e
t 0
 2ik / T
Yt
k  2k / T
Hmm, a sum of products

That reminds me of matrix multiplication.

Define a matrix F whose j,k entry is
exp(-i*j*k*2pi/T)
d  FY

Then

Matrix multiplication takes T^2 operations

This matrix has a special structure, can do it in
about T log T operations

That's the FFT=Fast Fourier Transform

Easiest if T is a power of 2
So now we have complex values...

Take magnitude & argument of each DFT result

Plot squared magnitude vs. frequency

This is the “Periodogram”

Large value = that frequency is very strong

Often plotted on semilog-y scale, “decibels”

Example spreadsheet
Spreadsheet Experiments

First, play with amplitudes:



Next, play with frequency1:

2*pi/8 then 2*pi/4

2*pi/6, 2*pi/7, 2*pi/9, 2*pi/10

2*pi/100, 2*pi/1000

Summarize your results for yourself. Write it down!
Reset to 2*pi/8 then play with phase2:


(1,0) then (0,1) then (1,.5) then (1,.7)
0, 1, 2, 3, 4, 5, 6...
Now add some noise to Yt
Interpretations

Value at k=0 is mean of data series


Called “DC” component
Area under periodogram is proportional to
Var(data series)

Height at each point=how much of variance is
explained by that frequency

Plotting argument vs. frequency shows phase

Often need to smooth with moving avg.
What is FT of White Noise?

Try it!

Why is it called white noise?

Pink noise, etc. (look up in Wikipedia)
Filtering: part 1

Zero out the frequencies you don't want

Invert the FT

FT is its own inverse! Not like Laplace Transform.

This is “frequency-domain” filtering

MP3 files: filter out the freqs. you wouldn't hear

because they're overwhelmed by stronger
frequencies
Filtering: part 2

Time-domain filtering: example spreadsheet

Smoothing: moving average


Filters out high frequencies (noise is high-freq)

Low-pass filter
Detrending: differencing

Filters out trends and slow cycles (which look like
trends, locally)

High-pass filter

Band-pass filter

Band-reject filter (esp. 12-month cycles)
Filtering

Time-domain filter's freq. response comes from
the FT of its averaging coefficients

Example spreadsheet

This curve is called the “Transfer Function”

Good audio speakers publish their frequency
response curves
Long-history time series



Ordinary theory assumes that ACF dies off
faster than 1/h
But some time series don't satisfy that:

River flows

Packet amounts on data networks
Connected to chaos & fractals
Bibliography


Enders: Applied Econometric Time Series
Kedem & Fokianos: Regression Models for Time
Series Analysis

Pen~a, Tao, Tsay: A Course in Time Series Analysis

Brillinger: lecture notes for Stat 248 at UC Berkeley

Brillinger:Time Series: Data Analysis and Theory

Brockwell & Davis: Introduction to Time Series and
Forecasting
1 real way, 2 fake ways:
0.3
0.25
0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-0.25
0
10
20
30
40
50
60
70