Transcript Document

Forecasting using
simple models
Outline
 Basic forecasting models
– The basic ideas behind each model
– When each model may be appropriate
– Illustrate with examples
 Forecast error measures
 Automatic model selection
 Adaptive smoothing methods
– (automatic alpha adaptation)
 Ideas in model based forecasting techniques
– Regression
– Autocorrelation
– Prediction intervals
2
Basic Forecasting Models
 Moving average and weighted moving
average
 First order exponential smoothing
 Second order exponential smoothing
 First order exponential smoothing with
trends and/or seasonal patterns
 Croston’s method
3
M-Period Moving Average
m 1
Pt 1(t) 
 Vt  j
j 0
M
 i.e. the average of the last M data points
 Basically assumes a stable (trend free) series
 How should we choose M?
– Advantages of large M?
– Advantages of large M?
 Average age of data = M/2
4
Weighted Moving Averages
n
Pt 1(t)  W V
t j t j
j 0
 The Wi are weights attached to each
historical data point
 Essentially all known (univariate) forecasting
schemes are weighted moving averages
 Thus, don’t screw around with the general
versions unless you are an expert
5
Simple Exponential
Smoothing
 Pt+1(t) = Forecast for time t+1 made at time t
 Vt = Actual outcome at time t
 0<<1 is the “smoothing parameter”
6
Two Views of Same Equation
 Pt+1(t) = Pt(t-1) + [Vt – Pt(t-1)]
– Adjust forecast based on last forecast
error
OR
 Pt+1(t) = (1- )Pt(t-1) + Vt
– Weighted average of last forecast and last
Actual
7
Simple Exponential
Smoothing
 Is appropriate when the underlying time
series behaves like a constant + Noise
– Xt =  + N t
– Or when the mean  is wandering around
– That is, for a quite stable process
 Not appropriate when trends or seasonality
present
8
ES would work well here
Typical Behavior for Exponential Smoothing
4
2
0
Demand
-2
-4
-6
-8
-10
-12
Period
9
115
109
103
97
91
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
-14
Simple Exponential
Smoothing
 We can show by recursive substitution
that ES can also be written as:
 Pt+1(t) = Vt + (1-)Vt-1 + (1-)2Vt-2 + (1-)3Vt-3 +…..
 Is a weighted average of past
observations
 Weights decay geometrically as we go
backwards in time
10
Weights on past data
0.7
0.6
0.5
0.4
Expo Smooth a=0.6
MoveAve(M=5)
0.3
0.2
0.1
0
1
11
2
3
4
5
6
7
8
9
10
Simple Exponential
Smoothing
 Ft+1(t) = At + (1-)At-1 + (1-)2At-2 + (1-)3At-3 +…..
 Large  adjusts more quickly to changes
 Smaller  provides more “averaging” and
thus lower variance when things are stable
 Exponential smoothing is intuitively more
appealing than moving averages
12
Exponential Smoothing
Examples
13
Zero Mean White Noise
Series
3
2
1
0
Series
-1
-2
14
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-3
15
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
3
2
1
0
Series
0.1
-1
-2
-3
16
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
3
2
1
Series
0
0.1
0.3
-1
-2
-3
Shifting Mean + Zero Mean White Noise
4
3
2
1
Series
0
Mean
-1
-2
-4
17
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-3
18
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
4
3
2
1
Series
0
0.1
Mean
-1
-2
-3
-4
19
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
4
3
2
1
Series
0
0.3
Mean
-1
-2
-3
-4
Automatic selection of 
 Using historical data
 Apply a range of  values
 For each, calculate the error in one-stepahead forecasts
– e.g. the root mean squared error (RMSE)
 Select the  that minimizes RMSE
20
RMSE vs Alpha
1.45
1.4
RMSE
1.35
1.3
1.25
1.2
1.15
0
0.1
0.2
0.3
0.4
0.5
Alpha
21
0.6
0.7
0.8
0.9
1
Recommended Alpha
 Typically alpha should be in the range 0.05 to
0.3
 If RMSE analysis indicates larger alpha,
exponential smoothing may not be
appropriate
22
23
Period
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
Time Series Value
Original Data
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
Actual vs Forecast for
Various Alpha
2
1.5
Forecast
1
Demand
0.5
a=0.1
0
a=0.3
-0.5
a=0.9
-1
-1.5
Period
24
97
91
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
-2
Series and Forecast using Alpha=0.9
Might look good, but is it?
2
1.5
Forecast
1
0.5
0
-0.5
-1
-1.5
Period
25
101
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-2
Series and Forecast using Alpha=0.9
2
1.5
Forecast
1
0.5
0
-0.5
-1
-1.5
Period
26
101
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-2
Series and Forecast using Alpha=0.9
2
Forecast
1.5
1
0.5
0
-0.5
1
2
3
4
5
6
7
8
9
Period
27
10
11
12
13
14
15
16
Series and Forecast using Alpha=0.9
2
Forecast
1.5
1
0.5
0
-0.5
1
2
3
4
5
6
7
8
9
Period
28
10
11
12
13
14
15
16
Forecast RMSE vs Alpha
0.67
0.66
Forecast RMSE
0.65
0.64
0.63
0.62
Series1
0.61
0.6
0.59
0.58
0.57
0
0.2
0.4
0.6
Alpha
29
0.8
1
Exponential Smoothing on Lake Huron Level Data
Various Alphas
13
Forecast and actual
12
11
HuronLevel
10
a=0.1
9
a=0.3
8
a=0.9
7
6
Period
30
97
91
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
5
Forecast Errors for Lake Huron Data
Various Alphas
3
Forecast Error
2
1
a=0.1
0
a=0.9
-1
-2
Period
31
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-3
Forecast RMSE vs Alpha
for Lake Huron Data
1.1
1.05
1
RMSE
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0
0.1
0.2
0.3
0.4
0.5
Alpha
32
0.6
0.7
0.8
0.9
1
Monthly Furniture Demand vs Forecast
Various Alphas
160
140
120
Demand
Orders
100
a=0.1
80
a=0.3
60
a=0.9
40
20
Period
33
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1
0
Monthly Furniture Demand Forecast Errors
Various Alphas
80
60
Forecast Error
40
20
a=0.1
0
a=0.3
-20
a=0.9
-40
-60
-80
Period
34
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1
-100
Forecast RMSE vs Alpha
for Monthly Furniture Demand Data
45.6
40.6
35.6
RMSE
30.6
25.6
20.6
15.6
10.6
5.6
0.6
0
0.1
0.2
0.3
0.4
0.5
Alpha
35
0.6
0.7
0.8
0.9
1
Exponential smoothing will
lag behind a trend
 Suppose Xt=b0+ b1t
 And St= (1- )St-1 + Xt
 Can show that
(
1


)
E St  E X t 
b0





36
















Exponential Smoothing on a Trend
12
10
8
Trend Data
6
0.2
0.5
4
2
0
1
2
3
4
5
6
7
Period
37
8
9
10
11
12
Double Exponential
Smoothing
 Modifies exponential smoothing for following
a linear trend
Let St  (1 )St1X t
Let St[2]  (1 )St[2]1St
 i.e. Smooth the smoothed value
Let Xˆ t  2St  St[2]
38
Single and Double smoothed values
16
14
12
St Lags
10
Trend Data
8
Single, a=0.5
Double, a=0.5
6
4
St[2] Lags even more
2
0
1
39
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17
Double Smoothing
20
18
2St -St[2] doesn’t lag
16
14
12
Trend Data
10
2(S(t)-S2(t)
8
6
4
2
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Period
40
1


E St  E X t 
b1






















[2] 
E St










1


 E St 
b1











[2]

b1 
E St  E St
1
























 




 

Thus estimate slope at time t as
[2]

ˆ
b1(t) 
St  St
1







41







1


E X t  E St 
b1































 
[2]  


1



E X t  E St 
 E S  E S  
 1  t  t  





E







42





















[
2
]
X t  2E St  E St 











Xˆ t  2 St  St[2]















































Xˆ t  2St  St[2]
Xˆ t  Xˆ t bˆ1









[2]
Xˆ t  2St  St









[2]


St  St
1









Xˆ t  2  St  1  St[2]
1
1






43



























44
101
96
91
86
81
76
71
66
61
56
51
2
46
41
36
31
26
21
16
11
6
1
6
Example
5
4
3
Trend
Series
1
0
-1
6
5
4
=0.2
3
Trend
2
Series Data
Single Smoothing
Double smoothing
1
0
45
101
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-1
6
5
4
Single Lags a trend
3
Trend
2
Series Data
Single Smoothing
Double smoothing
1
0
46
101
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-1
6
5
4
Double Over-shoots a change
(must “re-learn” the slope)
3
Trend
2
Series Data
Single Smoothing
Double smoothing
1
0
47
101
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-1
Holt-Winters Trend and
Seasonal Methods
 “Exponential smoothing for data with trend
and/or seasonality”
– Two models, Multiplicative and Additive
 Models contain estimates of trend and
seasonal components
 Models “smooth”, i.e. place greater weight on
more recent data
48
Winters Multiplicative Model
 Xt = (b1+b2t)ct + t
 Where ct are seasonal terms and
L
ct  L where L is the season length
t 1
 Note that the amplitude depends on the level
of the series
 Once we start smoothing, the seasonal
components may not add to L
49
Holt-Winters Trend Model
 Xt = (b1+b2t) + t
 Same except no seasonal effect
 Works the same as the trend + season model
except simpler
50
 Example:















1.5














X t  1 0.04t 0.5
1





51
Xt =(1 + 0.04t)(1.5,0.5,1)
5
4.5
4
3.5
(1+0.04t)
3
2.5
2
1.5
1
0.5
52
58
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
0
Xt =(1 + 0.04t)(1.5,0.5,1)
5
4.5
4
*150%
3.5
3
2.5
2
1.5
1
0.5
53
58
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
0
Xt =(1 + 0.04t)(1.5,0.5,1)
5
4.5
4
3.5
3
2.5
*50%
2
1.5
1
0.5
54
58
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
7
4
1
0
 The seasonal terms average 100% (i.e. 1)
 Thus summed over a season, the ct must add
to L
 Each period we go up or down some
percentage of the current level value
 The amplitude increasing with level seems to
occur frequently in practice
55
Recall Australian Red Wine Sales
Series
3000.
2500.
2000.
1500.
1000.
500.
0
56
20
40
60
80
100
120
140
Smoothing
 In Winters model, we smooth the “permanent
component”, the “trend component” and the
“seasonal component”
 We may have a different smoothing
parameter for each (, , )
 Think of the permanent component as the
current level of the series (without trend)
57
Step 1. Update the Permanent Component
Let a1(T ) b1b2T be the permanent Component
The update step is:
aˆ1(T )  
V
T
cˆ (T  L)
T
58
 (1 ) aˆ (T 1) bˆ (T 1)







1
2







Step 1. Update the Permanent Component
Let a1(T ) b1b2T
aˆ1(T )  
V
T
cˆ (T  L)
T
59
Current Observation
 (1 ) aˆ (T 1) bˆ (T 1)







1
2







Step 1. Update the Permanent Component
Let a1(T ) b1b2T
aˆ1(T )  
V
T
cˆ (T  L)
T
60
Current Observation
“deseasonalized”
 (1 ) aˆ (T 1) bˆ (T 1)







1
2







Step 1. Update the Permanent Component
Let a1(T ) b1b2T
aˆ1(T )  
V
T
cˆ (T  L)
 (1 ) aˆ (T 1) bˆ (T 1)







1
2
T
Estimate of permanent component from
last time = last level + slope*1
61







Step 1. Update the Permanent Component
Let a1(T ) b1b2T
aˆ1(T )  
V
T
cˆ (T  L)
 (1 ) aˆ (T 1) bˆ (T 1)







1
2
T
aˆ1(T )   Current observed level










 (1 ) Forecast of current level





62












Step 2. Update the Trend Component
bˆ2(T )   aˆ (T ) aˆ (T 1)  (1  )bˆ2(T 1)





63
1
1





Step 2. Update the Trend Component
bˆ2(T )   aˆ (T ) aˆ (T 1)  (1  )bˆ2(T 1)





1
1
“observed” slope
64





Step 2. Update the Trend Component
bˆ2(T )   aˆ (T ) aˆ (T 1)  (1  )bˆ2(T 1)





1
1
“observed” slope
65





“previous” slope
Step 3. Update the Seasonal Component
for this period
cˆT (T )  
V
T
aˆ (T )
 (1 )cˆT (T  L)
1
Since VT  a1(T )c1(T )
66
To forecast ahead at time T
use current values of a, b, and c
VˆT  (T )  aˆ (T ) bˆ (T ) cˆT  (T   L)






67
1
2






To forecast ahead at time T
use current values of a, b, and c
VˆT  (T )  aˆ (T ) bˆ (T ) cˆT  (T   L)






1
2






Extend the trend out  periods ahead
68
To forecast ahead at time T
use current values of a, b, and c
VˆT  (T )  aˆ (T ) bˆ (T ) cˆT  (T   L)






1
2






Use the proper seasonal adjustment
69
Winters Additive Method
 Xt = b1+ b2t + ct + t
 Where ct are seasonal terms and
L
ct  0 where L is the season length
t 1
 Similar to previous model except we
“smooth” estimates of b1, b2, and the ct
70
Croston’s Method
 Can be useful for intermittent, erratic, or
slow-moving demand
– e.g. when demand is zero most of the
time (say 2/3 of the time)
 Might be caused by
– Short forecasting intervals (e.g. daily)
– A handful of customers that order
periodically
– Aggregation of demand elsewhere (e.g.
reorder points)
71
Demand Distribution
0.8
0.7
Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
Demand
72
6
7
8
9
Typical situation
 Central spare parts inventory (e.g. military)
 Orders from manufacturer
– in batches (e.g. EOQ)
– periodically when inventory nearly
depleted
– long lead times may also effect batch size
73
Example
Demand
0
1
2
3
4
5
6
7
74
Prob
0.85
0.1275
0.0191
0.0029
0.0004
0.00006
0.00001
0.000002
Demand each
period follows a
distribution that
is usually zero
75
Period
391
378
365
352
339
326
313
300
287
274
261
248
235
222
209
196
183
170
157
144
131
118
105
92
79
66
53
40
27
14
1
Demand
Example
An intermittent Demand Series
3.5
3
2.5
2
1.5
1
0.5
0
Example
 Exponential smoothing applied (=0.2)
Exponential Smoothing Applied
0.9
0.8
0.7
Demand
0.6
0.5
0.4
0.3
0.2
0.1
Period
76
397
379
361
343
325
307
289
271
253
235
217
199
181
163
145
127
109
91
73
55
37
19
1
0
Using Exponential Smoothing:
 Forecast is highest right after a non-zero
demand occurs
 Forecast is lowest right before a non-zero
demand occurs
77
Croston’s Method
 Separately Tracks
– Time between (non-zero) demands
– Demand size when not zero
 Smoothes both time between and demand
size
 Combines both for forecasting
Demand Size
Forecast =
Time between demands
78
Define terms
 V(t) = actual demand outcome at time t
 P(t) = Predicted demand at time t
 Z(t) = Estimate of demand size (when it is not
zero)
 X(t) = Estimate of time between (non-zero)
demands
 q = a variable used to count number of
periods between non-zero demand
79
Forecast Update
 For a period with zero demand
– Z(t)=Z(t-1)
– X(t)=X(t-1)
 No new information about
– order size Z(t)
– time between orders X(t)
 q=q+1
– Keep counting time since last order
80
Forecast Update
 For a period with non-zero demand
– Z(t)=Z(t-1) + (V(t)-Z(t-1))
– X(t)=X(t-1) + (q - X(t-1))
– q=1
81
Forecast Update
 For a period with non-zero demand
– Z(t)=Z(t-1) + (V(t)-Z(t-1))
– X(t)=X(t-1) + (q - X(t-1))
– q=1
Latest
order size
 Update Size of order via smoothing
82
Forecast Update
 For a period with non-zero demand
– Z(t)=Z(t-1) + (V(t)-Z(t-1))
– X(t)=X(t-1) + (q - X(t-1))
– q=1
Latest time
between orders
 Update size of order via smoothing
 Update time between orders via smoothing
83
Forecast Update
 For a period with non-zero demand
– Z(t)=Z(t-1) + (V(t)-Z(t-1))
– X(t)=X(t-1) + (q - X(t-1))
– q=1
Reset
counter
 Update size of order via smoothing
 Update time between orders via smoothing
 Reset counter of time between orders
84
Forecast
 Finally, our forecast is:
Z(t)
P(t) =
=
X(t)
85
Non-zero Demand Size
Time Between Demands
Recall example
 Exponential smoothing applied (=0.2)
Exponential Smoothing Applied to Example Data
0.9
0.8
0.7
Demand
0.6
0.5
0.4
0.3
0.2
0.1
Period
86
397
386
375
364
353
342
331
320
309
298
287
276
265
254
243
232
221
210
199
188
177
166
155
144
133
122
111
100
89
78
67
56
45
34
23
12
1
0
Recall example
 Croston’s method applied (=0.2)
Croston's Method Applied to Example Data
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
87
397
386
375
364
353
342
331
320
309
298
287
276
265
254
243
232
221
210
199
188
177
166
155
144
133
122
111
100
89
78
67
56
45
34
23
12
1
0
What is it forecasting?
 Average demand per period
Croston's Method Applied to Example Data
0.9
0.8
0.7
True average demand per period=0.176
0.6
0.5
0.4
0.3
0.2
0.1
88
397
386
375
364
353
342
331
320
309
298
287
276
265
254
243
232
221
210
199
188
177
166
155
144
133
122
111
100
89
78
67
56
45
34
23
12
1
0
Behavior
 Forecast only changes after a demand
 Forecast constant between demands
 Forecast increases when we observe
– A large demand
– A short time between demands
 Forecast decreases when we observe
– A small demand
– A long time between demands
89
Croston’s Method
 Croston’s method assumes demand is
independent between periods
– That is one period looks like the rest
(or changes slowly)
90
Counter Example
 One large customer
 Orders using a reorder point
– The longer we go without an order
– The greater the chances of receiving an
order
 In this case we would want the forecast to
increase between orders
 Croston’s method may not work too well
91
Better Examples
 Demand is a function of intermittent random
events
– Military spare parts depleted as a result of
military actions
– Umbrella stocks depleted as a function of
rain
– Demand depending on start of
construction of large structure
92
Is demand Independent?
 If enough data exists we can check the
distribution of time between demand
 Should “tail off” geometrically
93
Theoretical behavior
Theoretical Time Between Demands Distribution
12
10
Fequency
8
6
4
2
0
1
2
3
4
5
6
7
8
9
10
Time Between
94
11
12
13
14
15
16
17
In our example:
Time Between Demands in Example
14
12
Frequency
10
8
6
4
2
0
1
2
3
4
5
6
7
8
9
10
Time Between
95
11
12
13
14
15
16
17
Comparison
Time Between Demands
14
12
Frequency
10
8
6
4
2
0
1
2
3
4
5
6
7
8
9
10
Time Between
96
11
12
13
14
15
16
17
Counterexample
 Croston’s method might not be appropriate if
the time between demands distribution looks
like this:
Distribution of Time Between Demand
12
10
Frequency
8
6
4
2
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time Between
97
Counterexample
 In this case, as time approaches 20 periods
without demand, we know demand is coming
soon.
 Our forecast should increase in this case
98
Error Measures
 Errors: The difference between actual and
predicted (one period earlier)
 et = Vt – Pt(t-1)
– et =can be positive or negative
 Absolute error |et|
– Always positive
 Squared Error et2
– Always positive
 The percentage error PEt = 100et / Vt
– Can be positive or negative
99
Bias and error magnitude
 Forecasts can be:
– Consistently too high or too low (bias)
– Right on average, but with large
deviations both positive and negative
(error magnitude)
 Should monitor both for changes
100
Error Measures
 Look at errors over time
 Cumulative measures summed or averaged
over all data
– Error Total (ET)
– Mean Percentage Error (MPE)
– Mean Absolute Percentage Error (MAPE)
– Mean Squared Error (MSE)
– Root Mean Squared Error (RMSE)
 Smoothed measures reflects errors in the
recent past
– Mean Absolute Deviation (MAD)
101
Error Measures
Measure Bias
 Look at errors over time
 Cumulative measures summed or averaged
over all data
– Error Total (ET)
– Mean Percentage Error (MPE)
– Mean Absolute Percentage Error (MAPE)
– Mean Squared Error (MSE)
– Root Mean Squared Error (RMSE)
 Smoothed measures reflects errors in the
recent past
– Mean Absolute Deviation (MAD)
102
Error Measures
 Look at errors over time
Measure error
magnitude
 Cumulative measures summed or averaged
over all data
– Error Total (ET)
– Mean Percentage Error (MPE)
– Mean Absolute Percentage Error (MAPE)
– Mean Squared Error (MSE)
– Root Mean Squared Error (RMSE)
 Smoothed measures reflects errors in the
recent past
– Mean Absolute Deviation (MAD)
103
Error Total
 Sum of all errors
n
ET  e
t 1




104
t
Uses raw (positive or negative) errors
ET can be positive or negative
Measures bias in the forecast
Should stay close to zero as we saw in last
presentation
MPE
 Average of percent errors
n
1
MPE   PE
t
nt 1
 Can be positive or negative
 Measures bias, should stay close to zero
105
MSE
 Average of squared errors
n
2
1
MSE  e
t
nt 1
 Always positive
 Measures “magnitude” of errors
 Units are “demand units squared”
106
RMSE
 Square root of MSE
n
2
1
RMSE 
 et
n t 1
 Always positive
 Measures “magnitude” of errors
 Units are “demand units”
 Standard deviation of forecast errors
107
MAPE
 Average of absolute percentage errors
n
1
MAPE   PE
t
nt 1
 Always positive
 Measures magnitude of errors
 Units are “percentage”
108
Mean Absolute Deviation
 Smoothed absolute errors
MADt  (10.3)MAD 0.3e
t 1
t
 Always positive
 Measures magnitude of errors
 Looks at the recent past
109
Percentage or Actual units
 Often errors naturally increase as the level of
the series increases
 Natural, thus no reason for alarm
 If true, percentage based measured preferred
 Actual units are more intuitive
110
Squared or Absolute Errors
 Absolute errors are more intuitive
 Standard deviation units less so
– 66% within  1 S.D.
– 95% within  2 S.D.
 When using measures for automatic model
selection, there are statistical reasons for
preferring measures based on squared errors
111
Ex-Post Forecast Errors
 Given
– A forecasting method
– Historical data
 Calculate (some) error measure using the
historical data
 Some data required to initialize forecasting
method.
 Rest of data (if enough) used to calculate expost forecast errors and measure
112
Automatic Model Selection
 For all possible forecasting methods
– (and possibly for all parameter values e.g.
smoothing constants – but not in SAP?)
 Compute ex-post forecast error measure
 Select method with smallest error
113
Automatic  Adaptation
 Suppose an error measure indicates behavior
has changed
– e.g. level has jumped up
– Slope of trend has changed
 We would want to base forecasts on more
recent data
 Thus we would want a larger 
114
Tracking Signal (TS)
ETt
TSt 
MADt
TSt  0 if MADt is zero
 Bias/Magnitude = “Standardized bias”
115
 Adaptation
t  














 0.2 TS 
t 1
t t 1
t  0.8
 0.2TSt
t 1
subject to 0.05  0.9
 If TS increases, bias is increasing, thus
increase 
 I don’t like these methods due to instability
116
Model Based Methods
 Find and exploit “patterns” in the data
 Trend and Seasonal Decomposition
– Time based regression
 Time Series Methods (e.g. ARIMA Models)
 Multiple Regression using leading indicators
 Assumes series behavior stays the same
 Requires analysis (no “automatic model
generation”)
117
Univariate Time Series
Models Based on
Decomposition
 Vt = the time series to forecast
 Vt = T t + St + N t
 Where
– Tt is a deterministic trend component
– St is a deterministic seasonal/periodic
component
– Nt is a random noise component
118
Raw Material Price
3.8
3.6
Price ($/Unit)
3.4
3.2
3
Price
2.8
(Vt)=0.257
2.6
2.4
2.2
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Period
119
Raw Material Price
3.8
3.6
Price ($/Unit)
3.4
3.2
3
Price
2.8
2.6
2.4
2.2
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Period
120
Simple Linear Regression
Model:
Vt=2.877174+0.020726t
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.569724
R Square 0.324585
Adjusted R Square
0.293884
Standard Error0.21616
Observations
24
ANOVA
df
Regression
Residual
Total
SS
MS
F Significance F
1 0.494006 0.494006 10.57257 0.003659
22 1.027956 0.046725
23 1.521963
Coefficients
Standard Error t Stat
P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 2.877174 0.091079 31.58978 7.99E-20 2.688287 3.066061 2.688287 3.066061
X Variable 1 0.020726 0.006374 3.251549 0.003659 0.007507 0.033945 0.007507 0.033945
121
Use Model to Forecast into the Future
Actuals and Forecasts
3.8
3.6
3.4
Price
3.2
3
Price
Forecast
2.8
2.6
2.4
2.2
Period
122
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1
2
Residuals = Actual-Predicted
et = Vt-(2.877174+0.020726t)
Residuals After Regression
(et)=0.211
0.4
0.3
Residuals
0.2
0.1
0
Residuals
-0.1
-0.2
-0.3
Period
123
23
21
19
17
15
13
11
9
7
5
3
1
-0.4
Simple Seasonal Model
 Estimate a seasonal adjustment factor for
each period within the season
 e.g. SSeptember
124
Residuals
0.1521
-0.24862609
0.03064782
0.27992173
-0.21080436
-0.07153045
0.22774346
-0.28298263
0.03629128
0.19556519
-0.1951609
0.00411301
0.18338692
-0.28733917
-0.00806526
0.28120865
-0.20951744
-0.00024353
0.33903038
-0.24169571
-0.0424218
0.34685211
-0.26387398
-0.01460007
125
Season
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
Residuals
0.1521
0.27992173
0.22774346
0.19556519
0.18338692
0.28120865
0.33903038
0.34685211
-0.24862609
-0.21080436
-0.28298263
-0.1951609
-0.28733917
-0.20951744
-0.24169571
-0.26387398
0.03064782
-0.07153045
0.03629128
0.00411301
-0.00806526
-0.00024353
-0.0424218
-0.01460007
Season
Season Averages
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
Sorted
by season
0.250726055
Season
averages
-0.242500035
-0.008226125
Trend + Seasonal Model
 Vt=2.877174+0.020726t + Smod(t,3)
 Where
– S1 = 0.250726055
– S2 = -0.242500035
– S3 = -0.008226125
126
Actual vs Forecast (Trend + Seasonal Model)
4
3.8
3.6
3.4
Price
3.2
Price
3
Forecast
2.8
2.6
2.4
2.2
Period
127
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1
2
et = Vt - (2.877174 + 0.020726t + Smod(t,3))
(et)=0.145
Residuals from Trend+Season
0.15
0.1
Residuals
0.05
0
Residuals2
-0.05
-0.1
Period
128
23
21
19
17
15
13
11
9
7
5
3
1
-0.15
Can use other trend models
 Vt= 0+ 1Sin(2t/k) (where k is period)
 Vt= 0+ 1t + 2t2 (multiple regression)
 Vt= 0+ 1ekt
 etc.
 Examine the plot, pick a reasonable model
 Test model fit, revise if necessary
129
130
97
93
89
85
81
77
73
69
65
61
57
53
49
45
41
37
33
29
25
21
17
13
9
5
1
3
2
1
0
-1
-2
-3
Signal=COS(2*PI*t/12)
3
2
Tim e t
1
Series1
0
Signal
-1
-2
S(t)
131
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
-3
Model: Vt = Tt + St + Nt
 After extracting trend and seasonal
components we are left with “the Noise”
Nt = Vt – (Tt + St)
 Can we extract any more predictable
behavior from the “noise”?
 Use Time Series analysis
– Akin to signal processing in EE
132
Zero Mean, and Aperiodic:
ˆ
N
Is our best forecast t 1  0 ?
Demand
6
4
Demand
2
0
-2
-4
Period
133
300
287
274
261
248
235
222
209
196
183
170
157
144
131
118
105
92
79
66
53
40
27
14
1
-6
AR(1) Model
 This data was generated using the model
 Nt = 0.9Nt-1 + Zt
 Where Zt ~N(0,2)
 Thus to forecast Nt+1,we could use:
Nˆ  0.9Nt
t 1
2
Nˆ
 0.9Nˆ  0.9 Nt
t 2
t 1
134
AR(1): Actual vs 1-Step Ahead Forecast
6
4
Demand
2
Actual
0
Forecast
-2
-4
Period
135
289
273
257
241
225
209
193
177
161
145
129
113
97
81
65
49
33
17
1
-6
Forecasting N Steps Ahead
6
4
Demand
2
Actual
0
Forecast
-2
-4
Period
136
343
324
305
286
267
248
229
210
191
172
153
134
115
96
77
58
39
20
1
-6
Time Series Models
 Examine the correlation of the time series to
past values.
 This is called “autocorrelation”
 If Nt is correlated to Nt-1, Nt-2,…..
 Then we can forecast better than
Nˆ  0
t 1
137
Sample Autocorrelation Function
Sample ACF
Sample PACF
1.00
1.00
.80
.80
.60
.60
.40
.40
.20
.20
.00
.00
-.20
-.20
-.40
-.40
-.60
-.60
-.80
-.80
-1.00
-1.00
0
138
5
10
15
20
25
30
35
40
0
5
10
15
20
25
30
35
40
Back to our Demand Data
Residuals from Trend+Season
0.15
0.1
Residuals
0.05
0
Residuals2
-0.05
-0.1
Period
139
23
21
19
17
15
13
11
9
7
5
3
1
-0.15
No Apparent Significant
Autocorrelation
Sample ACF
Sample PACF
1.00
1.00
.80
.80
.60
.60
.40
.40
.20
.20
.00
.00
-.20
-.20
-.40
-.40
-.60
-.60
-.80
-.80
-1.00
-1.00
0
140
5
10
15
20
25
30
35
40
0
5
10
15
20
25
30
35
40
Multiple Linear Regression
 V= 0+ 1 X1 + 2 X2 +….+ p Xp + 
 Where
– V is the “independent variable” you want
to predict
– The Xi‘s are the dependent variables you
want to use for prediction (known)
 Model is linear in the i‘s
141
Examples of MLR in
Forecasting
 Vt= 0+ 1t + 2t2 + 3Sin(2t/k) + 4ekt
– i.e a trend model, a function of t
 Vt= 0+ 1X1t + 2X2t
– Where X1t and X2t are leading indicators
 Vt= 0+ 1Vt-1+ 2Vt-2 + 12Vt-12 +13Vt-13
– An Autoregressive model
142
Example: Sales and Leading Indicator
Series 1
Series 2
14.00
260.
13.50
250.
13.00
12.50
240.
12.00
230.
11.50
220.
11.00
10.50
210.
10.00
200.
9.50
0
143
20
40
60
80
100
120
140
0
20
40
60
80
100
120
140
Example: Sales and Leading Indicator
Series 1
Series 2
14.00
260.
13.50
250.
13.00
12.50
240.
Sales(t) = -3.93+0.83Sales(t-3)
12.00
230.
-0.78Sales(t-2)+1.22Sales(t-1) -5.0Lead(t)
11.50
220.
11.00
10.50
210.
10.00
200.
9.50
0
144
20
40
60
80
100
120
140
0
20
40
60
80
100
120
140