Regression Lecture Notes

Download Report

Transcript Regression Lecture Notes

BABS 502
Regression Based Forecasting
February 28, 2011
(c) Martin L. Puterman
1
Simple and Multiple Regression
• A widely used set of statistical tools that are useful for:
– forecasting
– data summary
– adjustment for uncontrolled factors
• Basic idea is to fit an equation of the following form relating a dependent
variable to one or more independent variables
y = 0 + 1x1 + 2x2 + 3x3 + …
• It’s power is that by choosing the y and xi’s in different ways a wide
range of different effects can be taken into account.
• The theoretical model assumes that each observation is subject to an
additive error which is normally distributed with mean zero and the
same variance for every observation so that one observes the signal
and noise components in aggregate.
• In forecasting the signal part provides the point forecast and the random
part provides an accuracy measure.
(c) Martin L. Puterman
2
Regression in forecasting - trend
extrapolation
• Fit a trend to historical data
– linear
– quadratic
– exponential
Yt = a + bt
Yt = a + bt + ct2
Yt = aebt or Log (Yt) = a + bt
• Assumption is that the same trend occurred
throughout the past and that it will persist into future
• Fit using multiple regression module in NCSS or
spreadsheet regression function
• Extensive regression theory available to guide use
(c) Martin L. Puterman
3
Trend Regression
wages
4.92
4.96
5.04
.
.
.
5.91
trend trend2
1
1
4
2
9
3
.
.
.
.
.
.
72 5184
Linear Trend
Yt = 5.02 + .013t
F72(1) = 5.02 + .01373 = 5.97
Quadratic Trend
Yt = 4.956 + .0191t - 00007t2
F72(1) = 4.956 + .0191*73 -.00007*732 = 5.98.
(c) Martin L. Puterman
4
Trend Regression
Linear Trend
Quadratic Trend
Trend Analysis for Wages
Trend Analysis for Wages
6.5
6.5
Wages = 4.96 + 0.019 * Trend - 7.43E-05 * Trend2
Wages = 5.023 + 0.0136 * Trend
6.0
Wages
Wages
6.0
5.5
Wage
Forecast
5.0
5.5
Wage
Forecast
5.0
Linear Fit
Quadratic Fit
4.5
4.5
0
20
40
60
80
100
0
Time
20
40
60
80
100
Time
(c) Martin L. Puterman
5
Dummy Variables
• Dummy Variables are independent variables in regression that
assume the values of either 0 or 1.
– A value 1 means a condition is present; a value 0 means it is not.
– When an observation in regression corresponds to a condition
being present, then the value of that observation is decreased or
increased by a constant amount equal to the value of the coefficient
of the dummy variable in the regression.
• If a condition has three possible values; say “high”, “medium” or
“low”. We encode its value with two dummy variables. The first
variable, High, equals 1 if the condition is “high” and zero
otherwise and the second variable, Medium, equals 1 if the
condition is “medium” and zero otherwise. When the condition
is “low” both values are zero. The Baseline condition “low” is
reflected in the constant in the regression equation.
• In time series regression, we use dummy variables for seasons,
and use S-1 dummy variables if there are S seasons. We are
free to chose the baseline season from which all others are
measured.
(c) Martin L. Puterman
6
Trend Regression with Seasonality
• My experience suggests that a quadratic trend
regression plus (additive) seasonality is useful for
forecasting
• Uses “dummy variables” for seasons
• Must be fit with regression software
• Equation with linear trend and additive monthly
seasonality
Yt = a + bt + dt2+ c2Febt + c3Mart + … + c12Dect
• Also enables multiple levels of seasonality such as
weekly and monthly.
(c) Martin L. Puterman
7
Trend Regression with Seasonality
• In previous Febt, Mart, … are dummy variables
– they equal 1 if observation Yt is from the indicated month and 0
otherwise
– Note that there is no dummy variable for January
• January is the baseline for comparison
Examples:
Yt = a + bt
Yt = a + bt + c2
Yt = a + bt + c3
Observation t in January
Observation t in February
Observation t in March
• In NCSS, declaring a variable as categorical means that
NCSS will generate dummy variables automatically
– But it will choose the baseline somewhat arbitrarily.
(c) Martin L. Puterman
8
Trend Regression With Seasonality Example
Intercept
Trend
I(Feb)
I(Mar)
I(Apr)
I(May)
I(Jun)
I(Jul)
I(Aug)
I(Sep)
I(Oct)
I(Nov)
I(Dec)
189.88
0.36
-23.85
-19.96
-34.87
-25.17
-8.98
11.88
11.72
-17.19
-26.08
-28.46
-8.46
Some forecasts:
Jan: F156(1) = 189.88 + 0.36*157 = 246.40
Feb: F156(2) = 189.88 + 0.36*158 - 23.85
= 222.91
Mar: F156(3) = 189.88 + 0.36*159 - 19.96
= 227.16
(c) Martin L. Puterman
9
Regression Example: Forecast
Updating During Season
• Goal: Improve total sales forecasts using interim sales data
• Data; early forecast, interim sales and total sales data for a
wide range of products.
• Fitted Model:
Total Sales = 120 + .6 Interim Sales +.3 Early Forecast
• Example: Early Forecast of Total Sales = 3000; Interim
Sales =1400
• Revised Total Sales Forecast
Total Sales = 120 + .6*1400 + .3*3000 = 1860
• Forecast Standard Deviation is Regression RMSE
(c) Martin L. Puterman
10
Regression Example: Impact of
Advertising
• Goal: Take into account effect of advertising
expenditures on sales
• Data; Sales and advertising expenditures in previous
quarter
• Fitted Model:
Salest = 15 + 10 Quartert + .8 Salest-1 +.4 (Advertisingt-1) 1/2
• Example: Sales in last quarter = 2000 and Advertising in
previous quarter = 10,000
• Total Sales Forecast
Sales = 15 + .8*2000 + .4*100 =1655
• Forecast Standard Deviation is Regression RMSE
(c) Martin L. Puterman
11
Some special concerns when using regression
with time series data
• Often the usual regression assumption of uncorrelated
errors is violated
– This means that the residuals contain information.
Case A: This is usually due to model mis-specification; i.e.
omission of important variables
Case B: But sometimes we have what we think is a good model
and there is nothing obvious to add.
• Difficulty – Standard errors are underestimated so model
seems better than it really is.
– Concept: Since observations are not independent, there is less
information in the data than you would think
– Reject Ho: βj = 0 when we shouldn’t.
• Detection
– Some systematic pattern in residual plot vs. time
– Durbin-Watson Test (see next slide).
– (Best approach) ACF of residuals
(c) Martin L. Puterman
12
Durbin-Watson test; comments
• The Durbin-Watson test is a not so good alternative to
using the ACF of the residuals but it is widely used
probably because of historical reasons.
• It is based on the Durbin-Watson test statistic D.
• It tests only for first order autocorrelation in the errors.
– Formally it tests H0:  =0 vs. Ha:  0
– The test is reject H0 and conclude that there is autocorrelation
in the residuals if D is well below 2 or well above 2; I suggest
being imprecise here. I would worry about values less than 1.4
or greater than 2.6.
• In economic data, when  is not zero, it is usually positive.
(c) Martin L. Puterman
13
Regression in the face of
correlated residuals
• Approaches for obtaining more reliable estimates;
– Add variables, such as trend squared, or use the lagged
dependent variable as an explanatory variable. (See sales
and advertising example on previous slide; Salest-1 is a
lagged variable.)
– Use time series regression models – which except for a
special case (AR1 errors) requires advanced software such
as SAS or R.
– Use “Multiple regression with serial correlation” under “other
regression routines” in NCSS.
– Difference data if lag one autocorrelation is large and
software such as that above is not available.
(c) Martin L. Puterman
14
Regression with auto-correlated errors
Model
yt = β0 + β1x1t + ••• + βmxmt + εt
where
εt = ρ εt-1 + νt
and
νt ~ N(0, σ2) and independent
The quantity ρ is called the first order auto- correlation or serial
correlation parameter and is between -1 and +1.
The Corchrane-Orcutt procedure, which is coded in NCSS,
estimates the regression coefficients and ρ for this model. Note
that usually the regression coefficients will not change much
from ordinary regression but their standard errors will be larger.
(c) Martin L. Puterman
15
What if seasonality is multiplicative and
we want to use regression?
•
•
Problem; Model on nominal scale assumes additive effect of seasonal dummy
variables.
Solution: Do regression on the logarithmic scale. This means that we transform
the dependent variable by taking logarithms (base 10 or base e) and then do
regression.
• Why does this work? Multiplicative seasonality is additive on the log scale!
•
Thus we can do forecasts using the model on the log scale and then transform
back to the original scale by exponentiating the forecast on the log scale.
– Example: If forecast on Log10 scale is 3.4, then forecast on the nominal (original) scale
is 103.4 = 2511.9 units.
•
Also trends and dummy’s on the log-scale have nice interpretations. Consider the
model for 4 seasons
log10(yt) = 2 + .014t + .19 Season2t -.13 Season3t + .04 Season4t
• Then the value of the series is increasing 1.4% per period.
• The value in Season2 is about 19% above the what the trend alone predicts for that
season.
•
•
But predictions based on these transformations are often biased.
Alternative ad hoc approach: Deseasonalize data; fit model to deseasonalized
data and then multiply back by seasonal factors to get forecasts. This is how time
series decomposition works.
(c) Martin L. Puterman
16
Example – BC Incorporations
Trend Regression
T-Value
to test
H0:B(i)=0
10.010
3.209
Prob
Level
0.0000
0.0059
Reject
H0 at
5%?
Yes
Yes
Power
of Test
at 5%
1.0000
0.8503
Serial Correlation of Residuals Section
Serial
Serial
Serial
Lag
Correlation Lag
Correlation Lag
Correlation
1
0.7473
9
-0.2205
17
0.0000
2
0.3632
10
-0.0183
18
0.0000
3
-0.0190
11
0.1487
19
0.0000
4
-0.2897
12
0.2349
20
0.0000
5
-0.4198
13
0.0000
21
0.0000
6
-0.4632
14
0.0000
22
0.0000
7
-0.4478
15
0.0000
23
0.0000
8
-0.3732
16
0.0000
24
0.0000
Above serial correlations significant if their absolute values are greater than 0.485071
BC vs Year
35000
28333
BC
Regression Equation Section
Regression Standard
Independent Coefficient Error
Variable
b(i)
Sb(i)
Intercept
18672.3162 1865.4549
trend
584.1152
182.0498
21667
15000
1990 1992 1994 1997 1999 2001 2003 2006 2008 2010
Durbin-Watson Test For Serial Correlation
Parameter Value
Durbin-Watson Value
Prob. Level: Positive Serial Correlation
Prob. Level: Negative Serial Correlation
Did the Test Reject
H0: Rho(1) = 0?
0.3573
0.0000
1.0000
Year
Yes
No
(c) Martin L. Puterman
17
Same data using serial
correlation routine
Run Summary Section
Parameter
Dependent Variable
Number Ind. Variables
Weight Variable
R2
Adj R2
Coefficient of Variation
Mean Square Error
Square Root of MSE
Ave Abs Pct Error
Parameter
Rows Processed
Rows Filtered Out
Rows with X's Missing
Rows with Weight Missing
Rows with Y Missing
Rows Used in Estimation
Sum of Weights
Completion Status
Autocorrelation (Rho)
Value
17
0
0
0
0
17
16.000
Normal Completion
0.8523
Regression Equation Section
Regression Standard
Independent Coefficient Error
Variable
b(i)
Sb(i)
Intercept
10850.3237 12603.3258
T-Value
to test
H0:B(i)=0
0.861
Prob
Level
0.4038
Reject
H0 at
5%?
No
trend
1.576
0.1374
No
1244.8410
Value
BC
1
None
0.1506
0.0899
0.4879
4628200
2151.325
17.034
790.0657
(c) Martin L. Puterman
18
Same data but adding extra variables
Regression Equation Section
Regression
Independent
Coefficient
Variable
b(i)
Intercept
22290.5091
dummy
-37509.5091
dummyXyear
3036.6909
year
-73.6909
Standard
Error
Sb(i)
1303.6382
7155.5803
518.8170
192.2110
T-Value
to test
H0:B(i)=0
17.099
-5.242
5.853
-0.383
Prob
Level
0.0000
0.0002
0.0001
0.7076
Reject
H0 at
5%?
Yes
Yes
Yes
No
Power
of Test
at 5%
1.0000
0.9979
0.9997
0.0646
• In above – dummy = 1 if year for year > 2001 and
dummmyXyear allows for a shift in trends.
• Note there is still some autocorrelation present, lag 1
serial autocorrelation equals .32 (which is insignificant)
and the Durbin-Watson Test is significant but much less so
than without extra variables.
• The purpose of this example was to show that
autocorrelation can result from the omission of
independent variables.
(c) Martin L. Puterman
19