Regression and Correlation BUSA 2100, Sect. 12.0 - 12.2, 3.5
Download
Report
Transcript Regression and Correlation BUSA 2100, Sect. 12.0 - 12.2, 3.5
Regression and Correlation
BUSA 2100, Sect. 12.0 - 12.2, 3.5
Introduction to Regression
Forecasts (predictions) are often based on
the relationship between 2 or more variables.
Ex. 1: Advertising expenditures and sales.
Example 2: Daily high temperature and
demand for electricity.
X = independent variable, the variable being
used to make a forecast; Y = dependent
variable, the variable being forecasted.
Identify X and Y in Examples 1 and 2.
Y depends on X.
Straight Lines
A regression line can be used to show
mathematically how variables are related.
Regression Example
To determine the equation of a line, all
we need are the slope and Y-intercept.
Example: Pizza House builds restaurants near college campuses.
Before building another one, it plans to
use X = student enrollment (1000s) to
estimate Y = quarterly sales ($1000s).
A sample of 6 existing restaurants is
chosen.
Pizza Restaurant Problem
Resulting data pairs are shown below.
X
Y
4
95
6 155
9 140
11 210
12 250
15 260
Scatter Diagram & Line of
Best Fit
Draw a scatter diagram on the board.
Use a hiatus so that the X, Y axes don’t
have to begin at zero. All units must be
the same size within axes.
By trial and error, draw some lines
through the data. The regression line is
the one line that fits the data best. (Also
called the line of best fit.)
Line of Best Fit (Continued)
As indicated earlier, YF is a forecasted
value (on the regression line). Y is an
actual value (one of the dots).
Regression Formulas
Based on calculus, the equation of a
regression line (line of best fit) can be
found using these formulas.
Regression Formulas, Page 2
Carry out the numerical coefficients
(b1 and b0) 3 or 4 decimal places; then
round to 2 or fewer places at the end.
Substitute the numbers into the
regression equation: YF = b0 + b1X.
We will complete the restaurant problem, using a table to organize the data.
Restaurant Problem, Page 2
X
Y
XY
4
95
380
6 155
930
9 140 1260
11 210 2310
12 250 3000
15 260 3900
SUM 57 1110 11780
X2
Y2
16
9025
36 24025
81 19600
121 44100
144 62500
225 67600
623 226850
Restaurant Problem, Page 3
.
Meaning and Uses of the
Regression Equation
Example: Vidalia State University has an
enrollment of 9,800. Forecast pizza
sales for a restaurant near the campus.
Accuracy of Forecasts
Using Regression
The accuracy of forecasts depends on
how closely the points in a scatter
diagram fit the regression line.
If the linear relationship is too weak (the
deviations are too large), there are large
forecast errors and there may be no
need to pursue use of a regression line.
Evaluating Accuracy of
Regression Forecasts
It is best to have an estimate of forecast
accuracy before using a regression line.
3 ways to estimate forecast accuracy:
Introduction to Correlation
Def.: The coefficient of correlation (r)
is a numerical measure of the strength
of the linear relationship between 2
variables.
Values of r are always between -1 & 1;
i.e., between 0 and 1 in absolute value.
r = 0 means no correlation; r = +-1
means perfect correlation; both rare.
Positive and Negative
Correlation
Definition: Two variables X, Y have a
positive correlation if large values of X
tend to be associated with large values
of Y; similarly, for small values.
X, Y must be measurable quantitatively.
Example of positive correlation:
Positive and Negative
Correlation, Page 2
Definition: Two variables X, Y have a
negative correlation if large values of
X tend to be associated with small
values of Y, and vice-versa.
Example of negative correlation:
Graph positive and negative correlation.
High and Low Correlation
General guidelines:
Degree of
Correlation
•
very high
high
moderate
low
very low
Forecast
Accuracy
very good
good
medium
fair
poor
Formula for Correlation
Use regression for forecasts only if r is
.70 or larger, in absolute value.
Correlation (Restaurant Ex.)
.
Regression Analysis
Summary
Steps in regression analysis:
(1) Collect data pairs, using 2 related
variables.
(2) Calculate the correlation, r.
(3) (a) If r >= .70, in absolute value, find
the regression equation and use it for
forecasting.
(b) If r < .70, don’t use regression.
Multiple Regression
Regression analysis with one independent variable (X) is called simple regression.
Regression analysis with 2 or more
independent variables (X1, X2, etc.) is
called multiple regression.
Multiple Regression and Line
of Average Relationship
State the multiple regression equation.
A regression equation is also called the
line of average relationship. Explain in
terms of GPA example.
Correlation does not necessarily imply
cause and effect. Illustrate with
example.