Introduction to simple linear regression

Download Report

Transcript Introduction to simple linear regression

Introduction to simple linear
regression
ASW, 12.1-12.2
Economics 224 – Notes for November 5, 2008
Regression model
• Relation between variables where changes in some
variables may “explain” or possibly “cause” changes
in other variables.
• Explanatory variables are termed the independent
variables and the variables to be explained are
termed the dependent variables.
• Regression model estimates the nature of the
relationship between the independent and
dependent variables.
– Change in dependent variables that results from changes
in independent variables, ie. size of the relationship.
– Strength of the relationship.
– Statistical significance of the relationship.
Examples
• Dependent variable is retail price of gasoline in Regina –
independent variable is the price of crude oil.
• Dependent variable is employment income – independent
variables might be hours of work, education, occupation, sex,
age, region, years of experience, unionization status, etc.
• Price of a product and quantity produced or sold:
– Quantity sold affected by price. Dependent variable is
quantity of product sold – independent variable is price.
– Price affected by quantity offered for sale. Dependent
variable is price – independent variable is quantity sold.
600
160
140
500
120
400
100
300
80
60
200
40
100
20
Crude Oil price index, 1997=100, left axis
Regular gasoline prices, regina, cents per litre, right axis
Source: CANSIM II Database (Vector v1576530 and v735048
respectively)
2008M01
2007M01
2006M01
2005M01
2004M01
2003M01
2002M01
2001M01
2000M01
1999M01
1998M01
1997M01
1996M01
1995M01
1994M01
1993M01
1992M01
1991M01
1990M01
1989M01
1988M01
1987M01
1986M01
1985M01
1984M01
1983M01
1982M01
0
1981M01
0
Bivariate and multivariate models
Bivariate or simple regression model
x
(Education)
y (Income)
Multivariate or multiple regression model
x1
(Sex)
x2
(Experience) x3
(Age)
x4
(Education)
y
(Income)
Model with simultaneous relationship
Price of wheat
Quantity of wheat produced
Bivariate or simple linear regression (ASW, 466)
• x is the independent variable
• y is the dependent variable
• The regression model is
y  0  1 x  
• The model has two variables, the independent or explanatory
variable, x, and the dependent variable y, the variable whose
variation is to be explained.
• The relationship between x and y is a linear or straight line
relationship.
• Two parameters to estimate – the slope of the line β1 and the
y-intercept β0 (where the line crosses the vertical axis).
• ε is the unexplained, random, or error component. Much
more on this later.
Regression line
• The regression model is y  0  1 x  
• Data about x and y are obtained from a sample.
• From the sample of values of x and y, estimates b0 of
β0 and b1 of β1 are obtained using the least squares
or another method.
• The resulting estimate of the model is
yˆ  b0  b1 x
• The symbol yˆ is termed “y hat” and refers to the
predicted values of the dependent variable y that are
associated with values of x, given the linear model.
Relationships
• Economic theory specifies the type and structure of
relationships that are to be expected.
• Historical studies.
• Studies conducted by other researchers – different
samples and related issues.
• Speculation about possible relationships.
• Correlation and causation.
• Theoretical reasons for estimation of regression
relationships; empirical relationships need to have
theoretical explanation.
Uses of regression
• Amount of change in a dependent variable that
results from changes in the independent variable(s) –
can be used to estimate elasticities, returns on
investment in human capital, etc.
• Attempt to determine causes of phenomena.
• Prediction and forecasting of sales, economic
growth, etc.
• Support or negate theoretical model.
• Modify and improve theoretical models and
explanations of phenomena.
Income
hrs/week
Income
hrs/week
8000
38
8000
35
6400
50
18000
37.5
2500
15
5400
37
3000
30
15000
35
6000
50
3500
30
5000
38
24000
45
8000
50
1000
4
4000
20
8000
37.5
11000
45
2100
25
25000
50
8000
46
4000
20
4000
30
8800
35
1000
200
5000
30
2000
200
7000
43
4800
30
Summer Income as a Function of Hours Worked
30000
25000
Income
20000
15000
10000
5000
0
0
10
20
30
Hours per Week
40
50
60
yˆ  2461 297x
R2 = 0.311
Significance = 0.0031
Outliers
• Rare, extreme values may distort the
outcome.
– Could be an error.
– Could be a very important observation.
• Outlier: more than 3 standard deviations from
the mean.
15
GPA vs. Time Online
12
10
Time Online
8
6
4
2
0
50
55
60
65
70
75
GPA
80
85
90
95
100
GPA vs. Time Online
9
8
7
Time Online
6
5
4
3
2
1
0
50
55
60
65
70
75
GPA
80
85
90
95
100
160
Regular gasoline prices, regina, cents per litre
140
120
100
80
60
Correlation =
0.8703
40
20
0
0
100
200
300
400
Crude Oil Price Index (1997=100)
Source: CANSIM II Database (Vector v1576530 and v735048
respectively)
500
600
U-Shaped Relationship
12
10
Y
8
6
4
2
0
0
2
4
6
X
19
Correlation = +0.12.
8
10
12
Next Wednesday, November 12
• Least squares method (ASW, 12.2)
• Goodness of fit (ASW, 12.3)
• Assumptions of model (ASW, 12.4)
• Assignment 5 will be available on UR Courses
by late afternoon Friday, November 7.
• No class on Monday, November 10 but I will
be in the office, CL247, from 2:30 – 4:00 p.m.