Transformations to Achieve Linearity

Download Report

Transcript Transformations to Achieve Linearity

Transformations to Achieve
Linearity
Created by Mr. Hanson
Objectives

Course Level Expectations
◦ CLE 3136.2.3 Explore bivariate data

Check for Understanding
(Formative/Summative Assessment)
◦ 3136.2.7 Identify trends in bivariate data; find
functions that model the data and that
transform the data so that they can be
modeled.
Common Models for Curved Data
1.
Exponential Model
y = a bx
2.
The variable is in
the exponent.
Power Model
y = a xb
The variable is the
base and b is its
power.
Linearizing Exponential Data
•
•
Accomplished by taking ln(y)
To illustrate, we can take the logarithm of
both sides of the model.
y = a bx
Need Help?
Click here.
ln(y) = ln (a bx)
ln(y) = ln(a) + ln(bx)
A + BX
ln(y) = ln(a)
+ ln(b)x
This is a linear model because
ln(a) and ln(b) are constants
Linearizing the Power Model
•
•

Accomplished by taking the logarithm of
both x and y.
Again, we can take the logarithm of both
sides of the model.
y = a xb
ln(y) = ln (a xb)
ln(y) = ln(a) + ln(xb)
ln(y) = ln(a)A ++BXb ln(x)
Note that this time the logarithm remains
attached to both y AND x.
Why Should We Linearize Data?
Much of bivariate data analysis is built on
linear models. By linearizing non-linear
data, we can assess the fit of non-linear
models using linear tactics.
 In other words, we don’t have to invent
new procedures for non-linear data.

HOORAY!!
Procedures for testing models
Step 1
• Inspect Data.
• If it is non-linear, you should test both exponential and power
models.
Step 2
• Transform Data.
• Try exponential model first, since it requires taking only one
logarithm.
Step 3
• Inspect Transformed Data.
• Pay close attention to residual plot and linear correlation
coefficient.
Step 4
• Repeat with the other model.
Example: Starbucks Growth
Starbucks New Stores
This table represents
the number of
Starbucks from 19842004.
Put the data in your
calculator
Year in L1
Stores in L2
year
ln_year
ln_stores
2
84
1
4.43082
0
3
87
15
4.46591
2.70805
4
88
18
4.47734
2.89037
5
89
22
4.48864
3.09104
6
90
29
4.49981
3.3673
7
91
32
4.51086
3.46574
8
92
49
4.52179
3.89182
9
93
107
4.5326
4.67283
10
94
153
4.54329
5.03044
11
95
251
4.55388
5.52545
12
96
339
4.56435
5.826
13
97
397
4.57471
5.98394
14
98
474
4.58497
6.16121
15
99
249
4.59512
5.51745
16
100
1366
4.60517
7.21964
17
101
1208
4.61512
7.09672
18
102
1177
4.62497
7.07072
19
103
1339
4.63473
7.19968
20
104
1112
4.64439
7.01392
Construct scatter plot.
Forgotten how to make
scatterplots? Click here.
stores
<
Note that the data appear to be
non-linear.
Transformation time
Starbucks New Stores
year
Transform the data
Let L3 = ln (L1)
Let L4 = ln (L2)
ln_year
ln_stores
2
84
1
4.43082
0
3
87
15
4.46591
2.70805
4
88
18
4.47734
2.89037
5
89
22
4.48864
3.09104
6
90
29
4.49981
3.3673
7
91
32
4.51086
3.46574
92
49
4.52179
3.89182
93
107
4.5326
4.67283
94
153
4.54329
5.03044
11
95
251
4.55388
5.52545
12
96
339
4.56435
5.826
13
97
397
4.57471
5.98394
14
98
474
4.58497
6.16121
15
99
249
4.59512
5.51745
16
100
1366
4.60517
7.21964
17
101
1208
4.61512
7.09672
18
102
1177
4.62497
7.07072
19
103
1339
4.63473
7.19968
20
104
1112
4.64439
7.01392
Redraw scatterplot 8
9
10
Determine new LSRL
Forgotten how to
determine the LSRL? Click
here.
stores
<new >
Original
Exponential (x, ln y)
Original
Power (ln x, ln y)
Remember: Inspect Residual Plots!!
Exponential
Power
NOTE: Since both residual plots show curved patterns, neither model is
completely appropriate, but both are improvements over the basic linear
model.
Forgotten how to make residual plots?
Click here.
R-squared (A.K.A. Tiebreaker)
If plots are similar, the decision should be
based on the value of r-squared.
Power has the highest value (r2 = .94), so
it is the most appropriate model for this
data (given your choices of models in this
course).
Forgotten how to find r-squared? Click here.
Writing equation for model
Once a model has been chosen, the LSRL
must be converted to the non-linear
model.
This is done using inverses.
In practice, you would only need to
convert the best fit model.
Conversion to Exponential
LSRL for transformed data (x, ln y)
ln y = -20.7 + .2707x
eln y = e-20.7 + .2707x
eln y = e-20.7 (e.2707x)
y = e-20.7 (e.2707)x
Transformed
Linear Model:
a + bx
Exponential
Model:
a + bx
Conversion to Power
LSRL for transformed data (ln x, ln y)
ln y = -102.4 + 23.6 ln x
eln y = e-102.4 + 23.6 ln x
eln y = e-102.4 (e23.6 ln x)
y = e-102.4 x23.6
Transformed
Linear Model:
a + bx
Exponential
Model:
a + bx
View non-linear model with nonlinear data
Assignment

U.S. Population Handout
◦ Rubric for assignment
Need Help? Email me at [email protected]