Transcript Slide 1

Transforming Relationships
AP Statistics
Practice of Statistics
Section 4.1
What You’ll Learn
• Recognize when the relationship between two
variables is either an exponential relationship
or a power relationship

Perform the appropriate transformation to
“linearize” the data, find the LSRL on the
transformed points, “untransform” to find a
model for the original data
Not everything is Linear!
We’ve looked at several sets of data in which the
relationships are linear in nature
What about those relationships that exhibit a
different “nonlinear” pattern?
Consider for a moment gypsy moths.
An outbreak of gypsy moths in Massachusetts from
1978 to 1981 resulted in many acres of defoliated land.
The acreages are listed in the following table.
Gypsy Moths
The data and graph
depict the number of
acres defoliated by gypsy
moths in Massachusetts
between 1978 and 1981.
Calculator:
Create a scatter plot
L1: Years
L2: Acres
Stat Plot, On,
scatterplot, zoom9
Years
1978
1979
1980
1981
Acres of
Defoliated land
63042
226260
907075
2826095
So, this doesn’t look too bad! Let’s try a linear
regression on the data, remembering to check both the
correlation coefficient and the residual plot.
Calculator: Stat Calc 4
Store RegEQ, VARS, Y-VARS, Y1
Calculate, Graph (LSRL appears)
OR
Stat Calc 4
Y=, VARS, 5, EQ, 1RegEQ
Dependent Variable: Acres
Independent Variable: Year
Acres = -1.7746007E9 + 896997.4
(Year)
Sample size: 4
R (correlation coefficient) = 0.9136
R-sq = 0.8347045
Estimate of error standard
deviation: 631139.44
Well a visual of the line doesn’t look
too bad, and that’s a great correlation
coefficient.
(remember though, sometimes “r” is
deceptive---be sure to check the
residuals!)
The Residuals
• A check of the residuals indicates that a linear model is NOT
appropriate!
• (Notice the parabolic pattern in the plot that even with only 4
data points can be seen!)
So, what type of relationship is this?
• Remember from linear regression that when the
relationship is linear, the response variable
increases (or decreases) by a constant amount.
(Add or subtract the same number each time)
Years Since 1977
Acres of defoliated land
Difference in Acres
1
2
3
4
63042
226260
907075
2826095
163218
680815
1919020
•Notice that the difference between number of acres is not constant
•With this in mind and the problem with the residual plot, let’s consider
another type of relationship.
Exponential Relationships
 In an exponential relationship, the response variable increases
by a fixed percentage of the previous total.
 In other words, we should be able to multiply the previous value
by some constant to get the next one.
So, let’s check out this possibility (we will again disregard the
increase from 1990-1993 and only look at the increases for 1-year
intervals.
Years Since 1977
Acres of defoliated land
Ratio (Next/Prev)
1
2
3
4
63042
226260
907075
2826095
3.5890
4.0090
3.1156
Ratio: 226260/63042 = 35890
•Notice that although the ratio is not exactly the same (we
wouldn’t expect it to be exact with “real” data) that there
does appear to be a pretty consistent ratio value.
So How Do We Create the Model?
• If the relationship is an exponential one, we can use
a mathematical transformation to “linearize” the
data, find the LSRL of the transformed data,
then “untransform” to find the model that will
fit the original data.
Ok, so let’s take it step by step
Finding the Model
• Step 1: Use a mathematical model to “linearize” (create a
new data set whose relationship is linear)
If the original data is exponential, find the logarithm
(either common log or natural log) of each of the response
values.
When working with years it is also helpful to “code” the year data so
our calculators can handle the values (most computer programs are
capable of creating models using the full year) To do this we will take
each year and subtract 1977 (this way all of our values are > 0)
Calculator:
Stat Edit
L1: 1, 2, 3, 4
L3, up to select,
Log(L2), Enter
Years
1978
1979
1980
1981
Acres of Defoliated
land
63042
226260
907075
2826095
1
2
3
4
4.7996
5.3546
5.9576
6.4512
Years Since 1977
Log10 (acres)
Finding the Model
Now, let’s check a scatterplot of the transformed data
Calculator: Stat Plot, On, Scatter L1,L3 Graph, Zoom9
Notice the change in the pattern from our original data to the
transformed data. The logarithm transformation really
“straightened our data”. (Using the natural logarithm would have
had the same effect, our values would have just been different)
Finding the Model
• Step 2: Find the LSRL for the transformed data
(remember to check the “r” and the residuals!)
Calculator: Stat Calc 4, L1, L3 Enter
2nd Zero, DiagnosticOn
Dependent Variable: log10(Acres)
Independent Variable: Year-1977
log 10(Acres) = 4.2513404 + 0.5557706 (Year1977)
Sample size: 4
R (correlation coefficient) = 0.9993
R-sq = 0.9985874
Estimate of error standard deviation:
0.033050213
This model looks promising, but remember to CHECK THE RESIDUALS!!!
Check the Residual Plot residual  y  yˆ
Calculator:
Stat Edit L4, up, select
Enter LSRL equation,
4.2513404 + 0.5557706 (L1)
Enter, this populates the
y-hat data in L4.
Stat Edit L5, up select
Enter Residual equation,
L3 – L4, Enter.
Remember, L3 is the new (log) transformed y, and L4 is y-hat
Stat Plot, On, Scatter, L1 L5 , Graph, Zoom9
A check of the residuals confirms that a exponential
model is appropriate. (No pattern is present now).
“Untransforming” to find the model for
our original data
★ Remember that our goal was to find a model that we could
use for prediction of the number of defoliated acres of land for a
given year.
The linear model we have would predict the common
logarithm of acres. In order for our model to be useful, we
need to reverse the transformation to create the model that fits
the original data.
★
★ Although many transformations are easier to
“untransform” after evaluating, we can use the properties
of logarithms with both exponential and power (we’ll look
at those next) to find the model for our original data.
Properties of Logarithms
• Before we try to “untransform”, let’s review the
properties of logarithms you learned in Algebra
(yes, you really did learn these!)
Logb xy = logb x + logb y
(Addition rule)
Logb xm = mlogb x
(Power rule)
Logb bn = n
10logn = n
(Same base)
Logb(x/y) = logb x – logb y
(Subtraction rule)
Since any subtraction can be changed to an addition equation,
we will not use this last rule much!
Rewriting Log/Exponential Forms
Also recall rewriting from
Exponential to Logarithmic form:
bx = a
logba = x
“log base answer = exponent”
Review Exponent Rules
Homework:
Notebook, page 69 and 70
Day 2:
UNTRANSFORMING Linearized Data
Notes: Page 73, 74
“Untransforming” exponential expressions
• An exponential function takes the form:
y = abx, where a, b are constants
• (This is the form we want to end up with)
So, let’s get started
log10 (Acres) = 4.2513404 + 0.5557706 (Year-1977)
10log10(Acres) = 10 4.2513404 + 0.5557706 (Year-1977)
Linear regression of the transformed data
Raise both sides using power of 10 (same
base)
Acres = (10 4.2513404) (10.5557706(Year-1977))
Same base law and multiplication law for
exponents.
Acres = 17837.7634 (3.5956(Year-1977))
Simplify the constants
This is now in the form of y=abx, where a=17837.7634 and b = 3.5956
Notice that “b” is approximately the average of the ratios (next/prev)
we calculated when we began looking for a model.
So, does it fit our original data?
• Since our original goal was to find a model that would allow
us to predict the number of acres of defoliated land if we
knew the year, we need to check to see if our model actually
fits the data.
Scatter Plot
Gypsy Moth Outbreak
3000000
2500000
2000000
Acres
The model looks pretty
good, but as with any model
we need to use caution when
predicting outside our
original data range.
1500000
1000000
500000
0
1.0
Acres =
1.5
2.0
2.5
3.0
YearsSince1977
Y earsSince1977
3.5
4.0
4.5
Power Models
• Another important transformation used in modeling
is the power model.
Power models have the form
Y = axb where a and b are constants
We can find an appropriate power model by taking
the logarithms for both the response and explanatory
variables, finding the linear regression for the
transformed data, then using the laws of logarithms
and exponents to “untransform”
Let’s look at an example
Fishing Tournament
• In a fishing tournament that you are in charge of you need to
find a way to record the weight of each fish caught without
destroying or killing the fish.
• Since it is easier to measure the length of the fish rather than
it’s weight, we must find a way to convert the length to
weight.
• The local marine research lab has been gracious enough to
provide you with the data for the average length and weight
at different ages for Atlantic Ocean rockfish which model most
fish species growing under normal feeding conditions.
The Data
Age (yr)
Length
(cm)
Weight
(g)
1
5.2
2
2
8.5
8
3
11.5
21
4
14.3
38
5
16.8
69
6
19.2
117
7
21.3
148
8
23.3
190
9
25.0
264
10
26.7
293
11
28.2
318
12
29.6
371
13
30.8
455
14
32.0
504
15
33.0
518
16
34.0
537
17
34.9
651
18
36.4
719
19
37.1
726
20
37.7
810
•Since length is one dimensional and weight is
three dimensional we should be able to find a
reasonable model using power model (the residuals
for a regression on the original data confirms
that the variables are NOT linearly related—but
we already knew that!)
•As before we need to first transform our data
but we have to perform transformations on both
length and weight
Transforming the Data
Age (yr)
Length
(cm)
Log 10
(length)
Weight
(g)
Log10
(weight)
1
5.2
.7160
2
.3010
2
8.5
.9294
8
.9031
3
11.5
1.0607
21
1.3222
4
14.3
1.1553
38
1.5798
5
16.8
1.2253
69
1.8388
6
19.2
1.2833
117
2.0682
7
21.3
1.3284
148
2.1703
8
23.3
1.3674
190
2.2788
9
25.0
1.3979
264
2.4216
10
26.7
1.4265
293
2.4669
11
28.2
1.4502
318
2.5024
12
29.6
1.4713
371
2.5694
13
30.8
1.4886
455
2.6580
14
32.0
1.5052
504
2.7024
15
33.0
1.5315
518
2.7143
16
34.0
1.5428
537
2.7300
17
34.9
1.5611
651
2.8136
18
36.4
1.5694
719
2.8567
19
37.1
1.5763
726
2.8609
20
37.7
1.5763
810
2.9085
This scatterplot indicates that a
linear regression on the
logarithms of both variables is
certainly one to consider.
Linear Regression on the transformed data
Simple linear regression results:
Dependent Variable: log10(Weight(g))
Independent Variable: log10(Length(cm))
log10 (Weight(g)) = -1.8993973 + 3.049418 log10 (Length(cm))
Sample size: 20
R (correlation coefficient) = 0.9993
R-sq = 0.9985228
A check of the correlation coefficient is
certainly promising (r=.9993), the
scatterplot of the transformed data
indicates the line fits very well, and most
importantly-----look at those residuals!!!
Yes, statisticians get very excited when
they see residuals that look that good!
“Untransforming” a power model
log10 (Weight(g)) = -1.8993973 + 3.049418 log10
(Length(cm))
Linear equation of the transformed data
10log10(Weight(g)) = 10-1.8993973 + 3.049418 log10(length(cm))
Raise both sides using a base of 10
Weight = 10-1.8993973 (103.049418log10(length(cm)))
Same base and Multiplication law for
exponents
Weight =
10-1.8993973(10log
10
(length(cm))
3.049418
)
Weight = 10-1.8993973(length(cm))3.049418)
Power rule for logarithms
Same base
Simplify constants
Weight = .01261 (length(cm))3.049418
Scatter Plot
Last check: plot the new model on
the original data.
Looks like we’ve got a model that will
be very useful for estimating the
weight of a fish if we know its length!
Weight
Atlantic Ocean Rockfish
900
800
700
600
500
400
300
200
100
0
5
Weight =
10
15
20
25
Length
Length
30
35
40
Are there Other Possibilities?
• There are many other possibilities to
transform data in order to find a model.
• If either an exponential or power model is not
appropriate you may try:
– Square the response or explanatory variable
– Take the square root of either variable
– Take the reciprocal of either variable
• The possibilities are endless, but for now we will
concentrate mostly on either an exponential or
power model.
Transforming on the TI
• There are a couple of different ways to find
both an exponential and power regression
model on your TI-calculator
• Using lists to transform
• Using the built in regression models
Using lists to transform
• We’ll use the Gypsy Moth data first.
Enter in lists 1 & 2
L1: years since 1977
L2: acres of defoliated land
Take the common log of the
values in list 2 and put the new
values in list 3
L3: log (L2)
Now do a linear regression on
lists 1 & 3
You can check residuals just like
we did before to verify this
regression.
Now “untransform” as we did
before to get the exponential
Note: for a power model create
another list for the logarithm of the
explanatory variable and do the linear
regression on these two lists.
Using the Regression Models
• The TI family of calculators has both an exponential and power model built
into the stat calc menus.
• Create a list for the explanatory variable and one for the response variable
•
From the home screen
– STAT
– CALC
– 0:ExpReg
(A:PwrReg)
L1, L2
– The model does not need
untransforming
– The residuals created are the
residuals from the linear
transformation on the
transformed data (yes, your
calculator actually transforms
the data, does a linear
regression, then untransforms
How to decide which model
• Creating mathematical models for real data involves
a lot of trial and error.
• One strategy:
– Try a linear model first ( residuals)
– Then try an exponential model ( residuals)
– Then try a power model (
residuals)
• If all residuals show a pattern, you can continue to
try different transformations or choose the one with
the best correlation
• Remember, no model is perfect, some models are
useful…..we wish to find a useful model.
Homework:
• Notebook, page 71, problem #1 only
• Handout “Practice Before Quiz 3.3”