Chapter 10: Re-expressing Data (Get it Straight)

Download Report

Transcript Chapter 10: Re-expressing Data (Get it Straight)

Chapter 10: Re-expressing Data (Get it Straight)

Jami Copeland Shriya Varma Semester Project

Straightening Relationships

• In order to compare two variables using a linear regression model, the relationship between them must be linear • To re-express data you can use square roots, reciprocals or logarithms

Goals of Re-expression

• Make the distribution of a variable more symmetric • Make the spread of several groups (as seen in side-by side boxplots) more alike • Make the form of a scatterplot more nearly linear • Make the scatter in a scatterplot spread out evenly rather than following a fan shape.

The Ladder Of Powers

• The Ladder of Powers helps to decide how to re-express data • It offers a range of options and when each option should be used • It also orders the effects of the re expressions, from weakest to strongest

-1

Power

2 1 1/2 “0” -1/2

The Ladder of Powers

Name

Square of data values Raw data Square root of data values We’ll use logarithms here Reciprocal square root The reciprocal of the data

Comment

Try with unimodal distributions that are skewed to the left.

Data with positive and negative values and no bounds are less likely to benefit from re-expression.

Counts often benefit from a square root re-expression.

Measurements that cannot be negative often benefit from a log re expression.

An uncommon re-expression, but sometimes useful.

Ratios of two quantities (e.g., mph) often benefit from a reciprocal.

Plan B: Attack of the Logarithms

• If a stronger re-expression is needed, you can use Logarithms • When none of the data is zero or negative, logarithms can be used in different combinations

Plan B: Attack of the Logarithims

Model Name

Exponential

X-axis

x Logarithmic Log(x) Power Log(x)

Y-axis

log(y) y Log(y)

Comment

This model is the “0’ power in the ladder approach, useful for values that grow by percentage increases.

A wide range of x-values, or a scatterplot descending rapidly at the left but leveling off toward the right, may benefit from trying this model.

The Goldilocks model: When one of the ladder’s powers is too big and the next is too small, this one may be just right.

Problem 13: Planet distances and order

• Let’s look again at the pattern in the locations of the planets in our solar system seen in the given table.

– Use re-expressed data to create a model for the distance from the sun based on the planet’s position – There is some debate among astronomers as to whether Pluto is truly a planet or actually a large member of the Kuiper Belt of comets and other icy bodies. Does your model suggest that Pluto may not belong in the planet group? Explain.

Planet

Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto

Problem 13

8 9 4 5 6 7 2 3

Position Number

1

Distance from Sun

36 67 93 142 484 887 1784 2796 3666

Problem 13a

• Re-express the “Distance from the sun” data using logarithms and your calculator to find the new distances, log(y).

Planet

Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto

Problem 13a

Position Number

8 9 1 2 3 4 5 6 7

Distance from Sun

36 67 93 142 484 887 1784 2796 3666

Log(y)

1.5563

1.8261

1.9685

2.1523

2.6848

2.9479

3.2514

3.4465

3.5642

Problem 13b

• This problem is just asking whether Pluto should be counted as a planet.

• The linear model predicts that Pluto will be 5741 million miles away, while the data shows it is only 3666 million miles away.

• This means it doesn’t fit very well and supports the claim that Pluto doesn’t behave like a planet.

Problem 15: Quaoar Planet

• Caltech astronomers discovered a new large body orbiting around the Sun, a billion miles beyond Jupiter named Quaoar. Quaoar orbits at a distance of about 4 billion miles. It is classified as a member of the Kuiper Belt instead of a planet. There are many reasons for suspecting that Pluto is unlike other planets.

Problem 15: Quaoar Planet

• Omit Pluto from your count of planets, and consider Quaoar as a candidate for the new planet.

– Based on its position, how does Quaoar’s distance from the sun compare with the prediction made by your model?

– Refit the model using Quaoar’s distance and position in the model instead of Pluto’s. Now how well does your model predict the re-expressed distance and position?

Problem 15a

• To solve this, you find the predicted distance from the re-expressed linear regression model. The predicted distance is 3.635. Pluto’s distance is 3.564. Quaoar’s is 3.602. Quaoar is therefore a better fit on the model.

Problem 15b

• To refit the model, you replace Pluto’s data in the original chart with Quaoar's data. This makes the R^2 value go up to 99.5% which means it is a better fit.