Stats Review - Orem High School

Download Report

Transcript Stats Review - Orem High School

Stats Review

Chapter 2

2.1 Scatter Plots

Response Variable – Measures an outcome of a study.

Explanatory Variable – Explains or causes change in the response variable

Scatter Plots Shows the relationship between two quantitative variables measured on the same individuals.

The explanatory variable (if there is one) always goes on the horizontal axis, the response on the vertical axis.

Examining a Scatterplot  Look for an overall pattern and striking deviations from the pattern  Describe pattern using form, direction and strength of the relationship  Look for outliers  Look for positive or negative association (slope)

2.2 Correlation

Correlation measures the direction and strength of the linear relationship between two quantitative variables. Usually written as

r

.

FORMULA

Properties of Correlation  Makes no distinction between explanatory and response variables     Both variables need to be quantitative Does not change when data under goes a linear transformation Positive

r

indicates positive association, Neg.

r

indicates neg. association

r

always lies between -1 and 1 values near 0 indicate a very weak linear relationship. Values of a straight line

r

that lie close to -1 or 1 indicate the points lie close to

2.3 Least Squares Regression

Regression Line A straight line that describes how a response variable

y

changes as an explanatory variable

x

changes. Often used to predict the value of

y

for a given value of

x

. Regression unlike correlation requires an explanatory and response variable.

Least Squares Regression Line The line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

Equation of the Least-Squares Regression Line Equation: Slope

b

: This is the rate of change Intercept

a

: This is the value when x=0

Points to Remember:   The slope

b

of the LSR is the rate of change, the correlation

r

represents how close to a straight line the points lie.

Double check the data output. Some have a as the slope, others have b as the slope.

r-squared The fraction of the variation in the values of

y

that is explained by the LSR of

y

on

x

.

Residuals A residual is the difference between an observed value of the response variable and the value predicted by the regression line residual=observed

y

– predicted

y

Residual Plots A scatterplot of the regression residuals against the explanatory variable.

Lurking Variable A variable that has an important effect on the relationship among the variables in a study but is not included among the variables studied.

Warnings  Correlation measures linear association. Plot your data to make sure the relationship is linear.

  Extrapolation (predicting far outside the range of data) can produce unreliable predictions.

Correlation and Least Squares Regression are not resistant to outliers.

  Lurking variables can make a correlation or LSR unreliable Association does not imply causation, strong association between variables does not imply that changes in one causes the other to change

2.5 Exponential Growth

Linear versus Exponential Growth Linear: Increases by a fixed

amount

in each equal time period. Graphs are straight lines.

Exponential: Increases by a fixed

percentage

of the previous total. Graphs are curves

Logarithm Transformation If a variable grows exponentially, its logarithm grows linearly.

Var x=2 4 8 16 Log x=.301

.602

.903

1.204

2.6 Relations in Categorical Data

Tables A 2 way table describes two categories like age group and education level. There is a row variable and a column variable.

Marginal Distributions The totals of the rows alone and the totals of the columns alone are called marginal distributions.

Describing Relationships in Categorical Data Relationships in categorical data are descried by calculating

percents

from the counts given.

2.7 Causation

Many links between variables can explain association 3 basic links are: a) Causation b) Common Response c) Confounding

Causation – There is a direct link from x to y. This is best shown from experiments that keep all other factors fixed.

Common Response – The association is explained by a lurking variable z.

Confounding – Can be explanatory or lurking, the effects on the response variable are mixed together.

Criteria for establishing causation w/out experiment:  Association is strong.

 Association is consistent. (many studies show same association)  Higher doses are associated with stronger responses.

 The alleged cause is plausible.

Review Exercises 2.108, 2.111, 2.121