Transcript Slide 1
Section 2.4 How Can We Describe the Spread of Quantitative Data? Agresti/Franklin Statistics, 1 of 63 Measuring Spread: Range Range: difference between the largest and smallest observations Agresti/Franklin Statistics, 2 of 63 Measuring Spread: Standard Deviation Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations s ( x x )2 n 1 Agresti/Franklin Statistics, 3 of 63 Empirical Rule For bell-shaped data sets: Approximately 68% of the observations fall within 1 standard deviation of the mean Approximately 95% of the observations fall within 2 standard deviations of the mean Approximately 100% of the observations fall within 3 standard deviations of the mean Agresti/Franklin Statistics, 4 of 63 Parameter and Statistic A parameter is a numerical summary of the population A statistic is a numerical summary of a sample taken from a population Agresti/Franklin Statistics, 5 of 63 Section 2.5 How Can Measures of Position Describe Spread? Agresti/Franklin Statistics, 6 of 63 Quartiles Splits the data into four parts The median is the second quartile, Q2 The first quartile, Q1, is the median of the lower half of the observations The third quartile, Q3, is the median of the upper half of the observations Agresti/Franklin Statistics, 7 of 63 Example: Find the first and third quartiles Prices per share of 10 most actively traded stocks on NYSE (rounded to nearest $) 2 4 11 12 13 15 31 31 37 47 a. b. c. d. Q1 = 2 Q1 = 12 Q1 = 11 Q1 =11.5 Q3 = Q3 = Q3 = Q3 = 47 31 31 32 Agresti/Franklin Statistics, 8 of 63 Measuring Spread: Interquartile Range The interquartile range is the distance between the third quartile and first quartile: IQR = Q3 – Q1 Agresti/Franklin Statistics, 9 of 63 Detecting Potential Outliers An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile Agresti/Franklin Statistics, 10 of 63 The Five-Number Summary The five number summary of a dataset: • Minimum value • First Quartile • Median • Third Quartile • Maximum value Agresti/Franklin Statistics, 11 of 63 Boxplot A box is constructed from Q1 to Q3 A line is drawn inside the box at the median A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier A line extends outward from the upper end of the box to the largest observation that is not a potential outlier Agresti/Franklin Statistics, 12 of 63 Boxplot for Sodium Data Sodium Data: 0 200 70 210 125 210 125 220 140 220 150 230 170 250 170 260 180 290 200 290 Five Number Summary: Min: 0 Q1: 145 Med: 200 Q3: 225 Max: 290 Agresti/Franklin Statistics, 13 of 63 Boxplot for Sodium in Cereals Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180 Agresti/Franklin Statistics, 14 of 63 Z-Score The z-score for an observation measures how far an observation is from the mean in standard deviation units observatio n - mean z standard deviation An observation in a bell-shaped distribution is a potential outlier if its z-score < -3 or > +3 Agresti/Franklin Statistics, 15 of 63 Chapter 3 Association: Contingency, Correlation, and Regression Learn …. How to examine links between two variables Agresti/Franklin Statistics, 16 of 63 Variables Response variable: the outcome variable Explanatory variable: the variable that explains the outcome variable Agresti/Franklin Statistics, 17 of 63 Association An association exists between the two variables if a particular value for one variable is more likely to occur with certain values of the other variable Agresti/Franklin Statistics, 18 of 63 Section 3.1 How Can We Explore the Association Between Two Categorical Variables? Agresti/Franklin Statistics, 19 of 63 Example: Food Type and Pesticide Status Agresti/Franklin Statistics, 20 of 63 Example: Food Type and Pesticide Status What is the response variable? What is the explanatory variable? Pesticides: Food Type: Organic Conventional Yes No 29 98 19485 7086 Agresti/Franklin Statistics, 21 of 63 Example: Food Type and Pesticide Status What proportion of organic foods contain pesticides? What proportion of conventionally grown foods contain pesticides? Pesticides: Food Type: Organic Conventional Yes 29 19485 No 98 7086 Agresti/Franklin Statistics, 22 of 63 Example: Food Type and Pesticide Status What proportion of all sampled items contain pesticide residuals? Pesticides: Food Type: Organic Conventional Yes No 29 98 19485 7086 Agresti/Franklin Statistics, 23 of 63 Contingency Table The Food Type and Pesticide Status Table is called a contingency table A contingency table: • Displays 2 categorical variables • The rows list the categories of 1 variable • The columns list the categories of the other variable • Entries in the table are frequencies Agresti/Franklin Statistics, 24 of 63 Example: Food Type and Pesticide Status Contingency Table Showing Conditional Proportions Agresti/Franklin Statistics, 25 of 63 Example: Food Type and Pesticide Status What is the sum over each row? What proportion of organic foods contained pesticide residuals? What proportion of conventional foods contained pesticide residuals? Pesticides: Food Type: Yes No Organic 0.23 0.77 Conventional 0.73 0.27 Agresti/Franklin Statistics, 26 of 63 Example: Food Type and Pesticide Status Agresti/Franklin Statistics, 27 of 63 Example: For the following pair of variables, which is the response variable and which is the explanatory variable? College grade point average (GPA) and high school GPA a. College GPA: response variable and High School GPA : explanatory variable b. College GPA: explanatory variable and High School GPA : response variable Agresti/Franklin Statistics, 28 of 63 Section 3.2 How Can We Explore the Association Between Two Quantitative Variables? Agresti/Franklin Statistics, 29 of 63 Scatterplot Graphical display of two quantitative variables: • Horizontal Axis: Explanatory variable, x • Vertical Axis: Response variable, y Agresti/Franklin Statistics, 30 of 63 Example: Internet Usage and Gross National Product (GDP) Agresti/Franklin Statistics, 31 of 63 Positive Association Two quantitative variables, x and y, are said to have a positive association when high values of x tend to occur with high values of y, and when low values of x tend to occur with low values of y Agresti/Franklin Statistics, 32 of 63 Negative Association Two quantitative variables, x and y, are said to have a negative association when high values of x tend to occur with low values of y, and when low values of x tend to occur with high values of y Agresti/Franklin Statistics, 33 of 63 Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election? Agresti/Franklin Statistics, 34 of 63 Linear Correlation: r Measures the strength of the linear association between x and y • • • • A positive r-value indicates a positive association A negative r-value indicates a negative association An r-value close to +1 or -1 indicates a strong linear association An r-value close to 0 indicates a weak association Agresti/Franklin Statistics, 35 of 63 Calculating the correlation, r 1 xx y y r ( )( ) n 1 sx sy Agresti/Franklin Statistics, 36 of 63 Example: 100 cars on the lot of a used-car dealership Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? Positive association Negative association No association Agresti/Franklin Statistics, 37 of 63