Transcript Chapter 1

Chapter 1

Introduction to Statistics

http://www.learner.org/courses/againstallodds/unitpages/unit01.html

1

Chapter Outline

• 1.1 An Overview of Statistics • 1.2 Data Classification • 1.3 Experimental Design 2

Section 1.1 Objectives

• • • • Define statistics Distinguish between a population and a sample Distinguish between a parameter and a statistic Distinguish between descriptive statistics and inferential statistics 3

What is Data?

Data

Consist of information coming from observations, counts, measurements, or responses.

• • “People who eat three daily servings of whole grains have been shown to reduce their risk of…stroke by 37%.”

(Source: Whole Grains Council)

“Seventy percent of the 1500 U.S. spinal cord injuries to minors result from vehicle accidents, and 68 percent were not wearing a seatbelt.”

(Source: UPI)

4

What is Statistics?

Statistics

The science of collecting, organizing, analyzing, and interpreting data in order to make decisions.

5

Data Sets

Population

The collection of

all

outcomes, responses, measurements, or counts that are of interest.

Sample

A subset of the population.

6

Example: Identifying Data Sets

In a recent survey, 1708 adults in the United States were asked if they think global warming is a problem that requires immediate government action. Nine hundred thirty-nine of the adults said yes. Identify the population and the sample. Describe the data set.

(Adapted from: Pew Research Center)

7

Solution: Identifying Data Sets

• • • • The population consists of the responses of all adults in the U.S.

The sample consists of the responses of the 1708 adults in the U.S. in the survey.

The sample is a subset of the responses of all adults in the U.S.

The data set consists of 939 yes’s and 769 no’s.

Responses of adults in the U.S. (population) Responses of adults in survey (sample) 8

Classifying a Data Set

: D

etermine whether the data set is a population or a sample. Explain your reasoning.

• The soil contamination levels at 10 locations near a landfill • The political party of every U.S. president • At the end of the day, a quality control inspector selects 20 light bulbs from the day's production and tests them.

Parameter and Statistic

P arameter

A number that describes a population characteristic.

Average age of all people in the United States

S tatistic

A number that describes a sample characteristic.

Average age of people from a sample of three states

10

Example: Distinguish Parameter and Statistic

Decide whether the numerical value describes a population parameter or a sample statistic.

1.

A recent survey of a sample of MBAs reported that the average salary for an MBA is more than $82,000.

(Source: The Wall Street Journal)

Solution:

Sample statistic (the average of $82,000 is based on a subset of the population) 11

Example: Distinguish Parameter and Statistic

Decide whether the numerical value describes a population parameter or a sample statistic.

2.

Starting salaries for the 667 MBA graduates from the University of Chicago Graduate School of Business increased 8.5% from the previous year.

Solution:

Population parameter (the percent increase of 8.5% is based on all 667 graduates’ starting salaries) 12

Branches of Statistics

Descriptive Statistics

Involves organizing, summarizing, and displaying data.

Inferential Statistics

Involves using

sample data

to draw conclusions about a

population.

e.g. Tables, charts, averages 13

Example: Descriptive and Inferential Statistics

Decide which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics?

A large sample of men, aged 48, was studied for 18 years. For unmarried men, approximately 70% were alive at age 65. For married men, 90% were alive at age 65.

(Source: The Journal of Family Issues)

14

Solution: Descriptive and Inferential Statistics

Descriptive statistics involves statements such as “For unmarried men, approximately 70% were alive at age 65” and “For married men, 90% were alive at 65.” A possible inference drawn from the study is that being married is associated with a longer life for men.

15

Distinguishing Between a Parameter and a Statistic

: D

etermine whether the numerical value is a parameter or a statistic. Explain your reasoning.

• Sixty-two of the 97 passengers aboard the Hindenburg airship survived its explosion.

• In a recent survey of 2000 people, 44% said China is the world's leading economic power.

(Source: Pew Research Center)

Sleep Deprivation

In a recent study, volunteers who had 8 hours of sleep were three times more likely to answer questions correctly on a math test than were sleep deprived participants.

(Source: CBS News)

(a) Identify the sample used in the study.

(b) What is the sample's population?

(c) Which part of the study represents the descriptive branch of statistics?

Section 1.1 Summary

• • • • Defined statistics Distinguished between a population and a sample Distinguished between a parameter and a statistic Distinguished between descriptive statistics and inferential statistics 17

Section 1.2

Data Classification

18

Section 1.2 Objectives

• • Distinguish between qualitative data and quantitative data Classify data with respect to the four levels of measurement 19

Types of Data

Qualitative Data

Consists of attributes, labels, or nonnumerical entries.

Major Place of birth Eye color 20

Types of Data

Quantitative data

Numerical measurements or counts.

Age Weight of a letter Temperature 21

Example: Classifying Data by Type

The base prices of several vehicles are shown in the table. Which data are qualitative data and which are quantitative data?

(Source Ford Motor Company)

22

Solution: Classifying Data by Type

Qualitative Data (Names of vehicle models are nonnumerical entries) Quantitative Data (Base prices of vehicles models are numerical entries) 23

Levels of Measurement

• • •

Nominal level of measurement

Qualitative data only Categorized using names, labels, or qualities No mathematical computations would make contextual sense • • •

Ordinal level of measurement

Qualitative or quantitative data Data can be arranged in order Differences between data entries is not meaningful 24

Example: Classifying Data by Level

Two data sets are shown. Which data set consists of data at the nominal level? Which data set consists of data at the ordinal level?

(Source: Nielsen Media Research)

25

Solution: Classifying Data by Level

Ordinal level (lists the rank of five TV programs. Data can be ordered. Difference between ranks is not meaningful.) Nominal level (lists the call letters of each network affiliate. Call letters are names of network affiliates.) 26

Levels of Measurement

• • • •

Interval level of measurement

Quantitative data Data can be ordered Differences between data entries is meaningful Zero represents a position on a scale (not an inherent zero – zero does not imply “none”).

Example temperature unit of Fahrenheit; 0 F does not mean zero amount of something! 27

Levels of Measurement

• • • •

Ratio level of measurement

Similar to interval level Zero entry is an inherent zero (implies “none”) A ratio of two data values can be formed One data value can be expressed as a multiple of another 28

Example: Classifying Data by Level

Two data sets are shown. Which data set consists of data at the interval level? Which data set consists of data at the ratio level?

(Source: Major League Baseball)

29

Solution: Classifying Data by Level

Interval level (Quantitative data. Can find a difference between two dates, but a ratio does not make sense.) Ratio level (Can find differences and write ratios.) 30

Summary of Four Levels of Measurement

Level of Measurement Nominal Ordinal Interval Ratio Put data in categories Yes Yes Yes Yes Arrange data in order No Yes Yes Yes Subtract data values No No Yes Yes Determine if one data value is a multiple of another No No No Yes 31

Section 1.2 Summary

• • Distinguished between qualitative data and quantitative data Classified data with respect to the four levels of measurement 32

Classifying Data by Level

: D

etermine whether the data are qualitative or quantitative, and identify the data set's level of measurement. Explain your reasoning.

Football:

The top five teams in the final college football poll released in January 2010 are listed. (

Source: Associated Press

)

1. Alabama 2. Texas 3. Florida 4. Boise State 5. Ohio State

Fish Lengths:

The lengths (in inches) of a sample of striped bass caught in Maryland waters are listed. (

Adapted from National Marine Fisheries Service, Fisheries Statistics and Economics Division

) 16 17.25 19 18.75 21 20.3 19.8 24 21.82

Classifying Data by Level

: D

etermine whether the data are qualitative or quantitative, and identify the data set's level of measurement. Explain your reasoning.

Graphical Analysis

: I

dentify the level of measurement of the data listed on the horizontal axis in the graph.

Classifying Data by Level

: D

etermine whether the data are qualitative or quantitative, and identify the data set's level of measurement. Explain your reasoning.

Interval. Data can be ordered and meaningful differences can be calculated, but it does not make sense to say one year is a multiple of another.

Section 1.3

How to Design a Statistical Study

Experimental Design

36

Section 1.3 Objectives

• • • • Discuss how to design a statistical study Discuss data collection techniques Discuss how to design an experiment Discuss sampling techniques 37

Designing a Statistical Study

1) Identify the variable(s) of interest (the focus) and the population of the study.

Researchers observed and recorded the mouthing behavior on nonfood objects of children up to three years old.

2) Develop a detailed plan for collecting data. If you use a sample, make sure the sample is representative of the population.

An experiment was performed in which diabetics took cinnamon extract daily while a control group took none. After 40 days, the diabetics who had the cinnamon reduced their risk of heart disease while the control group experienced no change.

3) Collect the data.

38

Designing a Statistical Study

4) Describe the data using descriptive statistics techniques.

5) Interpret the data and make decisions about the population using inferential statistics.

A possible inference drawn from the study is that being married is associated with a longer life for men.

6) Identify any possible errors.

39

Data Collection

Observational study

A researcher observes and measures characteristics of interest of part of a population.

• Researchers observed and recorded the mouthing behavior on nonfood objects of children up to three years old.

(Source: Pediatric Magazine)

40

Data Collection

Experiment

A treatment is applied to part of a population and responses are observed.

• An experiment was performed in which diabetics took cinnamon extract daily while a control group took none. After 40 days, the diabetics who had the cinnamon reduced their risk of heart disease while the control group experienced no change.

(Source: Diabetes Care)

41

Data Collection

• •

Simulation

Uses a mathematical or physical model to reproduce the conditions of a situation or process.

Often involves the use of computers.

• Automobile manufacturers use simulations with dummies to study the effects of crashes on humans.

42

Data Collection

• •

Survey

An investigation of one or more characteristics of a population.

Commonly done by interview, mail, or telephone.

• A survey is conducted on a sample of female physicians to determine whether the primary reason for their career choice is financial stability.

43

Example: Methods of Data Collection

Consider the following statistical studies. Which method of data collection would you use to collect data for each study?

1.

A study of the effect of changing flight patterns on the number of airplane accidents.

Solution:

Simulation (It is impractical to create this situation) 44

Example: Methods of Data Collection

2.

A study of the effect of eating oatmeal on lowering blood pressure.

Solution:

Experiment (Measure the effect of a treatment – eating oatmeal) 45

Example: Methods of Data Collection

3.

A study of how fourth grade students solve a puzzle.

Solution:

Observational study (observe and measure certain characteristics of part of a population) 46

Example: Methods of Data Collection

4.

A study of U.S. residents’ approval rating of the U.S. president.

Solution:

Survey (Ask “Do you approve of the way the president is handling his job?”) 47

Methods of Data Collection

http://www.learner.org/courses/againstallodds/unitpages/unit16.html

48

Key Elements of Experimental Design

• • • Control Randomization Replication 49

Key Elements of Experimental Design: Control

• •

Control

for effects other than the one being measured.

Confounding variables

 Occurs when an experimenter cannot tell the difference between the effects of different factors on a variable.

 A coffee shop owner remodels her shop at the same time a nearby mall has its grand opening. If business at the coffee shop increases, it cannot be determined whether it is because of the remodeling or the new mall.

50

Key Elements of Experimental Design: Control

Placebo effect

 A subject reacts favorably to a placebo when in fact he or she has been given no medical treatment at all.

Blinding

is a technique where the subject does not know whether he or she is receiving a treatment or a placebo.

Double-blind

experiment neither the subject nor the experimenter knows if the subject is receiving a treatment or a placebo.

51

Key Elements of Experimental Design: Randomization

• • •

Randomization

is a process of randomly assigning subjects to different treatment groups.

Completely randomized design

 Subjects are assigned to different treatment groups through random selection.

Randomized block design

 Divide subjects with similar characteristics into

blocks

, and then within each block, randomly assign subjects to treatment groups.

52

Key Elements of Experimental Design: Randomization

Randomized block design

An experimenter testing the effects of a new weight loss drink may first divide the subjects into age categories. Then within each age group, randomly assign subjects to either the treatment group or control group.

53

Key Elements of Experimental Design: Randomization

Matched Pairs Design

 Subjects are paired up according to a similarity. One subject in the pair is randomly selected to receive one treatment while the other subject receives a different treatment.

54

Key Elements of Experimental Design: Replication

Replication

is the repetition of an experiment using a large group of subjects.

• To test a vaccine against a strain of influenza, 10,000 people are given the vaccine and another 10,000 people are given a placebo. Because of the sample size, the effectiveness of the vaccine would most likely be observed.

55

Example: Experimental Design

A company wants to test the effectiveness of a new gum developed to help people quit smoking. Identify a potential problem with the given experimental design and suggest a way to improve it.

The company identifies one thousand adults who are heavy smokers. The subjects are divided into blocks according to gender. After two months, the female group has a significant number of subjects who have quit smoking.

56

Solution: Experimental Design

Problem:

The groups are not similar. The new gum may have a greater effect on women than men, or vice versa.

Correction:

The subjects can be divided into blocks according to gender, but then within each block, they must be randomly assigned to be in the treatment group or the control group.

57

Section 1.3

Experimental Design

http://www.learner.org/courses/againstallodds/unitpages/unit15.html

58

Sampling Techniques

Simple Random Sample

Every possible sample of the same size has the same chance of being selected.

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 59

Simple Random Sample

• • • Random numbers can be generated by a random number table, a software program or a calculator.

Assign a number to each member of the population.

Members of the population that correspond to these numbers become members of the sample.

60

Example: Simple Random Sample

There are 731 students currently enrolled in statistics at your school. You wish to form a sample of eight students to answer some survey questions. Select the students who will belong to the simple random sample.

• • Assign numbers 1 to 731 to each student taking statistics.

On the table of random numbers, choose a starting place at random (suppose you start in the third row, second column.) 61

Solution: Simple Random Sample

• • Read the digits in groups of three Ignore numbers greater than 731 The students assigned numbers 719, 662, 650, 4, 53, 589, 403, and 129 would make up the sample.

62

We can use Excel to pick 8 numbers between 1 and 731.

The function that does this is =RANDBETWEEN(1,731); enter it in a cell, and copy it to as many cells as desired, in this case 8. The function is under Formulas, then Insert Function.

We can use the applet provided by the publisher of your textbook to generate random numbers: http://media.pearsoncmg.com/aw/aw_mml_shared_1/statistics/West_Apple ts/randomnumbers.html

Other Sampling Techniques

Stratified Sample

Divide a population into groups (strata) and select a random sample from each group.

• To collect a stratified sample of the number of people who live in West Ridge County households, you could divide the households into socioeconomic levels and then randomly select households from each level.

65

Other Sampling Techniques

Cluster Sample

Divide the population into groups (clusters) and select all of the members in one or more, but not all, of the clusters.

• In the West Ridge County example you could divide the households into clusters according to zip codes, then select all the households in one or more, but not all, zip codes.

66

Other Sampling Techniques

Systematic Sample

Choose a starting value at random. Then choose every k th member of the population.

• In the West Ridge County example you could assign a different number to each household, randomly choose a starting number, then select every 100 th household.

67

Example: Identifying Sampling Techniques

You are doing a study to determine the opinion of students at your school regarding stem cell research. Identify the sampling technique used.

1.

You divide the student population with respect to majors and randomly select and question some students in each major.

Solution:

Stratified sampling (the students are divided into strata (majors) and a sample is selected from each major) 68

Example: Identifying Sampling Techniques

2.

You assign each student a number and generate random numbers. You then question each student whose number is randomly selected.

Solution:

Simple random sample (each sample of the same size has an equal chance of being selected and each student has an equal chance of being selected.) 69

Section 1.3 Summary

• • • • Discussed how to design a statistical study Discussed data collection techniques Discussed how to design an experiment Discussed sampling techniques 70

• What is the difference between an observational study and an experiment?

• What is the difference between a census and a sampling?

• What is the difference between a random sample and a simple random sample?

• What is replication in an experiment, and why is it important?

Deciding on the Method of Data Collection

: E

xplain which method of data collection you would use to collect data for the study.

• A study of how often people wash their hands in public restrooms • A study of the effect of a product's warning label to determine whether consumers will still buy the product

Using and Interpreting Concepts:

A pharmaceutical company wants to test the effectiveness of a new allergy drug. The company identifies 250 females 30 –35 years old who suffer from severe allergies. The subjects are randomly assigned into two groups. One group is given the new allergy drug and the other is given a placebo that looks exactly like the new allergy drug. After six months, the subjects’ symptoms are studied and compared.

(a) Identify the experimental units and treatments used in this experiment.

(b) Identify a potential problem with the experimental design being used and suggest a way to improve it.

(c) How could this experiment be designed to be double-blind?

Chapter 1: Introduction to Statistics

Elementary Statistics:

Picturing the World

Fifth Edition by Larson and Farber Slide 4- 73

© 2012 Pearson Education, Inc.

Identify the population: A survey of 500 adults in the U.S. found that 54% drink coffee daily.

A. Collection of the 500 adults surveyed B. Collection of all adults in the U.S.

C. 54% D. 500

© 2012 Pearson Education, Inc.

Slide 1- 74

Identify the population: A survey of 500 adults in the U.S. found that 54% drink coffee daily.

A. Collection of the 500 adults surveyed B. Collection of all adults in the U.S.

C. 54% D. 500

© 2012 Pearson Education, Inc.

Slide 1- 75

Identify the sample: A survey of 500 adults in the U.S. found that 54% drink coffee daily.

A. Collection of the 500 adults surveyed B. Collection of all adults in the U.S.

C. 54% D. 500

© 2012 Pearson Education, Inc.

Slide 1- 76

Identify the sample: A survey of 500 adults in the U.S. found that 54% drink coffee daily.

A. Collection of the 500 adults surveyed B. Collection of all adults in the U.S.

C. 54% D. 500

© 2012 Pearson Education, Inc.

Slide 1- 77

True or false: In the statement “A survey of 500 adults in the U.S. found that 54% drink coffee daily” 54% is a parameter.

A.True

B. False

Slide 1- 78

© 2012 Pearson Education, Inc.

True or false: In the statement “A survey of 500 adults in the U.S. found that 54% drink coffee daily” 54% is a parameter.

A.True

B. False

Slide 1- 79

© 2012 Pearson Education, Inc.

True or false: The costs of items in a shopper’s grocery cart represent quantitative data.

A. True B. False

Slide 1- 80

© 2012 Pearson Education, Inc.

True or false: The costs of items in a shopper’s grocery cart represent quantitative data.

A. True B. False

Slide 1- 81

© 2012 Pearson Education, Inc.

True or false: The social security numbers of students in a class represent quantitative data.

A. True B. False

Slide 1- 82

© 2012 Pearson Education, Inc.

True or false: The social security numbers of students in a class represent quantitative data.

A. True B. False

Slide 1- 83

© 2012 Pearson Education, Inc.

Identify the data set’s level of measurement: The IQ scores of students in a class.

A. Nominal B. Ordinal C. Interval D. Ratio

© 2012 Pearson Education, Inc.

Slide 1- 84

Identify the data set’s level of measurement: The IQ scores of students in a class.

A. Nominal B. Ordinal C. Interval D. Ratio

© 2012 Pearson Education, Inc.

Slide 1- 85

Identify the data set’s level of measurement: The nationality of each person on an airplane.

A. Nominal B. Ordinal C. Interval D. Ratio

© 2012 Pearson Education, Inc.

Slide 1- 86

Identify the data set’s level of measurement: The nationality of each person on an airplane.

A. Nominal B. Ordinal C. Interval D. Ratio

© 2012 Pearson Education, Inc.

Slide 1- 87

Identify the data set’s level of measurement: The salaries of nurses at a hospital.

A. Nominal B. Ordinal C. Interval D. Ratio

© 2012 Pearson Education, Inc.

Slide 1- 88

Identify the data set’s level of measurement: The salaries of nurses at a hospital.

A. Nominal B. Ordinal C. Interval D. Ratio

© 2012 Pearson Education, Inc.

Slide 1- 89

Decide which method of data collection would be most appropriate: A study of the effect of using MyStatLab on grades in a statistics course.

A. Observational study B. Experiment C. Simulation D. Survey

Slide 1- 90

© 2012 Pearson Education, Inc.

Decide which method of data collection would be most appropriate: A study of the effect of using MyStatLab on grades in a statistics course.

A. Observational study B. Experiment C. Simulation D. Survey

Slide 1- 91

© 2012 Pearson Education, Inc.

Identify the sampling technique used: Students are classified according to major. Twenty students are selected from each major and asked how often they use the library.

A. Random sample B. Stratified sample C. Cluster sample D. Systematic sample

© 2012 Pearson Education, Inc.

Slide 1- 92

Identify the sampling technique used: Students are classified according to major. Twenty students are selected from each major and asked how often they use the library.

A. Random sample B. Stratified sample C. Cluster sample D. Systematic sample

© 2012 Pearson Education, Inc.

Slide 1- 93