Transcript Slide 1
+ Chapter 4 Designing Studies 4.1 Samples and Surveys + You can hardly go a day without hearing the results of a statistical study. Here are some examples: Social Media and Teens According to a Common Sense Media study, nine out of ten 13to 17-year-olds have used some form of social media. Three out of four teenagers currently have a profile on a social networking site, and one in five has a current Twitter account. 68% of all teens say Facebook is their main social networking site, compared to 6% for Twitter, 1% for Google Plus, and 1% for MySpace. Underage Drinking During the past month (30 days), 26.4% of underage persons (ages 12-20) used alcohol, and binge drinking among the same age group was 17.4%. SAMHSA Can we trust the results? That depends on how the data were produced. and Sample + Population Definition: The population in a statistical study is the entire group of individuals about which we want information. A sample is the part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population. Population Collect data from a representative Sample... Sample Make an Inference about the Population. Sampling and Surveys The distinction between population and sample is basic to statistics. To make sense of any sample result, you must know what population the sample represents Sampling Students and Soda Problem: Identify the population and sample in each of the following settings. (a) The student government at a high school surveys 100 of the students at the school to get their opinions about a change to the bell schedule. (a) Population is all students at the school, sample is the 100 students surveyed. (b) The quality control manager at a bottling company selects a sample of 10 cans from the production line every hour to see if the volume of the soda is within acceptable limits. (b) Population is all cans produced that hour, sample is 10 cans inspected. Idea of a Sample Survey Choosing a sample from a large, varied population is not that easy. Step 1: Define the population we want to describe. Step 2: Say exactly what we want to measure. Step 3: Decide how to choose a sample from the population. A “sample survey” is a study that uses an organized plan to choose a sample that represents some specific population. Sampling and Surveys We often draw conclusions (inferences) about a whole population on the basis of a sample. + The to Sample Badly Definition: Choosing individuals who are easiest to reach results in a convenience sample. Convenience samples often produce unrepresentative data…why? Definition: The design of a statistical study shows bias if it systematically favors certain outcomes. Sampling and Surveys How can we choose a sample that we can trust to represent the population? There are a number of different methods to select samples. + How to Sample Badly samples are almost guaranteed to show bias. So are voluntary response samples, in which people decide whether to join the sample in response to an open invitation. Definition: A voluntary response sample consists of people who choose themselves by responding to a general appeal. Voluntary response samples show bias because people with strong opinions (often in the same direction) are most likely to respond. Sampling and Surveys Convenience + How For each of the following situations, identify the sampling method used. Then explain how the sampling method could lead to bias. a) A farmer brings a juice company several crates of oranges each week. A company inspector looks at 10 oranges from the top of each crate before deciding whether to buy all the oranges. a. The ABC program Nightline once asked whether the United Nations should continue to have its headquarters in the US. Viewers were invited to call one telephone number to respond “Yes” and another for “No”. There was a charge for calling either number. More that 186,000 callers responded and 67% said, “No”. to Sample Well: Random Sampling The statistician’s remedy is to allow impersonal chance to choose the sample. A sample chosen by chance rules out both favoritism by the sampler and self-selection by respondents. Random sampling, the use of chance to select a sample, is the central principle of statistical sampling. Definition: A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. In practice, people use random numbers generated by a computer or calculator to choose samples. If you don’t have technology handy, you can use a table of random digits. Sampling and Surveys + How Definition: A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. Select a sample of size 4 from our class to illustrate the definition of SRS Need two recorders Let’s force the number the sample to have 2 girls and 2 boys. Why is this not an SRS? Since samples of 4 that do not contain 2 girls and 2 boys have no chance of occurring. In an SRS, each possible sample of 4 has the same chance of occurring. In practice, people use random numbers generated by a computer or calculator to choose samples. If you don’t have technology handy, you can use a table of random digits. to Choose an SRS How to Choose an SRS Using Table D Step 1: Label. Give each member of the population a numerical label of the same length. Step 2: Table. Read consecutive groups of digits of the appropriate length from Table D. Your sample contains the individuals whose labels you find. Sampling and Surveys Definition: A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these properties: • Each entry in the table is equally likely to be any of the 10 digits 0 - 9. • The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part. + How Problem: Use Table D at line 130 to choose an SRS of 4 hotels. 01 Aloha Kai 02 Anchor Down 03 Banana Bay 04 Banyan Tree 05 Beach Castle 06 Best Western 07 Cabana 69051 08 Captiva 09 Casa del Mar 10 Coconuts 11 Diplomat 12 Holiday Inn 13 Lime Tree 14 Outrigger 15 Palm Tree 16 Radisson 17 Ramada 18 Sandpiper 19 Sea Castle 20 Sea Club 21 Sea Grape 22 Sea Shell 23 Silver Beach 24 Sunset Beach 25 Tradewinds 26 Tropical Breeze 27 Tropical Shores 28 Veranda 64817 87174 09517 84534 06489 87201 97245 Sampling and Surveys How to Choose an SRS + Example: 69 05 16 48 17 87 17 40 95 17 84 53 40 64 89 87 20 Our SRS of 4 hotels for the editors to contact is: 05 Beach Castle, 16 Radisson, 17 Ramada, and 20 Sea Club. ACU#2 Alternate Example: Mall Hours The management company of a local mall plans to survey a random sample of 3 stores to determine the hours they would like to stay open during the holiday season. Problem: Use Table D at line 101 to select an SRS of size 3 stores. 01 Aeropostale 08 Forever 21 15 Old Navy 02 All American Burger 09 GameStop 16 Pac Sun 03 Arby’s 10 Gymboree 17 Panda Express 04 Barnes & Noble 11 Haggar 18 Payless Shoes 05 Carter’s for Kids 12 Just Sports 19 Star Jewelers 06 Destination Tan 13 Mrs. Fields 20 Vitamin World 07 Famous Footwear 14 Nike Factory Store 21 Zales Diamond Store Using line 101, here are the selected stores: 19 (Star Jewelers) 22 (skip) 39 (skip) 50 (skip) 34 (skip) 05 (Carter’s for Kids) 75 (skip) 62 (skip) 87 (skip) 13 (Mrs. Fields) SRS using our Graphing Calculators (Tech Corner pg 214) Seed the random number generator for TI-83/84, enter in a unique number on your screen (cell phone number, student number). Then, press the STO button (just above the ON button). Finally, have them press the MATH button, scroll to the PRB menu, choice the first option, rand, and press ENTER 01 Aloha Kai 02 Anchor Down 03 Banana Bay 04 Banyan Tree 05 Beach Castle 06 Best Western 07 Cabana 08 Captiva 09 Casa del Mar 10 Coconuts 11 Diplomat 12 Holiday Inn 13 Lime Tree 14 Outrigger 15 Palm Tree 16 Radisson 17 Ramada 18 Sandpiper 19 Sea Castle 20 Sea Club 21 Sea Grape 22 Sea Shell 23 Silver Beach 24 Sunset Beach 25 Tradewinds 26 Tropical Breeze 27 Tropical Shores 28 Veranda ACU#3 How large is a typical U.S. State? Pick 5 states in 15 seconds. Refer to the table of land areas on page N/DS-5. Find the mean land area for your sample. Dot plot of the results. Use line of Table D that I assign you to choose an SRS of 5 states. Find the mean land for this sample. Dot plot of the results. How do the class’s estimates using the two methods compare? What advantage(s) does random sampling provide? A university’s financial aid office wants to know how much it can expect students to earn from summer employment. This information will be used to set the level of financial aid. The population contains 3478 students who have completed at least one year of study but have not yet graduated. A questionnaire will be sent to an SRS of 100 of these students drawn from an alphabetized list. (a) Describe how you will select the sample. (b) Starting at line 135, use the portion of the random digits table below to select the first three students in the sample. 135 66925 55658 39100 78458 11206 19876 87151 31260 136 08421 44753 77377 28744 75592 08563 79140 92454 137 53645 66812 61421 47836 12609 15373 98481 14592 The basic idea of good sampling is straightforward: take an SRS from the population and use your sample results to gain information about the population. SRS: • every individual has = chance of getting picked, • every sample of the size you are drawing has = chance of getting picked Unfortunately, it’s usually very difficult to actually get an SRS from the population of interest. Sometimes, there are also statistical advantages to using more complex samplings methods. If the individuals in each stratum are less varied than the Same population as aand whole, a stratified Strata Within Different Between random sample can often produce better EX. The population of students in a large information about the population than an SRS high school of the same might size. be divided not freshman, sophomore, junior, and senior strata. + Definition: To select a stratified random sample, first classify the population into groups of similar individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample. Sampling Sunflowers pg. 216 Use Table D or technology to take an SRS of 10 grid squares, then using the rows as strata. Then, repeat using the columns as strata. + Activity: Cell Phones Only A growing percentage of Americans are dropping their traditional land line phones and using their cell phones exclusively. This has caused a problem for polling organizations who in the past only used land lines when randomly selecting people for their polls. Since the group of people who use cell phones exclusively are different in many ways than people who do not rely exclusively on cell phones, polling organizations now use a stratified sampling design that selects a random sample of cell phone users and a random sample of land line users. Although a stratified random sample can sometimes give more precise information about a population than an SRS, both sampling methods are hard to use when populations are large and spread out over a wide area. In that situation, we’d prefer a method that selects groups of individuals that are “near” one another. Definition: To take a cluster sample, first divide the population into smaller groups. Ideally, these clusters should mirror the characteristics of the population. Then choose an SRS of the clusters. All individuals in the chosen clusters are included in the sample. + Other Sampling Methods Sampling and Surveys Cluster Different Within and Same Between Sampling Sunflowers Which, rows or columns, could be used for clusters? Why? + Strata (Same Within group/Different Between groups) Rows or Section? Cluster(Different Within group/Same Between groups) Rows or Section? A Hotel on the Beach The manager of a beach-front hotel wants to survey guests in the hotel to estimate overall customer satisfaction. The hotel has two towers, an older one to the south and a newer one to the north. Each tower has 10 floors of standard rooms (40 rooms per floor) and 2 floors of suites (20 suites per floor). Half of the rooms in each tower face the beach, while the other half of the rooms face the street. This means there are (2 towers)(10 floors)(40 rooms) + (2 towers)(2 floors)(20 suites) = 880 total rooms. Problem: (a) Explain how to select a simple random sample of 88 rooms. (b) Explain how to select a stratified random sample of 88 rooms. (c) Explain why selecting 2 of the 24 different floors would not be a good way to obtain a cluster sample. (b) customer vary based ononce roomthe (e.g. aselected suite in the (c) a)Since Although Number each it would ofsatisfaction the be rooms easy towill from collect 001the to 880. data Using alocation random floors were number north tower facing the ocean much nicer thanfrom a room in south facing the (since generator, you only select need 88 unique to visit is two random floors), integers each floor 001 is more to the 880. homogenous Usetower the than street), we rooms shouldwhich take sample from each type of room. willcluster take an heterogeneous, selected for theaissample a bad things for clusters. Ideally,We each should SRS of 20 the 200 types south of tower rooms facing beach 2 floors, it is fairly likely include allfrom the different rooms. If you onlythe selected SRS 20would from the southintower rooms facing that of you get 200 no suites the sample or onlythe getstreet one of the towers in the SRS of 20 from the 200 north tower rooms facing the beach sample. SRS of 20 from the 200 north tower rooms facing the street SRS of 2 from the 20 south tower suites facing the beach SRS of 2 from the 20 south tower suites facing the street SRS of 2 from the 20 north tower suites facing the beach SRS of 2 from the 20 north tower suites facing the street 1. Your school will send a delegation of 35 seniors to a student life convention. 200 girls and150 boys are eligible to be chosen. If a sample of 20 girls and separate sample 15 boys are each selected randomly, it gives each senior the same chance to be chosen to attend the convention. Is it an SRS? Explain. 2. 18. Dead trees. On the west side of Rocky Mountain National Park, many mature pine trees are dying due to infestation by pine beetles. Scientists would like to use sampling to estimate the proportion of all pine trees in the area that have been infected. (a) Explain why it wouldn’t be practical for scientists to obtain an SRS in this setting. To obtain an SRS, every tree would need to have an equal chance of being included in the sample. It is not practical to even identify every tree in the park. 18. Dead trees. On the west side of Rocky Mountain National Park, many mature pine trees are dying due to infestation by pine beetles. Scientists would like to use sampling to estimate the proportion of all pine trees in the area that have been infected. (b) A possible alternative would be to use every pine tree along the park’s main road as a sample. Why is this sampling method biased? This sampling method is biased because these trees are unlikely to be representative of the population. Trees along the main road are more likely to be damaged by cars and people and may be more susceptible to infestation. 18. Dead trees. On the west side of Rocky Mountain National Park, many mature pine trees are dying due to infestation by pine beetles. Scientists would like to use sampling to estimate the proportion of all pine trees in the area that have been infected. (c) Suppose that a more complicated random sampling plan is carried out, and that 35% of the pine trees in the sample are infested by the pine beetle. Can scientists conclude that 35% of all the pine trees on the west side of the park are infested? Why or why not? The scientists can be confident that the actual percent of pine trees in the area that are infected by the pine beetle is near 35%, although there is always some error associated with using sampling to estimate population parameters. Parameter-is a number that describes some characteristic of the population The purpose of a sample is to give us information about a larger population. The process of drawing conclusions about a population on the basis of sample data is called inference. “Margin of error” does not mean that a mistake has Why should we rely on random sampling? been made but rather 1)To eliminate bias in selecting samples from the list of available individuals. compensates for the variability results 2)The laws of probability allowthat trustworthy inference about the population from taking a random • Results from random come with a margin of sample fromsamples a population. error that sets bounds on the size of the likely error. • Larger random samples give better information about the population than smaller samples. + Inference for Sampling Sampling and Surveys • Sampling Variability is described by the margin of error that comes with most poll results. • Most sample surveys are affected by errors in addition to sampling variability. These errors can introduce bias that make a survey result meaningless. Two main sources of errors in sample surveys: Sampling errors Non-sampling errors Sampling errors Sampling errors are mistakes made in the process of taking a sample that could lead to inaccurate information about the population. • Random sample margin of error • bad sampling methods, such as voluntary response and convenience samples. We can control these. • Undercoverage which occurs when some members of the population are left out of the sampling frame Non-sampling errors Nonresponse occurs when an individual chosen for the sample can’t be contacted or refuses to participate. Response bias is when people intentionally answer wrong Poor wording of questions Don’t misuse the term “voluntary response” to explain why certain individuals don’t respond in a sample survey. Nonresponse can occur only after a sample has been selected. In voluntary response sample, every individual has opted to take part, so there won’t be any nonresponse. + Section 4.2 Experiments Learning Objectives After this section, you should be able to… DISTINGUISH observational studies from experiments DESCRIBE the language of experiments APPLY the three principles of experimental design + + http://apcentral.collegeboard.com/a pc/public/repository/ap10_statistics_ form_b_q2.pdf Study versus Experiment An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. Experiments In contrast to observational studies, experiments don’t just observe individuals or ask them questions. They actively impose some treatment in order to measure the response. Definition: + Observational An experiment deliberately imposes some treatment on individuals to measure their responses. When our goal is to understand cause and effect, experiments are the only source of fully convincing data. Let me repeat… When our goal is to understand cause and effect, experiments are the only source of fully convincing data. The distinction between observational study and experiment is one of the most important in statistics. We think that car weight helps explain accident deaths and that smoking influences life expectancy. Car weight Smoking accident deaths life expectancy In these relationships, the two variables play different roles. Car weight and number of cigarettes smoked are the explanatory variables Accident death rate and life expectancy are the response variables An explanatory variable may help explain or influence changes in a response variable A response variable measures an outcome of a study . Soy Good For You? The November 2009 issue of Nutrition Action discusses what the current research tells us about the supposed benefits of soy. For a long time, scientists have believed that the soy foods in Asian diets explains the lower rates of breast cancer, prostate cancer, osteoporosis, and heart disease in places like China and Japan. However, when experiments were conducted, soy either had no effect or a very small effect on the health of the participants. For example, several different studies randomly assigned elderly women either to soy or placebo, and none of the studies showed that soy was more beneficial for preventing osteoporosis. So what explains the lower rates of osteoporosis in Asian cultures? We still don’t know. It could be due to genetics, other dietary factors, or any other difference between Asian cultures and non-Asian cultures. An explanatory variable may help explain or influence changes in a response variable. Whether the woman consumed soy. A response variable measures an outcome of a study. The overall health of the woman Weight and Height Julie wants to know if she can predict a student’s weight from his or her height. Information about height is easier to obtain than information about weight! Jim wants to know if there is a relationship between height and weight . Problem: For each student, identify the explanatory variable and response variable, if possible Solution: Julie is treating a student’s height as the explanatory variable and the student’s weight as the response variable. Jim is just interested in exploring the relationship between the two variables, so there is no clear explanatory or response variable. Study versus Experiment Definition: A lurking variable is a variable that is not among the explanatory or response variables in a study but that may influence the response variable. Confounding occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. Well-designed experiments take steps to avoid confounding. Experiments Observational studies of the effect of one variable on another often fail because of confounding between the explanatory variable and one or more lurking variables. + Observational For example, ice cream consumption and murder rates are highly correlated. Now, does ice cream incite murder or does murder increase the demand for ice cream? Neither: they are joint effects of a common cause or lurking variable, namely, hot weather. Another look at the sample shows that it failed to account for the time of year, including the fact that both rates rise in the summertime What could be a lurking variable in these examples? • There is a strong positive correlation between the foot length of K-12 students and reading scores. • Students who use tutors have lower test scores than students who don’t. • A survey shows a strong positive correlation between the percentage of a country's inhabitants that use cell phones and the life expectancy in that country Language of Experiments Definition: A specific condition applied to the individuals in an experiment is called a treatment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables. The experimental units are the smallest collection of individuals to which treatments are applied. When the units are human beings, they often are called subjects. Sometimes, the explanatory variables in an experiment are called factors. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a level) of each of the factors. Experiments An experiment is a statistical study in which we actually do something (a treatment) to people, animals, or objects (the experimental units) to observe the response. Here is the basic vocabulary of experiments. + The Experiments are the preferred method for examining the effect of one variable on another. By imposing the specific treatment of interest and controlling other influences, we can pin down cause and effect. Good designs are essential for effective experiments, just as they are for sampling. Experiment to Experiment Badly + How Example, page 236 A high school regularly offers a review course to prepare students for the SAT. This year, budget cuts will allow the school to offer only an online version of the course. Over the past 10 years, the average SAT score of students in the classroom course was 1620. The online group gets an average score of 1780. That’s roughly 10% higher than the long- time average for those who took the classroom review course. Is the online course more effective? Students -> Online Course -> SAT Scores Many laboratory experiments use a design like the one in the online SAT course example: Experimental Units Students ---> Treatment Measure Response Online Course -----> SAT Scores In the lab environment, simple designs often work well. Field experiments and experiments with animals or people deal with more variable conditions. Outside the lab, badly designed experiments often yield worthless results because of confounding. Experiment to Experiment Badly + How to Experiment Well: The Randomized Comparative Experiment The remedy for confounding is to perform a comparative experiment in which some units receive one treatment and similar units receive another. Most well designed experiments compare two or more treatments. Comparison alone isn’t enough, if the treatments are given to groups that differ greatly, bias will result. The solution to the problem of bias is random assignment. Definition: In an experiment, random assignment means that experimental units are assigned to treatments at random, that is, using some sort of chance process. Experiments + How ACU # Does Caffeine Affect Pulse Rate? Many students regularly consume caffeine to help them stay alert. Thus, it seems plausible that taking caffeine might increase an individual’s pulse rate. Is this true? One way to investigate this is to have volunteers measure their pulse rates, drink some cola with caffeine, measure their pulses again after 10 minutes and calculate the increase in pulse rate. Unfortunately, even if every student’s pulse rate went up, we couldn’t attribute the increase to caffeine. Suppose you have a class of 30 students who volunteer to be subjects in the caffeine experiment described earlier. Problem: Explain how you would randomly assign 15 students to each of the two treatments. Solution: Using 30 identical slips of paper, write A on 15 and B on the other 15. Mix them thoroughly in a hat and have each student select one paper. Have each student who received an A drink the cola with caffeine and each student who received a B to drink the cola without caffeine. Randomized Comparative Experiment + The Group 1 Experimental Units Experiments Definition: In a completely randomized design, the treatments are assigned to all the experimental units completely by chance. Some experiments may include a control group that receives an inactive treatment or an existing baseline treatment. Treatment 1 Compare Results Random Assignment Group 2 Treatment 2 ACU # Dueling Diets A health organization wants to know if a low-carb or low-fat diet is more effective for long-term weight loss. The organization decides to conduct an experiment to compare these two diet plans with a control group that is only provided with brochure about healthy eating. Ninety volunteers agree to participate in the study for one year. Problem: Outline a completely randomized design for this experiment. Write a few sentences describing how you would implement your design. Solution: Here is a basic outline: Group 1 (30 subjects) Trt 1 (low-carb) Random Assignment Group 2 (30 subjects) Trt 2 (low-fat) Group 3 (30 subjects) Trt 3 (control) compare weight loss To implement the design, use 90 equally sized slips of paper. Label 30 of the slips “1”, 30 of the slips “2” and 30 of the slips “3”. Then, mix them up in a hat and have each subject draw a number without looking. The number that each subject chooses will be the group he or she is assigned to. At the end of the year, the amount of weight loss will be recorded for each subject and the mean weight loss will be compared for the three treatments. Randomized comparative experiments are designed to give good evidence that differences in the treatments actually cause the differences we see in the response. Principles of Experimental Design 1. Control for lurking variables that might affect the response: Use a comparative design and ensure that the only systematic difference between the groups is the treatment administered. 2. Random assignment: Use impersonal chance to assign experimental units to treatments. This helps create roughly equivalent groups of experimental units by balancing the effects of lurking variables that aren’t controlled on the treatment groups. 3. Replication: Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups. Experiments Principles of Experimental Design + Three More Caffeine Problem: Explain how to use all three principles of experimental design in the caffeine experiment. Solution: Control: There should be a control group that receives non-caffeinated cola. Also, the subjects in each group should receive exactly the same amount of cola served at the same temperature. Also, each type of cola should look and taste exactly the same and have the same amount of sugar. Subjects should drink the cola at the same rate and wait the same amount of time before measuring their pulse rates. If all of these lurking variables are controlled, they will not be confounded with caffeine or be an additional source of variability in pulse rates. Randomization: Subjects should be randomly assigned to one of the two treatments. This should roughly balance out the effects of the lurking variables we cannot control, such as body size, caffeine tolerance, and the amount of food recently eaten. Replication: We want to use as many subjects as possible to help make the treatment groups as equivalent as possible. This will give us a better chance to see the effects of caffeine, if there are any. The Physicians’ Health Study A placebo is a “dummy pill” or inactive treatment that is indistinguishable from the real treatment. Experiments Read the description of the Physicians’ Health Study on page 241. Explain how each of the three principles of experimental design was used in the study. + Example: What Can Go Wrong? The logic of a randomized comparative experiment depends on our ability to treat all the subjects the same in every way except for the actual treatments being compared. Good experiments, therefore, require careful attention to details to ensure that all subjects really are treated identically. A response to a dummy treatment is called a placebo effect. The strength of the placebo effect is a strong argument for randomized comparative experiments. Whenever possible, experiments with human subjects should be double-blind. Definition: In a double-blind experiment, neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received. Experiments + Experiments: The caffeine experiment can be conducted in a doubleblind manner. It is very important that the subjects be unaware of which treatment they are receiving. To make sure they are blind, the two colas must look and taste exactly the same. To be double-blind, the people measuring the pulse rates and interacting with the subjects should also be unaware of which subjects are getting which treatments. To make this happen, have the subjects measure their own pulse since they are already blind. Also have another teacher prepare the colas with labels A and B and then leave the room before anyone else gets there. Only after the experiment is complete will this teacher come back to reveal which treatment is which. for Experiments In an experiment, researchers usually hope to see a difference in the responses so large that it is unlikely to happen just because of chance variation. We can use the laws of probability, which describe chance behavior, to learn whether the treatment effects are larger than we would expect to see if only chance were operating. If they are, we call them statistically significant. Definition: An observed effect so large that it would rarely occur by chance is called statistically significant. A statistically significant association in data from a well-designed experiment does imply causation. Experiments + Inference Growing Tomatoes Does adding fertilizer affect the productivity of tomato plants? How about the amount of water given to the plants? To answer these questions, a gardener plants 24 similar tomato plants in identical pots in his greenhouse. He will add fertilizer to the soil in half of the pots. Also, he will water 8 of the plants with 0.5 gallons of water per day, 8 of the plants with 1 gallon of water per day and the remaining 8 plants with 1.5 gallons of water per day. At the end of three months he will record the total weight of tomatoes produced on each plant. Problem: Identify the explanatory and response variables, experimental units, and list all the treatments. Growing Tomatoes Does adding fertilizer affect the productivity of tomato plants? How about the amount of water given to the plants? To answer these questions, a gardener plants 24 similar tomato plants in identical pots in his greenhouse. He will add fertilizer to the soil in half of the pots. Also, he will water 8 of the plants with 0.5 gallons of water per day, 8 of the plants with 1 gallon of water per day and the remaining 8 plants with 1.5 gallons of water per day. At the end of three months he will record the total weight of tomatoes produced on each plant. Problem: Identify the explanatory and response variables, experimental units, and list all the treatments. Solution: The two explanatory variables are fertilizer and water. The response variable is the weight of tomatoes produced. The experimental units are the tomato plants. There are 6 treatments: (1) fertilizer, 0.5 gallon (2) fertilizer, 1 gallon (3) fertilizer, 1.5 gallons (4) no fertilizer, 0.5 gallons (5) no fertilizer, 1 gallon (6) no fertilizer, 1.5 gallons Completely randomized designs are the simplest statistical designs for experiments. But just as with sampling, there are times when the simplest method doesn’t yield the most precise results. Definition A block is a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a randomized block design, the random assignment of experimental units to treatments is carried out separately within each block. Form blocks based on the most important unavoidable sources of variability (lurking variables) among the experimental units. Randomization will average out the effects of the remaining lurking variables and allow an unbiased comparison of the treatments. Control what you can, block on what you can’t control, and randomize to create comparable groups. + Blocking Experiments Design Definition A matched-pairs design is a randomized blocked experiment in which each block consists of a matching pair of similar experimental units. Chance is used to determine which unit in each pair gets each treatment. Sometimes, a “pair” in a matched-pairs design consists of a single unit that receives both treatments. Since the order of the treatments can influence the response, chance is used to determine with treatment is applied first for each unit. Experiments A common type of randomized block design for comparing two treatments is a matched pairs design. The idea is to create blocks by matching pairs of similar experimental units. + Matched-Pairs Consider the Fathom dotplots from a completely randomized design and a matched-pairs design. What do the dotplots suggest about standing vs. sitting pulse rates? Experiments Standing and Sitting Pulse Rate + Example: Are these results statistically DISTRACTEDsignificant? DRIVING How many subjects in the experiment? How many drivers stopped at the rest area? How many drivers did not stop? We need 48 cards from the deck to represent the drivers. Since we’re assuming that the treatment received won’t change whether each driver stops at the rest area, we use 33 cards to represent drivers who stop and 15 cards to represent those who don’t. Remove the ace of spades and any three of the 2s from the deck. ** Stop: All Cards with denominations 2 through10 (36 - 3 missing 2s = 33) Don’t Stop: All jacks, queens, kings, and aces (16 – 1 missing ace =15) Activity: DISTRACTED DRIVING Are these results statistically significant? To find out, let’s see what would happen just by chance if we randomly reassign the 48 people in this experiment to the two groups many times, assuming the treatment received doesn’t affect whether a driver stops at the rest area POD Roles for Activity (with each trial switch roles by going clockwise Shuffler/DealerFlipperRecorder/Dot plotter- Only Two in Pod Shuffler/Dealer/Flipper Recorder/Dot plotter Are these results statistically DISTRACTEDsignificant? DRIVING Remove the ace of spades and any three of the 2s from the deck. ** Stop: All Cards with denominations 2 through10 (36 - 3 missing 2s = 330 Don’t Stop: All jacks, queens, kings, and aces (16 – 1 missing ace =15) • Shuffle and deal two piles of 24 cards each – the first pile represents the cell phone group and the second pile represents the passenger group. The shuffling reflects our assumption that the outcome for each subject is not affected by the treatment. Record the number of drivers who fail to stop at the rest area in each group. • Repeat this process 9 more times so that you have a total of 10 trails/ • Make a class dotplot of the number of drivers in the cell phone group who failed to stop at the rest area in each trial. Are these results statistically DISTRACTEDsignificant? DRIVING In what percent of the class’s trials did 12 or more people in the cell phone group fail to stop at the rest area? In the original experiment, 12 of the 24 drivers using cell phones didn’t stop at the rest area. Based on the class’s simulation results, how surprising would it be to get a result this large or larger simply due to the chance involved in the random assignment? Is the result statistically significant? What conclusion would you draw about whether talking on a cell phone is more distracting than talking to a passenger? + Chapter 4 Designing Studies 4.1 Samples and Surveys 4.2 Experiments 4.3 Using Studies Wisely of Inference Well-designed experiments randomly assign individuals to treatment groups. However, most experiments don’t select experimental units at random from the larger population. That limits such experiments to inference about cause and effect. Observational studies that don’t randomly assign individuals to groups, rules out inference about cause and effect. Observational studies that use random sampling can make inferences about the population. Using Studies Wisely What type of inference can be made from a particular study? The answer depends on the design of the study. + Scope : Silence is Golden? Many students insist that they study better when listening to music. A teacher doubts this claim and suspects that listening to music actually hurts academic performance. Here are four possible study designs to address this question at your school. In each case, the response variable will be the students’ GPA at the end of the semester. : Silence is Golden? Many students insist that they study better when listening to music. A teacher doubts this claim and suspects that listening to music actually hurts academic performance. Here are four possible study designs to address this question at your school. In each case, the response variable will be the students’ GPA at the end of the semester. 1. Get all of the students in your AP Statistics class to participate in the study. Ask them whether or not they study with music on and divide them into two groups based on their answer to this question. For each design, suppose that the mean GPA for students who listen to music was significantly lower than the mean GPA of students who didn’t listen to music. Problem: What can we conclude for each design? 1. With no random selection, the results of the study should only be applied to the AP Statistics students in the study. With no random assignment, we should not conclude anything about cause-and-effect. All we can conclude is that AP Stats students who listen to music while studying have lower GPA’s than those who do not listen to music. We don’t know why and we can’t apply these results to any larger group of students. : Silence is Golden? Many students insist that they study better when listening to music. A teacher doubts this claim and suspects that listening to music actually hurts academic performance. Here are four possible study designs to address this question at your school. In each case, the response variable will be the students’ GPA at the end of the semester. 2. Select a random sample of students from your school to participate in a study. Then, divide them into two groups as in Design 1. For each design, suppose that the mean GPA for students who listen to music was significantly lower than the mean GPA of students who didn’t listen to music. Problem: What can we conclude for each design? With random selection, the results of the study can be applied to the entire population—in this case, all the students at this school. With no random assignment, however, we should not conclude anything about cause-and-effect. All we can conclude is that students at this school who listen to music while studying have lower GPA’s than those who do not listen to music. We don’t know why their GPA’s are lower, however : Silence is Golden? Many students insist that they study better when listening to music. A teacher doubts this claim and suspects that listening to music actually hurts academic performance. Here are four possible study designs to address this question at your school. In each case, the response variable will be the students’ GPA at the end of the semester. 3. Get all of the students in your AP Statistics class to participate in a study. Randomly assign half of the students to listen to music while studying for the entire semester and have the remaining half abstain from listening to music while studying. For each design, suppose that the mean GPA for students who listen to music was significantly lower than the mean GPA of students who didn’t listen to music. Problem: What can we conclude for each design? . With no random selection, the results of the study should only be applied to the AP Statistics students in the study. With random assignment, however, we can conclude that there is a cause-and-effect relationship between listening to music while studying and GPA, but only for the AP Statistics students who took part in the study. : Silence is Golden? Many students insist that they study better when listening to music. A teacher doubts this claim and suspects that listening to music actually hurts academic performance. Here are four possible study designs to address this question at your school. In each case, the response variable will be the students’ GPA at the end of the semester. 4. Select a random sample of students from your school to participate in a study. Randomly assign half of the students to listen to music while studying for the entire semester and have the remaining half abstain from listening to music while studying. For each design, suppose that the mean GPA for students who listen to music was significantly lower than the mean GPA of students who didn’t listen to music. Problem: What can we conclude for each design? 1. With random selection, the results of the study can be applied to the entire population— in this case, all the students at this school. With random assignment, we can conclude that there is a cause-and-effect relationship between listening to music while studying and GPA for all the students at the school. Challenges of Establishing Causation Lack of realism can limit our ability to apply the conclusions of an experiment to the settings of greatest interest Using Studies Wisely A well-designed experiment tells us that changes in the explanatory variable cause changes in the response variable. + The Animal Testing and Lack of Realism When new products are being developed for use by humans, the products are often tested on animals first. While animals share some physiological features with humans, they are not the same and we should always be cautious applying the results of tests on animals to humans. In some cases it isn’t practical or ethical to do an experiment. Consider these questions: Does texting while driving increase the risk of having an accident? Does going to church regularly help people live longer? Does smoking cause lung cancer? It is sometimes possible to build a strong case for causation in the absence of experiments by considering data from observational studies. Challenges of Establishing Causation The association is strong. The association is consistent. Larger values of the explanatory variable are associated with stronger responses. The alleged cause precedes the effect in time. The alleged cause is plausible. Discuss how each of these criteria apply to the observational studies of the relationship between smoking and lung cancer. Using Studies Wisely When we can’t do an experiment, we can use the following criteria for establishing causation. + The Ethics • Basic Data Ethics All planned studies must be reviewed in advance by an institutional review board charged with protecting the safety and well-being of the subjects. • All individuals who are subjects in a study must give their informed consent before data are collected. • All individual data must be kept confidential. Only statistical summaries for groups of subjects may be made public. Using Studies Wisely Complex issues of data ethics arise when we collect data from people. Here are some basic standards of data ethics that must be obeyed by all studies that gather data from human subjects, both observational studies and experiments. + Data