Transcript Chapter 5
Chapter 2 Sampling Design How do we gather data? • • • • Surveys Opinion polls Interviews Studies – Observational – Retrospective (past) – Prospective (future) • Experiments Population • the entire group of individuals that we want information about Census • a complete count of the population How good is a census? Do frog fairy tale . . . The answer is 83! Why would we not use a census all the time? 1) 2) 3) 4) Not accurate Very expensive Perhaps impossible Look at the U.S. census – it has a huge amount ofwould error in If using destructive sampling, you Since census ofknow any Suppose you it; taking plus it awanted takes a to long to destroy population • • • population takes time, censuses the average weight of thethe compile the data making Breaking strength of soda bottles are obsolete VERY costly to do! in white-tail deer population Lifetime of data flashlight batteries by the time we Texas – wouldget it be it! feasible to Safety ratings for cars do a census? Sample • A part of the population that we actually examine in order to gather information • Use sample to generalize to population Sampling design • refers to the method used to choose the sample from the population Sampling frame • a list of every individual in the population Jelly Blubber Activity • Select 10 Jelly blubbers that you think are representative of the population of blubbers in regards to length. • Find the mean length of your sample Simple Random Suppose wereeach to take an SRS of Not onlywedoes student has the Sample SHS to students –(SRS) put –each same100 chance be selected but every students’ in a students hat. from Then group 100 has the the • possible consist of name n of individuals randomly 100 from the sameselect chance to names be selected! population chosen inpossible such afor way hat. Eachit student has the same Therefore, has to be all chancetotobe beseniors selected! 100 in order for thatstudents it to be an SRS! –every individual has an equal chance of being selected –every set of n individuals has an equal chance of being selected Stratified random sample Suppose we were to take a stratified Homogeneous groups are groups random sample of 100 SHS students. that are alike based upon some Since students are already divided by characteristic of the group grade level, grade level can be our members. strata. Then randomly select 50 seniors and randomly select 50 juniors. • population is divided into homogeneous groups called strata • SRS’s are pulled from each stratum Systematic random sample Suppose we want to do a systematic random sample of SHS students - number a list of students (There are approximately 2000 students – if we want a sample of 100, 2000/100 = 20) • select sample by following a systematic approach • randomly select where to begin Select a number between 1 and 20 at random. That student will be the first student chosen, then choose every 20th student from there. Cluster Sample Suppose we want to do a cluster sample of SHS students. One way to do this would be to randomly select 10 classrooms during 2nd period. Sample all students in those rooms! • based upon location • randomly pick a location & sample all there For the Jelly Blubber colony: m = 19.41 Multistage sample To use a multistage approach to sampling SHS students, we could first divide 2nd period classes by level (AP, Honors, Regular, etc.) and randomly select 4 second period classes from each group. Then we could randomly select 5 students from each of those classes. The selection process is done in stages! • select successively smaller groups within the population in stages • SRS used at each stage SRS • Advantages• Disadvantages – Unbiased – Easy – Large variance – May not be representative – Must have sampling frame (list of population) Stratified • Advantages • Disadvantages – More precise – Difficult to do if unbiased you must divide estimator than stratum SRS – Formulas for SD – Less variability & confidence intervals are – Cost reduced more complicated if strata already exists – Need sampling frame Systematic Random Sample • Advantages • Disadvantages – Unbiased – Ensure that the sample is spread across population – More efficient, cheaper, etc. – Large variance – Can be confounded by trend or cycle – Formulas are complicated Cluster Samples • Advantages • Disadvantages – Unbiased – Clusters may – Cost is not be reduced representative – Sampling of population frame may – Formulas are not be complicated available (not needed) Identify the sampling design 1)The Educational Testing Service (ETS) needed a sample of colleges. ETS first divided all colleges into groups of similar types (small public, small private, etc.) Then they randomly selected 3 colleges from each group. Stratified random sample Identify the sampling design 2) A county commissioner wants to survey people in her district to determine their opinions on a particular law up for adoption. She decides to randomly select blocks in her district and then survey all who live on those blocks. Cluster sampling Identify the sampling design 3) A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10th customer after them, to fill it out before they leave. Systematic random sampling Random digit table Numbers can be read across. Numbers can of be the readrandom vertically. The following is part digit table found can on be page 847 of your Numbers read diagonally. textbook: •Row each entry is equally to 8be5any 1 likely 4 5 1 0 3of3the 7 2 4 2 5 5 8 0 4 5 7 10 digits 3 8 9 9 3 4 3 5 0 6 • digits are independent of each other 1 0 3 Suppose your population consisted of these 20 people: 1) 1) Aidan Aidan 2) Bob 3) Chico 4) Doug 5) Edward We will11) need to use double 6) Fred Kathy 16) Paul digit 12) random 7) Gloria Lori numbers, 17) Shawnie ignoring13) any number greater 8) Hannah 13) Matthew Matthew 18) Tracy than 20. 9) Israel 14)Start Nan with Row 19) 1 Uncle Sam 10) Jung and 15)read Opus across. 20) Vernon Ignore. Ignore.Ignore. Ignore. Use the following random digits to select a sample of five from these people. Row Stop when five people are selected. So 1 4 5 my1 sample 8 0 would 5 consist 1 3 of 7 :1 2 0 1 5 5 8 0 1 5 7 0 3 8 Aidan, 9 9 Edward, 3 4 Matthew, 3 5 0Opus, 6 3 and Tracy Bias • A systematic error in measuring the estimate Anything that causes the data to be wrong! It • favorsmight certain outcomes be attributed to the researchers, the respondent, or to the sampling method! Sources of Bias • things that can cause bias in your sample • cannot do anything with bad data Voluntary response Remember – the way to determine voluntary • People chose respond An example would be to the surveys in is: to mail in magazines response that ask readers •the Usually onlyexamples people survey. Other arewith callin shows, American Idol, etc. very strong opinions Self-selection!! Remember, the respondent selects respond themselves to participate in the survey! Convenience sampling The data obtained by a convenience sample will be biased – however this method is often used for surveys & results reported in newspapers and An example would be stopping magazines! friendly-looking people in the mall to survey. Another example is the surveys left on tables at restaurants - a convenient method! •Ask people who are easy to ask •Produces bias results Undercoverage People with unlisted phone numbers – usually high-income families •some groups of People without phone numbers – population left Suppose you take a are usually lowsample by randomly income families out names of the selecting from selection the phone book – process some groups will not People with ONLY cell have the opportunity of being selected! phones – usually young adults Nonresponse Because of huge telemarketing • occurs when an individual efforts in the past few years, telephonefor surveys a MAJOR chosen thehave sample People are chosen by the problem with nonresponse! One way to help with theresearchers, problem can’t be contacted or BUT refuse toto participate. of nonresponse is make follow contact with the people who are refuses to cooperate NOT self-selected! not home when you first contact • telephonethem. surveys 70% This is often confused with voluntary nonresponse response! Response bias Suppose we wanted to survey high school students on drug abuse and we used a uniformed police officer to interview each student in our sample – would we get honest Response biasanswers? occurs when for some reason (interviewer’s or respondent’s fault) you get incorrect answers. • occurs when the behavior of respondent or interviewer causes bias in the sample • wrong answers Wording of the The level of vocabulary should be appropriate for the you Questions mustpopulation be worded as Questions are surveying neutral as possible to avoid influencing the influence response. • wording can the – if surveying Podunk, TX, thenare you should answers that givenavoid complex vocabulary. • connotation of words if surveying doctors, •– use of “big” words or then use more complex, technical words technical wording. Source of Bias? 1) Before the presidential election of 1936, FDR against Republican ALF Landon, the magazine Literary Digest predicting Landon winning the election in a 3-to-2 victory. A survey of 2.8 million people. George Gallup surveyed only 50,000 people and predicted that survey Undercoverage – since the Digest’s Roosevelt win. Theetc., Digest’s comes fromwould car owners, the survey people came from magazine car selected were mostly subscribers, from high-income owners, and telephone directories, etc. families thus mostly Republican! (other answers are possible) 2) Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at SMU. You collect register Convenience sampling – easy way to receipts for students as they collect data or leave the bookstore during Undercoverage – students who buy lunch booksone fromday. on-line bookstores are included. 3) To find the average value of a home in Fort Smith, one averages the price of homes that are listed for sale with a Undercoverage – leaves out homes realtor. that are not for sale or homes that are listed with different realtors. (other answers are possible) 4) A new and somewhat controversial polling procedure that replaces the phone with the Internet is being used to conduct surveys. One criticism is that Internet users as a whole are still too highly educated and urban to produce results that accurately reflect all Americans. 5) “More than half of California’s doctors say they are so frustrated with managed care they will quit, retire early, or leave the state.” This statement comes from a survey conducted by the California Medical Association in which surveys were sent to 19,000 doctors and 2000 completed surveys were returned.