Transcript Document
Designs for Research: The Xs and Os Framework Research Methods for Public Administrators Dr. Gail Johnson Dr. Johnson, www.ResearchDemysified.org 1 Steps in the Research Process Planning 1. Determining Your Questions 2. Identifying Your Measures and Measurement Strategy 3. Selecting a Research Design 4. Developing Your Data Collection Strategy 5. Identifying Your Analysis Strategy 6. Reviewing and Testing Your Plan Dr. Johnson, www.ResearchDemysified.org 2 Narrow Definition of Design While sometimes the overall research plan is called a “design,” this discussion focuses on the narrow definition The narrow definition focuses on 3 design elements Dr. Johnson, www.ResearchDemysified.org 3 Three Design Elements 1. When measures are taken After Before and After Multiple times before and/or after 2. Whether there are comparison groups 3. Whether there is random assignment to comparison groups Dr. Johnson, www.ResearchDemysified.org 4 Three Broad Categories for Research Design Experimental Quasi-Experimental Non-Experimental Dr. Johnson, www.ResearchDemysified.org 5 Experimental Design The best design to use for cause-effect questions because it rules out most other possible explanations for the results obtained. Random assignment assures that the two groups are comparable. Dr. Johnson, www.ResearchDemysified.org 6 The Xs and Os Framework R O indicates Random assignment to the treatment group or the comparison group is the Observation, that is, the measure for the dependent variable Examples: earnings, weight, test scores, stock market trading, reported crime rate, kilowatt hours, reported discrimination, poverty rate, number of people unemployed, GDP, etc) The researchers are looking to see if these measures change because of the treatment Dr. Johnson, www.ResearchDemysified.org 7 The Xs and Os Framework X is the treatment, which may be a: A particular medication A particular exercise regimen A program (eg. Head Start Program or Troubled Asset Relief Program) An independent variable (eg.economic news stories, sunspot activity, a change in daylight savings hours, etc) Dr. Johnson, www.ResearchDemysified.org 8 Example: Which approach works better in learning statistics: using computer software or calculating formulas by hand? Experimental Design: Create Comparison Groups: Group 1: X: computers to do formulas Group 2: no computers Randomly assign students into 2 groups Observe: test scores before and after Dr. Johnson, www.ResearchDemysified.org 9 Using the Xs and Os Framework R R O1 O1 X O2 O2 R indicates Random assignment O is the Observation (test scores). Testing statistical knowledge before and after X is the treatment (in this case the use of computers) Dr. Johnson, www.ResearchDemysified.org 10 Experimental Design: Xs and Os Variation: No Pre-Measure Sometimes it is not possible to have a pre-measure For example: I am testing to see whether a welfare to work training program results in people getting jobs with above poverty wages. I can randomly assign people to the program or the control group, but I will not have a good measure for wages before they entered the program since they are all on welfare Dr. Johnson, www.ResearchDemysified.org 11 Experimental Design: Xs and Os Notation Variation: No Pre-Measure (note there are on observations before the treatment) R X O2 R O2 Dr. Johnson, www.ResearchDemysified.org 12 Quasi-Experimental Designs Non-Equivalent Comparison Design Like experimental except no random assignment Use when you cannot control the process for deciding who gets the treatment. Weak because there may be selection bias But this is often more practical in public sector research Dr. Johnson, www.ResearchDemysified.org 13 Quasi-Experimental Design: Xs and Os O1 O1 X O2 O2 Treatment Group Control Group Key elements: Random Pre and Post Measurement Treatment to Test Group Control Group (or comparison group without the treatment—but there is no random assignment). Dr. Johnson, www.ResearchDemysified.org 14 Quasi-Experimental Designs Does spanking make a difference? Can we randomly assign children to spanking and non-spanking parents? No: We have to deal with the world as it exists At best we can compare the behavior children from parents who spank with children whose parents don’t spank. Dr. Johnson, www.ResearchDemysified.org 15 Types of Quasi-Experimental Designs Statistical Controls (sometimes called Correlation with Statistical Controls. variations: Causal Comparative or Ex Post Facto design Basically: statistical procedures are used to create comparisons group Dr. Johnson, www.ResearchDemysified.org 16 Ex-post Facto Design: Study of Child Abuse and Neglect A study funded by the Army Medical Research and Material Command, reported, “During the 40 months covered by the study, 1,858 parents in 1,771 families of enlisted soldiers neglected or abused their children, in a total of 3,334 incidents involving 2,968 children. Of those, 942 incidents occurred during deployments.”[1] [1] Aaron Levin, “Children of U.S. Army soldiers face increased risk of maltreatment while a parent is deployed away from home,” Psychiatry News, September 7, 2007, Volume 42, Number 17, page 8; “Child Abuse, Neglect Rise Dramatically When Army Parents Deploy To Combat,” ScienceDaily, August 1, 2007, http://www.sciencedaily.com /releases/2007/07/070731175911.htm Dr. Johnson, www.ResearchDemysified.org 17 Ex-post Facto Design: Study of Child Abuse and Neglect In this study, the researchers gathered data about children at a child care center serving military families and looked at the characteristics among those that were reported to have been abused or neglected as compared to those that were not. They looked backwards to see if there were some differences that might explain why some children were abused and neglected. They found that deployments were a factor From a policy perspective, this suggests that families require more supports to handle the stresses associated with deployments. Dr. Johnson, www.ResearchDemysified.org 18 Correlational Design with Statistical Controls We cannot randomly assign people but can create comparison groups using statistical software and then compare outcomes. Eg. We can compare people from different income groups to see if income is related to birth weights of their babies. Eg. We can compare citizen policy preferences to see if there are differences based on age, race or gender. Dr. Johnson, www.ResearchDemysified.org 19 Does Head Start Make a Difference? Select all 8th graders from two inner-city schools Obtain school records which has information about whether the attended Head Start as well as other information Statistical software can divide all the 8th graders into two groups: those who attended Head Start and those that didn’t The 8th-grade reading scores can be compared Dr. Johnson, www.ResearchDemysified.org 20 Does Head Start Make a Difference? If Head Start made a difference, then: Their scores will be higher than those who did not Their scores will be similar to the scores of other 8th graders in the school district It might be possible to look at other factors, assuming the data is in their permanent records: Education of parents, family income, other pre-school experiences Dr. Johnson, www.ResearchDemysified.org 21 More Quasi-experimental Designs Longitudinal and Time Series Measures taken over time Time series: many measures Longitudinal: a few measures No clear dividing point when longitudinal becomes a time series Example: Federal budget deficit over time Noted: O O O O O O O O O O O O O Dr. Johnson, www.ResearchDemysified.org 22 More Quasi-experimental Designs Interrupted Time Series Measures taken before and after an event Time series: at least 15 measures before and after Example: Number of smog warnings before and after air pollution legislation was passed in the city Noted: O O O O X O O O O O Dr. Johnson, www.ResearchDemysified.org 23 More Quasi-experimental Designs Multiple Time Series: Comparison Example: number of smog days after city passes air pollution legislation as compared to a city of equal size and density that did not pass an air pollution law Noted:O O O O O X O O O O OOOOO OOOO Dr. Johnson, www.ResearchDemysified.org 24 More Quasi-experimental Designs Two ways to select: Cross-sectional: slice of the population: a different group of people, roads, cities at each point in time Drug survey of high school seniors Panel: track the same people, roads, cities over time National Longitudinal Survey of Youth: same group of people have been surveyed since 1979 Dr. Johnson, www.ResearchDemysified.org 25 Non-Experimental Designs Sometimes researchers are just trying to take a picture at one point in time They are not trying to answer a causeeffect/impact question These designs are appropriate for answering descriptive and normative questions discussed earlier Dr. Johnson, www.ResearchDemysified.org 26 Non-Experimental Design One shot: X O Key elements: No random assignment No pre-measures No comparison Weakest design for cause-effect questions!! Dr. Johnson, www.ResearchDemysified.org 27 Non-Experimental Design Variations: Before and After Design O X O Static Group Comparison X O O Dr. Johnson, www.ResearchDemysified.org 28 True Confessions Immigration Reform and Control Act Employers would be fined if they knowingly hired illegal workers GAO was asked to determine whether this law caused a widespread pattern of discrimination against those who look or sound foreign. Type of Question: Cause-Effect Dr. Johnson, www.ResearchDemysified.org 29 True Confessions What design elements can be used? Random Assignment? No. Congress does not randomly require some states to implement a law and some states not. Comparison Groups: No. All states had to implement at the same time. Before measure: No. The law was implemented before any measure could be taken. Dr. Johnson, www.ResearchDemysified.org 30 True Confessions What design is left? Implement the law (X) and measure discrimination (O) A one-shot design The weakest design to answer an impact question. You play the hand you are dealt. Dr. Johnson, www.ResearchDemysified.org 31 Sometimes Experimental Designs are not Possible Designs reflect the situation and an experimental design is not always possible or practical. You can’t assign children to parents who spank and those who do not It might be more practical to conduct a reading program in a specific school rather than randomly children across the school district into a reading program or not. Dr. Johnson, www.ResearchDemysified.org 32 Sometimes Experimental Designs are not Possible In public administration, the uses of experimental designs are limited by other ethical and legal considerations: You cannot require anyone to participate. You cannot deny services or benefits to which people are entitled. You cannot deny life-saving treatments to people in need. Dr. Johnson, www.ResearchDemysified.org 33 Sometimes Experimental Designs Are Not Possible Politics may play a role: mayors may object to their city being in the “control group” while other cities get money to implement a program. Dr. Johnson, www.ResearchDemysified.org 34 Design and Internal Validity You may see changes after a program has been implemented, but those changes might be caused by something other than the program. The intention of design is to ensure that you are not tricked into believing an explanation that is not true. Design helps ensure internal validity. Design eliminates other possible (or rival) explanations. Dr. Johnson, www.ResearchDemysified.org 35 Threats to Internal Validity History: Due to a particular event that took place while data was being conducted. Drug-related death just before post-test may explain “no drug” attitude, not the program. Using a comparison group in the same environment will reduce this threat. If a comparison group is not possible, ask: “what has happened to determine if there was some event that might effect the results?” Dr. Johnson, www.ResearchDemysified.org 36 Threats to Internal Validity Maturation: Changes based on aging, growth, natural increases in skills Improved study skills because of maturity, not the program This matters in studies where the behavior or attitude likely to affected by getting older or becoming more experienced Using a comparison group will reduce this threat Dr. Johnson, www.ResearchDemysified.org 37 Threats to Internal Validity Testing: changes do to learning how to take the test. Risk in pre/post designs where they they “learned” how to do the test. Using a comparison group would reduce this threat because both groups would have taken the pre and post tests. Any learning from the testing alone would be controlled. Dr. Johnson, www.ResearchDemysified.org 38 Threats to Internal Validity Instrumentation: Changes in data collection Pre/post and comparative designs are vulnerable Example: Interviewer changes (race/gender) may get different results, especially on race/gender questions. Example: Changing wording in questions or changing measures are a problem because different things have been measured. The results on not truly comparable. Ask: are the the measures reliable? Dr. Johnson, www.ResearchDemysified.org 39 Threats to Internal Validity Regression to the Mean: Things tend to average out over time A problem when a group is selected for treatment or a program is enacted because of an unusually high or low score. The next set of scores are likely to change--to “regress to the mean”–regardless of treatment. Using measures over time helps or a comparison time series identify trends and makes it easier to assess real change from just the appearance of change because of the regression to the mean effect. Dr. Johnson, www.ResearchDemysified.org 40 Threats to Internal Validity Selection: The group under study may be different in ways that effect the results. School selected for a program is different from the schools that were not selected A low income school may score different than a high income school Volunteers may be different than those who chose not to participate. “Did the program officials select the people most likely to succeed to make the program look successful?” Dr. Johnson, www.ResearchDemysified.org 41 Threats to Internal Validity Selection: The group under study may be different in ways that effect the results. Random selection and assignment avoids this problem But if random is not possible, collect data that might help examine differences (demographic data usually work) Dr. Johnson, www.ResearchDemysified.org 42 Threats to Internal Validity Attrition: different rates of dropping out may effect results. “Problem” people may drop out, so results may look better based on those left behind. Eg. Test scores may be higher because the failing students had dropped out. Do what is possible to avoid attrition. If there is attrition, researchers should note as limitation to conclusions that can be drawn. Dr. Johnson, www.ResearchDemysified.org 43 Did the Poverty Program Fail? Year 1960 Poverty Rate 22.2% 1970 12.6 1980 12.3 1990 13.5 2000 11.3 Dr. Johnson, www.ResearchDemysified.org 44 How to Decide? Measurement: How do you define the “poverty program”? What components of the poverty program were specifically designed to reduce poverty? How was poverty operationalized? Does food account for 1/3 of our living expenses? Design: No control group, no random assignment At best, an interrupted time series design We do not know what percent of people would be below the poverty line if the “poverty program” was not in place during any of the recessions between 1960 and 2000. Dr. Johnson, www.ResearchDemysified.org 45 External Validity Does what happens in the lab under controlled settings likely to be the same as what happens outside of the lab? Does what happens in this study reflect what occurs in other places where the program is also being conducted? Programs may share the same name but be implemented differently. Dr. Johnson, www.ResearchDemysified.org 46 External Validity Experimental designs are strong on internal validity But are often weak on external validity Relatively small and therefore rarely representative of the larger population Much of what we know about social psychology comes from experiments involving college students—but they may or may not accurately reflect how other people behave. Dr. Johnson, www.ResearchDemysified.org 47 External Validity It is easy for policymakers, program managers and advocates to get excited about an innovative program or policy and decide to implement it in their community. The tough question is one of external validity: will this program or policy will work in their particular situation? Public administrators have long known about the limits of “cookie cutter” or “one-size-fits all” approaches. Dr. Johnson, www.ResearchDemysified.org 48 No Perfect Design One-shot designs: Useful for descriptive and normative questions Very weak for cause/effect questions: many threats However, it is often used in public administration. We implement a program and then see if it worked. Multiple one-shot designs begin to build a case Dr. Johnson, www.ResearchDemysified.org 49 No Perfect Design Pre/post designs: Useful in giving context for measuring change. Threats: testing, instrumentation, regression to the mean, attrition, history, and maturation may be threats. Threats tend to be context related For example, regression to the mean is only a threat if a an unusually high or low score was used as the selection criteria. For example, testing is only a threat if the researchers used a before and after test as part of their research design. Dr. Johnson, www.ResearchDemysified.org 50 No Perfect Design Comparison designs: Useful in looking at differences Controls for history and maturation if comparison group is a close match. Selection and attrition are threats Dr. Johnson, www.ResearchDemysified.org 51 No Perfect Design Experimental design: Controls for most threats by design Hard to do in the public sector It is hard to randomly assign people or localities to receive a program or not. It is sometimes unethical to deny people access to treatment just to form a control group. Dr. Johnson, www.ResearchDemysified.org 52 Linkage: Question Design Descriptive Questions: (What is?) One shot designs Pre/post Designs Cross-sectional surveys Time series. Describes inputs and outputs Dr. Johnson, www.ResearchDemysified.org 53 Linkage: Question Design Normative questions: Does the observed condition meet a given criterion? One shot design Pre/post design Time series Benchmarking is a normative question. Dr. Johnson, www.ResearchDemysified.org 54 Linkage: Question Design Impact Questions: To determine the relationship or effect between two variables or program impact Experimental designs: considered the gold standard Quasi-experimental designs using a comparison group and a pre/post design. Interrupted time series Correlational statistical designs. Dr. Johnson, www.ResearchDemysified.org 55 Elements of Good Research No single design can be applied to every research question because every situation is unique. Good researchers identify the reasons for the tradeoffs and the potential weaknesses of the study’s design. Sometimes there really is not much choice because the situation itself is very limited. In these cases, researchers state the limitations of the design and present their conclusions within the context of those limitations Dr. Johnson, www.ResearchDemysified.org 56 Takeaway Lessons If the research is claiming a cause-effect relationship, caution is warranted if: You cannot determine the design in terms of the Xs and Os framework If the researchers did not use an experimental design or a strong quasi-experimental design Dr. Johnson, www.ResearchDemysified.org 57 Tough Questions to Ask Is there something else other than the program (or hypothesized causal variable) that could explain these results? Has something been left out that might alter the results? Is the research design really strong enough to support their conclusions about program or policy success or failure? Dr. Johnson, www.ResearchDemysified.org 58 Creative Commons This powerpoint is meant to be used and shared with attribution Please provide feedback If you make changes, please share freely and send me a copy of changes: [email protected] Visit www.creativecommons.org for more information Dr. Johnson, www.ResearchDemysified.org 59