Transcript Slide 1
Issues in case-control studies Internal Medicine Samsung Medical Center Sungkyunkwan University School of Medicine Kwang Hyuck Lee [email protected] Issues in case-control studies Eliseo Guallar, MD, DrPH [email protected] Presenter’s Name Date Juhee Cho, M.A., Ph.D. [email protected] Case-control study – historical synonyms Retrospective study Trohoc study Case comparison study Presenter’s Name Case compeer study Date Case history study Case referent study 3 Case Control Study Disease Case Exposed Yes No Yes No A1 B1 A B0 Presenter’s Name 0 Control Date A1 B0 OR (cross product ratio) A0 B1 생체 간이식 후 간수치 상승 환자에서 담도 협착의 조기 발견과 관련된 요인 Presenter’s Name Date 오초롱, 이광혁, 이종균 , 이규택 , 권준혁*,조재원*, 조주희** 성균관대학교 의과대학, 삼성서울병원 소화기내과, 이식외과*, 암교육센터** 연구목적 생체간이식(LDLT) 후 발생하는 담도 합병증 가장 좋은 치료인 내시경적 치료 성공률 : 50% 전후 담도 합병증을 조기에 발견하여 내시경적 배액술을 시행하면 성공률이 높다. LDLT 후 간 기능 이상 소견을 보이는 환자 중에 담도 합병증을 예측할 수 있는 요인을 찾고자 하였다. 대상 및 방법 기간 및 대상 환자 2006년 1월부터 2008년 12월 생체간이식을 받은 환자 수술 후 회복된 간기능이 다시 악화되었던 환자 duct to duct 문합 환자만 포함(hepaticojejunostomy 환자는 제외) 조사한 항목 기저질환, 증상 간기능 검사 수술기록 영상의학검사 분석 group LDLT 후 간수치가 재상승한 환자를 대상으로 group 을 나눔 (상승 기준 : AST>80, ALT>80, ALP>250 or bilirubin>2.2) Group A : ERCP가 필요한 환자 Vs ERCP 필요하지 않은 환자 Group B : 문합부 담도협착 환자 Vs 거부반응 환자 Group C : CT 상 협착소견이 없었던 환자 중에 ERCP가 필요한 환자 Vs 필요하지 않은 환자 LDLT patients during 3years : n=213 Patients with LFT elevation : n=120 Analysis group A need ERCP n=74 not need ERCP n=46 Analysis group B stricture 58 rejection 23 leakage 13 infection 7 stone 3 HCC 5 viral reactivation 3 vessel stenosis 3 etc 5 Analysis group C CT(-) need ERCP : 32 CT(-) not need ERCP : 40 Case-Control Study or not? Presenter’s Name Date 11 Presenter’s Name Date 12 Presenter’s Name Date Brock MV, et al. N Engl J Med 2008;358:900-9 13 Conducting case-control studies Case and Control selection Exposure measurement Presenter’s Name Odds ratio Date Research New Question ?? Method Clinical study Translational study Laboratory study Presenter’s Name ClinicalDate study Observational studies • Case-control study Vs Cohort study Randomized controlled trial Why case-control studies? New question of interest Cohort study with the appropriate outcome or exposure ascertainment does NOT exist Need toPresenter’s initiate a new study Name Date Do you have the time and/or resources to establish and follow new cohort? 16 Case control study ?? High cholesterol Myocardial infarction MI (+) case MI (-) control Cholesterol level ResultPresenter’s Name Date • Negative • Positive 17 Impetus for case-control studies : EFFICIENCY May not have the sufficient duration of time to see the development of diseases with long latency periods. May not have the sufficiently large cohort Presenter’s Name to observe outcomes of low incidence. Date NOTE: Rare outcomes are not necessary for a case-control study, but are often the drive. 18 Presenter’s Name Date 19 Efficiency of case-control study Do maternal exposures to estrogens around time of conception cause an increase in congenital heart defects? Assume RR = 2, 2-sided α = 0.05, 90% power Cohort study: If I0 = 8/1000, I1 = 16/1000, would need 3889 exposed Presenter’s Name and 3889 unexposed mothers Date Case-control study: If ~30% of women are exposed to estrogens around time of conception, would need 188 cases and 188 controls Schlesselman, p. 17 20 Strengths of case-control study Efficient – typically: Shorter period of time Not as many individuals needed Cases are selected, thus particularly good for rare diseases Presenter’s Name Date Informative – may assess multiple exposures and thus hypothesized causal mechanisms 21 Learning objectives Exposure Selection of cases and controls Bias Selection, Recall, Interviewer, Information Odds ratios Presenter’s Name MatchingDate Nested studies Conducting a case-control study DCR Chapter 8 22 Exposure ascertainment – examples Active methods Questionnaire (self- or interviewer- administered) Biomarkers Passive methods Presenter’s Name Medical records Date Insurance records Employment records School records 23 Exposure ascertainment issues Establish biologically relevant period Measurement occurs once at current time Repeated exposure Previous exposure Measure of exposure occurs after outcome Presenter’s Name has developed Date Possibility of information bias Possibility of reverse causation (outcome influences the measure of exposure) 24 Is it possible in case-control study? – relevant period Presenter’s Name Date Yesterday smoking and radiation Cancer risk 25 Information bias: recall bias Mothers of babies born with congenital malformations more likely to recall (accurately or “over-recall”) events during pregnancy such as illnesses, diet, etc. Presenter’s Name Date 26 Possibility of reverse causation High cholesterol Myocardial infarction MI (+) case MI (-) control Cholesterol level Name ResultPresenter’s ? Date MI Cholesterol level decrease Measure cholesterol after MI 27 Case selection – basic tenets Eligibility criteria Characteristics of the target and source population Diagnostic criteria Definition of a case: misclassification Presenter’s Name Date Feasibility 28 Source populations – samples Health providers: clinics, hospitals, insurers Occupations: work place, unions Surveillance/screening programs Laboratories, pathology records Birth records Presenter’s Name Existing cohorts Date Special interest groups: disease foundations or organizations 29 Incident versus prevalent cases Incident cases: All new cases of disease cases (that become diagnosed) in a certain period Prevalent cases: Presenter’s Name All current cases Date of when the case was diagnosed regardless 30 Incident Vs Prevalence Do the cases represent all incident cases in the target population? Exposure–disease association Vs Exposure–survival association Presenter’s Name Date 31 Prevalence cases Disease only A (causal factor) A+B (protective factor) A+C (protective factor) Patient A: A1 Presenter’s Name Patient Date B: A1+B Patient C: A1+C 1-month survival 1-year survival 10-year survival 1 month 1 year 10 years Prevalence cases A1,B,C : Causes intervention of B or C ↓↓Survival 32 Disease severity Which stage is chosen for a case? Early stage only Late stage only Progression not always Influence of severity Increase sample size for stratification Presenter’s Name Date 33 Early stage only Case selection was done in prevalent cases of thyroid cancer Case: small thyroid cancer Control: normal population Determined the differences Presenter’s Name Date Clinical meaning of this study if there is no difference of survival between them 34 Late stage only – difficult diagnosis Pancreatic cancer Vs. Weight Cases: late stage pancreatic cancer Low weight due to Cancer progression Conclusion low weight pancreatic cancer Presenter’s Name Date Increase sample size for stratification 35 Selection bias Selection of cases independent of exposure status Related to severity Related to hospitalization or visiting Presenter’s Name Date 36 Example selection bias (1) Hypothesis Common cold Asthma Setting Patients in Hospital Truth Presenter’s Name Common Date cold: aggravating factor not causal factor No different incidence of asthma according to common cold Common cold (+) aggravation hospital visit Common cold (-) no symptoms no visit 37 Example selection bias (2) Total Common cold in society Patients in hospital Common cold in hospital Asthma 1000 10 50 10 General 200000 2000 1000 20 (10+ alpha) Cause positive Cause negative Case (asthma) 10 40 Control 1 49 Presenter’s Name Date Odds ratio = (1X49)/(4X1) 38 Case and Control selection Presenter’s Name Date Same distribution of risk factors ?? 39 Presenter’s Name Date Guallar E, et al. N Engl J Med 2002;347:1747-54 40 Selection of controls – basic tenets Same target population of cases Confirmation of lack of outcome/disease Selection needs to be independent of exposure Presenter’s Name Date 41 Controls in case-control studies Should have the same proportion of exposed to non-exposed persons as the underlying cohort (source population) Should Presenter’s answerName yes to: If developed disease of interest Date during study period, would they have been included as a case? 42 Selecting controls – Same as case source Characteristics 1. 2. 3. 4. Convenient Most likely same target population Rule out outcome – avoids misclassification Similar factors leading to inclusion into source population 5. Sometimes impractical Examples Breast cancer screening program Presenter’s Name • Confirmed breast cancer – cases Date cancer – controls • No breast Same hospital as case series • Similar referral pattern – examine by illness types Pediatric clinics Geographic population Other special populations (e.g., occupational setting) 43 Source for controls Geographic population Roster needed Probability sampling Neighborhood controls Random sample of the neighborhood Presenter’s Name FriendsDate and family members Hospital-based control 44 Selection of controls: Friends or family members Friends or family members Ask each case for list of possible friends who meet eligibility criteria Randomly select among list Type of matching - will be addressed later Concerns: Presenter’s Name May inadvertently select on exposure status, that is, Date friends because of engaging in similar activities or having similar characteristics/culture/tastes “over-matching” 45 Presenter’s Name Date Am J Epidemiol 2004;159:915-21 46 Selection of controls Hospital or clinic-based Strengths Ease and accessibility Avoid recall bias Concerns Section bias: exposure related to the hospitalization A mixturePresenter’s of the best defensible control Name Date Referral pattern Same Or not 47 Diet pattern: Colon cancer 소화기 암 전문 병원 (GI referral center)에서 연구를 수행함 Case : 소화기 클리닉의 대장암 (+) Control : 호흡기 클리닉의 대장암 (-) • 소화기 클리닉: 대기실 소화기 암 관련 음식 정보 • 호흡기 클리닉 Presenter’s Name Date 차이는 질환의 차이가 아니라 두 군 간에 클리닉의 차이를 반영할 수도 있다. Control :소화기 클리닉의 위암 (+) 48 Presenter’s Name Date Guallar E, et al. N Engl J Med 2002;347:1747-54 49 Weakness of Case-Control Studies Time period from which the cases arose Survival factor, Reverse causation Biologically relevant period Only one outcome measured Susceptibility to bias Presenter’s Name Separate sampling of the cases and controls Date Retrospective measurement of the predictor variables 50 Issues in case-control studies Eliseo Guallar, MD, DrPH [email protected] Presenter’s Name Date Juhee Cho, M.A., Ph.D. [email protected] Case and Control selection Presenter’s Name Date Same distribution of risk factors ?? 52 Selection of cases Case selection in hospitals Alcohol Hip fractures: All visit hospitals IUD abortion 1st abortion: Some visit but others not Women with IUD in general population more frequently visit clinics Target population Study sample Presenter’s Name Disease DateNo disease Exposed Non-exposed A B C D Disease No disease Exposed a b Non-exposed c d 53 1st abortion: 3% rate and no relation of IUD IUD: frequent visit General population case control Yes 10 10 No 90 90 100 100 case control IUD(+) 1000 970/30 IUD(-) 9000 8730/270 Hospital population Presenter’s Name 90% 873/27 IUD (+) Date IUD (-) 45% 4050/120 Yes 18 No 82 100 Control: general population difference due to frequent visit Control: Hospital population theoretically same unless this control group has higher abortion rates due to other problems Control mixture: both 54 Actual situation Limited cases Presenter’s Name Selection bias from control selection Date 55 Presenter’s Name Date 56 Presenter’s Name Date Nomura A, et al. N Engl J Med 1991;325:1132-6 57 Selection bias in nested case-control study Controls were excluded if they had had gastrectomy or history of peptic ulcer disease Controls with a cardiovascular disease or cancer at baseline or during follow-up were excluded Target population Study sample Presenter’s Name Disease No disease Date Disease No disease Exposed A B Exposed a b Nonexposed C D Nonexposed c d 58 Presenter’s Name Date 59 Presenter’s Name Date MacMachon B, et al. N Engl J Med 1981;304:630-3 60 Presenter’s Name Date MacMachon B, et al. N Engl J Med 1981;304:630-3 61 Presenter’s Name Date MacMachon B, et al. N Engl J Med 1981;304:630-3 62 Selection bias in case-control study Controls were largely patients with diseases of the gastrointestinal tract Control patients may have reduced their coffee intake as a consequence of GI symptoms Target population Study sample Presenter’s Name Disease No disease Disease No disease Date Exposed A B Exposed a b Nonexposed C D Nonexposed c d 63 Presenter’s Name Date 64 Presenter’s Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 65 Presenter’s Name Date Non-GY Control GY Control Antunes CMF, et al. N Engl J Med 1979;300:9-13 6.0 2.1 66 Criticisms of prior case-control studies Diagnostic surveillance bias Women on estrogens are evaluated more intensively – they are more likely to be diagnosed and to be diagnosed at earlier stages Women with asymptomatic cancer who receive estrogens are more likely to bleed and to be diagnosed Presenter’s Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 67 To avoid selection bias in case-control studies Selection of cases Types of cases selected (non-fatal, symptomatic, advanced) Response rates among cases Relation of selection to exposure – Are exposed cases more (or less) likely to be included in the study? Selection of controls Type of controls (general population, hospital, friends and Presenter’s Name relatives) Date controls, diseases selected as control conditions For hospital Response rate among controls Relation of selection to exposure – Are exposed controls more (or less) likely to be included in the study? Similar response rates in cases and controls do NOT rule out selection bias 68 Presenter’s Name Date 69 Recall issues All information in case-control studies is historic, so if relying on reporting by participants, accuracy depends on recall Concerns: Do cases recall prior events differently from controls? Mindset of someone Presenter’s Name with disease : Is there something that I did that may have caused the disease? Date Recall Bias (Information Bias) 70 Recall bias – example Mothers of babies born with congenital malformations more likely to recall (accurately or “over-recall”) events during pregnancy such as illnesses, diet, etc. Presenter’s Name Date 71 Presenter’s Name Date 72 Folic acid and neural tube defects Figure 1: Features of neural tube development and neural tube defects. Botto et el. Neural tube defects. NEJM 1999. (28th days after fertilization) Background and Aim A reduced recurrent risk of neural tube defects among women receiving muti-vitamin supplements containing folic acid. Most of NTDs are de-novo; less than 10% of NTDs are recurrent. First occurrence of only NTDs and periconceptional folate supplements Study population Pregnant women Target Source Study Case NTDs Control Other major malformations due to recall bias Subjects with oral clefts were excluded because vitamin supplementation has been hypothesized to reduce the risk: selection bias Overall data Folate (+) OR = 0.6 (0.4 – 0.8) 76 Recall Bias: Previous knowledge 77 Recall Bias quantification Case Control OR In this study 1000 1000 real 500 800 0.625 Control – 75% all 400 600 0.667 Case – 80% 0.6 Prev known 450 600 0.750 Case – 90% 0.8 Prev unknown 375 600 0.625 Case – 75% 0.4 Recall rate 78 Recall bias – assessment / avoidance Check with recorded information, if possible Use objective markers or surrogates for exposure – careful of markers that are affected by disease Ask participant to identify which factor(s) are Presenter’s Name important for disease Date Build in false risk factor to test for overreporting Use controls with another disease 79 Study population Pregnant women Target Source Study Case NTDs Control Other major malformations due to recall bias Subjects with oral clefts were excluded because vitamin supplementation has been hypothesized to reduce the risk: selection bias Selection bias If oral clefts were included in control group, control with exposure (lack of vitamin supplement or folate intake) increased. As B number increases, the probability of rejecting null hypothesis decreases. Cleft = ↓intake of vitamin Case Control Exposure (+) A B Exposrue (-) C D Exposure: lack of folate intake Methods Periconceptional folic acid exposure was determined by Interview with study nurses Demographic Health behavior factors Reproductive history Family history of birth defects Occupation Illnesses (chronic and during pregnancy) Use of alcohol, cigarettes and medications Vitamin use during the 6 months before the last LMP through the end of pregnancy Semi-quantitative food frequency questionnaire Knowledge of vitamins and birth defects Confounding Exposure ↓ Folate intake Confounding Alcohol Outcome ↑ NTDs Interviewer bias Differential interviewing of cases and controls, i.e., may probe or interpret responses differently Presenter’s Name Date Interviewer Bias (Information Bias) 84 Interviewer bias – avoidance / assessment Self-administered instruments (prone to more non-response) Standardized instruments Computerized instruments (CADI, ACASI) Avoid open-ended questions but rather use Name possible response elicited questionsPresenter’s with each Date Training Masking interviewers to research question Masking interviewers to case/control status Same interviewers for cases and controls 85 Odds ratio Disease Exposed Yes No Yes No A1 B1 A B0 Presenter’s Name 0 Date A1 B0 OR (cross product ratio) A0 B1 Example: CHD and Diabetes CHD Yes Diabetes No Yes 183 65 No 575 735 Presenter’s Name Date 183 / 65 ORCHD 3.62 575 / 735 No units! 87 Some properties of odds ratios Null value: OR = 1 OR >= 0 (cannot be negative) Multiplicative scale (be careful with plots) Use logistic regression to estimate multivariate adjusted odds ratios in casePresenter’s Name control Date studies 88 Odds ratios and the “rare disease assumption” With incidence density sampling (represents underlying cohort at time of case) and sampling of cases and controls independent of exposure: OR ≈ IR With outcomes of very low incidence in the underlyingPresenter’s cohortName and sampling of cases and Date controls independent of exposure: OR ≈ RR Higher incidence increases the bias away from the null 89 Presenter’s Name Date 90 Matching Individual matching Frequency matching Stratified matching Nested study Presenter’s Name Case-control study Date Case-cohort study 91 Matching in cohort study – example Presenter’s Name Date Siegel DS, et al. Blood 1999;93:51-4 92 Matching in case-control studies – individual matching Pairing or grouping controls to case by known risk factors in the design phase, i.e., when selecting controls In protocol, define matching characteristics and their “boundaries” Dichotomous or categorical: self-explanatory (e.g., sex, race, blood type, disease stage) Presenter’s Name Continuous: can be exact, or typically a window (e.g., age ± Date 5 years, CD4 cell count ± 50 cells) For each recruited case, search in control source population for the person(s) who meet the matching criteria Select 1 or more of them at random 93 Odds ratio – matched pairs Case Control # pairs A1 B1 n11 A1 B0 n10 A0 Presenter’s Name B1 n01 B0 n00 Date A0 N = total # pairs N pairs = N cases and N controls 2 N people 94 Presenter’s Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 95 Frequency matching Select cases Examine distribution of potential confounder (matching variable) Select controls so that they have same distribution of the potential confounder Presenter’s Name ConductDatestratified analyses or regression to control for the induced selection bias 96 Stratified sampling – alternative to matching Decide up front what distribution of cases and controls according to confounder is desired Select cases and controls so that expectations are met Selection of controls does not depend on Presenter’s Name cases being selected first Date Note that distribution of confounder is not the distribution one may see among all cases in the population 97 Stratified sampling – example Want 50% females in 100 cases and controls 50 female cases and 50 male cases 50 female controls and 50 male controls In the study period, 175 incident male cases and 75 incident female cases occur As theyPresenter’s occur,Name enroll cases until 50 are Date recruited in each stratum Throughout study period, enroll 50 male and 50 female controls 98 Matching – limitations Cannot examine the independent effect of matched variable on outcome Cases are controls are balanced for the matched factor May be costly to perform May inadvertently match On the exposure itself or its surrogate On a factor in the causal pathway Presenter’s Name On a factor that is affected by the outcome Date Matching on an exposure-related factor but not a disease determinant may reduce the statistical efficiency (matched cases and controls with same exposure are not used in matched analysis) Logistical complexity of matching 99 Matching – strengths Costs of finding a matched control may < costs of performing tests to assess confounding < costs of recruiting additional controls to yield enough persons across entire range of confounding variable Particularly useful Presenter’s Name when distribution of Date confounders is very different in cases and controls Increases amount of information/subject Matching yields same ratio of cases and controls according to distribution of matched variable 100 Nested studies In an existing cohort study New questions arise Need efficient method to use existing information Do not want to conduct methods on entire cohort, due to limited resources Presenter’swithout Name Nest a study sacrificing validity and Date too much precision Some nesting options: Case-cohort • Sub-cohort Case-control 101 Nested Case-Control and CaseCohort Studies Case-comparison studies Use all cases or representative subset as of date of analysis Comparison group: Cohort member for all nested designs Study Design Presenter’s Name Case-control Date Comparison Case-cohort Event-free member at time of case’s event (incidence density sampling) Members of subcohort, selected at random from cohort at time of enrollment, at risk at time of case’s event= In the subcohort riskset 102 Full Cohort 10 20 30 1 S1 1 S6 2 S3,S8 8 6 4 35 S1 S2 S3 S4 S5 S6 Presenter’s Name S7 Date S8 Events: A At risk: N S1,S2,S3,S4,S5,S6,S7,S8 S3,S4,S5,S6,S7,S8 S3,S4,S7,S8 103 Case-cohort study Presenter’s Name Date 104 Nested case-control study 10 20 30 35 S1 S2 S3 S4 S5 S6 S7 S8 Presenter’s Name Date Events: A 1 S1 1 S6 2 S3,S8 At risk: N 8 6 4 S1,S2,S3,S4,S5,S6,S7,S8 S3,S4,S5,S6,S7,S8 S3,S4,S7,S8 Potential controls: S2,S3,S4,S5,S6,S7,S8 S3,S4,S5,S7,S8 S4,S7 105 Persons A cohort study 3 events or cases occur among 8 people, of whom 5 are ever exposed Presenter’s Name Date Exposed are solid lines, unexposed are dashed Dots are events Time 106 A nested case-control study Persons Incidence Density Sampling Presenter’s Name Compare 3 cases to 3 non-cases (at event time) among cohort members Date Time 107 Persons A case-control study Incidence Density Sampling Compare 3 cases to 3 non-cases (at event time) among cohort members but Presenter’s Name Date Time “what is the cohort?” They arise from some underlying cohort!! 108 Designing a case-control study Overview I What is the research question? In what target population? What source(s) will be used? How long will recruitment take? What is the definition of the cases? What confirmation is needed? Is screening/additional Presenter’s Name testing necessary? Date Will prevalent cases be used? Does exposure influence the disease prognosis? What is the underlying cohort? How many cases are seen per year in the source? 109 Designing a case-control study Overview II What are the eligibility criteria for controls? What source(s) will be used to identify controls? Do they represent the same underlying cohort as the cases? What confirmation is needed? Is screening/additional testing necessary? Sampling methods? Will the controls be selected Presenter’s Name throughout the study period? Can they be selected as Date cases if they later develop disease? Do additional sources need to be used? For both cases and controls, does exposure status affect: inclusion in source populations or participation? 110 Designing a case-control study Overview III Are there known confounders? Should matching be used? What methods will be used to recruit cases and controls? What methods will be used to obtain information about exposures and potential confounders? Active / Passive? Are the methods of data collection objective and Name independent Presenter’s of case/control status? Date What methods are in-place to avert and monitor differential recall by case/control status if interviewing is involved? If study involves personnel-administered data collection, are the personnel masked to case-control status? 111