Bias and Confounding - UCLA School of Public Health

Download Report

Transcript Bias and Confounding - UCLA School of Public Health

Precision and Validity: Selection Bias

Dr. Jørn Olsen Epi 200B January 26 and 28, 2010

1

Bias and confounding (Last, Dictionary)

 Bias: Deviation of results or inference from truth, or processes leading to such deviations. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth.

2

Bias and confounding (Last, Dictionary)

 Confounding: A situation in which the effect of two processes are not separated.  Confounder, confounding factor, confounding variable-Poor term, confounding is study specific. No variables are always confounders.

3

Bias and confounding (Last, Dictionary)

 Selection bias: caused by the way subjects are selected into the study or because there are selective losses of subjects prior to data analyses.

 In a cohort study the first type of selection bias can often be described as selection leading to more or less confounding.

4

Selection Bias

   Selection as a design problem Healthy worker selection, Berkson bias Most problematic non-responders in case control studies, loss to follow-up 5

Survey

Non-responders Smokers Non-smokers N 400 200 400 % 40 20 40 % 33.3

66.6

% 60 40 % 20 80 All 1000 100% 100% 100% 100% 6

Follow-up study-Complete Cohort

E + N 1000 1000 D 200 100 RR = 2.0, RD = 0.10

7

50% refuse to take part in the study

E + E S N 500 500 D D 100 50 RR = 2.0, RD = 0.10

8

E + N 1000 500 D 200 50 PR = 2.0, RD = 0.10

E D S but E S C E S D D Is unlikely at baseline since they do not know D. Is more likely C could be SES.

9

Most likely S E C D In cohort studies; selection may cause confounding, perhaps more likely reduce confounding. Poor health, poor social conditions, may correlate with selection.

Conditioning on S would open an E-C path-induce confounding that was not present before 10

    Large cohorts recruit seldom more than 50% DNBC about 30%; half of GPs participated 60% of the invited accepted invitation Selection bias – Yes, if used as a survey But when making internal comparisons?

11

Table 2. RORs Based on Adjusted* ORs in the Source Population and Among Participants Ref Nohr et al. Epidemiology 206;17:413-8 12

      Internal comparison, counterfactual guidelines RR = 2 for this cohort External validity, generalization For the source population?

For all in the future?

For other ethnic groups, etc.

13

 Selection bias in a cohort study is mainly related to a loss to follow-up.

RCT – A pain killer randomization Drug, N = 100 Placebo, N = 100 5 loss to follow-up 40 loss to follow-up Reason to expect selection bias? Will “intention to treat” solve the problem? Not when estimating effect size , but may be ok when testing H o 14

E

Follow-up study

D + 150 50 D 9850 9950 RR = 3.0

All 10,000 10,000 15

Now 20% loss to follow-up among exposed and 10% among not exposed E + D 120 D 7880 All 8,000 RR = 3.0

45 8955 9,000 16

Suppose we got: E + D 140 D 7860 40 RR = 3.9

How could this happen?

When is it likely?

8960 All 8,000 9,000 17

Source population E + D A C Study population E + D a D b c Selection bias if A/N 1 C/N 0 ≠ a/n 1 c/n 0 d D B D Total N 1 N 0 All n 1 n 0 18

 Does condom use protect against STDs?

 What is the source population for such a study?

19

  A case-control study samples cases from an STD clinic and controls from the catchment area of the clinic. Any problems with that?

Results could be like this: Males with infected partners Condom use cases controls Yes No 100 600 200 600 OR = 0.5

No requirement for infected partners cases 100 600 controls 100 600 OR = 1.0

20

E D S Selection bias is often a problem in a case-control study E + D 20 80 D 10 90 100 100 OR = 20/80 10/90 = 2.25

E + D 20 40 60 D 5 45 50 OR = 20/40 5/45 = 4.50

21

Response rates

E + D 100% 50% D 50% 50% 22

Response rates

OR responders = OR true x OR response rates 4.50 = 2.25 x 100/50 50/50 When would we expect this pattern?

When would we expect the opposite?

23

Selections of relevance for designs

Berkson’s bias Disease may be correlated in hospital patients but not in the population 100,000 30% asthma; 30,000 10% bronchitis; 10,000 0.3 x 0.1 = 0.03; 3000 have both diseases 24

100,000 30,000 3000 10,000 25

Selections of relevance for designs

In the hospital, let’s assume 40% of asthma patients get hospitalized, and 60% of patients with bronchitis 27000 asthma only 7000 bronchitis only 3000 with both diseases 10800 in hospital 4200 in hospital 2280 in hospital 0.4 + 0.6 – 0.4 x 0.6 = 0.76

Thus overrepresented in hospital data, the 2 diseases will look as if they are associated but they are not; those with both diseases just have a higher probability of being hospitalized A “Berkson’s like” bias could be seen for other factors that influence hospitalization rates or diagnostic probabilities.

26

30,000 11,080 3000 2280 10,000 6,480

Selections of relevance for designs

27

Smoking + 100 - 100 HBP + 20 - 80 + 20 - 80 CVD + 6 - 14 + 8 - 72 + 2 - 18 + 4 - 76 (30%) (10%) (10%) (5%) Smoking ? HBP CVD HBP CVD risk highest for those with high blood pressure and for smokers Estimates between smoking and HBP before or after exclusion of patients with CVD OR – smoking exposure odds ratios for HBP 28

Smoking + 100 - 100 HBP + 20 - 80 + 20 - 80 CVD + 6 - 14 + 8 - 72 + 2 - 18 + 4 - 76 No exclusion of CVD OR = 20/20 80/80 = 1 29

 Be careful when excluding diseases from the study if they are in the causal pathway, or if they are causally linked to the end point of your study.

30

Smoking + 100 - 100 HBP + 20 - 80 + 20 - 80 CVD + 6 - 14 + 8 - 72 + 2 - 18 + 4 - 76 Use CVD as controls and exclude them from the case group OR = 14/18 8/4 = 0.39

31

Smoking + 100 - 100 HBP + 20 - 80 + 20 - 80 CVD + 6 - 14 + 8 - 72 + 2 - 18 + 4 - 76 Use CVD as controls and include them in the case group OR = 20/20 8/4 = 0.50

32

Smoking + 100 - 100 HBP + 20 - 80 + 20 - 80 CVD + 6 - 14 + 8 - 72 + 2 - 18 + 4 - 76 Exclude CVD patients from the control group but not from the case group OR = 20/20 72/76 = 1.06

33

Smoking + 100 - 100 HBP + 20 - 80 + 20 - 80 CVD + 6 - 14 + 8 - 72 + 2 - 18 + 4 - 76 Exclude them from both groups OR = 14/18 72/76 = 0.85

34

Using hospital controls to replace population controls is bias prone (this example is extreme, though). Controls should provide the exposure distribution in the population that gave rise to the cases.

Do not take into consideration diseases that follow this pattern: Smoking HBP CVD Only: smoking HBP, and only if smoking is not causing CVD CVD Exclusion of persons with an exposure related condition from one group but not from the other introduces a threat to validity (although one of these estimates was close to 1).

Exclusion of such cases for both groups can cause bias (unless the selection criteria are confounders).

35

Healthy worker selection Is a conceptual problem when designing the study, a violation of the counterfactual ideal Indicates that SMR values for workers who perform physical demanding jobs tend to be less than 100. The reason is that the comparison we make are biased. The population at large include people with chronic diseases (and high mortality) that cannot perform a physically demanding job). “The sick population effect” or “the stupid investigator effect” 36

MR SMR = 80 Age 37

Selection operates into the workforce at recruitment and out of the workforce over time unemployment is associated with suicide risk – causal or bias?

How can this be studied?

38

Selection Bias-Publication Bias

 Decision making depends upon the combined evidence-e.g. Cochrane reviews not just one study.

 But is the source population for Meta analyses biased?

39

Selection Bias-Publication Bias

     Researchers may decide not to submit based on results Editors may decide to review or reject based on results Reviewers may decide to recommend publication based on results Editors may make final conclusions based on results All of this leads to a biased source population for reviews and meta analyses 40

Selection Bias-Publication Bias

 Example-Panayiotis et al Incl; 2005:97:1043-1055.

 Association between TP53 (tumor suppressor protein) and risk of death in patients with head and neck cancers 41

Selection Bias-Publication Bias

Fig. 1 42

Selection Bias-Publication Bias

Fig. 2 43

Selection Bias-Publication Bias

Fig. 3 44

External validity?

 In an etiologic study the aim is to formulate abstract hypotheses in relation to the factors under study.  The hypotheses are abstract in the sense that they are not tied to a specific population but aim to formulate a general scientific theory.

 Internal validity  External validity 45

   Estrogen exposure (more than 0.3 mg estrogen/d in at least 6 months) and cancer of the endometrium (N Engl J Med 1978; 299: 1089-94).

Cases: All post-menopausal gynaecological cancer patients at Yale New Haven Medical Center 1974-1976. Controls: Mainly patients with cancer of the cervix (60) or the ovarium (43), matched for age and race.

46

E + All Cases 35 84 119 Controls 4 115 119 47

   Incl. all postmenopausal women with bleedings.

Cases: Same cancer patients. Controls: Women with bleedings, but no cancer of the endometrium, matched for age and race.

48

E + All Cases 44 105 149 Controls 23 126 149 49

Horwitz et al. continued the discussion and presented new data in Lancet 1981;2:66-8.

In the abstract they state (shortened and modified) “In this study, to determine the frequency with which endometrial cancer escapes detection, all necropsies on 8998 eligible women showed previously unsuspected endometrial cancer in 24 of them. The estimated rate of undetected cancer 27/10,000 is two to five times higher than the detection rate of 5/10,000 noted by the Connecticut State Tumor Registry.” Comments?

50

Two types of endometrial cancer: A-diagnosed, B-undetected A woman of 45 years of age would have a lifetime risk (until 80) of type A cancer 5/10,000 x 35 = 175/10,000 Better 1-e -5/10,000 x 35 = 174/10,000 The proportion of type B cases would be 27/(27 + 174) = 13.4% 51

   The most frequent and serious problem of selection bias in case-control studies is non responders.

And an equal proportion of non-responding cases and controls is NOT a guarantee against selection bias.

The question is whether there is an equal selection of exposed cases and exposed controls.

52

 The most serious selection problem in a follow-up study is loss to follow-up.

 “If in doubt, stay out” 53

Sensitivity Analysis

54

 Cohort – 10 years of follow-up Smoking + N Loss to follow-up 1000 1000 200 100 End of follow up Lung cancer 80 10  RR = 80/800 10/900 = 9.0

55

Sensitivity approach:

 Lung cancer risk among lost to follow-ups Smokers 1/10 1/10 0 Non-Smokers 1/90 2/90 1/90 0 0 (worst case) 2/90 1.0

Comments As for followed-up RR 9.0

Underestimate risk for non-smokers Overestimate risk among smokers Underestimate risk for non-smokers All non-smokers lost to follow-up get lung cancer 8.2

7.3

6.6

0.7

56

Selection Bias Main Points

Selection of the people to the study produces bias under the following condition and more.

A. Selection bias in the design 1. cross-sectional study: The sampling strategy does not produce a representative sample of the target population 57

Selection Bias Main Points, cont.

2. Cohort study/case control study: The not exposed are too far away from the counterfactual ideal. The exposed do not provide the expected disease occurrence had the exposed not been exposed; and stratification or statistical control will not be sufficient to produce unbiased estimates of effects.

examples: health worker selection + many other poorly designed studies.

58

Selection Bias Main Points, cont.

B. Selection bias in the conduct of study; non responders, loss to follow-up.

1. The cross-sectional study – response rate may correlate with what you want to estimates which would lead to a biased estimate of its prevalence.

Risk of selection bias is high.

59

Selection Bias Main Points, cont.

2. The cohort study – non responses at baseline will usually not correlate directly with both the exposure and the (unknown) endpoint, but selection at baseline will often change the confounder structure (will correlate with exposure). Loss to follow may correlate with both the exposure and endpoint and lead to bias.

Give higher priority to compliance to follow up than to recruitment at baseline. Loss to follow-up will often cause bias in the randomized trial (intention to treat analysis).

60

Selection Bias Main Points, cont.

3. The case-control study - Non-responders may well correlate with both the exposure and the endpoint since both are known at the recruitment to the study. Keeping response rates high should be given high priority and the specific aim of the study should not be disclosed (IRB may not accept this procedure).

61

Selection Bias Main Points, cont.

Selection bias is a serious problem and should be avoided if possible. Often it is not possible and its magnitude and possible impact should be investigated.

62

Steps to avoid bias related to non-responders

 Keep non-responding as low as possible, expecially in surveys and case-control studies  Try to get some information on non-responders – at best for E and D, but also on confounders   Analyse data according to the time of responding Do sensitivity analyses  Do follow-up studies (incl RCTs) 63

 So, the first concern in an etiologic study is that of VALIDITY (  FREEDOM FROM BIAS –at least known bias).

 Internal validity: validity of inference drawn in relation to the members of the study population.

 External validity: validity of the inferences as they extend outside the population. 64