Study Designs in GWAS
Download
Report
Transcript Study Designs in GWAS
Study Designs in GWAS
Jess Paulus, ScD
January 30, 2013
Today’s topics
Case-control studies
Population based
Hospital based
Nested studies
Selection bias
Introduction to population stratification
Genetic Association Study Design
Case-Control: Dichotomous endpoints
Continuous or Quantitative traits
Diabetes: yes versus no
HgA1C
Family Studies
Association Study
Sample Size
High
Family Study
Low
Low
High
Heritability
Genetic complexity
High
Low
Hierarchy of Study Designs
Systematic Reviews & Meta Analysis
MR.
HAPPY
Randomized Controlled Trials
Cohort studies
Case-control studies
Cross-sectional studies
Ecologic studies
Case reports
MR.
WORRY
Cohort Study: Selection into study on
basis of exposure status
EXPOSURE
OUTCOME
?
?
Basis on which groups are selected at
beginning of study
PRESENT
ABSENT
Cohort studies in genetic
epidemiology
Allows study of multiple disease endpoints –
extends efficiency of effort to genotype
Selection bias is generally limited
Cohort study limitations for
genetic epidemiology
Loss-to-follow-up bias
Need for repeated questionnaire
assessments for most up to date covariate
information
Very costly and logistically challenging to
genotype entire cohort and survey for
disease endpoints
Due to this reason, genetic epidemiologic studies
of full cohorts are rare
Case-Control: Selection based on
disease status
Control
Exposure?
Case
Basis on which groups are selected at
beginning of study
Case-control designs for genetic
exposures
Appropriate for rare diseases, like cancer
Can be retrospective or prospective (nested
case-control design)
Efficient sampling of an underlying cohort
Control selection
The biggest threat to most case-control studies
Controls must be drawn from the source
population that gave rise to the cases
The ideal controls should:
Represent the exposure distribution in the source
population that gave rise to the cases
Be those who, had they developed the case disease, would
have been included in your study as a case
Failure to select appropriate controls generates
selection bias
Selection of participants based on joint probability of
exposure and outcome
Population case-control study
Cases arise from a given population, and controls
are randomly sampled from that population
(assuming population is enumerated)
Example: cases from CT state tumor registry,
controls drawn from state census tract listings
Reduces potential for selection bias since source of
controls is well-defined
Limitations of the population-based
case-control study for genetic
epidemiology
Lower participation rates than hospital-based
studies, especially given need for biological samples
Implementation of specimen collection and
processing protocols can be challenging outside a
clinical setting
If interest in following participants for survival
outcomes, tracing can be difficult
Hospital-based case-control study
Appropriate for genetic epidemiology studies:
Hospital setting facilitates subject enrollment and biological
specimen collection and analysis
Recruitment by medical staff can aid enrollment
Smaller geographic area to cover than a population-based
study – reduce processing/shipping time
Aids in collection of specimens in a timely fashion after
disease diagnosis, limiting possibility for reverse causation
When cases are hospital-recruited, source population is
the catchment population of the clinic
The collection of all the people who would have been
notified as a case, had they developed disease
Hospital-based case-control study
limitations
Retrospective nature opens door to:
Recall bias
Reverse causation
Selection bias
Selection bias in particular is a risk because it is difficult to
identify the source population that gave rise to the cases
Ideal control: Who would have presented as a case to
Hospital X had they in fact become ill?
Attempt to identify catchment population can be
challenging
Sometimes, a control disease (sick controls) is chosen to limit
potential for selection bias and differential recall of past
exposure
Control illness must not be associated with the gene of
interest
Nested case-control study
A type of population-based control sampling
Any case-control can be conceived as resting within a
cohort of exposed and unexposed
When the cohort is very well defined this is called a
nested case-control study
Sampling from within the cohort (rather than doing full
cohort analysis) is usually motivated by efficiency
concerns
Important applications for genetic epidemiology where it
would be too costly to genotype the full cohort
Nested case-control study design
advantages
Limited potential for selection bias
because full cohort is enumerated and can
randomly sample controls from roster
Often prospective – limits potential for
gene/biomarker to be affected by disease
process
Cohort sources of nested casecontrol studies
EPIC cohort: http://epic.iarc.fr/
Nurses Health Study:
http://www.channing.harvard.edu/nhs/
NCI Breast and Prostate Cancer Cohort Consortium
(BPC3): http://epi.grants.cancer.gov/BPC3/
Multiethnic Cohort (MEC) study:
http://www.uscnorris.com/mecgenetics/
Alpha-Tocopherol, Beta-Carotene Cancer Prevention
cohort: http://atbcstudy.cancer.gov/study_details.html
Framingham Heart Study: www.framinghamheartstudy.org
Analysis of case-control GWA
studies
Univariate analysis: Pearson χ2 or Fisher
exact test, Armitage trend test
Multivariate analysis: Logistic regression
(if unmatched) or conditional logistic
regression (if matched)