NUS Presentation Title 2006

Download Report

Transcript NUS Presentation Title 2006

Thinking about genetic
variation and biomedical
research
Peter Little
Overview
• The nature of human differences
– Embracing variation
• How does genetic variation influence
gene expression in individuals
– Models
– How to study, how to define
Regulatory genetic variation (RGV)
• Mouse models of RGV effects within and
between individuals
• Human functional variation
• Positioning RGV in human studies
Cis acting variations
Basal
Conditional
Detecting cis effects by genetics
• Look for statistically significant associations of
amount of mRNA with variations IN GENE
• 25,000 genes 25,000 comparisons (+/-)
• Achievable: P values 10-3 to 10-6 are significant
Trans acting RGV
Basal
Conditional
Gene 1
Gene 2
Gene 3
Detecting trans variation
• Look for statistically significant associations of
amount of mRNA with variations IN WHOLE
GENOME
• 25,000 genes time 1,000,000 variations
• >1010 comparisons
• NOT achievable: sample size has to be huge to
achieve P values <10-9 or 10-10
• So how do you do it?
Shared control of gene expression
• All genes share some control components
(Basal/conditional)
• (Regulation is not necessarily transcriptional)
• Genetic variation in regulators will result in
correlated changes in the expression levels of
multiple genes
Experimental Design
DBA
C57
Age/sex/feed/light matched to limit environmental
variability
Define differences as genetic
Expression differences?
• Compare DBA & C57 mRNA levels by replicated
microarrays
– brain, kidney, liver
• ~6000 genes are expressed in all three tissues
• mRNA levels of 755 genes are different
• Influenced by regulatory genetic variations-by
definition
• These are the focus of our study
Microarrays on each of 31 BXD
mice
• BXD mice have mix of DBA & C57 derived
alleles and are homozygous
• 31 BXD lines (genetically typed)
• Brain, Kidney and Liver mRNA levels measured
by microarrays
G1
G1
G2
G2
Spearman’s 
G3
G4
G5
G3
G4
Brain Kidney Liver
G5
G6
G6
G7
G7
G8
G8
Gn
Gn
mRNA level data
correlation matrix
Gn
G8
G7
G6
G5
G4
G3
RI1-32 RI1-32 RI1-32
G2
G1
Compute correlations between all pairwise combinations of genes
Correlations make networks
G3
G1 G2 G3 G4 G5 G6 G7 G8 G9
G4
G1
G2
G3
G4
G1
G2
G6
G5
G6
G7
G8
G7
G9
G9
G8
• Construct networks conditional on |ρ|>
thresholds
A Correlating group of genes
“CGG”
• A group not connected to other genes at
threshold correlation
– Not trying to imply biological meaning to
correlation threshold
• Biologically expect continuum of effects 0 to 1
• A convenient analytic tool (define groups for
analysis)
• Threshold based upon optimization of CGG
number, size, degree: tested by simulation
Cor threshold
0.775
Network of
212 genes
Cor 0.775 network – 212
genes
05 science/Figures/schematic/annotated.network.darker/schematic.png
Unpredictable shared behaviour
• The same genes exhibit shared behavior
• The behavior they share is NOT the same
• In different tissues
• In different individuals
Shared and unique influences
upon mRNA
• What proportion of an individual gene’s variation can
be explained by shared influences?
• Cis acting variations 15-40%
• For each CGG compare individual mRNA levels to
average behavior of CGG
6 are ribosomal proteins,
2 ribosomal
protein/ubiquitin fusions
13 are involved in carbohydrate
metabolism, 5 in signalling and 4 in
transport
Function is not the main organiser
RGV compared to structural
variation?
• RGV is more complex
– RGV causes changes to GROUPS of molecules
– Same gene(s) can behave differently in different tissues of
same individual
– Structural variation present in all tissues in which gene(s)
is expressed
• Study of (disease) expression
– Using surrogate tissues for analysis is not feasible
– Difficult implications for human research
Placing genetic variation into
human context
• How common is human variation?
• More common than most realise
Human variation: Craig Venter and
James Watson
• Venter 2.8 million & Watson 2.72 million existing SNPs
• Venter 0.74 million & Watson 0.61 million novel SNPs.
• Venter 3,882 SNPs that code for a changed amino acid
& Watson 3,766:
• 44% of Venter’s genes were heterozygous for one or
more variants
• SNPs (single nucleotide polymorphisms) are single
base differences
Frequency of amino acid changes
• The HapMap data
• Caucasians, Yorubans, Han population
samples
• 46% of genes contain at least one amino acid
change in >5% of individuals
• 30% of genes have a variant found in >25%
• 18% of genes have a variant in >40%
Functional ? (Byoko et al 2008)
• PolyPhen ; likely consequence to protein function of a nonsynonymous SNP (Ramensky et al 2001)
• 27–29% of changes functionally neutral or nearly neutral
• 30–42% moderately deleterious
• 29-43% highly deleterious or lethal
• Venter ~22% chance of isolating a variant, 16% mild or
highly deleterious version
Boyko. A.R. et al. (2008) Assessing the evolutionary impact of amino
acid mutations in the human genome. PLoS Genet. 4: e1000083
Functional ? (Chun & Fay 2009)
• PolyPhen, SIFT, LRT ; likely consequence to protein
function of a non-synonymous SNP
• Compared Watson, Venter &Chinese individual
• 796-837 deleterious changes per individual
– (5% cross validated)
• ~58% >5% population frequency (in HapMap data)
• ~4% of genes in each individual
Functional variation in humans
• HapMap data are mainly used as anonymous
markers in GWAS
• Common functional variants in functional classes of
proteins (Rohan Willams ANU, Australia)
– Transcription factors
– Lipid synthesis
– Glucose homeostasis etc etc
• Use to re-analyse GWAS studies based upon
“hypotheses”
RGV candidates?
http://www.hapmap.org/
Xie et al 2009
•
•
•
•
440 TF motifs in genome
Search 2kb +/- of TSS
1-134 dif motifs per gene
Median 18 mean 23
• ~5% contain a SNP
http://motifmap.ics.uci.edu
Does RGV matter?
• Cancer biomarkers
• 9 FDA approved
• 34 more in use
• If RGV is irrelevant expect biomarkers to be
randomly distributed with respect to RGV in
genes that encode them
•
CEACAM5, TG, AFP, KIT, CGB, PGR, EGFR, ERBB2, KLK3, CFH, MUC1, CSH1,
ALPP, CHGA, KLK6, MSLN, ESR1, PRL, KLK11, POMC, KLK7, KLK8, VIP, INS,
GAS, CSF1, KLK10, PTHLH, VEGFA, CALCA, IL2RA, GH2, SST and KLK5
Cancer biomarkers
Little, Williams, Wilkins.
Trends Biotechnol. 2009
RGV data 8000 human genes in 233 individuals Morley et al 2004
What do human phenotypes look
like from the standpoint of genetic
variation?
• Genome wide association studies
– 4 September 2009, 386 publications
– (http://genome.gov/gwastudies/)
• Multiple variants of low effect
• Very limited or no prognostic/diagnostic value (Odds
ratios 1.01-1.2)
The differences between
individuals
Based upon disease analysis
• Are the product of multiple small effects
• 10s to 100s of influences on each phenotype
• No two people with same phenotype will have the
same set of causative variations
• SNPs >0.05 in populations:
• 0.0510= ~10-14
• 0.05100=~10-131
How do we deal with this?
• Probability of association of SNPs with
schizophrenia
Common polygenic variation contributes to risk of schizophrenia and bipolar disorder
The International Schizophrenia Consortium Nature 2009
doi:10.1038/nature08185
How extensive is individuality?
• The paradigm that there are (genetic)
deviations from “normality” is dead
•
•
•
•
•
Transcription networks
Protein abundance
Protein activity?
Protein/protein interaction networks?
Metabolism
• How do we develop a framework for probablistic
mechanisms in biology?
Some challenges
• How to treat probabilistic groups as classifiers or in
association analysis?
• Combinatorial difficulty in defining “best” groups
• Defining probabilistic networks
– Display?
– Analysis?
Some benefits
The behaviour of groups of molecules
• Statistically very robust
• Redefine statistical power in association analysis
• Open the potential for significant studies at the scale
of 10-20 samples
• Partially alleviates the multiple testing problem
Human variation in biomedical
research
• Routinely check for common functional variation in
individual genes/proteins
• Recognize powerful genetic background effects
– Control by common variations? (common transcription
factor variations)
– Probabilistic controls?
Thinking about Singaporeans
• Molecular group behavior and Ethnic differences
Disease death rate globally
Lopez et al (2006) Lancet 367 : 1747
Some cancer predispositions in
Asians
•
•
•
•
•
•
High nasopharengyl cancer
Low Chronic lymphocytic leukemia
High Esophageal Cancer
High Liver Cancer
Low Prostate Cancer
…………
• Environmental?
• Genetic?
Singapore variation project
• Chia Kee Seng and colleagues (Teo et al, 2009 Genome
Research)
• 268 individuals from the Chinese, Malay and Indian
population groups
– 1.6 million variations
• Base line for comparison with Caucasian, African and
other samples
DNA variations in populations
• Principle component analysis of Singapore Chinese
Teo et al Genome research 2009
“Asian-ness” in biomedical
research
• For any human sample
–
–
–
–
–
–
Genotype
Gene expression
Proteomic
Metabolomic
Lipidomic
Glycomic
• Identify molecular group behaviours
• Does shared behavior classify normal phenotypes?
• Does shared behavior predict or classify disorder?
Acknowledgements
• Mark Cowley
• Rohan
Williams
• Chris
Cotsapas
To my
colleagues at
NUS Life
Sciences
Institute