Transcript Document

Motivations to study human
genetic variation
The evolution of our species and its history.
Understand the genetics of diseases, esp. the
more common complex ones such as
diabetes, cancer, cardiovascular, and
neurodegenerative.
To allow pharmaceutical treatments to be
tailored to individuals (adverse reactions
based on genetics).
Haplotype Map of the Human
Genome
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Goals:
• Define patterns of genetic variation across human genome
• Guide selection of SNPs efficiently to “tag” common variants
• Public release of all data (assays, genotypes)
Phase I:
1.3 M markers in 269 people
Phase II: +2.8 M markers in 270 people
HapMap Project
• The HapMap Project tests linkage between
SNPs in various sub-populations.
• For a group of linked SNPs recombination
may be rare over tens of thousands of bases
• A few "tag SNPs" can be used to identify
genotypes for groups of linked SNPs
• Makes it possible to survey the whole
genome with fewer markers (1/3-1/10th)
Haplotype
• Linkage is common in the human population,
particularly in genetically isolated subpopulations.
• A group of alleles for neighboring genes on a
segment of a chromosome are very often
inherited together.
• Such a combination of linked alleles is known
as a haplotype.
• When linked alleles are shared by members of
a population, it is called a linkage
disequilibrium.
Haplotypes (example)
.
. A .. C .. A .. T .. G .. T ..
.
. A .. C .. C .. G .. C .. T ..
.
. G .. T .. C .. G .. G .. A ..
A chromosome region with only the SNPs shown. Three
haplotypes are shown. The two SNPs in color are sufficient to
identify (tag) each of the three haplotyes. For example, if a
chromosome has alleles A and T at these two tag SNPs, then it
has the first haplotype.
HapMap Samples
• 90 Yoruba individuals (30 parent-parentoffspring trios) from Ibadan, Nigeria (YRI)
• 90 individuals (30 trios) of European descent
from Utah (CEU)
• 45 Han Chinese individuals from Beijing (CHB)
• 45 Japanese individuals from Tokyo (JPT)
Make Genetic Profiles
• Scan these populations with a large
number of SNP markers.
• Find markers linked to drug response
phenotypes.
• It is interesting, but not necessary, to
identify the exact genes involved.
• Can work with “associated populations,”
does not require detailed information on
disease in family history(pedigree).
The SNP database today
March, 2010 105,098,087
The 1000 Genomes Project submitted 17.3M SNPs
The 2008 SNP Submissions for the James Watson Genome totaled
3,542,364
The 2008 SNP Submissions for the J. Craig Venter Genome totaled
4,018,050
The 2008 SNP Submissions for the Individual Chinese Genome totaled
5,077,954
The 2008 SNP Submissions for the Individual Korean Genome totaled
1,750,224
Derived from dbSNP release 130
http://www.ncbi.nlm.nih.gov/SNP/
SNP’s aren’t everything: Introducing Copy Number Variations
Redon et al. Nature 2006
Copy Number Variation Dataset
Genome Structural Variation Consortium
Array-CGH using a whole genome tile path
array
Median clone size ~170 kb
All 270 HapMap individuals
Measures amount of DNA, not RNA
Comparison between two samples
‘Test’ sample vs ‘Reference’ sample
Array-CGH technology
Typical Analysis Procedure
Values are typically normalized so that the
mean log2 value for the entire array (or an
individual chromosome) is 0
Analysis consists of identifying segments
where the test and reference samples
have unequal copy number
1,447 CNVRs from 270 HapMap samples
Structural Variation Project
More than 10%
of the genome
sequence
Nature 447: 161-165, 2007
Copy Number Variations are ubiquitous in the human genome
The number of genome structural variants (>1 kb) that distinguish
genomes of different individuals is at least on the order of 600–900 per
individual.
J.O. Korbel et al., Science 318(2007), pp. 420–426
HapMap 3
• Merged the results from Affymetrix and
Illumina chips
• Genotyped 1.6 million common single
nucleotide polymorphisms (SNPs) in 1,184
reference individuals from 11 global
populations
• Sequenced ten 100-kilobase regions in
692 of these individuals
• http://www.broadinstitute.org/~debakker/p3.html
ASW African ancestry in Southwest USA
CEU Utah residents with Northern and Western European ancestry from the CEPH
collection
CHB Han Chinese in Beijing, China
CHD Chinese in Metropolitan Denver, Colorado
GIH Gujarati Indians in Houston, Texas
JPT Japanese in Tokyo, Japan
LWK Luhya in Webuye, Kenya
MXL Mexican ancestry in Los Angeles, California
MKK Maasai in Kinyawa, Kenya
TSI Toscani in Italia
YRI Yoruba in Ibadan, Nigeria
SNP allele frequency estimation
Population differentiation
Linkage disequilibrium analysis
SNP Tagging
Imputation efficiency
Genomic locations of human CNVs
Genotypes for CNVs
Population genetic properties of CNVs (allele frequencies, population
differentiation, etc.)
Mutation rate (frequency of de novo CNV) and potential mutational
mechanisms
Linkage disequilibrium properties of CNVs
Tagging and imputation of CNVs
Signals of selection around CNVs
Association of SNPs and CNVs with expression phenotypes
Computational detection of structural
genomic variation
Direct comparison of genomes through sequence alignments
Advantages:
All types of genomic variation can be identified, including
balanced variants (inversions or translocations)
No limit in the resolution and breakpoints can be defined at
nucleotide level
Problems:
Generate a lot of false positives due to sequence
misassembly and gaps
Modern humans arose in Africa and replaced other human species across
the globe.
Scientific American, August 1999)
Out of Africa
Out of Africa again and again
25
Itai Yanai, 2003
Templeton, A. Nature 416 (2002): 45 - 5
• The Human Genome Project cost ~USD
3,000,000,000
• Illumina now offers a complete genome
sequence from USD 50,000
• Complete Genomics will offer a complete
genome sequence from USD 5,000 soon
• There are now an estimated ? complete
human genome sequences
•James
Watson, 454. $70 million
•Craig Venter, Sanger, -$1 million
•African -HapMap –Illumina & Solid, $100,000
•Five African –Penn State University
•Chinese, Illumina
•Two Koreans
•Prof. Quake -Stanford --Nature genetics paper $50,000, 1 week, Helicos
• Stanford team -Clinical annotation of genome
from “patient Zero”
The 10-gen data set