Resources at HapMap.Org

Download Report

Transcript Resources at HapMap.Org

Resources at HapMap.Org
HapMap3 Tutorial
Marcela K. Tello-Ruiz
Cold Spring Harbor Laboratory
Basic Concepts
Parent 2
Parent 1
A
B
A
B
a
b
X
a
b
A B
a b
A B
a b
OR
a b
A B
A B
a b
High LD -> No Recombination
(r2 = 1) SNP1 “tags” SNP2
A b
A B
a B
A B
A b
a B
A B
A b
etc…
Low LD -> Recombination
Many possibilities
Basic Concepts
alleles:
SNP1
A/a
SNP2
B/b
A (80%)
a (20%)
B (60%)
b (40%)
C1
C2
POP allele freqs:
genotypes:
Person 1
Person 2
AA
BB
phased haplotypes (C1/C2):
A
B
A
B
Person 3
AA
Bb
A
A
Aa
Bb
B
b
A
a
B
b
OR
A
a
b
B
HapMap Glossary
• LD (linkage disequilibrium): For a pair of SNP alleles,
it’s a measure of deviation from random association
(i.e., no recombination). Measured by D’, r2, LOD
• Phased haplotypes: Estimated distribution of SNP
alleles. Alleles transmitted from Mom are in same
chromosome haplotype, while Dad’s form the paternal
haplotype.
• Tag SNPs: Minimum SNP set to identify a
haplotype. r2= 1 indicates two SNPs are redundant,
so each one perfectly “tags” the other.
• Questions?
[email protected]
HapMap Project
Phase 1
Phase 2
Phase 3
Samples & POP
panels
269 samples
(4 panels)
270 samples
(4 panels)
1,115 samples
(11 panels)
Genotyping
centers
HapMap
International
Consortium
Perlegen
Broad & Sanger
Unique QC+
SNPs
1.1 M
3.8 M
(phase I+II)
1.6 M (Affy 6.0 &
Illumina 1M)
Reference
Nature (2005)
437:p1299
Nature (2007)
449:p851
Draft Rel. 1
(May 2008)
Release Notes
• Phase 1+2: Latest Release #24, October 2008 (NCBI
build 36):
3.9 M unique QC+ SNPs -- > 1 SNP/700 bp
http://ftp.hapmap.org/00README.releasenotes_rel24
– Added back chrX SNPs dropped in previous releases
– Corrected allele flips from rel#23a
• Phase 3: Draft release #1 (NCBI build 36)
http://ftp.hapmap.org/genotypes/2008-07_phaseIII/00README.txt
– HapMap3 sites @ Broad Institute, Sanger Center and Baylor College
Phase 3 Samples
label
A SW*
C E U*
C HB
C HD
GIH
JP T
L WK
M E X*
M K K*
T SI
Y RI*
population sample
# samples
A fric an anc es try in Southwes t U SA
90
U tah res idents with N orthern and Wes tern
180
E uropean anc es try from the C E P H c ollec tion
H an C hines e in Beijing, C hina
90
C hines e in M etropolitan D enver, C olorado
100
G ujarati I ndians in H ous ton, T exas
100
J apanes e in T okyo, J apan
91
L uhya in Webuye, Kenya
100
M exic an anc es try in L os A ngeles , C alifornia
90
M aas ai in Kinyawa, Kenya
180
T os c ans in I taly
100
Y oruba in I badan, N igeria
180
1 ,3 0 1
* Population is made of family trios
QC+ Draf t 1
71
162
82
70
83
82
83
71
171
77
163
1,115
Phase 3
• 11 panels & 1,115 samples
– 558/557 males/females
– 924/191 founders/non-founders
• Platforms:
– Illumina Human 1M (Sanger)
– Affymetrix SNP 6.0 (Broad)
• EXCLUDED from QC+ data set:
– Samples with low completeness, and SNPs with low call rate in
each pop (< 80%) and not in HWE (p < 0.001)
– Overall false positive rate: ~3.2%
• Data merged with PLINK (concordance over
249,889 overlapping SNPs = 0.9931)
• Alleles on the (+/fwd) strand of NCBI b36
Phase 3: Draft Release 1
samples
QC+ SNPs
poly QC+ SNPs
71 ASW
1,632,186
1,536,247
162 CEU
1,634,020
1,403,896
82 CHB
1,637,672
1,311,113
70 CHD
1,619,203
1,270,600
83 GIH
1,631,060
1,391,578
82 JPT
1,637,610
1,272,736
83 LWK
1,631,688
1,507,520
71 MEX
1,614,892
1,430,334
171 MKK
1,621,427
1,525,239
77 TSI
1,629,957
1,393,925
163 YRI
1,634,666
1,484,416
Phase 3 Data
• HapMap format:
http://ftp.hapmap.org/genotypes/2008-07_phaseIII/hapmap_format
* Excluded 1,527 SNPs with strandedness issues & 411 indels
• PLINK format:
http://ftp.hapmap.org/genotypes/2008-07_phaseIII/plink_format
• HapMap3 sites:
Broad - http://www.broad.mit.edu/~debakker/p3.html
Sanger - http://www.sanger.ac.uk/humgen/hapmap3/
Baylor - http://www.hgsc.bcm.tmc.edu/projects/human/
Goals of This Tutorial
This tutorial will show you how to:
•
Find HapMap3 SNPs near a gene or region of interest (ROI)
–
–
–
–
–
–
•
Visualize allele frequencies in HapMap3 populations
Download SNP genotypes in ROI for use in Haploview 4.1
Identify GWA hits in the vicinity of ROI & visualize in the context of
all chromosomes (karyogram)
Add custom data onto the GWAs karyogram
Add custom tracks of association data onto ROI
Create publication-quality images
Download the entire HapMap3 data set in bulk
–
Distinguish genotype data in PLINK and HapMap formats
•
Visualize LD patterns, find tag SNPs, impute genotypes using
release #24 (phase 1+2)
•
Generate customized extracts of the entire dataset using
HapMart
1: Surf to the HapMap Browser
1a. Go to
www.hapmap.org
1b. Select
“HapMap phase 3”
2: Search for TCF7L2
2. Type search term
– “TCF7L2”
Search for a gene
name, a chromosome
band, or a phrase like
“insulin receptor”
3: Examine Region
Chromosome-wide
summary data is
shown in overview
Default tracks show
HapMap genotyped SNPs,
refGenes with exon/intron
splicing patterns, etc.
Region view puts
your ROI in
genomic context
3: This exonic region has
many typed SNPs. Click
on ruler to re-center
image.
3: Examine Region (cont)
Use the
Scroll/Zoom
buttons and menu
to change position
& magnification
3: Mouse over a SNP to see
allele frequency table
As you zoom in
Click
to gothe
to SNP
further,
details
pageto
display
changes
include more detail
4: Generate Text Reports
4: Select the desired
“Download” option and press
“Go” or “Configure”
Available phase 3 downloads:
- Individual genotypes
- Population allele &
genotype frequencies
4: Generate Reports (cont)
The Genotype download
format can be saved to
disk or loaded directly
into Haploview v4.1
5: Find GWA hits
5a: Scroll down to turn on
GWA studies tracks in
overview & region panels
5b: Find GWA hits in
nearby region. Click on a
GWA hit to re-center
5: Find GWA hits (cont)
5c: Mouse over & click on
GWA hit for more info
6: Examine GWA hits in
entire genome
6: From www.hapmap.org,
select “Karyogram”
6: Custom GWA hits in karyogram
6: Follow these instructions to
upload your own GWA data
Detailed help on
the format is under
the “Help” link
7: Create your own tracks
Example:
• Interested in T2DM genetics
• Create file with custom annotations from
http://www.broad.mit.edu/diabetes and
superimpose on the HapMap 7: Upload example file:
TCF7L2_annotations.txt
Detailed help on
the format is under
the “Help” link
7: Create your own tracks (cont)
Some SNPs were typed
(known platform) and
others were imputed.
Format data for both
typed & imputed SNPs.
Scores allow you
to display data in
quantitative
form, such as XY
plots
Save as a text
file!
7: Create your own tracks (cont)
Remember to point your
browser to the location
of your annotations
(TCF7L2 gene in this
case).
7: Create your own tracks (cont)
Make edits on
your own browser
window by
clicking on “Edit
File…”
7: Create your own tracks (cont)
8: Create Image for Publication
Click on the
+/- sign to
hide/show a
section
8a. Click on “High-res
Image”
Mouse over a track
until a cross appears.
Click on track name to
drag track up or
down.
8: Image for Publication (cont)
8b. Click on “View SVG Image in new
browser window”
8c. Save generated file
with “.svg” extensions
Can view file in Firefox,
but use other programs
(Adobe Illustrator or
Inkscape) to convert to
other formats and/or edit
8: Image for Publication (cont)
Inkscape is free and
lets you edit and
convert to other
formats (many
journals prefer EPS)
9. Bulk downloads
Or directly
click on “Data”
18. From www.hapmap.org,
click on “Bulk Data
Download”
9. Bulk downloads
Download the entire HapMap3 data set to your own computer
HapMap3
genotypes &
frequencies
9a. Select
“Genotypes”
Analytic results (LD &
phased haplotype data
available for HapMap3)
Your own copy
of the HapMap
Browser
Protocols &
assay design
HapMap
Samples
Also available at http://ftp.hapmap.org
9. Bulk downloads (cont)
9b. Click on hapmap_format/forward to
download genotypes
Also at http://ftp.hapmap.org/genotypes/latest_phaseIII_ncbi_b36/
10: Surf to the HapMap phase 1+2
genome browser
10. Go to www.hapmap.org &
select “HapMap Genome
Browser B36”
11: Search for TCF7L2
11. Type search term
– “TCF7L2”
12: Examine Region
12. Re-center & zoom in
12: Turn on LD & Haplotype Tracks
12a: Scroll down to the
“Tracks” section. Turn on
the LD Plot and Haplotype
Display tracks.
12b: Press
“Update Image”
These sections allow
you to adjust the
display and to
superimpose your own
data on the HapMap
13: View variation patterns
Triangle plot shows LD
values using r2 or
D’/LOD scores in one or
more HapMap
populations
Phased haplotype
track shows all 120
chromosomes with
alleles colored yellow
and blue
14: Adjust Track Settings (on the spot)
14a. Click on question
mark preceding
track name
14b. Adjust population
and display settings &
press “Configure”
14: Adjust Track Settings (cont)
Select the analysis
track to adjust and
press “Configure”
15: Turn on Tag SNP Track
15: Activate the “tag
SNP Picker” and press
“Update Image”
16: Adjust tag SNP picker
Tag SNPs are selected
on the fly as you
navigate around the
genome
16a: Click on question mark
behind “tag SNP Picker”
Alternatively, you
may select “Annotate
tag SNP Picker” and
press “Configure…”
16: Adjust tag SNP picker (cont)
Select population
Select tagging
algorithm and
parameters
16b: Press “Configure” to
save changes
[optional] upload list
of SNPs to be
included, excluded, or
design scores
17: Impute genotypes using
HapMap Data
• Interested in the
VAV1 gene
• Commercially
available platforms
with few overlapping
SNPs in this region
• HapMap
genotyped lots of
SNPs in region
 Use genotypes for HapMap SNPs to impute genotypes &
compare non-overlapping SNP sets!
17: Impute genotypes using
MACH1
17b. Select “Download Impute Data”, click
“Configure”
17a. Go to chr19:6,765,000..6,900,000
17: Configure MACH1
17c. Upload input files: example.dat &
example.ped. Enter e-mail address.
Click “Go”
17: Impute genotypes: Input files
• example.dat (20 user-provided SNPs; all should be part of
the HapMap):
M rs4807101
M rs164022
M rs625828
M rs461970
M rs331684
…
• example.ped (genotypes for 336 unrelated inds):
PED00001 IND00001 0 0 2 C/C C/C T/T C/T C/C G/G G/G …
PED00002 IND00002 0 0 1 C/T C/C T/T T/T C/C A/A A/G …
PED00003 IND00003 0 0 2 T/T G/G A/A C/T C/C A/G A/G …
…
17d. Return to browser
17. Visualize imputed SNPs
Your imputation results
appear as an external track
that can be edited.
Hint: Click on “Help” link
below for display options
17e. Click “Edit File”
17. Edit external annotations file
17f. Edit annotations file &
“Submit Changes”
17. Edit external annotations file
17: Impute genotypes: Results
• Info (143 provided & imputed HapMap SNPs)
SNP
Al1
rs10419572
rs415218 T
rs4807100 A
rs4807101 T
rs1651876 T
…
Al2
T
A
G
C
C
Freq1
A
0.9709
0.4713
0.4714
0.9631
MAF
0.9041
0.0291
0.4713
0.4714
0.0369
Quality
0.0959
0.9427
0.9790
0.9803
0.9277
17g. Check your e-mail for
text results
Rsq
0.8179
0.0313
0.9625
0.9649
0.0216
0.1069
Probability of match
imputed:experimental
PED00001->IND00001 ML_GENO T/T T/T G/G C/C T/T T/T A/T G/G A/Agenotype
T/T T/C … (1.0 for
PED00002->IND00002 ML_GENO T/T T/T A/G T/C T/T T/T A/T G/G A/A T/T T/C …
provided markers)
PED00003->IND00003 ML_GENO T/T T/T A/A T/T T/T T/T A/T G/G A/A T/T T/T …
• Geno (143 SNPs x 336 inds)
…
• Dose (allele dosage)
PED00001->IND00001 ML_DOSE 1.719 1.911 0.004 0.003 1.913 1.980 1.246 1.884 1.949 1.948 1.302 …
PED00002->IND00002 ML_DOSE 1.861 1.957 1.000 1.000 1.952 1.892 1.086 1.909 1.949 1.948 1.096 …
PED00003->IND00003 ML_DOSE 1.994 1.999 1.993 1.995 1.955 1.656 1.297 1.863 1.987 1.988 1.374…
…
18. Use HapMart to Generate Extracts
of the HapMap Dataset
Find all HapMap characterized SNPs
that:
1. Have a MAF > 0.20 in the Yoruban
population panel (YRI)
2. Cause a nonsynonymous amino acid
change
3. Were typed by Perlegen
Further Information
• HapMap Publications & Guidelines
http://hapmap.cshl.org/publications.html.en
• Past tutorials & user’s guide to HapMap.org
http://www.hapmap.org/tutorials.html.en
• Questions?
[email protected]
HapMap DCC Present Members (CSHL)
Lincoln Stein
Marcela K. Tello-Ruiz
Zhenyuan Lu
Wei Zhao
HapMap DCC Former Members
Lalitha Krishnan
Albert Vernon Smith
Gudmundur Thorisson
Fiona Cunningham