Transcript Slide 1

Fatchiyah, PhD
Dept Biology UB
Fatchiyah.lecture.ub.ac.id
Why do we care about genetic variations?
1. Genetic variations underlie
phenotypic differences among
different individuals
2. Genetic variations determine our predisposition
to complex diseases and responses to drugs and
environmental factors
3. Genetic variations reveal
clues of ancestral human
migration history
Main Types of Genetic Variations
A. Single nucleotide mutation
 Resulting in single nucleotide polymorphisms (SNPs)
 Accounts for up to 90% of human genetic variations
 Majority of SNPs do NOT directly or significantly contribute to any phenotypes
B. Insertion or deletion of one or more nucleotide(s)
1. Tandem repeat polymorphisms
 Tandem repeats are genomic regions consisting of variable length of sequence
motifs repeating in tandem with variable copy number.
 Used as genetic markers for DNA finger printing (forensic, parentage testing)
 Many cause genetic diseases
 Microsatelites (Short Tandem Repeats): repeat unit 1-6 bases long
 Minisatelites: repeat unit 11-100 bases long
2. Insertion/Deletion (INDEL or DIPS) polymorphisms
Often resulted from localized rearrangements between homologous tandem
repeats.
C. Gross chromosomal aberration
 Deletions, inversions, or translocation of large DNA fragments
 Rare but often causing serious genetic diseases
How many variations are present
in human genome?
 SNPs appear once per 0.1-1 kb interval or on average 1
per 300 bp. Considering the size of entire human
genome (3.2 x109 bp), the total number of SNPs is well
above 11 million. The high density and relatively easier
assay make SNPs the ideal genomic markers.
 In sillico estimation of potentially polymorphic variable
number tandem repeats (VNTR) are over 100,000
across the human genome
 The short insertion/deletions are very difficult to
quantify and the number is likely to fall in between
SNPs and VNTR.
Types of Single Base Substitutions
 Transitions
Change of one purine (A,G) for another purine, or a
pyrimidine (C,T) for another pyrimidine
 Transversions
Change of a purine (A,G) for a pyrimidine (C,T), or vice
versa.
 The cytosine to thymine (C>T) transition accounts for
approximately 2 out of every 3 SNPs in human
genome.
SNP or Mutation?
 Call it a SNP IF
the single base change occurs in a population at a
frequency of 1% or higher.
 Call it a mutation IF
the single base change occurs in less than 1% of a
population.
 A SNP is a polymorphic position where the point
mutation has been fixed in the population.
From a Mutation to a SNP
SNPs Classification
SNPs can occur anywhere on a genome, they are classified based on their locations.
 Intergenic region
 Gene region
can be further classified as promoter region, and coding region
(intronic, exonic, promoter region, UTR, etc.)
Coding Region SNPs
 Missense – amino acid change
 Nonsense – changes amino acid to stop codon.
Geospiza Green Arrow™ tutorial by Sandra Porter, Ph.D.
 Synonymous
 Non-Synonymous
The Consequences of SNPs
The phenotypic consequence of a SNP is
significantly affected by the location where it
occurs, as well as the nature of the mutation.
 No consequence
 Affect gene transcription quantitatively or
qualitatively.
 Affect gene translation quantitatively or
qualitatively.
 Change protein structure and functions.
 Change gene regulation at different steps.
Simple/Complex Genetic Diseases and SNPs
 Simple genetic diseases (Mendelian diseases) are
often caused by mutations in a single gene.
-- e.g. Huntington’s, Cystic fibrosis, PKU, etc.
 Many complex diseases are the result of mutations in
multiple genes, the interactions among them as well
as between the environmental factors.
-- e.g. cancers, heart diseases, Alzheimer's, diabetes,
asthmas, etc.
 Majority of SNPS may not directly cause any diseases.
 SNPs are ideal genomic markers (dense and easy to
assay) for locating disease loci in association studies.
Main Genetic Variation Resources
 NCBI dbSNP
http://www.ncbi.nlm.nih.gov/SNP/index.html
 NCBI Online Mendelian Inheritance in Man (OMIM)
http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM
 International HapMap Project
http://www.hapmap.org/
 Perlegen
http://genome.perlegen.com
 Genome Variation Server (Seattle SNPs)
http://gvs.gs.washington.edu/GVS/
Where to Find Bioinformatics Resources for
Genetic Variation Studies?
 OBRC: Online Bioinformatics Resources
Collection (Univ. of Pittsburgh)
http://www.hsls.pitt.edu/guides/genetics/obrc
The most comprehensive annotated bioinformatics databases
and software tools collection on the Web, with over 200
resources relevant to genetic variation studies.
 HUGO Mutation Database Initiative
http://www.hgvs.org/dblist/dblist.html
NCBI dbSNP Database: Overview
 URL: http://www.ncbi.nlm.nih.gov/SNP/index.html
 The NCBI’s Single Nucleotide Polymorphism
database (dbSNP) is the largest and primary publicdomain archive for simple genetic variation data.
 The polymorphisms data in dbSNP includes:
 Single-base nucleotide substitutions (SNPs)
 Small-scale multi-base deletions or insertions variations
(also called deletion insertion polymorphisms or DIPs or
INDELs)
 Microsatellite tandem repeat variations (also called short
tandem repeats or STRs).
dbSNP Data Stats (build 128, Oct, 2007)
http://www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi
dbSNP Data Types
The dbSNP contains two classes of records:
 Submitted record
The original observations of sequence
variation; submitted SNPs (SS) records
started with ss (ss5586300)
 Computationally annotated record
Generated during the dbSNP "build" cycle by
computation based the original submitted
data, Reference SNP Clusters (ref SNP) start
with rs (rs4986582)
dbSNP Submitted Record
 Provides information on the SNP and conditions under which
it was collected.
 Provides links to collection methods (assay technique),
submitter information (contact data, individual submitter),
and variation data (frequencies, genotypes).
ss5586300
From Submitted Record to Reference SNP Cluster
SNPs records submitted
by researchers
SNP position mapped
to the reference genomic
contigs
If the SNP position not unique, it will be
assigned to the existing RefSNP cluster
If the SNP position is
unique, a new RS# is
assigned
Different Ways to Search SNPs in dbSNP
 dbSNP Web site
http://www.ncbi.nlm.nih.gov/SNP/index.html
Direct search of SS record; batch search; allow SNP record
submission; NO search limits
 Entrez SNP
http://www.ncbi.nlm.nih.gov/sites/entrez?db=Snp
Search limits options allows precise retrieval
 Entrez Gene Record’s SNP Links Out Feature
Direct links to corresponding SNP records; access to genotype
and linkage disequilibrium data
 NCBI’s MapViewer
Visualize SNPs in the genomic context along with other types of
genetic data.
Search SNPs from dbSNP Web Page
 dbSNP Web site
http://www.ncbi.nlm.nih.gov/SNP/index.html
Search SNPs from Entrez SNP Web Page
 Entrez SNP
http://www.ncbi.nlm.nih.gov/sites/entrez?db=Snp
The dbSNP is a part of the Entrez integrated information
retrieval system and may be searched using either qualifiers
(aliases) or a combination search limits from 14 different
categories.
Entrez SNP Search Limits
 Organisms
 Chromosome (including W and Z for non-mammals)
 Chromosome Ranges
 Map Weight (how many times in genome)
 Function Class (coding non-synonymous; intron; etc.)
 SNP Class (types of variations)
 Method Class (methods for determining the variations)
 Validation Status (if and how the data is validated)
 Variation Alleles (using IUPAC- codes)
 Annotation (Records with links to other NCBI database)
 Heterozygosity (% of heterozygous genotype)
 Success Rate (likelihood that the SNP is real)
 Created Build ID
 Updated Build ID
http://www.ncbi.nlm.nih.gov/portal/query.fcgi?db=Snp
http://www.ensembl.org/common/helpview?kw=snpview;ref=
Adapted from Nature 426, 6968: 789-796 (2003)
Assessing Polymorphisms:
Linkage Disequilibrium, Haplotype Block, and Tag SNPs
 Linkage Disequilibrium (LD): If two alleles tend to be inherited together more often
than would be predicted, then the alleles are in linkage disequilibrium.
 If most SNPs have highly significant correlation to one or more of neighbors, these
correlations can be used to generate haplotypes, which represent excellent proxies for
individual SNP.
 Because haplotypes may be identified by a much small number of SNPs (tag SNPs),
assessing polymorphisms via haplotypes dramatically reduces genotyping work.