No Slide Title

Download Report

Transcript No Slide Title

BioSci D145 Lecture #3
• Bruce Blumberg ([email protected])
– 4103 Nat Sci 2 - office hours Tu, Th 3:30-5:00 (or by appointment)
– phone 824-8573
• TA – Bassem Shoucri ([email protected])
– 4351 Nat Sci 2, 824-6873 – Monday 2-4 (Wednesday this week)
• check e-mail and noteboard daily for announcements, etc..
– Please use the course noteboard for discussions of the material
• lectures will be posted on web pages after lecture
– http://blumberg.bio.uci.edu/biod145-w2015
– http://blumberg-lab.bio.uci.edu/biod145-w2015
BioSci D145 lecture 1
page 1
©copyright
Bruce Blumberg 2014. All rights reserved
Genome mapping
• The problem – genomes are large, workable fragments are small
– How to figure out where everything is?
– How to track mutations in families or lineages?
• analogy to roadmaps
– The most useful maps do not have too much detail but have major
features and landmarks that everything can be related to
• Allows genetic markers to be related to physical markers
• What sorts of maps are useful for genomes?
– Restriction maps of various sorts (most often of large insert libraries)
• RFLPs, fingerprints
– Recombination maps, how often to traits segregate together
– Physical maps – which genes occur on same chunks of DNA
BioSci D145 lecture 2
page 2
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• How are maps made?
– Restriction digestion and ordering of fragments to build contigs
• Fingerprinting
– Location of marker sequences onto larger chunks
– Hybridization of markers to larger chunks
– Calculation of recombination frequencies between loci
• What do we map these days?
– BACs are most common target for mapping of new genomes
– Radiation hybrid panels still in wide use
– Goal is always to map markers onto ordered large fragments and infer
location of genes relative to each other.
– HAPPY mapping becoming widely used again
BioSci D145 lecture 2
page 3
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd) (stopped here)
• Useful markers
– STS – sequence tagged sites
• Short randomly acquired sequences
• PCRing sequences, then prove by
hybridization that only a
single sequence is amplified/genome
– VERY tedious and slow
• validated ones mapped back
to RH panels
• Orders sequences on large chunks of DNA
– STC – sequence tagged connectors
• Array BAC libraries to 15x
coverage of genome
• Sequence BAC ends
• Combine with genomic maps
and fingerprints to link clones
– Average about 1 tag/5 kb
• Most useful preparatory
to sequencing
BioSci D145 lecture 2
page 4
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Useful markers (contd)
– ESTs – expressed sequence tags
• randomly acquired cDNA sequences
• Lots of value in ESTs
– Info about diversity of genes expressed
– Quick way to get expressed genes
• Better than STS because ESTs are expressed genes
• Can be mapped to
– chromosomes by FISH
– RH panels
– BAC contigs
– Polymorphic STS – STS with variable lengths
• Often due to microsatellite differences
• Useful for determining relationships
• Also widely used for forensic analysis
– OJ, Kobe, etc
BioSci D145 lecture 2
page 5
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Useful markers (contd)
– SNPs – single nucleotide polymorphisms
• Extraordinarily useful - ~1/1000 bp in humans
• Much of the differences among us are in SNPs
• SNPs that change restriction sites cause RFLPs (restriction fragment
length polymorphisms
• Detected in various ways
– Hybridization to high density arrays (Affymetrix)
– Sequencing
– Denaturing electrophoresis or HPLC
– Invasive cleavage
• Tony Long in E&E Biology has method for ligation mediated SNP
detection that they use for evolutionary analyses
BioSci D145 lecture 2
page 6
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Useful markers (contd)
– RAPDs – randomly amplified polymorphic DNA
• Amplify genomic DNA with short, arbitrary primers
• Some fraction will amplify fragments that differ among individuals
• These can be mapped like STS
• Issues with PCR amplification
• Benefit – no sequence information required for target
– AFLPs – amplified fragment length polymorphisms
• Cut with enzymes (6 and 4 cutter) that yield a variety of small
fragments ( < 1 kb)
• Ligate sequences to ends and amplify by PCR
• Generates a fingerprint
– Controlled by how frequently enzymes cut
• Often correspond to unique regions of genome
– Can be mapped
• Benefit – no sequence required.
BioSci D145 lecture 2
page 7
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Fingerprinting
– Array and spot ibraries
– Probe with short oligos (10-mers)
• Repeat
– Build up a “fingerprint” for each clone
– Can tell which ones share sequences
• tedious
BioSci D145 lecture 2
page 8
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Mapping by walking/hybridization
– Start with a seed clone then walk along the chromosome
– Takes a LOOONNNNGGG time
– Benefit – can easily jump repetitive sequences
BioSci D145 lecture 2
page 9
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Mapping by hybridization
– Array library – pick a “seed clone”
– See where it hybridizes, pick new seed and repeat
– Product
BioSci D145 lecture 2
page 10
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd) Restriction mapping of large insert clones
• Mapping by restriction digest fingerprinting
– Order clones by comparing patterns from restriction enzyme digestion
BioSci D145 lecture 2
page 11
©copyright
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• FISH - Fluorescent in situ hybridization – can detect chromosomes or genes
– Can localize probes to chromosomes and
– Relationship of markers to each other
– Requires much knowledge of genome being mapped
– Chromosome painting
BioSci D145 lecture 2
page 12
©copyright
marker detection
Bruce Blumberg 2007. All rights reserved
Genome mapping (contd)
• Radiation hybrid mapping
– Old but very useful technique (Geisler paper)
• Lethally irradiate cells with X-rays
• Fuse with cells of another species, e.g., blast human cells then fuse
with hamster cells
– Chunks of human DNA will remain in mouse cells
• Expand colonies of cells to get a collection of cell lines, each
containing a single chunk of human cDNA
• Collection = RH panel
– Now map markers onto these RH panels
• Can identify which of any type of markers map together
– STS, EST (very commonly used), etc
• Can then map others by linkage to the ones you have mapped
– Compare RH panel with other maps
• Utility – great for cloning gaps in other maps
• HAPPY Mapping –
– PCR-based method – see Bassem’s presentation
BioSci D145 lecture 3
page 13
©copyright
Bruce Blumberg 2004. All rights reserved
Genome mapping (contd)
• How should maps be made with current knowledge?
– All methods have strengths and weaknesses – must integrate data for
useful map
• e.g, RH panel, BAC maps, STS, ESTs
– Size and complexity of genome is important
• More complex genomes require more markers and time mapping
– Breakpoints and markers are mapped relative to each other
– Maps need to be defined by markers (cities, lakes, roads in analogy)
– Key part of making a finely detailed map is construction of genomic
libraries and cell lines for common use
• Efforts by many groups increase resolution and utility of maps
• Current strategies
– BAC end sequencing
– Whole genome shotgun sequencing
– EST sequencing
– HAPPY mapping
– Mapping of above to RH panels
BioSci D145 lecture 3
page 14
©copyright
Bruce Blumberg 2004. All rights reserved
DNA sequence analysis
• DNA sequencing = determining the nucleotide sequence of DNA
– Two main methods
– shared Nobel prize in 1980
• Chemical cleavage – Maxam and Gilbert
• Enzymatic sequencing (based on polymerization reaction)
Nobel Prize in Chemistry 1980
Walter Gilbert (Harvard) & Frederick
Sanger (MRC Labs)
(Sanger also won Nobel in 1958 for protein
sequencing)
How many others have won 2 Nobel prizes? In the same field?
BioSci D145 lecture 4
page 15
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Only people to have won 2 Nobel Prizes
Marie Sklodowska Curie
1903
in Physics – (radioactive effect)
1911
in Chemistry – (radium and polonium)
Husband Pierre, daughter Irene, and son-in-law Frederick Joliot also won
as did son-in-law Henry R. Labouisse (UNICEF)
Linus Pauling
1954
in Chemistry (nature of chemical bond)
1962
in Peace (crusade to ban atmospheric nuclear testing)
John Bardeen
1956
in Physics (transistor)
1972
in Physics (superconductivity)
1958
Frederick Sanger
in Chemistry (protein sequencing)
1980
in Chemistry (DNA sequencing)
BioSci D145 lecture 4
page 16
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis
• Maxam and Gilbert
– One of the first reasonable sequencing methods
– Very popular in late 70s and early 80s
– VERY TEDIOUS!!
• Totally superceded by dideoxy sequencing now
BioSci D145 lecture 4
page 17
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis (contd)
• Dideoxy sequencing – Sanger 1977
– Virtually all sequencing is done
this way now
– Requires modified nucleotide
• 2’3’-dideoxy dNTP
– DNA polymerase incorporates
the ddNTP and chain
elongation terminates
– Original method used 4
separate elongation reactions
– Products separated by
denaturing PAGE and visualized
by autoradiography
BioSci D145 lecture 4
page 18
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis (contd)
• Dideoxy sequencing (contd) – Sanger 1977
– Dideoxy NTPs present at ~1% of [dNTP]
– Each reaction has identified end
– In principle, all possible chain lengths are represented
• varies by [dNTPs], [ddNTPs], [primer] and [template] and ratios
BioSci D145 lecture 4
page 19
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis (contd)
A
ACGT ACGT
BioSci D145 lecture 4
page 20
©copyright
Bruce Blumberg 2004-2007. All rights reserved
C
G
T
Automated DNA sequence analysis
• How to improve throughput of sequencing?
– Incorporate fluorescent ddNTPs, separate products by PAGE
• Base calling and lane calling issues
– Key advance was capillary sequencers
• Separate DNA in a thin capillary instead of gel
• Very accurate, no tracking errors, much more automation friendly
BioSci D145 lecture 4
1.
Trace files (dye signals) are analyzed and bases called to
create chromatograms.
2.
Chromatograms from opposite strands are reconciled
with software to create double-stranded sequence data.
page 21
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Automated DNA sequence analysis
• Capillaries vs gels
– Capillaries much faster – higher field strength possible
– Fully automated = higher throughput
BioSci D145 lecture 4
page 22
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Applied Biosystems PRISM 377
(Gel, 34-96 lanes)
BioSci D145 lecture 4
page 23
©copyright
Applied Biosystems PRISM 3700
(Capillary, 96 capillaries)
Bruce Blumberg 2004-2007. All rights reserved
PCR – polymerase chain reaction amplification of DNA
• PCR is most routinely used method to amplify
DNA
– Exponential amplification of DNA by
polymerases – Saiki et al, 1985
• 2n fold amplification, n= # cycles
– 35 cycles = 235 = 3.4 x 1010 fold
• Originally used DNA polymerase I
– Needed to add fresh enzyme at
every cycle because heat
denaturation of template killed
the enzyme
– Not widely used – too painful to
do manually
– Nobel Prize to Kary Mullis in 1993 for
deciding to use Taq DNA polymerase for
PCR
• He was middle author on paper!
BioSci D145 lecture 4
page 24
©copyright
Bruce Blumberg 2004-2007. All rights reserved
PCR – polymerase chain reaction amplification of DNA (contd)
Hot water bacteria:
Thermus aquaticus
Taq DNA polymerase
Life at High Temperatures by Thomas D. Brock
Biotechnology in Yellowstone
© 1994 Yellowstone Association for Natural Science
http://www.bact.wisc.edu/Bact303/b27
BioSci D145 lecture 4
page 25
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Cycle sequencing – fusion of PCR and fluorescent ddNTP sequencing
• http://www.dnalc.org/ddnalc/resources/animations.html
• Combine PCR amplification with
dideoxy sequencing – cycle sequencing
– Linear amplification of template
in the presence of fluorescent ddNTPs
– When nucleotides are used up
reaction is over
– Separate on capillary electrophoresis
instrument
– Advantages
• Fast, single tube reaction
• Works with small amounts of
starting material
– Disadvantages
• Still need to prepare high
quality template to sequence
• Cost and time
– Many sequencing centers spend
time, $$ on template prep
– Automation requirements
BioSci D145 lecture 4
page 26
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Isothermal amplification – the solution to template preparation
• How to make template preparation faster, easier and more reliable?
– Eliminate automation requirement, amplify starting material in some
other way
– Φ29 DNA polymerase (aka TempliPhi)
– http://www.gelifesciences.com/aptrix/upp01077.nsf/content/sample_pr
eparation~product_selection_category~rolling_circle_amplification
– Enzyme has high processivity and strand displacement activity
• Isothermal reaction produces huge quantities of DNA from tiny
amount of input
• More efficient than PCR (no temp change, no machine, no cleanup)
BioSci D145 lecture 4
page 27
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Modern DNA sequence analysis
• Cycle sequencing
– Virtually all DNA sequencing today is done by cycle sequencing with
fluorescent ddNTPs
• ABI Big Dye chemistry
– Template preparation still tedious for small scale
• TempliPHi used in genome centers (obviated need for most
automation)
– Capillary sequencers predominant form of technology in use
• But, next generation sequencing is already coming online and will rapidly
displace old technology in genome centers.
– 454 sequencing (Roche)
– Solexa (Illumina)
– SoLID (Applied Biosystems)
• 3rd generation sequencing (individual DNA molecule) now available
– e.g., Pacific Biosciences (sequence reads of 1,000-10K bases)
BioSci D145 lecture 4
page 28
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis
• Landmarks in DNA sequencing
– Sanger, Nicklen and Coulson. Sequencing with chain terminating
inhibitors. Proc. Natl. Acad. Sci. 74, 5463-5467 (1977).
– Sanger, F. et al. The nucleotide sequence of bacteriophage ΦX174. J Mol
Biol 125, 225-46. (1978).
– Sutcliffe, J. G. Complete nucleotide sequence of the Escherichia coli
plasmid pBR322. Cold Spring Harb Symp Quant Biol 43, 77-90. (1979).
– Sanger et al., Nucleotide sequence of bacteriophage lambda DNA. J Mol
Biol 162, 729-73. (1982).
– Messing, J., Crea, R. & Seeburg, P. H. A system for shotgun DNA
sequencing. Nucl.Acids Res 9, 309-21 (1981).
– Anderson, S. et al. Sequence and organization of the human
mitochondrial genome. Nature 290, 457-65 (1981).
– Deininger, P. L. Random subcloning of sonicated DNA: application to
shotgun DNA sequence analysis. Anal Biochem 129, 216-23. (1983).
– Baer et al. DNA sequence and expression of the B95-8 Epstein-Barr virus
genome. Nature 310, 207-11. (1984). (189 kb)
– Innis et al. DNA sequencing with Taq DNA polymerase and direct
sequencing of PCR-amplified DNA Proc. Natl. Acad. Sci. 85, 9436-9440
(1988)
BioSci D145 lecture 4
page 29
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis (contd)
• Landmarks in DNA sequencing (contd).
– 1995 - Haemophilus influenzae (1.83 Mb)
• first bacterium sequenced, human pathogen
– 1995 - Mycoplasma genitalium (0.58 Mb)
– 1996
– 1996
– 1997
– 1997
– 1997
– 1997
• smallest free living organism
- Saccharomyces cerevisiae genome (13 Mb)
- Methanococcus jannaschii (1.66 Mb)
• first Archaebacterium
- Escherichia coli (4.6 Mb)
- Bacillus subtilis (4.2 Mb)
- Borrelia burgdorferi (1.44 Mb)
• Lyme disease
- Archaeoglobus fulgidus (2.18 Mb)
• first sulfur metabolizing bacterium
– 1997 - Helicobacter pylori (1.66 Mb)
• first bacterium proven to cause cancer
BioSci D145 lecture 4
page 30
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA sequence analysis (contd)
• Landmarks in DNA sequencing (contd)
– 1998 - Treponema pallidum (1.14 Mb)
– 1998 - Caenorhabditis elegans genome (97 Mb)
– 1999 - Deinococcus radiodurans (3.28 Mb)
• resistant to radiation, starvation, ox stress
– 2000 - Drosophila melanogaster (120 Mb)
– 2000 - Arabidopsis thaliana (115 Mb)
– 2001 - Escherichia coli O157:H7 (4.1 Mb)
• Pathogenic variant of E. coli
– 2001 – draft Human “genome”
– 2002 – mouse genome
– 2002 – Ciona intestinalis
–
–
–
–
• Primitive chordate
2003 – “complete “human genome
2004 – rat genome
2006 – Human “genome” complete sequence of all chromosomes
Many more genomes underway, check JGI, Sanger and other web sites
BioSci D145 lecture 4
page 31
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA Sequence analysis
• Complete DNA sequence (all nts both strands, no gaps)
– complete sequence is desirable but takes time
• how long depends on size and strategy employed
– which strategy to use depends on various factors
• how large is the clone?
– cDNA
– genomic
• How fast is sequence required?
• sequencing strategies
– primer walking
– cloning and sequencing of restriction fragments
– progressive deletions
• Bidirectional, unidirectional
– Shotgun sequencing
• whole genome
• with mapping
– map first (C. elegans)
– map as you go (many)
BioSci D145 lecture 4
page 32
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA Sequence analysis (contd)
• Primer walking - walk from the ends with oligonucleotides
– sequence, back up ~50 nt from end, make a primer and continue
• Why back up?
– Need to see overlap to
be sure about sequence
you are reading
BioSci D145 lecture 4
page 33
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA Sequence analysis (contd)
• Primer walking (contd)
– advantages
• very simple
• no possibility to lose bits of DNA
– restriction mapping
– deletion methods
• no restriction map needed
• best choice for short DNA
– disadvantages
• slowest method
– about a week between sequencing runs
• oligos are not free (and not reusable)
• not feasible for large sequences
– applications
• cDNA sequencing when time is not critical
• targeted sequencing
– verification
– closing gaps in sequences
BioSci D145 lecture 4
page 34
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA Sequence analysis (contd)
• Cloning and sequencing of restriction fragments
– once the most popular method
• make a restriction map,
subclone fragments
• sequence
– advantages
• straightforward
• directed approach
• can go quickly
• cloned fragments often useful otherwise
– RNase protection, nuclease mapping, in situ hybridization
– disadvantages
• possible to lose small fragments
– must run high quality analytical gels
• depends on quality of restriction map
– mistaken mapping -> wrong sequence
• restriction site availability
– applications
• sequencing small cDNAs
• isolating regions to close gaps
BioSci D145 lecture 4
page 35
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA Sequence analysis (contd)
• nested deletion strategies - sequential deletions from one end of the clone
– cut, close and sequence
• Approach
– make restriction map
– use enzymes that cut in polylinker and insert
– Religate, sequence from end with restriction site
– repeat until finished, filling in gaps with oligos
• advantages
– Fast, simple, efficient
• disadvantages
– limited by restriction site availability in vector and insert
– need to make a restriction map
BioSci D145 lecture 4
page 36
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA Sequence analysis (contd)
• nested deletion strategies (contd)
– Exonuclease III-mediated deletion
• cut with polylinker enzyme
– protect ends » 3’ overhang
» phosphorothioate
• cut with enzyme between first
cut and the insert
– can’t leave 3’ overhang
• timed digestions with Exonuclease III
• stop reactions, blunt ends
• ligate and size select recombinants
• sequence
• advantages
– unidirectional
– processivity of enzyme
gives nested deletions
BioSci D145 lecture 4
page 37
©copyright
Bruce Blumberg 2004-2007. All rights reserved
DNA Sequence analysis (contd)
• Nested deletion strategies
– Exonuclease III-mediated deletion (contd)
• disadvantages
– need two unique restriction sites flanking insert on each side
– best used successively to get > 10kb total deletions
– may not get complete overlaps of sequences
» fill in with restriction fragments or oligos
• applications
– method of choice for moderate size sequencing projects
» cDNAs
» genomic clones
– good for closing larger gaps
• Small-scale sequence analysis – how is it practiced today?
– Primer walking
– ExoIII-mediated deletion with primer walking
BioSci D145 lecture 4
page 38
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Genome sequencing
• The problem
– Genome sizes for most eukaryotes are large (108-109 bp)
– High quality sequences only about 600-800 bp per run
• The solution
– Break genome into lots of bits and sequence them all
– Reassemble with computer
• The benefit
– Rapid increase in information about genome size, gene comparisons, etc
• The cost
– 3 x 109 bp(human haploid genome) ÷ 600 bp/reaction = 5 x 106 reactions
for 1x coverage!
– Need both strands (x2), need overlaps and need to be sure of sequences
– ~107-108 reactions/runs required for a human-sized genome
– About $1-2 per reaction these days, ~$8 commercially.
BioSci D145 lecture 4
page 39
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Genome sequencing (contd)
• Shotgun sequencing NOT invented by Craig Venter
– Messing 1981 first description of shotgun sequencing
– Sanger lab developed current methods in 1983
– approach
• blast genome into small chunks
• clone these chunks
– 3-5 kb, 8 kb plasmid
– 40 kb fosmid jump
repetitive sequences
• sequence + assemble by computer
– A priori difficulties
• how to get nice uniform distribution
• how to assemble fragments
• what to do about repeats?
• How to minimize sequence redundancy?
BioSci D145 lecture 4
page 40
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Genome sequencing(contd)
BioSci D145 lecture 4
page 41
©copyright
Bruce Blumberg 2004-2007. All rights reserved
Genome sequencing(contd)
BioSci D145 lecture 4
page 42
©copyright
Bruce Blumberg 2004-2007. All rights reserved