Document

Transcript Document

Scotty Merrell
B4137
295-1584
[email protected]
Structural Genomics
(What is Genomics?)
Genomics defined:
"the study of functions and interactions
of all the genes in the genome, including
their interactions with environmental factors"
Has led to a new scientific vocabulary:
transcriptome, proteome, secretome, virulome, metabolome
Why the huge interest in Genomics?
Provides comprehensive list of genes (and the proteins they encode) for the entire organism
Provides starting point from which a genome wide understanding of systems
and networks can be initiated
Provides a global picture of genome organization
Allows for the identification of genes families, their distribution between phylogenetic
lineages, and permits insight into gene and genomic evolution on an unprecedented scale
Permits comparison of the global genetic composition of different organisms that
occupy the same niche/different niches
Provides an inventory of genes required for housekeeping function
-----understanding differences in genetic basis of these functions in different
phylogenetic lineages is central to understanding life itself
Practical applications of data generated by genomics:
Comprehensive study of microbial pathogenesis and the interaction
between pathogens and their hosts
Identification of sensitive and specific molecular targets suitable for microbial
identification, typing, and for use as markers of anti-microbial resistance
Discovery of microbial molecular markers associated with substantial
variance in the risk and severity of disease
Selection of potential candidates for the rational development of new
therapeutic agents and vaccines
Identification of genes encoding systems that are unique to bacteria or a
particular pathogen
The creation of the field of Genomics
was made possible by the development of new
technologies that made it possible to
sequence entire genomes.
The old way, aka
“back when I was a kid”
Primer,
Nucleotides
Polymerase
Radionucleotides
Dideoxy termination system:
While the DNA polymerase will add a
dideoxynucleotide complementary to the template strand,
it cannot further extend that product after the addition of
a dideoxynucleotide. This biochemistry is used to produce
populations of products specifically terminated at either
A, G, C or T residues. These are labeled in some way and
visualized after separation by electrophoresis.
This figure shows the structure of a dideoxynucleotide (notice the H atom attached
to the 3' carbon). Also depicted in this figure are the ingredients for a Sanger
reaction. Notice the different lengths of labeled strands produced in this reaction
dATP
One method for labeling is to use radioactive
nucleotides (P32 or P33 or S35) to label the
oligonucleotide primer. Four reactions are
performed (one each for A,G,C and T), and
electrophoresed side by side in a denaturing
polyacrylamide gel. The products are separated
by size at base resolution and the sequence read
from the pattern of bands on the gel.
Qui ckTi me™ and a
TIFF (Uncompressed) decompressor
are needed to see this pictur e.
GATC GATC
Today: The availability of multiple dyes with different
emission spectra led to the development of the
four-dye - one-lane system. Four aliquots of primer
end-labeled with the four different dyes are used to
perform the A,G,C and T reactions. These are pooled
and run in a single lane of a gel. The sequencer reads
the gel by using a spectrophotometer to distinguish
between the different dye spectra, and thus the different bases.
This system has been further improved by the development
of dye-labeled terminators (dideoxynucleotides) that will
simultaneously terminate and fluorescently tag a product.
These reactions can be performed in a single tube, and run
in a single lane. Currently, the four-dye systems can routinely
read >600 bases/lane, and the four-lane one-dye systems can
read over 1kb per reaction.
The two newest sequencing techniques include:
Pyrosequencing is a method of DNA sequencing based on the "sequencing by synthesis"
principle developed initially by Mostafa Ronaghi and co-workers in the late 1990s, then further
by Biotage. The method is based on a chemiluminescent enzymatic reaction, which is triggered
when a molecular recognition event occurs. Essentially, the method allows sequencing of a
single strand of DNA by synthesizing the complementary strand along it. Each time a nucleotide,
A, C, G or T is incorporated into the growing chain a cascade of enzymatic reactions is triggered
which results in a light signal.
454 Sequencing is a massively-parallel sequencing-by-synthesis (SBS) system capable of
sequencing roughly 20 megabases of raw DNA sequence per 4.5-hour run of their current
sequencing machine, the GS20. The system relies on fixing nebulized and adapter-ligated DNA
fragments to small DNA-capture beads in a water-in-oil emulsion. The DNA fixed to these beads
is then amplified by PCR. Finally, each DNA-bound bead is placed into a ~44 μm well on a
PicoTiterPlate, a fiber optic chip. A mix of enzymes such as polymerase, sulfurase, and
luciferase are also packed into the well. The PicoTiterPlate is then placed into the GS20 for
sequencing.At this stage, the four nucleotides (TAGC) are washed in series over the
PicoTiterPlate. During the nucleotide flow, each of the hundreds of thousands of beads with
millions of copies of DNA is sequenced in parallel.
So you want to sequence your favorite bug:
How would you do this?
(what do you need?)
So you want to sequence your favorite bug:
Shotgun Sequencing
The shotgun part comes from the way the
clone is prepared for sequencing: it is
randomly sheared into small pieces (usually
about 1 kb) and subcloned into a "universal"
cloning vector. The library of subfragments is
sampled at random, and a number of sequence
reads generated (using a universal primer
directing sequencing from within the cloning
vector). These sequence reads are then
assembled into contigs, and the complete
sequence of the clone generated.
Genomic DNA is sheared or restricted to
yield random fragments of the required size.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
The fragments are cloned in a universal vector
Sequencing reactions are performed with a
universal primer on a random selection of
the clones in the shotgun library.
are needed to see this picture.
TIFF (Uncompressed) decompressor
QuickTime™ and a
These sequencing reads are assembled in to
contigs, identifying gaps (where there is no
sequence available) and single-stranded regions
(where there is sequence for only one strand).
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
The gaps and single-stranded regions are then
targeted for sequencing to produce the full
sequenced molecule.
Where we are today:
The Comprehensive Microbial Resource (CMR)
contains 401 organisms:
384 completed genomes, 17 incomplete;
28 Archaea, 3 Viruses and 353 Bacteria.
+
Human, Mice, some plants etc.
http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl
Requirements for this?
Culturability vs Nonculturability
What about microbes that we
Don’t know how to grow
Microbial Diversity:
Venter, J.C. et al. Environmental Genome
Shotgun Sequencing of the Sargasso Sea.
Published online in Science March 4, 2004.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
(this was done without culturing the bacteria, etc.)
In the Sargasso Sea, they found 1800 species
of microbes, including 150 new species of
bacteria, and over 1.2 million new genes.
Although they don’t know what most of
these genes do, the research is a first step
to understanding more about life in the
Sargasso Sea and the larger ocean.
It also highlights the fact that we know
relatively little about microbial diversity.
it’s estimated that we’ve been able to culture
less than 1% of microbes.
More requirements for
shotgun sequencing?
Gene must be clonable!
What about genes that are toxic?
Once sequence completed, what now?
ATGAAAAGATTAGAAACTTTGGAATCCATTTTAGAGCGC
TTGAGAATGTCTATCAAAAAAAACGGACTCAAAAATTCA
AAACAGAGAGAAGAAGTGGTGAGCGTTTTGTATCGCAGC
GGCACACACCTAAGCCCTGAAGAAATCACGCATTCTATC
CGCCAAAAGGACAAAAACACTAGCATTTCTTCAGTCTAT
CGCATTTTGAATTTCTTAGAAAAAGAAAATTTTATCTGT
GTTTTAGAAACTTCAAAAAGCGGTCGGCGCTATGAAATT
GCGGCTAAAGAACACCATGATCACATCATTTGTTTGCAT
TGCGGTAAGATCATTGAATTTGCAGACCCTGAAATTGAA
AACCGCCAGAATGAAGTCGTTAAAAAATATCAAGCCAAG
CTGATTAGCCATGACATGAAAATGTTTGTGTGGTGTAAA
GAATGCCAAGAGAGTGAATGTTAA
Annotation and assigning gene function based on homology
1. Finding ORFs
(frame, start, stop)
http://www.tigr.org/tigr-scripts/CMR2/GenePage.spl?locus=HP1027
You’ve found an ORF---Now what?
Look for homologs (hopefully with a known function)
Tool: BLAST (http://www.ncbi.nlm.nih.gov/BLAST/ )
(blastn---nucleotide vs nucleotide
blastp---protein vs protein
blastx--translated query vs protein
cdart---shows conserved domains
etc…
Steps in the Blast algorithm (Blastp)
1.sequence is filtered to remove low complexity regions
2.list of words of length 3 in the query protein sequence is made ( length 11-12 for DNA sequences).
3.words are evaluated for matches with any other combination of 3 amino amino acids using Blosum 62 scoring matrix as
default. Matches of PQG to PEG would score 15, to PRG 14, to PSG 13 and to PQA 12
4.For DNA words, a match score of +5 and a mismatch score of -4 is used corresponding to the changes expected in
sequences separated by a PAM distance of 40
5.a cutoff score T called a neighborhood word score threshold is selected to reduce the number of matches\
6.the above procedure is repeated for each 3-letter word in the query sequence. For a sequence of length 250 amino acids,
the total number of words to search for is approximately 50 x 250 = 12,500.
7.words organized into an efficient search tree for comparing them rapidly to the database sequences.
8.each database sequence is scanned for an exact match to one of the 50 high scoring amino acid words corresponding to
the first query sequence position
9.in Blast2 or gapped Blast, short matched regions called HSPs or high scoring segment pairs lying on the same diagonal
and within a certain distance of each other are extended in each direction as long as the score keeps rising.
10.HSPs of score greater than a cutoff score S are kept.
11.in earlier versions of Blast and some of the later ones, the statistical significance of each HSP score is determined and if
two or more HSP regions are found, thereby providing additional evidence that the query and database sequences are
related, these scores will be combined to form a combined score.
12.in Blast 2, a local gapped alignment of the sequences is made and the significance of the score is determined
blast output
Putative conserved domains have been detected, click on the image below for detailed results
Sequences producing significant alignments:
(bits) Value
gi|16766184|ref|NP_461799.1| (NC_003197) protein tyrosine p...
gi|16761655|ref|NP_457272.1| (NC_003198) tyrosine phosphata...
gi|13096377|pdb|1G4U|S Chain S, Crystal Structure Of The Sa...
gi|16974849|pdb|1JYO|E Chain E, Structure Of The Salmonella...
gi|809148|pdb|1YPT|B Chain B, Protein-Tyrosine Phosphatase ...
gi|1943402|pdb|1YTW|
Yersinia Ptpase Complexed With Tungst...
gi|1353120|sp|P08538|YOPH_YERPS PROTEIN-TYROSINE PHOSPHATAS...
gi|10955583|ref|NP_052424.1| (NC_002120) Yop effector YopH ...
gi|14579369|gb|AAK69246.1|AF336309_41 (AF336309) Yop effect...
gi|16082755|ref|NP_395201.1| (NC_003131) putative protein-t...
gi|79206|pir||S01054 virulence protein Yop2b - Yersinia pse...
gi|1065228|pdb|1YTS|
Molecule: Yersinia Protein Tyrosine P...
gi|464498|sp|P34137|PTP1_DICDI PROTEIN-TYROSINE PHOSPHATASE...
gi|348540|pir||A44267 protein-tyrosine-phosphatase (EC 3.1....
gi|15077066|gb|AAK83052.1|AF288366_2 (AF288366) ADP-ribosyl...
gi|16082697|ref|NP_395143.1| (NC_003131) putative outer mem...
gi|10955586|ref|NP_052427.1| (NC_002120) Yop effector YopE ...
gi|141105|sp|P08008|YOPE_YERPS OUTER MEMBRANE VIRULENCE PRO...
gi|155548|gb|AAA27674.1| (M34280) virulence determinant (yo...
gi|5572701|dbj|BAA82559.1| (AB019126) sPTPR2B [Ephydatia fl...
gi|2120612|pir||JC6026 ADP-ribosyltransferase (EC 2.4.2.-) ...
1068
992
761
207
95
95
95
95
95
94
94
91
62
62
58
57
57
57
56
49
49
gi|809147|pdb|1YPT|A Chain A, Protein-Tyrosine Phosphatase (Yersinia) (E.C.3.1.3.48)
(Yop51,Pasteurella X,Ptpase,Yop51delta162) (Catalytic
Domain, Residues 163 - 468) Mutant With Cys 235 Replaced
By Arg (C235r)
Length = 305
Score = 95.1 bits (235), Expect = 1e-18
Identities = 66/212 (31%), Positives = 103/212 (48%), Gaps = 17/212 (8%)
Query: 340 GKPVALAGSYPKNTPDALEAHMKMLLEKECSCLVVLTSEDQMQAKQ--LPPYFRGSYTFG 397
G
+A YP +
LE+H +ML E
L VL S ++ ++ +P YFR S T+G
Sbjct: 89 GNTRTIACQYPLQS--QLESHFRMLAENRTPVLAVLASSSEIANQRFGMPDYFRQSGTYG 146
Query: 398 EVHTNSQKVSSASQGEAI--DQYNMQL-SCGEKRYTIPVLHVKNWPDHQPLPS--TDQLE 452
+
S+
G+ I D Y + +
G+K ++PV+HV NWPD
+ S T L
Sbjct: 147 SITVESKMTQQVGLGDGIMADMYTLTIREAGQKTISVPVVHVGNWPDQTAVSSEVTKALA 206
Query: 453 YLADRVKNSNQN-----GAPGRSSSDKHLPMIHCLGGVGRTGTMAAALVLKDNPHSNL-- 505
L D+
+ +N
G+
+
K P+IHC GVGRT + A+ + D+ +S L
Sbjct: 207 SLVDQTAETKRNMYESKGSSAVADDSKLRPVIHCRAGVGRTAQLIGAMCMNDSRNSQLSV 266
Query: 506 EQVRADFRDSRNNRMLEDASQF-VQLKAMQAQ 536
E + + R RN M++
Q V +K + Q
Sbjct: 267 EDMVSQMRVQRNGIMVQKDEQLDVLIKLAEGQ 298
0.0
0.0
0.0
2e-52
1e-18
1e-18
2e-18
2e-18
2e-18
2e-18
2e-18
3e-17
2e-08
2e-08
2e-07
3e-07
3e-07
5e-07
9e-07
1e-04
1e-04
gi|809147|pdb|1YPT|A Chain A, Protein-Tyrosine Phosphatase (Yersinia) (E.C.3.1.
(Yop51,Pasteurella X,Ptpase,Yop51delta162) (Catalytic
Domain, Residues 163 - 468) Mutant With Cys 235 Replaced
By Arg (C235r)
Length = 305
Score = 95.1 bits (235), Expect = 1e-18
Identities = 66/212 (31%), Positives = 103/212 (48%), Gaps = 17/212 (8%)
Query: 340 GKPVALAGSYPKNTPDALEAHMKMLLEKECSCLVVLTSEDQMQAKQ--LPPYFRGSYTFG 397
G
+A YP +
LE+H +ML E
L VL S ++ ++ +P YFR S T+G
Sbjct: 89 GNTRTIACQYPLQS--QLESHFRMLAENRTPVLAVLASSSEIANQRFGMPDYFRQSGTYG 146
Query: 398 EVHTNSQKVSSASQGEAI--DQYNMQL-SCGEKRYTIPVLHVKNWPDHQPLPS--TDQLE 452
+
S+
G+ I D Y + +
G+K ++PV+HV NWPD
+ S T L
Sbjct: 147 SITVESKMTQQVGLGDGIMADMYTLTIREAGQKTISVPVVHVGNWPDQTAVSSEVTKALA 206
Query: 453 YLADRVKNSNQN-----GAPGRSSSDKHLPMIHCLGGVGRTGTMAAALVLKDNPHSNL-- 505
L D+
+ +N
G+
+
K P+IHC GVGRT + A+ + D+ +S L
Sbjct: 207 SLVDQTAETKRNMYESKGSSAVADDSKLRPVIHCRAGVGRTAQLIGAMCMNDSRNSQLSV 266
Query: 506 EQVRADFRDSRNNRMLEDASQF-VQLKAMQAQ 536
E + + R RN M++
Q V +K + Q
Sbjct: 267 EDMVSQMRVQRNGIMVQKDEQLDVLIKLAEGQ 298
This procedure is conducted for every
ORF in a newly sequenced genome and
all the putative genes get sorted into
different functional groups.
Gene Role
# ofGenes % out of1586 Genes
1Amino acid biosynthesis
42
2.64%
2Biosynthesis of cofactors, prosthetic groups, and carriers 57
3.59%
3Cell envelope
102
6.43%
4Cellular processes
125
7.88%
5Central intermediary metabolism
24
1.51%
6DNA metabolism
90
5.67%
7Energy metabolism
99
6.24%
8Fatty acid and phospholipid metabolism
25
1.57%
9Hypothetical proteins - Conserved
185
11.6%
10Hypothetical Proteins
495
31.2%
11Mobile and extrachromosomal element functions
17
1.07%
12Protein fate
42
2.64%
13Protein synthesis
98
6.17%
14Purines, pyrimidines, nucleosides, and nucleotides
38
2.39%
15Regulatory functions
25
1.57%
16Transcription
10
0.63%
17Transport and binding proteins
88
5.54%
18Unknown function
24
1.51%
This can be represented diagrammatically
The two V. cholerae
chromosomes
Circular representation of the V. cholerae genome.
The two chromosomes, large and small, are depicted.
From the outside inward: the first and second circles
show predicted protein-coding regions on the plus
and minus strand, by role, according to the color
code in Fig. 1 (unknown and hypothetical proteins
are in black). The third circle shows recently
duplicated genes on the same chromosome (black)
and on different chromosomes (green). The fourth
circle shows transposon-related (black), phagerelated (blue), VCRs (pink) and pathogenesis genes
(red). The fifth circle shows regions with significant
2 values for trinucleotide composition in a 2,000-bp
window. The sixth circle shows percentage G+C in
relation to mean G+C for the chromosome. The
seventh and eighth circles are tRNAs and rRNAs,
respectively.
The ability to sequence the entire genome of an organism
has fueled a revolution in science
• genomics
provides a huge amount of data
• Vast sequence data has fueled new, large-scale, high through-put, technologies
• New technologies are revolutionizing (for better or worse) experimental strategies
• Experiments commonly designed to examine an organisms phenotype on a
genome-wide or system wide scale (holistic operation of biological systems)
• Approach will influence the way biological questions are phrased:
• “What is the function of this protein?” To “What role does the sequence play in
one or more biological processes operational under X conditions?”
•Old method: phenotype to genotype
•New method: genotype to phenotype
Properties of an ideal gene classification system:
•Group genes together that share a common ancestor
•Provide scaffolding for study of distribution of genes between organisms, distant phylogenetic lineages
* has practical application (identify genes encoding biochemical pathways unique to bacteria)
•Provide rapid functional annotation framework for new genome sequences
Ancestral gene X1
Bug A
X1
Genes X1 in A and B are orthologs
Genes encode same function
Gene duplication
X1 and X2
Bug B
X1
X1 and X2 are paralogs
Paralogs free to evolve new functions
Note: X1 in A and B are also homologs: A gene similar in structure and
evolutionary origin to a gene in another species
Types of structural information you can get from
Genomics and annotation
Helicobacter pylori 26695: Pseudo-2D Gel
%GC
(why might this be important/interesting)
Hydrophobicity
The GES scale is used to identify nonpolar transbilayer helices.
The curve is the average of a residue-specific hydrophobicity
scale over a window of 20 residues. When the line is in the upper
half of the frame (positive), it indicates a hydrophobic region and
when it is in the lower half (negative), a hydrophilic region.
In the graph below the X-axis represents the length of the protein
in amino acids (aa), while the Y-axis represents the GES score.
The blue line shows the GES pattern of the entire protein, while
the two dashed red lines represent the putative (lower line) and
certain (upper line) cutoffs for potential membrane spanning domains.
Predicted Secondary Structure
Genome Region comparison
Sequenced organisms have large
differences in the size of their genome
the number of genes encoded therein
and the constitution of the ORFs that are coded for.
Genome size is effected by environment
• Wide range in genome sizes within a single phylogenetic lineage, suggests that these genomes
are dynamic and in constant flux
•Because the vast majority of a bacterial chromosome consists of coding sequences, changes in
genome size reflect differences in gene content. The variation among bacterial genome sizes,
ranging from 0.6 to 9 Mb, reflects differences in biochemical capabilities and, hence, in the range
of environments available to particular microbial lineages.
•What is the source of genome variability?
•Insight into mechanisms of genome evolution provided by genomics
Principles and features of Horizontal Gene Transfer (HGT)
•First recognized in multi-drug resistant pathogenic bacteria
•HGT is the non-vertical transmission of genetic material
•Mechanisms of HGT
•Transduction
•Transformation
•Conjugation
•Maintenance of HT loci
•Episomal replication
•Homologous recombination
•Illegitimate recombination
•Integration (catalyzed by phage and IS element integrases/resolvases
•Features of HGT:
* HT loci have limited distribution within a single phylogenetic lineage
* HT loci encode phenotypes associated with unrelated species
* Sequence composition of HT loci is most similar to the composition
of the donor genome
- AT richness
- codon usage
V. cholerae virulence gene expression
OmpU
OmpU
Outer membrane
pH
salt
t emperat ure
ToxR/ ToxS
amino acids
bileCO2
TcpP/ TcpH
periplasm
Inner membrane
cyt oplasm
ompU
VPI
ompT
TCP Genes
toxT
ACF genes
Pathogenicity
island
ToxT
CRP
ct xAB
CTX phage
HGT is very common
Fig. 1. Distribution of horizontally transferred DNA
in the E. coli MG1655 chromosome. Within each
centisome, each bar denotes a continuous segment of
transferred DNA containing one or more ORFs; and
the length of each bar represents its size rounded to
the nearest 500 bp. Features of transferred regions,
such as duration in the chromosome and the
identification of repeated and mobile elements,
follow the notation presented in the key. The age of
each continuous segment
of DNA was inferred from the ages of genes
successfully analyzed by back-amelioration;
segments lacking genes of known age are shown in
black, and no segment comprised genes with
significantly different ages. Positions of the
replication origin (oriC) and terminus (terC), as well
as the identity of the specific tRNA loci found
to be adjacent to a horizontally transferred region, are
noted on the left of the open bar representing the
MG1655 chromosome. The nomenclature for phage
and IS elements, and for genes of known function
contained within a particular transferred segment, are
shown within the corresponding bar. The identities of
insertion sequences are noted except as follows:
adjacent IS911/(fragment)/IS3 are located within
minute 5; adjacent IS3/IS600 are located within
minute 8; and adjacent IS2/IS30 are located within
minute 31.
Figure 2 Distribution of horizontally acquired (foreign) DNA in sequenced bacterial genomes. Lengths of
bars denote the amount of protein-coding DNA. For each bar, the native DNA is blue; foreign DNA
identifiable as mobile elements, including transposons and bacteriophages, is yellow, and other foreign
DNA is red. The percentage of foreign DNA is noted to the right of each bar. 'A' denotes an Archaeal
genome.
•Bacterial genomes are mosaics of ancestral and HT genes
•Bacterial genomes are continually sampling new genes (the nature of bacterial genetic innovation)
•HT loci frequently associated with mobile genetic units
•HT loci rarely confer beneficial trait to the host
•However, long term survival of HT loci in host genome dependent on the ability to confer
a beneficial trait to the host
* i.e. a gain-of-function mutation (rare)
* Gain-of-function mutations usually requires multiple genes encoding whole systems
* HT loci comprising operons have best chance for success (V. cholerae)
•Gain-of-function mutations may permit occupation of a new niche
* HT loci contribute to speciation
Features of genome deletions
•Non-reversible
•Deletions cannot involve essential loci; target non-essential loci
* essential nature of a locus depends on selective pressures
encountered in the environment
•May maintain or disrupt genome synteny (gene order)
* disruption of synteny involves rearrangement of sequence(s)
•What is the force driving deletions? Are small genomes more fit
than larger genomes?
The process of genome shrinkage in the obligate symbiont Buchnera aphidicola.
Graphic depiction of syntenic fragments and lost regions in the genome of the reconstructed ancestor
and in Buchnera. Syntenic fragments are color-coded based on position in the ancestor. Lost regions
occurring between syntenic fragments are gray. The Reconstruction was on the basis of the
phylogenetic distribution of gene orthologs among fully sequenced relatives of Escherichia coli and
Buchnera. RESULTS: The reconstructed ancestral genome contained 2,425 open reading frames
(ORFs). The Buchnera genome, containing 564 ORFs, consists of 153 fragments of 1-34 genes that
are syntenic with reconstructed ancestral regions. On the basis of this reconstruction, 503 genes were
eliminated within syntenic fragments, and 1,403 genes were lost from the gaps between syntenic
fragments, probably in connection with genome rearrangements.
Part of a syntenic fragment from Buchnera and the ancestor (same as E. coli for
this region). Deleted loci are white in the ancestor; orthologous genes are colorcoded. Genes shifted up in the figure are oriented forward in the genome; genes
shifted down are oriented backwards.
Doubling times of bacteria under laboratory conditions do not correlate with
genome size. Data are for 22 species for which doubling times were available in
the literature, and include bacteria from ten major taxonomic divisions.
•There is no selective advantage for small genome size
•Organisms containing small genomes are not the ancestors of large
genome organisms
•Rather, organisms harboring small genomes are derived form large genome organisms
•Isolation within a narrow niche (decreased incidence of HGT)
•Narrow niches provide static selective pressures
* fewer loci required for success than in an ever-changing environment
•Comparative genomics is an invaluable irreplaceable resource for the
study of genome evolution
Changes in DNA content
Fig. 4. Processes involved in the evolution of genome size in bacteria. New sequences are acquired by DNA transfer and gene duplication, the former being the
predominant mode of DNA increase within most species. DNA loss can be produced by large deletions eliminating one or more genes in a single event, or by loss of
function followed by subsequent deletions of the resulting pseudogenes.
http://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=ghp
Questions?

Document

Transcript Document

Directory