ASMCUE 2008- “The year genomics bombarded ASMCUE”

Download Report

Transcript ASMCUE 2008- “The year genomics bombarded ASMCUE”

The Genomics Era: A Vast
Resource for Educators
ASMCUE 2008- “The year genomics bombarded
ASMCUE”
David J. Baumler
Genome Center of Wisconsin
[email protected]
(Perna et al. Nature 2001)
#1) If you haven't already, download all materials in the ASMCUE2008 folder at:
http://asap.ahabs.wisc.edu/~baumler/
#2) Download Progressive Mauve at http://asap.ahabs.wisc.edu/mauve/download.php
Dispel a few myths
-I need a supercomputer to run genome alignments
-There are so many sequenced genomes, why do we need more?
-How do I get students excited and to relate to genomics?
-I have been teaching too long to get into genomics
-the more I use computers in teaching, the more things go wrong
ASMCUE 2007 data
My teaching philosophy:3rd dimension
of teaching
-Go beyond 2 dimensions with paper and presentations.
-For topics on genomics, you must get computers in the students hands.
•Look towards the future
-Iphones, small laptops that fit in a ziplock bag, personal communication devices
with fold out keyboards and magnifying screens
Small laptops Examples: Eee PC, Classmate, HP mini-Note, Ideapad
-one laptop per child……What about one laptop per
college student?
-look into laptop check out at your campus
2005-UW Madison 56% of students own a laptop
Introductory Biology-”Bring
your wireless-ready laptop to
class day
Photo by Dave Baumler of UW-Madison Introductory Biology class
Projection for wireless internet in college
classrooms
% of colleges with wireless classrooms
In 2004, only about a third of classrooms provided wireless Internet
... Wireless networks now cover more than half (51.2%) of
college classrooms. ...
100
90
80
70
60
50
40
30
20
10
0
2003
2005
2007
2009
2011
Year
As of 1/26/2007 by the campus
computing survey
2013
2015
Today’s session overview:
Introduction
Module #1) Annotate a gene from a phage genome
-key concepts: using ERIC database, BLAST, Interproscan, biological
annotations
Module #2) Conduct genome alignments of phage genomes
-using Mauve to conduct whole genome alignments, familiarize yourself
with Mauve
Module #3) Compare genomes from 3 outbreaks of E. coli O157:H7
difficulty
-identify genomic islands using Mauve & conservation of virulence factors
Module #4) Compare genomes from 5 strains of Yersinia pestis
-identify genomic islands, conservation of virulence factors, analyze
mutations with phenotypic consequences due to insertion and/or deletion
events and Single nucleotide polymorphisms (SNP’s), and paleomicrobiology
Conclusion
The ERIC database houses all of the available genomes of the
members of family Enterobacteriaceae, all of which are thought to
have descended from a common ancestor
Ancestor
Boxes, represent
organisms with at
least one genome
sequenced
Human Pathogens
-Calymmatobacterium
-Moellerella
-Cedecea
-Morganella
-Citrobacter
-Plesiomonas
Insect Pathogens
/Endosymbionts
-Edwardsiella
-Proteus
-Enterobacter
-Providencia
Environmental/
-Brenneria
-Arsenophonus
-Escherichia
-Rahnella
Animals/Industrial
-Dickeya
-Buchnera
-Ewingella
-Salmonella
-Alterococcus
-Erwinia
-Sodalis
-Hafnia
-Serratia
-Budvicia
-Pantoea
-Klebsiella
-Shigella
-Buttiauxella
-Pectobacterium
-Kluyvera
-Tatumella
-Obesumbacterium
-Phlomobacter
-Leclercia
-Yersinia
-Pragia
-Sacchararobacter
-Leminorella
-Yokenella
-Trabulsiella
-Samsonia
-Wigglesworthia
-Xenorhabdus
Phytopathogens/
Plant-associated
Orthologs
If at least two of these criteria are met for the pair of genes in question they
are typically assigned as orthologs.
•Percentage identity and alignment percentage are in the typical range (see
attached spreadsheet).
•Local genome context, the conserved gene is part of an operon with other
genes that are already considered orthologs.
•Larger scale conservation of genomic context, the conserved gene is in
the same general genomic context as other orthologs.
•Functional conservation, the conserved gene is predicted or known to
perform the same function as the potential ortholog in another genome.
Reciprocal Best Blast hits
BlastP
X >60% Y
BlastP Y
X
>60%
Enterobacteria cont.
Generated from 180 orthologs (Nicole T. Perna unpublished data)
ERIC-Enteropathogen Resource Integration Center
(http://www.ericbrc.org)
Genomes
Tools & Annotations
Genome Views and
Comparisons
Why Phage? Genomics timeline
1977 1982 1995 1996 1997 1998 2000
2001
2008
Teach annotation
with a phage
genome
Annotation step #1: Structural Annotation
The genetic code – (Courtesy of
http://history.nih.gov)
Example of a gene - the start codon is
green and the stop codon is red
Structural annotation consists of the identification of genomic elements (e.g. genes).
•Open Reading Frames (ORFs) also called coding sequences (CDSs) must have a start
codon and a stop codon
•location of regulatory motifs (such as promoters and ribosome binding sites)
•This step is typically automated using gene prediction software (Automation only
finds ~50-90% of the genes)
Annotation step #2
Functional annotation: consists in attaching biological
information to genomic elements.
•biochemical function
•involved regulation and interactions
•expression
•cellular location
Three examples of annotations for one gene:
•Name/synonym: a short “word” used to refer to the gene
(Ex. ureC)
•Product: a descriptive protein name (Ex. Urease gamma
subunit)
•Function : Describes what the protein does (Ex. Catalyzes
the hydrolysis of urea to form ammonia and carbon dioxide)
Tools you will use to annotate today
• #1 ERIC database: this is where you will get the
sequences and record your functional annotations.
• #2 BLASTP: this is a tool you will use to find similar
sequences in the NCBI database of all publicly available
known and predicted proteins
• #3 InterproScan: this is a tool you will use to find similar
sequences in a database of protein families (groups of
related proteins) and domains (functionally significant
subregions of proteins)
Note: For background information about Interproscan and Blast, I
recommend the book “Bioinformatics for Dummies”.
We are going to annotate a phage genome today
What type of genes should we
anticipate finding in the phage
genome?
•Structural components of a phage
•Phage replication proteins
•Machinery for integration into the host
genome
•Hypothetical proteins
You are going to annotate the
bacteriophage 933W genome. This
phage was found in the genome of E.
coli O157:H7 strain EDL933. The
phage genome contains the genes stx2A
and stx2B that encode the shiga toxin 2
protein, that contributes to disease in
humans.
Animation Courtesy of Microbelibrary.org
Welcome to the Enteropathogen Resource Integration Center.
Using your web browser,
#1) go to http://www.ericbrc.org/
#2) in the upper right portion of the screen click on login
Click on log on under ERIC user accounts.
Then type in the username and password (case sensitive)
Session #1 username: ASMCUE / password: genome
Session #2 username:ASMCUE2 / password: genomes
click the log on button. Note your class has been given access to a unique version of the
genome, in which you and your fellow classmates will be the only people annotating the
phage genome
#1
#2
Click on
Annotations
#1
Then use the pull down bar to
select bacteriophage 933W
(the last one on the list), then
click the OK button
#2
Every gene in a genome in the ERIC database has what we call a feature ID, which consists
of three capitol letters a dash and seven numbers For example ABC-1234567
Your genome will have a unique 3 letter code and each gene or coding sequence (CDS) will
have a unique seven digit number. Choose your gene from the list that corresponds to your
birthday and type in the feature ID and click Submit
On the next
page, click
on the link
for the
feature ID
#1
#2
Your webpage should look like this
On the left there is information about
your coding sequence and also some links
for tools you will be using
On the right are the
annotations, this is
where you will be
adding annotations
Lets split up the class
Left half of the classroom, use Interproscan
to add annotations, refer to slide #8 and
proceed through #14 (in the students
instructions for adding annotations.ppt file
located in the ASMCUE2008 folder at:
http://asap.ahabs.wisc.edu/~baumler/)
-If there is no good match, it is
called a hypothetical protein
-add an annotation for product
as hypothetical protein
-use Unpublished Sequence
analysis as Evidence
-type in author name, email
-submit to Database
Right half of the classroom, use BlastP at NCBI
to add annotations, refer to slide #15 and proceed
through #21 (in the students instructions for
adding annotations file.ppt file located in the
ASMCUE2008 folder at:
http://asap.ahabs.wisc.edu/~baumler/)
Once you have completed your annotations for you gene(s), you
can view the genome of the phage and see how your fellow
classmates are doing by clicking on
Show Feature Context (GaPP)
A new window will appear in a few
seconds,
The gene you are working on is highlighted in blue, and you are
visualizing the entire Bacteriophage 933W genome, scroll over each
gene (in pink) and you should see the name and the product
information provided in the boxes below the genome, also double click
any of the genes, and your web-browser will open the annotation page
in ERIC and you can view the function annotation, evidence, etc.
Learning assessment Pre and Post test
#1. Within a sequenced microbial genome,
identification of a gene predicted to encode a protein
should contain which of the following characteristics?
100
P<0.2
P<0.02
P<0.01
#2. What percentage of the protein coding genes do
you think automated computer approaches applied to
a newly sequenced microbial genome will find:
#3. What type of biological annotation cannot be
assigned to a newly sequenced gene based solely on
comparisons to known protein/gene(s)?
#4. In a newly sequenced microbial genome, every
identified gene produces a protein that is similar to a
known protein?
Percentage with Correct response
90
80
70
60
pre
50
post
40
30
20
10
0
Q1
#5. Which of these web-based resources are useful to
Q2
Q3
Q4
Q5
Question
find biological information about a gene sequence?
Student Testimonials
“I really enjoyed learning more about bacterial genetics and the tools that are available online for genomic research and gene
identification. This is an area of bacteriology that I have little experience in and I think that having experience using these websites will
prove valuable as my research continues.” –UW-Madison student in Bacteriology 650
“The concepts of using BLAST and Interproscan are pretty neat, and it is great that anyone can access this information, not just the insider
scientists that put it together. Thank you for teaching our class how to use these tools! I doubt I would have ever learned this stuff on my
own had you not taught us.” – UW-Madison student in Bacteriology 650
Module #2
Conduct genome alignments of phage genomes
-this module is developed to teach how to use Mauve using enterobacteria phage
-Phage genomes can be aligned using Mauve in a matter of minutes.
-applicable as a teaching tool to decipher the mosaicism of phage genomes.
-comparative studies of 30 mycobacteriophage genomes reveal new insights into the
diverse architecture and insight about gene exchange (Hatfull et al. PLoS genetics et al.
2006)
You could align EVERY mycobacteriophage genome using Mauve!!!
-How diverse are enterobacteriophage?
(the following series of slides are Mauve alignments of phage isolated from E. coli,
Salmonella spp., Yersinia spp., and Shigella spp.) all alignments are also provided for
further inquiry
-Since we just annotated a stx2-containing phage from E. coli O157:H7, we will run
alignments with 3 phage genomes
Mauve: Multiple Genome Aligner
• Able to identify and align collinear
regions of multiple genomes even in the
presence of rearrangements
• Find and extend seed matches
• Group into locally collinear blocks
• Align intervening regions
(Darling et al. Genome Res. 2004
Jul;14(7):1394-403.)
Module #2 Understanding phage, the viruses that infect
microorganisms, via genome alignments
I recently aligned 56 enterobacterial phage, phage genomes are an
ideal training tools for teaching how to set up mauve alignments, in
the ASMCUE2008 folder, in module #2 you are provided with ~50
enterobacteriaphage genome files to conduct alignments
Step #1 copy the folder called 3 phage genomes for ASMCUE
workshop, and paste it on the harddrive of your computer (C: drive)
Step #2 from the start menu, in programs select Mauve 2.1.1
Step #3 under the File pull down select Align with progressive Mauve
#4 click here to choose where to send
the output file, find the folder (from
Step#1), and double click on the folder
This
new
window
will
appear
#5 Type in a file
name, and click on
Save
Next add the sequences to align
Click on Add sequence
Select the first phage genome
and click on Open, then
continue with the 2nd and 3rd
phage genomes. Then click on
Align to start the genome
alignment
When viewing the LCB’s, mauve
displays regions that are highly
conserved/identical as full color.
Areas that are
unique/variable to one
genome appear in white,
and represent unique
islands
Your tool bar is at the top on the left, the tools you will use
are in the View pulldown, and also the buttons
Returns the
viewer back to
home
Move left or right,
you will find this
useful to center a
region of interest
in the middle of
the screen prior to
zooming in
Zoom in/out, you
can also hold
down the ctrl
button and use the
arrows on the
keyboard
Search
for
features
Other useful commands in Mauve
Function
Key
Zoom in
Ctrl+Up
Zoom out
Ctrl+Down
Scroll Left
Ctrl+Left
Scroll Right
Ctrl+Right
Export the current view as
Ctrl+E
An image
Module #3) Dissecting virulence of E. coli
O157:H7 using genome alignments
The first E. coli genome sequenced was the nonpathogenic E. coli K-12 genome MG1655
-determination of the complete E. coli
sequence required almost 6 years
-E. coli is the preferred model in
biochemical genetics, molecular
biology, and biotechnology and its
genomic characterization will
undoubtedly further research toward a
more complete understanding of this
important experimental, medical, and
industrial organism
(Blattner et al. Science 1997)
The first pathogenic E. coli genome sequence was
enterohaemorrhagic (EHEC) Escherichia coli O157:H7
strain 933 EDL
-In 1982 Escherichia coli
O157:H7 recognized as a
pathogen for human disease
-Also known as EDL933 from
the Michigan outbreak in 1982
from ground beef
-shiga toxin producing (STEC)
(Perna et al. Nature 2001)
The completion of the 2nd E. coli O157:H7 (EHEC)
sequence strain Sakai
-In July 1996, an outbreak of Escherichia
coli O157:H7 infection occurred among
schoolchildren in Sakai City, Osaka,
Japan.
-8,938 schoolchildren sickened, 3 deaths
- We are starting to ask-What genomic
differences determine differences in
virulence, epidemiology, and fatality?
(Hayashi et al. DNA Res. 2001)
In 2006 E. coli O157:H7
outbreak from bagged
spinach
(from CDC)
-multistate outbreak
205 people sickened, 3
deaths
-Produce associated
outbreak strains caused
higher incidence of
hemolytic-uremic
syndrome (HUS)
(Manning et al. PNAS 2008)
-genome alignments can
be used to find variations
Currently there are 13 E. coli O157:H7 Genomes sequenced,
we will have you focus on three that are all in the
Enteropathogen Resource Integration Center (ERIC)
database (www.ericbrc.org)
The three strains you will focus on are:
Escherichia coli EDL933 (EHEC) -1982 ground beef outbreak
Escherichia coli Sakai (EHEC) (also called RIMD) -1996 radish sprout
outbreak
Escherichia coli EC4042 (EHEC) –2006 Fresh bagged spinach outbreak
In your start menu under programs go to Mauve 2.1.1, start up
Mauve, notice there is a users guide in pdf form in this folder, this
will contain useful information and commands to navigate
Note: your computer may need to update Java, since mauve uses a
Java platform for the alignment.
You should see a
window for
Mauve appear
Next double click on the 3 O157H7 folder in the ASMCUE2008
folder, it should contain the following 19 files, take the first one (3
O157 alignment), and drag and drop it into the mauve window
It should start to say reading sequences here, and in a few seconds
the alignment will appear, note computers with less than 512MB
RAM may not be able to open the file
Your alignment should look like this
Organism
name notice
the first is
EDL933, the
second is
RIMD(Sakai),
and the third
is EC4042
(spinach)
Using the up or down arrows, you can switch
the position of the genomes
Top strand
Bottom strand
The colored blocks are called local colinear blocks (LCB’s), and
represent regions of the genome that Mauve has identified as
conserved, the lines connect the LCB’s, notice that some are in
different positions in the other genomes, some are inverted and
appear on the bottom strand of the double stranded genome
Notice, that when you scroll (slowly) over a white region (island)
the black boxes pause in the other genomes, then comes back once
you have passed over the island and back into conserved regions
When you move your mouse over a region of one genome
it will show a black box and also show the corresponding
region (boxes) in the other two genomes, try scrolling left
to right on one genome
If you would like to look at all three LCB’s, even
though one is in a different position, scroll over one
LCB and click the mouse button
Lets use the zoom function, press the home button
to restore the alignment to original view
Now click on the white island in the top genome,
and using the right button bring it to the center of
the screen, now start to zoom in multiple times
You will start to see the genes, scroll
over one and pause, and a window will
pop-up with the product annotation, so
here you can view what genes are
present in this EDL933 island, and not
in the other two
Now place you mouse over one of the genes, in my example I
have iha (irgA homolog adhesion)
Click your mouse once
on the gene, and a
window will pop-up,
scroll down and select
View CDS iha in
ERICdb
This will open the page in the ERIC database for that gene, containing
all of the annotations, you can look to see if it is involved in virulence
Lets use the search feature
#1) Click on the search feature
#2) Choose
a genome
(EDL933)
#4) Click on search
#3) Type in a gene
name (stx2A)
Notice that it has found the stx2A gene (highlighted in blue), and also
in the RIMD strain. Just because it isn't aligned in the EC4042 strain
does not mean it isn't there, if you look to the right in the EC4042
genome, you will find it
Stx2A
One last feature you can use in Mauve
To find an island that is in 2 out of 3
strains you will use the backbone view
Press the home
button first
Then go to the View pull down select
color scheme then backbone color
Your alignment should look like this in backbone color, regions in
all three appear in light purple color, there will be regions that are
different colors that will correspond to 2 out of 3 genomes (you may
have to zoom in a bit to see these regions
Regions in only EDL933 and RIMD appear olive green
Regions in only EDL933 and EC4042 appear maroon
Regions in only RIMD and EC4042 appear tan/brown
This is how you
identify islands unique
to 2/3 strains
Learning assessment results Module #3
Individual projects: How did they do?
-scores range from 16-20/20
avg. 18.5
#1) (5 pts) Run a blast analysis with your virulence gene against the other two strains and provide the results of the % identity in a list or
table. Is the gene (and the corresponding protein) conserved in all 3 genomes, are they all the same length? Are there more than one
copy? Are they present in the Mauve genome alignment in the 3 genomes, provide the coordinate positions or create an image to
include?
#2) (5 pts) How is this gene involved in virulence? Briefly summarize the supporting evidence by clicking the link from the ERIC
database subsystem virulence or putative virulence factor and reading the evidence.
#3 (5 pts) Is this gene or a homolog found in other Enterobacteria? (hint run a blast in ERIC against all other organisms) Is this gene or
a homolog found in other microorganisms? (hint run a blast search at NCBI against all bacteria and archaea. Briefly provide the five
best blast “hits” with % identity).
#4 (5 pts)
Using mauve, identify a unique island in one strain and briefly summarize the predicted products that it contains (provide coordinates or
an image). Identify a region that is unique to two strains and briefly summarize the predicted products (provide coordinates or an
image). Overall based on your analysis of your two identified regions, do you think that they play a role in in virulence and evolution of
E. coli O157:H7 genomes.? How important do you think phage are in variation of the genomes?
Student testimonials:
“I think it was an approach that is valuable because we learn about some of these virulence factors they find in a
strain or how two strains are similar to each other, but we don’t see how that information is found. It gives a better
understanding of what you can learn by looking at genomes and the comparison of different genomes. I think it
would be easier to follow and do the assignment if you provide some details in your slides or a handout on exactly
what you click on to do what is needed.”
“I know you wanted constructive criticism, but I don't actually have any. Once things were explained, i had a really
easy time doing things. I actually don't like genetics that much, I usually find it kind of boring, but it was kind of fun
working with Mauve. It's cool being able to do all of that stuff!”
Using genomics to track the dissemination of
Yersinia pestis strains
Courtesy of www.cdc.gov
Deng et al. 2002 J. Bacteriol. 184:16 4601-4611
Transmission cycle of Plague
Historic 3 pandemics of plague
-pandemic: is defined as an epidemic that spreads
throughout the human population across a large region
such as a continent or worldwide
-1st pandemic ~550 A.D. confined to mainly Africa and
some parts of the middle east
-2nd pandemic originated in Central Asia and spread via
trading routes into Europe (Killed ~30% of Europe
population)
Courtesy of edsitement.neh.gov
-3rd pandemic started in 1850’s in China’s Yunnan
providence confined mainly to Asia
Older methods for comparison of two genomes of Yersinia pestis
CO92 & KIM were not interactive
Parkhill et al. 2001 Nature 413, 523-527
Deng et al. 2002 J. Bacteriol. 184:16 4601-4611
FIG. 2. Comparison of KIM and CO92 at the DNA level. The outer circles show the CO92 C-G skew.
The second circle shows CO92 IS elements: IS100 (red), IS1541A (blue), IS285 (green), and IS1661
(yellow); short ticks represent partial IS elements. The third circle shows CO92 rRNA operons. The
fourth circle shows the CO92 genome in 27 blocks (numbered according to KIM genome order),
regions that are conserved by both locations and orientations (red), a single intrareplichore inversion
region (yellow), multiple-inversion regions (various blues), and genome-specific sequences (green).
The inner four circles show KIM rRNA operons, the KIM genome in blocks, KIM IS elements, and
KIM C-G skew. Colors are coded as for CO92. (Deng et al. 2002)
As of 05/2008 there are 7 complete and 14 Y.
pestis draft genomes
Traditionally the strains are classified as serovars (Antiqua, Mediaevalis,
Orientalis, and other) based on the following phenotypic characteristics:
-Antiqua = East Africa: (glycerol positive, arabinose positive, and nitrate
positive)
-Mediaevalis = Central Asia: (glycerol positive, arabinose positive, and nitrate
negative)
-Orientalis Central Asia (glycerol negative, arabinose positive, and nitrate
positive)
-other (ie Microtus, Pestoides) not consistent for these phenotypes
Paleomicrobiology
Partial view of the grave in Dreux investigated in this work, which
illustrates anthropologic features of a mass grave suitable for
paleomicrobiology research. (courtesy of www.cdc.gov)
-the prefix paleo comes from the Greek work palaios
meaning “ancient”
-bacterial colonization of dental pulp can occur during
bacteremia
-Bacteremia (also known as plague septicaemia with Y.
pestis) is the presence of bacteria in the blood
Courtesy of www.nidcr.nih.gov
Extraction of bacterial DNA from Dental pulp
-Some historians believed
that a flu-like virus and not Y.
pestis was responsible for the
1st and 2nd pandemics
-DNA detected in dental pulp
confirm that Y. pestis was the
cause
-Which serovar(s) are most
similar to the Y. pestis
strain(s) from the dental pulp
from the corpses?
Figure 1 The original protocol developed in our study allows recovering the dental pulp and minimizes the risk of
laboratory-acquired contamination of the specimen. The tooth was encasted into sterile resin (1a) ; the apex was sterily
sectioned (1b) to give access to the canal system (1c) ; solutions were injected (1d) ; after incubation, the tooth was put upside
down into sterile tube (1e) and centrifuged (1f).
Tran-Hung et al. PLoS ONE v.2(10); 2007
Use of genomic tools to study Y. pestis
Concepts in this module that you will address:
#1) mutations that affect the production of a full functional gene product that has
phenotypic consequences (insertions, deletions, single nucleotide polymorphisms
[SNP’s]) to study the genes glpD, napA, and araC
#2) Paleomicrobiology investigation, determine which serovar(s) have the most similar
matching genes compared to the amplified sequence from the dental pulp of 3 corpses.
#3) use of genome alignments; determine an island that is unique to the 4 genomes that
infect humans and is absent in Y. pestis strain 91001
#4) determine the conservation of a virulence factor in the 5 strains in the genome
alignment. Determine if it is a full functional product in strain 91001.
Next double click on the uncompressed Yersinia pestis alignment 5 genome folder,
it should contain the following 29 files, take the one
(yersinia_pestis_alignment_5genomes), and drag and drop it into the mauve
window
It should start to say reading sequences here, and in a few seconds the alignment will
appear, note computers with less than 512MB RAM may not be able to open the file
Your alignment should look like this
Organism name
notice the first is
CO92, the second
is KIM,the third is
91001, the fourth is
Antiqua, and the
fifth is Nepal516
Using the up or down arrows, you can switch the position of the genomes
You may find it easier to view the 5 genome alignment
without the connecting lines:
on your keyboard press Shift L
(pressing this again makes them reappear)
Now place your mouse over one of the genes,
Click your mouse once
on a gene, and a
window will pop-up,
scroll down and select
View CDS in ERICdb
This will open the page in the ERIC database for that gene, containing all of the
annotations, you can look to see what is known about it and/or if it is involved in
virulence (note you may be prompted to a log-in screen, click on the button that says
“Enter ASAP”)
Lets use the search feature to find the genes glpD, napA, and araC
#1) Click on the search feature
#2) Choose
a genome or
search all of
the genomes
#4) Click on search
#3) Type in a gene
name (glpD)
Notice that it has found the glpD gene (highlighted in blue), and also
a corresponding gene in each genome. You need to determine which
of the five CDS’s produce the full-length functional protein
Method #1: click on each gene
and go to the view CDS in
ERICdb, look at the length and
if any are labeled as
pseudogenes. If so look for a
note that describes why it is
thought to be a pseudogene
Identifying mutations in glpD, napA, and araC cont.
Method #2: from the feature
page in ERIC
Scroll down to the feature
context part of the page
This is a list of all features that are
neighboring your gene in the genome,
notice some are upstream, downstream,
or contained within
Notice that contained within your glpD
gene there are polymorphic sites
(otherwise known as SNP’s)
For SNP analysis, you will use a
new tool called “Snippy”
In a new tab or web browser window go to
http://asap.ahabs.wisc.edu/~cabot/aep/snippy.php
It should look like this:
Highlight and copy all feature ID’s for polymorphic sites from
glpD and paste them into here and click submit
feature ID’s
In the middle of each region you will see the polymorphic site (in this
case capitol G’s) and the corresponding base in each genome, note you
are interested in variations in YPKIM, YPCO92, YP91001, YPNepal,
and YpAntiqua.
-in this case there is no difference in these 5 genomes in this analysis,
scroll down and search the remaining polymorphic sites and see if
there is any difference in the various polymorphic sites in the 5
genomes, if not it probably is a larger deletion or insertion event
In your SNP analysis, you want to look for SNP’s that cause a
change in the amino acid that it encodes for. In some cases the
change results in a premature stop-codon, which may generate a
truncated non-functional protein
#1) note Snippy shows you if the SNP variation results in a
amino acid change, in this case A (Alanine) to T (Threonine)
#2) In this second SNP, the change resulted in a stop codon
Using the DNA sequence obtained from the dental pulp from three
corpses (found in the file called Ypestis corpse and CA88-4125YPE
genes.doc), conduct a BlastN search within the ERIC database with each
sequence against the 91001,Nepal, Kim, Antiqua, and CO92 genomes.
For each of the three corpses, which serovar is most similar to the strains
that caused the 1st and 2nd pandemics?
From the ERIC
home page you
can select to
run a Blast
search here
(http://www.ericbrc.org/)
Paste the first
nucleotide
sequence from
corpse #1
Select entire
genomes
Select the genomes to query,
hold down the Ctrl key and
select Y . pestis genomes
91001, Antiqua, CO92, KIM,
and Nepal
Finally click on the Submit Query
button, repeat with the other two
corpses sequences
Next repeat the BlastN process using the gene sequences from a known North
American ancestor (Y. pestis CA88-4125/YPE) for glpD, napA, and araC. Of the 5
genomes (91001, Antiqua, CO92, KIM, and Nepal) representing the three serovars,
which is most similar to the known North American ancestor?
Based on your analysis did Y. pestis arrive in North America via shipping routes
over the Atlantic or Pacific?
Atlantic?
Pacific?
(Serovar
Antiqua of
African origin)
Serovar
Orientalis or
Mediaevalis
of Asian
origin
Courtesy of education.usgs.gov
Your alignment should look like this in backbone color, regions in
all five appear in light purple color, there will be regions that are
different colors that will correspond to 2, 3, 4 out of 5 genomes (you
may have to zoom in a bit to see these regions)
Look for a region in the lightest blue color that is present in CO92, KIM, Antiqua, and
Nepal, but absent in the 91001 strain. Analyze the contents and determine if any of the
genes may contribute to human infection of Y. pestis.
Thanks for your time
Collaborators:
Dr. Kai F. (Billy) Hung (UW-Madison/assistant Prof. At Eastern Illinois University Fall 2008)
Dr. Amy C. Wong (UW-Madison)
Dr. Lois Banta (Williams College)
Mentors:
Dr. Nicole Perna (UW-Madison)
Dr. Charles Kaspar (UW-Madison)
Dr. Jeffrey Byrd (St. Mary’s College)
Dr. Bob Kadner and the ASM Summer Institute
Thank you: everyone on the ERIC database team (especially Guy Plunkett III for setting up
module #1 & Eric Cabot for making Snippy) and all of the members of the Perna Genome
Evolution Laboratory
Funding: This project has been funded with Federal funds from the National Institute of
Allergy and Infectious Diseases, National Institutes of Health, Department of Health and
Human services, under contract No. HHSN266200400040C