Transcript source file

Overview
What is Annotation?
Annotation is the process of determining the location
and function of all identifiable genes in a genome.
Annotation is an important part of bioinformatics
• whole-genome shotgun sequencing provides the raw material
• annotation provides an interpretation of the sequencing
results
Figure 1 from Stothard & Wishart (2006)
Automated bacterial genome analysis and
annotation. Current Opinion in
Microbiology 9: 505-510.
1.
Find start and stop codons
– separated by 800-900 bp?
2.
Find Shine-Dalgarno sequence (RBS)
– upstream of start codon?
3.
Find core promoter
– consensus sequences for -10 & -35?
4.
Find rho-independent terminator
5.
Predict whether the gene could be
organized into an operon
– compare chromosomal neighborhood
1.
Verify predicted function based
on amino acid sequence homology
2
Predict protein structure and
localization
What will we be doing?
Verifying ORF calls
Verifying function based on
sequence conservation
Insert Figure 8-40 from
Microbiology – An Evolving Science
© 2009 W.W. Norton & Company, Inc.
Verifying function based on
localization data
Verifying function based on structural conservation
(insert image of E. coli lac permease)
Why manually annotate?
• Automated annotations tend to over-predict….produce
many false-positives
• Automated annotations also miss things….
• Accuracy of any annotation is only as good as the quality of
annotated genes in reference databases
• High sequencing error rates. . .
A curated, finished genome has gene calls
verified & proteins organized into pathways
Possible solutions?
Reference paper:
Genome re-annotation: a wiki solution?
by Steven Saltzberg
Genome Biology (2007), 8:102
Undergraduates provide “human expertise”
GOAL: Demonstrate that student annotations
can be accurate, up-to-date, reliable, and useful to scientific community!
What is imgACT?
http://img-act.jgi-psf.org/user/login
- Web portal to access genome database, img/edu
- Contains wiki-based Lab Notebook & Report Page
for organizing annotation data
What is img/edu?
http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi
- Simplified database for
undergraduate genome
annotation
- Features and functions
similar to that found in IMG
- Directly linked to imgACT
Click!
IMG companion
system
What is IMG?
http://img.jgi.doe.gov/cgi-bin/pub/main.cgi
INTEGRATED MICROBIAL GENOMES (IMG)
- Database managed by the U.S.
Department of Energy (DOE)
Joint Genome Institute (JGI)
- JGI currently producing ~ 22% of
the reported number of bacterial
genome projects worldwide
- Key mission of IMG is to provide a data
management platform that supports
comprehensive analysis and annotation
of all publicly available genomes in a
comparative genomics context
What are we annotating?
(insert information about organism including location/map of collection site,
image and description of organism, etc.)
Why annotate a GEBA organism?
Phylogenetic tree
of Bacteria showing
established &
candidate phyla
 Note that genome sequences from
members of those phyla in yellow
and orange are under-represented
relative to those in red
 GEBA (Genomic Encyclopedia of
Bacteria and Archaea) goal is to
sequence genomes from underrepresented phyla
Insert Figure 1 from Handelsman (2004)
Microbiol. Mol. Biol. Rev. 68: 669-685.
What is our goal?
Annotate genes in pathways & complexes
Insert Figure 2 from Scott KM et al. (2006) The Genome
of Deep-Sea Vent Chemolithoautotroph Thiomicrospira
crunogena XCL-2. PLoS Biology, 4: 2196
Student Goals: Conceptual
• Apply basic concepts in biochemistry, microbial
physiology & ecology, and evolutionary biology
• Question basic assumptions about biochemistry,
physiology and evolution
• Understand the power and limitations of
bioinformatics
Student Goals: Technical
• Proficiently use multiple database analysis software packages
• Strengthen web-based library search skills (Pubmed)
• Develop skills creating hypotheses and designing experiments
to test them
• Sharpen skills in analysis, synthesis and presentation of results
and data interpretation
• Experience the collaborative nature of science
Annotation Project
• Each team will annotate genes encoding enzymes in a
metabolic pathway or components of a cellular complex
in [insert organism name]
• Your T.A. or instructor will tell you specific assignments
• Consult KEGG map and use orthologous gene in other related
organisms to query the genome of [insert organism name] in
IMG/EDU database
• For best “hit”, complete the corresponding modules
of imgACT lab notebook and lab report for that gene
• Complete the module(s) presented each week.
The imgACT online notebook & report for Modules #1 – 8
must be finished for all genes assigned (3 per student).
Annotation Project
Assignments
• Online notebook checks end of weeks:
• Final Report due dates:
How do we get started?
http://img-act.jgi-psf.org/user/login
Click “Create an account”
Register for an img-act account
Email address
First Name
Last Name
xxxxxxxxxxxxxxxxxxxxxxxxxx
No
abbreviations
or nicknames
Pick something you
can remember
Specific for our class
Click “Register” once information entered
Once registration complete, log in to imgACT
What you should see. . .
Winter 2010
If you can’t get this far, tell your instructor immediately!
Next, take pre-annotation survey
Cookies must be enabled for survey to work properly.
What next? Practice!
Explore the imgACT web portal
• All students will be assigned at least one gene, which should be used to
navigate through the imgACT online lab notebook (Modules #1 – 8) and
the lab report
• Note that students are not responsible for annotating this gene. It
may be used to help students get used to navigating the web portal.
“Practice gene”
click
imgACT Lab Notebook
click
The first time you log in to Lab Notebook, you will also need to log in to the wiki.
Use the same username & password as created for imgACT account.
imgACT Lab Notebook
Only responsible for Modules #1 – 8 in this class
imgACT Lab Report
To be completed
at end of the quarter
Correspond to
modules in
Lab Notebook