IGR-ANNOT: A Multiagent System for InterGenic Regions

Download Report

Transcript IGR-ANNOT: A Multiagent System for InterGenic Regions

IGR-ANNOT: A Multiagent
System for
InterGenic Regions Annotation
Sandro Camargo, João Valiati,
Luis Otávio Álvares, Paulo Engel,
Sergio Ceroni
Introduction
• The exponential growth of genomic data
has led to an absolute requirement for
computerized tools to analyze this data.
• A new genome sequencing does not
answer all questions about the organism.
Progress is more likely to come from
comparing the genomes of different
organisms.
Introduction
• There are many tools and techniques to
compare complete genomes and coding
regions, but there is a lack for techniques
for compare non-coding regions of DNA,
which contains regulatory elements.
• Many of the differences between species
may be attributed to changes in the
regulation of transcription and translation.
• Transcription and translation are often
regulated via elements that lie in intergenic
regions.
InterGenic Regions
• Intergenic regions are defined as the
sequence between the translational stop
of a gene and translational start of the next
gene.
• For obtaining intergenic regions of an
organism are necessary:
– the complete genome of this organism (the
nucleotides sequence)
– the information about coding regions (start
and stop positions, orientation, and name).
InterGenic Regions
• Our decision was to work with GenBank
files because they contain all this
necessary information for identifying
coding regions, and this information will be
used to infer the necessary information
about intergenic regions.
InterGenic Regions
• The format design is based on a tabular
approach and consists of the following
items:
– Feature Key: a single word or abbreviation
indicating functional group;
– Location: instructions for finding a feature;
– Qualifiers: auxiliary information about a
feature.
InterGenic Regions
Key
Location/Qualifiers
CDS
23..400
/product=“alcohol dehydrogenase”
/gene="adhI"
An example of a feature in the feature table.
InterGenic Regions
• InterGenic Regions naming conventions:
IGR-O-G1-G2
where O = {F|R|B|X} depending on the
previous and next gene orientations,
and G1 and G2 are the names coding
regions which intergenic regions contains
regulatory information.
InterGenic Regions
• Intergenic regions will be written in the
GenBank file format using the feature
misc_feature.
• According to the GenBank file format
description, this feature key is used for
annotate regions of biological interest
which cannot be described by any other
feature key.
IGR-ANNOT Engineering Process
• The multiagent approach is particularly
attractive to this problem because:
– information content is heterogeneous.
– information can be distributed.
– much of the annotation work for each gene
can be done by different laboratories using
different methodologies for annotate
information about genes.
• We have used MASE and AgentTool to
modelling the agent.
IGR-ANNOT Engineering Process
•
•
•
•
•
User Interface Agent (UIA)
File Reader Agents (FRA)
Gene Agents (GA)
InterGenic Regions Agents (IGRA)
File Writer Agents (FWA)
IGR-ANNOT Engineering Process
IGR-ANNOT Engineering Process
• To implementing this architecture, we have
used the Perl language, and it can be run
on any suitable platform.
• Perl have many features, like string
manipulation facilities, that become it a
very interesting language to working with
DNA sequences,
• besides there are complete packages to
implementing multiagent systems.
Results Discussion
• We have extensively used IGR-ANNOT to
creating intergenic regions annotation in
several genomes of Mycoplasmataceae
family.
• To getting a graphical view of annotation
created by our tool we have used the
Artemis tool.
• The next figures are presenting the
Mycoplasma Hyopneumoniae 232
genome.
Results Discussion
Results Discussion
Results Discussion
Len1 Len2 %Idy
Mhy
Mhy232
458
458
99,34
IGR-FMP04451_oppB-1
IGR-R-oppB
345
346
99,42
IGR-FMP0611_MHP0054
IGR-F-mhp057
574
572
98,26
IGR-XMP07135_rpsOMP01224_MHP0106
IGR-X-mhp275rps15
307
316
93,99
IGR-XMP09826_MHP0309MP03567_baiH
IGR-X-mhp321baiH
Results Discussion
Len1 Len2 %Idy
Mhy
Mhy232
1156
1157
98,02
IGR-RMP03198_MHP0344
IGR-R-mhp354
1037
1033
94,49
IGR-BMP18658_MHP0508MP05045_pdhC
IGR-B-mhp502aceF
395
395
99,49
IGR-BMP07145_deoCMP12669_gyrA
IGR-B-deoC-gyrA
528
543
96,69
IGR-F-MP02519_lgt
IGR-R-lgt
Conclusions
• This system is now successfully in use by
biologists at the UFRGS.
• The result of IGR-ANNOT application
provides an easy way to comparing
intergenic regions among different
organisms.
• Although the positive results achieved until
now in genomes of Mycoplasmataceae
family, further tests will be performed,
mainly using most complexes genomes.
Future Works
• Create an environment to InterGenic
Regions comparison.
• IGR-ANNOT will be available publicly to
other biologists over the web at
www.inf.ufrgs.br/~scamargo in software
section.