Web Services for N-Glycosylation Process

Download Report

Transcript Web Services for N-Glycosylation Process

Web Services for N-Glycosylation Process
Satya S. Sahoo, Amit P. Sheth, William S. York, John A. Miller
Presentation at
International Symposium on Web Services For Computational Biology and
Bioinformatics, VBI, Blacksburg, VA, May 26-27, 2005
Integrated Technology Resource for Biomedical Glycomics
NCRR/NIH
Glycomics
 Study of structure, function and quantity of ‘complex
carbohydrate’ synthesized by an organism
 Carbohydrates added to basic protein structure Glycosylation
Folded protein structure (schematic)
2
Glycosylation – why is it important?
 Genome (comprised of DNA) or Proteome (proteins) are not the
only factors in life functions of an organism
 Carbohydrates attached to different protein structures (by
glycosylation) are important for:
 Identification of foreign entities by immune system cells
 Markers to accurately diagnose diseases
 Regulate signaling activities
 Categorization of glycosylation - the way carbohydrates are
attached to proteins. Example: N-glycosylation
3
N-Glycosylation Process (NGP)
Cell CultureBy N-glycosylation Process,
extract
we mean the identification and
Glycoprotein Fraction
quantification of
proteolysis
glycopeptides
Glycopeptides Fraction
1
n
Separation technique I
Glycopeptides Fraction
n
PNGase
Peptide Fraction
Separation technique II
n*m
Peptide Fraction
Mass spectrometry
ms data
ms/ms data
Data reduction
ms peaklist
ms/ms peaklist
binning
Glycopeptide identification
and quantification
4
N-dimensional array
Signal integration
Data reduction
Peptide identification
Peptide list
Data correlation
NGP – part of the Bioinformatics core
Integrated Technology Resource for Biomedical Glycomics
 This Resource was established by the National Center for
Research Resources
 The aim is to develop the tools and technology to analyze
glycoprotein and glycolipid expression of embryonic stem cells
 Our research provides bioinformatics support for four research
groups:
 Embryonic Stem Cell Culture Program
 Glycomic Analysis of Glycoproteins
 Glycomic Analyses of Glycosphingolipids and Sphingolipids
 Transcript analysis by kinetic RT-PCR
5
NGP – need in Glycomics
 Unlike proteomics or genomics, high-throughput experimental
protocols are still being established in Glycomics
 NGP involves a multitude of heterogeneous tasks, including
human-mediated tasks
 NGP attempts to encapsulate particular computational steps as
platform-independent, scalable and Web-accessible tools – Web
Services
 Enables glycobiologists to integrate automated data generation
tasks with data processing tools (Web Services)
end-toend experimental lifecycle
6
N-Glycosylation identification - Problems
 Extremely difficult to identify glycosylated
sequences using standard analytical methods
peptide
 N-glycosylation occurs at particular sites on the protein
structure – consensus sequences
Asparagine
Aspartate
Consensus Sequence
Peptide
N
D
J
X
S/T
PNGaseF
Glycan
7
An example glycopeptide (schematic)
NGP - implementation
 NGP,currently,implements a Web Process constituted of two
Web Services:
 DB Modifier Web Service – modifies the search database by
replacing N (in consensus sequences) by J
 Collator Web Service – identifies a probable N-glycosylated
peptide, using three parameters:
 Calculated molecular mass
 Presence of ‘J’ in a peptide sequence
 MASCOT* Score assigned to a hit
 NGP also involves propriety Mass Spectrometer search engine
service (MASCOT*) as an intermediate task
 Hence, NGP Web Process identifies probable glycosylated
peptides – enabling rapid processing of data from high
throughput experiment
8
*http://www.matrixscience.com/
NGP – Architecture (current)
PEAK
LIST FILE
ms/ms raw data
Primary
Sequence
Database
ModifyDB
Web Service
MASCOT* Mass
Spectrometer
Search Engine
Collator
Web Service
Deglycosylated
peptide list
9 *http://www.matrixscience.com/
MASCOT*
output
file
(contains
both
glycosylated and nonglycosylated
peptide
sequences)
NGP Results
q1_p1=-1
q2_p1=0,626.349945,-0.023321,2,APGVAGR,18,000000000,1.49,00020000000000000,0,0;"gi|51465537":0:190:196:1
q2_p2=1,626.361191,-0.034567,2,APARGR,18,00000000,1.33,00020000000000000,0,0;"gi|10140845":0:2:7:2
q2_p3=0,626.349945,-0.023321,2,APAVGGR,18,000000000,1.33,00020000000000000,0,0;"gi|51470766":0:212:218:1,"gi|51470768":0:212:218:1
q3_p3=0,634.368973,0.006151,4,DIIFK,12,0000000,25.26,00010020000000000,0,0;"gi|47078238":0:364:368:2,"gi|47078240":0:328:332:2
q3_p4=0,634.351227,0.023897,4,MPLFK,12,0000000,25.24,00010020000000000,0,0;"gi|41197108":0:95:99:1,"gi|4557311":0:1:5:2
q3_p5=0,634.343811,0.031313,3,NNLFK,12,0000000,15.34,00010020000000000,0,0;"gi|31377725":0:539:543:1
q3_p6=0,634.368973,0.006151,3,LDIFK,12,0000000,15.34,00010020000000000,0,0;"gi|39725634":0:891:895:1
q3_p7=0,634.343811,0.031313,3,NNIFK,12,0000000,15.34,00010020000000000,0,0;"gi|7661646":0:212:216:1
q3_p8=0,634.368973,0.006151,3,LDLFK,12,0000000,15.34,00010020000000000,0,0;"gi|51474898":0:237:241:1
q3_p9=0,634.368958,0.006166,3,EVIFK,12,0000000,13.61,00010020000000000,0,0;"gi|28376662":0:67:71:1
q3_p10=0,634.368958,0.006166,3,VELFK,12,0000000,13.61,00010020000000000,0,0;"gi|51467300":0:493:497:1,"gi|51467535":0:99:103:1
q4_p1=-1
q5_p1=0,662.375122,0.004702,5,DLLFR,14,0000000,18.41,00020020000000000,0,0;"gi|21536369":0:84:88:1,"gi|21536367":0:17:21:1,"gi|4557871":0:647:651:1
q5_p2=0,662.375122,0.004702,3,DLFLR,14,0000000,12.81,00010020000000000,0,0;"gi|33695153":0:407:411:1,"gi|4504043":0:330:334:1,"gi|11968045":0:6:10:1
q5_p3=0,662.375122,0.004702,3,DIFIR,14,0000000,12.81,00010020000000000,0,0;"gi|4505725":0:924:928:1,"gi|29788751":0:1170:1174:1
q5_p4=0,662.349960,0.029864,3,NNFIR,14,0000000,11.84,00010020000000000,0,0;"gi|24416002":0:667:671:1
q5_p5=0,662.375122,0.004702,4,IDLFR,14,0000000,9.98,00020020000000000,0,0;"gi|12957488":0:602:606:1,"gi|41148707":0:536:540:1,"gi|51464463":0:646:650:1
q5_p6=0,662.375122,0.004702,4,LDLFR,14,0000000,9.98,00020020000000000,0,0;"gi|42657517":0:335:339:1
q5_p7=0,662.375107,0.004717,4,VELFR,14,0000000,9.98,00020020000000000,0,0;"gi|6912230":0:436:440:1
q5_p8=0,662.375122,0.004702,4,LDIFR,14,0000000,9.98,00020020000000000,0,0;"gi|8922081":0:2699:2703:1
q5_p9=0,662.349960,0.029864,4,NLNFR,64,0000000,5.89,00010020000000000,0,0;"gi|19923416":0:816:820:1
q5_p10=1,662.361191,0.018633,2,NRFAR,14,0000000,3.37,00010020000000000,0,0;"gi|4758704":0:97:101:1
q6_p1=0,674.359863,-0.006639,4,VSDNIK,35,00000000,11.27,00010020000000000,0,0;"gi|32130516":0:935:940:1
q6_p2=0,674.323456,0.029768,5,EGDLGGK,21,000000000,7.97,00020020000000000,0,0;"gi|13569928":0:1058:1064:1
q6_p3=0,674.359848,-0.006624,5,EATVAGK,21,000000000,7.88,00020020000000000,0,0;"gi|51475822":0:527:533:1
q6_p4=1,674.389740,-0.036516,3,QRMLK,14,0000000,7.46,00020010000000000,0,0;"gi|24307905":0:467:471:2,"gi|24307905":0:638:642:2
q6_p5=0,674.359863,-0.006639,5,LSSSPGK,56,000000000,7.38,00000020000000000,0,0;"gi|8922075":0:806:812:1
q6_p6=0,674.338730,0.014494,4,WDLGGK,42,00000000,6.40,00010020000000000,0,0;"gi|13375817":0:123:128:1
q6_p7=0,674.359879,-0.006655,4,QATDLK,56,00000000,6.21,00020010000000000,0,0;"gi|21361684":0:451:456:1
q6_p8=1,674.371094,-0.017870,3,QTNKGK,14,00000000,6.03,00020010000000000,0,0;"gi|41117716":0:85:90:1
q6_p9=1,674.389740,-0.036516,6,QMRIK,28,0000000,5.77,00020020000000000,0,0;"gi|28329439":0:269:273:1,"gi|28558993":0:278:282:1
q6_p10=1,674.389740,-0.036516,6,QMRLK,28,0000000,5.77,00020020000000000,0,0;"gi|40255096":0:300:304:1
q7_p1=0,695.348969,0.007855,4,YDASLK,14,00000000,8.86,00020020000000000,0,0;"gi|4758454":0:2761:2766:1
 A typical MASCOT output file is about 3MB!
 High-throughput experiment protocol generate
thousands of such files - manual identification is
not feasible
10
NGP Web Services – Adding Semantics
 Two Ontologies developed as part of the NCRR-Glycomics
project:
 GlycO: a domain Ontology embodying knowledge of the
structure and metabolisms of glycans
 Contains 770 classes – describe structural features of
glycans
 URL: http://lsdis.cs.uga.edu/projects/glycomics/glyco
 ProPreO: a comprehensive process Ontology modeling
experimental proteomics
 Contains 296 classes
 Models three phases of experimental proteomics* –
Separation techniques, Analytical techniques and, Data
analysis
 URL: http://lsdis.cs.uga.edu/projects/glycomics/propreo
11 *http://pedro.man.ac.uk/uml.html (PEDRO UML schema)
ProPreO - Experimental Proteomics Process Ontology
 ProPreO models the phases of proteomics experiment using five
fundamental concepts:
 Data: (Example: a peaklist file from ms/ms raw data)
 Data_processing_applications: (Example: MASCOT* search
engine)
 Hardware: embodies instrument types used in proteomics
(Example: ABI_Voyager_DE_Pro_MALDI_TOF)
 Parameter_list: describes the different types of parameter
lists associated with experimental phases
 Task:
(Example:
chromatography)
12 *http://www.matrixscience.com/
component
separation,
used
in
Service description using WSDL-S
 Formalize description and classification of Web Services
using ProPreO concepts
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<wsdl:definitions targetNamespace="urn:ngp"
<wsdl:definitions targetNamespace="urn:ngp"
……
…..
xmlns:
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
wssem="http://www.ibm.com/xmlns/WebServices/WSSemantics"
xmlns:
<wsdl:types>
ProPreO="http://lsdis.cs.uga.edu/ontologies/ProPreO.owl" >
<schema targetNamespace="urn:ngp“
xmlns="http://www.w3.org/2001/XMLSchema">
<wsdl:types>
…..
<schema targetNamespace="urn:ngp"
</complexType>
xmlns="http://www.w3.org/2001/XMLSchema">
</schema>
……
</wsdl:types>
</complexType>
<wsdl:message name="replaceCharacterRequest">
</schema>
<wsdl:part name="in0" type="soapenc:string"/>
</wsdl:types>
<wsdl:part name="in1" type="soapenc:string"/>
<wsdl:message name="replaceCharacterRequest"
<wsdl:part name="in2" type="soapenc:string"/>
wssem:modelReference="ProPreO#peptide_sequence">
</wsdl:message>
<wsdl:part name="in0" type="soapenc:string"/>
<wsdl:message name="replaceCharacterResponse">
<wsdl:part name="in1" type="soapenc:string"/>
<wsdl:part name="replaceCharacterReturn" type="soapenc:string"/>
<wsdl:part name="in2" type="soapenc:string"/>
</wsdl:message>
</wsdl:message>
13
WSDL ModifyDB
WSDL-S
ModifyDB
data
Description of a
sequence
Web Service using:
Web
Service
Description
peptide_sequence
Language
Concepts defined in
process Ontology
ProPreO
process Ontology
Biological UDDI (BUDDI)
WS Registry for Proteomics and Glycomics
 There are no current registries that use semantic
classification of Web Services in glycoproteomics
 BUDDI classification based on proteomics and
glycomics classification – part of integrated
glycoproteomics Web Portal called Stargate
 NGP to be published in BUDDI
 Can enable other systems such as myGrid to use NGP
Web Services to build a glycomics workbench
14
Conclusions
 As part of NCRR Integrated Technology Resource for Biomedical
Glycomics, we implemented a Semantic Web Process for high
throughput glycomics in open, web-centric environment
 Large domain specific ontologies with process (ProPreO) and
domain (GlycO) knowledge concepts was used to describe and
classify Web Services – at Semantic level
 Used proposed Semantic Web Service specification (WSDL-S) to
add semantics to Web Service description
 Biological UDDI (BUDDI) – part of Stargate is being developed
as a single-window resource to discover and publish Web
Services in glycoproteomics domain
15
Resources
 NCRR (Integrated Technology Resource for Biomedical Glycomics):
http://cell.ccrc.uga.edu/world/glycomics/glycomics.php
 Bioinformatics core of Glycomics project:
http://lsdis.cs.uga.edu/projects/glycomics/
 ProPreO process Ontology:
http://lsdis.cs.uga.edu/projects/glycomics/propreo/
 GlycO domain Ontology:
http://lsdis.cs.uga.edu/projects/glycomics/glyco/
 Stargate – GlycoProteomics Web Portal:
http://128.192.9.86/stargate
 WSDL-S: joint UGA-IBM technical note
http://lsdis.cs.uga.edu/library/download/WSDL-S-V1.pdf
16
Acknowledgement
Special Thanks:
James Atwood (CCRC, UGA)
Meenakshi Nagarajan (LSDIS Lab, UGA)
Blake Hunter (LSDIS Lab, UGA)
17
Extra Slides: Stargate subsystems – a bit
of detail
 BUDDI – BioUDDI is envisioned as the ‘yellow pages’ for all
WS in life sciences
 The classification of WS uses biological taxonomy
 Open resource for the worldwide community of life
sciences research
 Format Converter – Enables conversion of two available
representation formats into a xml-based representation
 IUPAC to LINUCS to GLYDE (a xml-based representation)
 Web Service Generator – Enables existing java application
to be exposed as Web Services
 Generates required files from a java application to allow
deployment as a Web Service
 Enable the newly generated Web Service to be published on
BioUDDI
18
Extra Slides: Stargate subsystems – a bit
of detail
 Group Forum – Members of the research group use it to
foster a sense of community
 Schedule meetings, discuss issues, collaborate on papers…
 Post papers for peer reviews, publications on relevant topic
 Stargate Search – is an integrated unit of the Stargate
 Enables search for research publication within the research
group
 Enables search on the internet
 Login – Allows restrictions on accessibility of selected parts
of Stargate
19
Extra Slides: The take home
message…
Forum
Internet
Search
20
Web Service Generator
BUDDI