Transcript Slide 1
Grid-Enablement of Protein Information Resource (gridPIR)
Georgetown University Medical Center (Developer) 1
University of Pennsylvania (Adopter) 2
1 Baris
E. Suzek, Hongzhan Huang, Scott Chung, Hsing-Kuo Hua, Peter McGarvey, Cathy H. Wu
Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, 20057
2 Craig Street, Casey Overby, David Fenstermacher
Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA 19104
Abstract
The Protein Information Resource (PIR) is an integrated bioinformatics resource that provides
protein databases and analysis tools to support genomic and proteomic research.
A participant in the Integrative Cancer Research (ICR) workspace of caBIG™, PIR developed
gridPIR in collaboration with the adopter from the Biomedical Informatics Facility (BMIF) for the
Abramson Cancer Center at the University of Pennsylvania.
gridPIR is one of four reference projects for demonstrating how a caBIG Silver compliant data
source can be discovered and consumed on caGrid. gridPIR was developed using a Model Driven
Architecture (MDA) approach and employs an n-tier architecture. A data layer, supported by Oracle
9i, stores the UniProt Knowledgebase (UniProtKB). The object layer contains 48 domain objects
with 51 attributes registered to caDSR, as required to support semantic interoperability. A data
access layer utilizing Hibernate provides the mapping between objects and the relational
database. The gridPIR API exposed to caGrid was generated using caCORE SDK 1.0.3.1.
Upon successful completion on August 1st, 2005, gridPIR became one of the first caBIG Silver
compliant data services on caGrid 0.5. In compliance with caBIG compatibility guidelines, gridPIR
will continue to improve its services to better serve as the central proteomic information resource
for cancer research within caBIG.
Protein Science Team (13)
Developer – PIR @ GUMC
Executive Team Members
Dr. Winona Barker, Director Emeritus, Adjunct Professor
Dr. Darren Natale, Protein Science Team Lead, Research Assistant Professor
Dr. Zhangzhi. Hu, Associate Protein Science Team Lead, Research Associate Professor
Dr. Lai-Su L. Yeh, Senior Protein Scientist, Research Assistant Professor
PIR Director
Dr. Cathy Wu
Professor
Bioinformatics Team (11)
Executive Team Members
Dr. Peter McGarvey, Project Manager, Research Associate Professor
Dr. Hongzhan Huang, Bioinformatics Team Lead, Research Assistant Professor
Baris Suzek, Associate Bioinformatics Team Lead, Senior Research Associate
Staff Members
Dr. Leslie Arminski, Systems Manager, Research Assistant Professor
Dr. Hsing-Kuo Hua, Bioinformatics Software Engineer, Research Assistant Professor
Dr. Robel Kashay, Bioinformatics Scientist, Research Instructor
Dr. Xin Yuan, Bioinformatics Scientist, Research Instructor
Yongxing Chen, Bioinformatics Research Associate
Jing Zhang, Bioinformatics Programmer
Jess Catana, System Administrator
Alireza Amoozmand, MS Student
UniProt Team
Adopter – BMIF @ UPenn
Biomedical Informatics Facility
Abramson Cancer Center, Department of Pathology and Laboratory Medicine
University of Pennsylvania
Biomedical Informatics Facility
Biomedical Informatics Facility
Vishal Nayak
The Biomedical Informatics Facility is a multi-faceted shared resource of the
Abramson Cancer Center and the Department of Pathology and Laboratory Medicine
combining the fields of bioinformatics, computational biology, clinical informatics and
biostatistics. Biomedical informatics tools enable biomedical investigators to utilize
vast amounts of research and clinical data. This is achieved by creating unified data
models, standardizing data interfaces, developing structured vocabularies, generating
new data visualization methods and capturing detailed metadata for the investigator's
research project. Data exchange, integration and analysis are the underlying themes
for creating effective computational resources that support and extend investigative
biomedical research. It is these same principles that are applied in supporting highthroughput cores so that they use the same standardized infrastructure for managing,
analyzing and integrating large volumes of data generated by genomics and
proteomics-related Shared Resources on behalf of multiple investigators. Thus,
biomedical informatics creates the means for using and analyzing large quantities of
biologic, clinical and environmental information by an individual investigator using
appropriately designed and standardized tools that can later be linked with the data
resulting from other projects.
UML Model
Staff Members
Dr. Anastasia Nikolskaya, Senior Protein Scientist, Research Assistant Professor
Dr. Raja Mazumder, Scientific Coordinator, Research Assistant Professor
Dr. C.R. Vinayaka, Senior Protein Scientist, Research Assistant Professor
Dr. Sona Vasudevan, Senior Protein Scientist, Research Assistant Professor
Dr. Cecilia Arighi, Senior Protein Scientist, Research Assistant Professor
Christina Fang, Research Assistant
Natalia Petrova, PhD Student
Paul Ramos, MS Student
Ti-Cheng Chang, MS Student
Craig Street
Project
Manager
Kevin Lux
The PIR Team
Casey Overby
EVS Annotations
CDE Browser
•Current model contains 48
objects , 51 attributes
including Protein/Gene,
Taxonomy and Protein
Feature related objects
•In second year, the model
will be extended with
Protein Annotation and
Family-related objects from
PIRSF
UniProt
Querying using caGrid Browser
http://cagrid-browser.nci.nih.gov/
Structure
Family
Protein Sequence
PDB
SCOP
CATH
PDBSum
MMDB
PIRSF
InterPro
Pfam
Prosite
COG
UniProt
UniRef
UniParc
RefSeq
GenPept
…
…
…
Function/Pathway
Search proteins for a gene
iProClass
Integrated Protein
Knowledgebase
…
Protein Expression
Interaction
Ontology
…
Project Page
Taxonomy
DIP
BIND
…
NCBI Taxon
NEWT
http://gforge.nci.nih.gov/projects/gridpir
Search organisms
containing a protein
…
…
GO
RESID
PhosphoBase
public void Demo2_TestProtein2Organism() {
ProteinName source = new ProteinNameImpl();
final String id = Transferrin receptor protein 1";
source.setValue(id);
try {
String path = "edu.georgetown.pir.domain.Organism," +
"edu.georgetown.pir.domain.Protein,";
List resultList = appService.search(path, source);
for( Iterator it = resultList.iterator(); it.hasNext();){
Organism organism = (OrganismImpl)it.next();
log.info(organism.getCommonName()+
"\t\t"+organism.getScientificName());
}…
GEO
GXD
ArrayExpress
CleanEx
SOURCE
OMIM
HapMap
…
Querying using Java API
…
Disease/Variation
Swiss-2DPAGE
PMG
Modification
GenBank/EMBL/DDBJ
LocusLink
UniGene
MGI
TIGR
Gene Expression
EC-IUBMB
KEGG
BioCarta
EcoCyc
WIT
caCORE SDK
Process Flow
Gene/Genome
Literature
PubMed