Some Thoughts on Scholarly Communication and the Role of Bio-ontologies Philip E. Bourne University of California San Diego [email protected].

Download Report

Transcript Some Thoughts on Scholarly Communication and the Role of Bio-ontologies Philip E. Bourne University of California San Diego [email protected].

Some Thoughts on Scholarly
Communication and the Role of
Bio-ontologies
Philip E. Bourne
University of California San Diego
[email protected]
Disclaimer – I am not an
expert in ontologies
Some would argue quite
the opposite!
I would claim to have an interest
in scholarly communication and
am beginning to see the role that
bio-ontologies have to play in
what I believe will be a very
different type of scientific
discourse
Let me cast that role into a vision
that we can state and then
dissect to see what role bioontologies have to play
The Vision…
Prior to leaving home a UCSD graduate student syncs her
IPOL with the latest papers delivered overnight by the
journal via RSS feed. On the bus she reviews the
stream, selecting a paper close to her interest in HIV-1
proteases. The data shows apparent anomalies with her
own work. Being on-line she notices that a colleague has
also discovered the same paper and they IM annotating
the results. By the time the bus stops she has
recomputed the results, proven the anomaly and made a
rebuttal in the form of a pubcast to the Editor and sent it
to the journal.
Science Fiction – Yes or No?
I would argue that the only part of
this vision that is science fiction is
finding a bus in San Diego
Science Fiction?
• Five years ago Yes… Today No…
• Five years ago the idea of downloading data
on a bus would have been absurd – not today
• Five years ago an IPOL would be absurd not today (consider the smart phone)
• Journals are providing RSS feeds today
• IM is prevalent but not for scientific discourse
• Video and podcasting are prevalent but not
for scientific discourse
• Full text and data are on-line but not
integrated
Science Fiction?
• Five years ago Yes… Today No…
• Five years ago the idea of downloading data
on a bus would have been absurd – not today
• Five years ago an IPOL would be absurd not today (consider the smart phone)
• Journals are providing RSS feeds today
• IM is prevalent but not for scientific discourse
• Video and podcasting are prevalent but not
for scientific discourse
• Full text and data are on-line but not
integrated
Role for Bio-ontologies
Science Fiction?
• Five years ago Yes… Today No…
• Five years ago the idea of downloading data
on a bus would have been absurd – not today
• Five years ago an IPOL would be absurd not today (consider the smart phone)
• Journals are providing RSS feeds today
• IM is prevalent but not for scientific discourse
• Video and podcasting are prevalent but not
for scientific discourse
• Full text and data are on-line but not
integrated
Role for Bio-ontologies
What is Missing to Make the
Vision a Reality?
1. Seamless integration between the data and the
publication upon which that data are based
2. Seamless integration of the authoring and
publishing process
3. Notion of traditional publications being
associated with podcasts and video
4. Professional networking akin to social
networking
What are the Catalysts for
Change?
• New publishing
paradigms, most
importantly open access
publishing
• The emerging generation
of digital scientists
• The increased ease of
working with digital
media, notably sound and
video
The Growth of Open Access Literature
PubMed Central Article Holdings (Research Articles only)
50000
40000
PLoS and PubMed Central founded
BioMed Central begins deposition
35000
Number of Articles
30000
25000
20000
15000
10000
PLoS publishes first journal issue
45000
5000
Back issue
deposition,
digitization
0
Publication Year
Open Access
(Creative Commons License)
1. All published materials available on-line
free to all (author pays model)
2. Unrestricted access to all published
material in various formats eg XML
provided attribution is given to the
original author(s)
3. Copyright remains with the author
Open Access
(Creative Commons License)
1. All published materials available on-line
free to all (author pays model)
2. Unrestricted access to all published
material in various formats eg XML
provided attribution is given to the
original author(s)
3. Copyright remains with the author
The catalyst
PLoS Comp Biol 2008 4(3) e1000037
Community Reaction?
Most scientists have no idea that this
implies that anyone can take their
material and enhance it e.g., via
mashup and effectively republish it
Okay so much for the 1%
inspiration, where is the 99%
perspiration?
What is Missing to Make the
Vision a Reality?
1. Seamless integration between the data and the
publication upon which that data are based
2. Seamless integration of the authoring and
publishing process
3. Notion of traditional publications being
associated with podcasts and video
4. Professional networking akin to social
networking
PLoS Comp. Biol. 2005 1(3), e34
Database and Journal IntegrationThe Test Bed
Journals
http://www.wwpdb.org/
Database
The Protein Data Bank
http://www.pdb.org
• Paper not published
unless data are
deposited – strong
data to literature
correspondence
• Highly structured data
conforming to an
extensive ontology
• DOI’s assigned to
every structure –
http://www.doi.org
Seamless Integration between
Data and the Literature – What
Does That Imply?
• Improving semantic consistency in the
literature – best done at the point of
authoring
• Post processing to establish semantic
content
• New forms of visualization and
interaction at the presentation layer
Seamless Integration between
Data and the Literature – What
Does That Imply?
• Improving semantic consistency in the
literature – best done at the point of
authoring
• Post processing to establish semantic
content
• New forms of visualization and
interaction at the presentation layer
BioLit: Tools for New Modes of Scientific Dissemination
The Knowledge and Data Cycle
0. Full text of PLoS papers stored
in a database
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
4.
1.
1. A link brings up figures
from the paper
3. A composite view of
journal and database
content results
3.
2.
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
• Biolit integrates
biological literature
and biological
databases and
includes:
– A database of journal
text
– Authoring tools to
facilitate database
storage of journal text
– Tools to make static
tables and figures
interactive
http://biolit.ucsd.edu
http://biolit.ucsd.edu
PSP Washington DC Feb. 2008
ICTP Trieste, December 10, 2007
What is Missing to Make the
Vision a Reality?
1. Seamless integration between the data and the
publication upon which that data are based
2. Seamless integration of the authoring and
publishing process
3. Notion of traditional publications being
associated with podcasts and video
4. Professional networking akin to social
networking
BioLit Plugin Project
Author
Publisher
Paper
Word File in Docx format
Sidebar: Imagine a Future Where…
• The relationship between author and
publisher is quite different
• The publisher is a warehouse for the
workflow of scientific endeavor not just a
repository for the end product
• Evidence:
– www.researchgate.net
– MML (Borya Shakhnovich)
BioLit Plugin Project
Automated Ontology & ID Tagging within Microsoft Word Documents
• Leverages Office Open XML used in Microsoft Office
2007
• Custom schema attached to document and used to
automatically XML tag ontology terms and database
identifiers within a research paper
• Ontology tagging assists publication of scientific
research by aiding efficient and accurate automated
categorization and promotion of information
dissemination
• Conversion of manuscript to NLM DTD for direct
submission to publisher
BioLit Plugin Project
Rather than Post-processing the Document the
Author Controls the Semantic Tagging
Plugin Architecture
Context-Sensitive Data Access
• Display of information of
database entries when
the user clicks on the ID
in the document
• Display of ontology
terms related to terms in
the document text, using
local database search
Ontologies are Stored in a Local Database
User Configurable Selection
• Fully user configuration
ontology and database
identifier selection
• All searches occur within
the user’s desktop
computer
• Desired ontologies are
downloaded and
installed automatically,
and update periodically
• BioLit installer XML file
provides the application
with the information
needed to download and
install ontologies.
What is Missing to Make the
Vision a Reality?
1. Seamless integration between the data and the
publication upon which that data are based
2. Seamless integration of the authoring and
publishing process
3. Notion of traditional publications being
associated with podcasts and video
4. Professional networking akin to social
networking
PSP Washington DC Feb. 2008
YouTube for Scientists
www.scivee.tv
Motivation
Pubcast – Video Integrated
with the Full Text of the Paper
Pubcast - Making
PSP Washington DC Feb. 2008
Channels – Just Like TV
ICTP Trieste, December 2007
Professional Profile
ICTP Trieste, December 2007
Create & Join Communities
and Discussion Groups
ICTP Trieste, December 2007
The Role of Ontologies
• Tag clouds generated
automatically from
MESH headings
• Semantic enrichment
can be included with
a pubcast
SciVee – Viral Projects
•
•
•
•
Sweetwater School District
“Postercasts”
Science video competitions
“Pubumentaries”
Acknowledgements
• SciVee Team
– Apryl Bailey
– Tim Beck
–
–
–
–
Leo Chalupa
Marc Friedman
Alex Ramos
Willy Suwanto
CT Watch 2007, 3(3) 26-31
• BioLit Team
• J. Lynn Fink
• Sergey Kushch
• Parker Williams
• Greg Quinn
[email protected]
Questions?