EMBL-EBI Powerpoint Presentation - National e

Download Report

Transcript EMBL-EBI Powerpoint Presentation - National e

Bibliography 2.0: A case study from the
Wellcome Trust Genome Campus
Dr. Duncan Hull
http://twitter.com/dullhunk
European Bioinformatics Institute, EBI.ac.uk
e-Science workshop: The influence and impact
of Web 2.0 on various applications
11th-12th May 2010, Edinburgh
EBI is an Outstation of the European Molecular Biology Laboratory.
Overview
• Introduction: Wellcome Trust Genome Campus
• The European Bioinformatics Institute (ebi.ac.uk)
• The Wellcome Trust Sanger Institute (sanger.ac.uk)
• The Library
• Problem: economics and “freakonomics” of publishing
• The unintended consequences of “publish or perish”
• Burying data in publication silos
• Obscuring identities and obstructing social applications
• Solution? Bibliography 2.0 with citeulike
• Incentives
• Disincentives
• Case study: What we’ve learnt
• Conclusions and future work
2
13.04.2015
Wellcome to the Genome Campus
Home of
 The European Bioinformatics Institute
 The Sanger Institute
Just outside Cambridge, UK
EBI is an Outstation of the European Molecular Biology Laboratory.
EBI: a data hub for bioinformatics in Europe
Literature
ebi.ac.uk/citexplore
Genomes:
ensembl.org
Protein sequence
uniprot.org
DNA +RNA sequences
ebi.ac.uk/ena
Protein structure
ebi.ac.uk/pdbe
Transcriptomes
e.g. ArrayExpress
Small molecules
ebi.ac.uk/chebi
andebi.ac.uk/che
mbl
Protein domains, families
ebi.ac.uk/interpro
Protein protein interactions
ebi.ac.uk/intact
Pathways
reactome.org
Systemsbiomodel
s.net
~400 staff (research/services), publishing data on the web
e.g. Chemical Entities of Biological Interest (ChEBI)
Free database /ontology of 500,000 small molecules (many drugs)
5
13.04.2015
The Wellcome Trust Sanger Institute
• The Sanger Institute is a world leading genome research
institute using DNA sequencing to further understanding of
gene function in health and disease funded by charity
(The Wellcome Trust)
• From THE human genome ten years ago to
1000 genomes today 2010
• More Bio than Informatics (c.f. EBI) with
progressive approach to Web 2.0 e.g.
• Daub, J., et al (2008). The RNA wikiproject:
community annotation of RNA families. RNA
14 (12), 2462-2464. DOI:10.1261/rna.1200508
• http://en.wikipedia.org/wiki/Wikipedia:WikiProje
ct_RNA
6
Alex Bateman
13.04.2015
~900 Sanger staff (total)
Shared Library
More later
Annual Journal
subscription
budget
£500,000
(modest compared to multi
million pound journal budgets of
university libraries)
7
13.04.2015
“People respond to incentives,
although not necessarily in ways
•that
) are predictable and manifest.
Therefore, one of the most powerful
laws in the universe is the law of
unintended consequences. This
applies to schoolteachers and
Realtors and crack dealers as well
as expectant mothers, sumo
wrestlers, bible salesman, and the
Ku Klux Klan…”
…and scientists too…
8
13.04.2015
Unintended consequences, an example
• Incentive: “publish or perish”
• Publications are rewarded with recognition, hiring, promotion,
tenure, fame, funding, fortune, prizes, job satisfaction etc
• Unintended consequences:
• Valuable data gets damaged, destroyed or “buried” (see later)
• Inaccessible to data and text mining on the Web
• Copyright and toll-access journals
• Luddite scientists
• Minimal exploitation of social software for sharing data
• Minimal exploitation of Web 2.0 for sharing data
9
13.04.2015
Why bury it [data] first
and then mine it again?
Which gene did you mean?
BMC Bioinformatics. 2005 Jun 7;6:142
DOI:10.1186/1471-2105-6-142
Barend Mons, Wikiproteins http://proteins.wikiprofessional.org
•
•
•
•
•
10
Gene names: e.g. Hexokinase, HK1, HK2, HK3
Protein names: e.g. Hexokinase, HK1, HK2, HK3
Chemical names: e.g. Glucose-6-phosphate, G6P, Glu, Gluc
Author names: e.g. Mark Baker (see next slide)
Poor precision and recall
13.04.2015
Identity crisis: Mark Baker
http://pubmed.gov?term=Baker+M[author]
http://pubmed.gov?term=Mark+Baker[author]
etc
Until we have unique author identifiers, it is difficult or
impossible to reliably find the papers published by a
particular person
Open Researcher and Contributor ID http://orcid.org
“Tell me whenever Mark Baker publishes a paper”
11
13.04.2015
Social information (need identity for this)
• Socialisation: (e-science > “we-science”)
• How many other people have read this paper?
• What are my friends / enemies reading?
• What other papers did they also read?
• Personalisation (e-science > “me-science”)
• These are my publications
• This is my bibliography (stuff I’m reading / have read)
• Digital libraries “document-centred” rather than “people-centred”
Author name disambiguation in MEDLINE by: Vetle I. Torvik, Neil R.
Smalheiser ACM Trans. Knowl. Discov. Data, Vol. 3, No. 3. (2009),
pp. 1-29. DOI:10.1145/1552303.1552304
12
13.04.2015
A solution, citeulike.org?
• http://www.citeulike.org
• Lack of personalisation of library data
• Lack of socialisation of library data
• Works a lot like http://www.delicious.com
13
13.04.2015
Click Post to Citeulike
14
13.04.2015
Tag it (optional) e.g. author tags
15
13.04.2015
Journal picks is a group of 40+
invited users on campus, who
select interesting papers
16
13.04.2015
2,016 unique articles in journal picks
(less than one year)
3,880,055 unique articles total
17
13.04.2015
Citeulike + ZeitGeist = CiteGeist
http://www.citeulike.org/citegeist
18
13.04.2015
Citeulike incentives
•
•
•
•
•
Selfish scientist (just organise my reference mess)
What’s popular (interesting stuff CiteGeist)
Serendipity (find papers you wouldn’t find normally)
Increase visibility and PageRank of papers?
Person-centred access points into first / second page of
Google results
e.g. http://www.google.com/search?q=carole+goble
Has result below fairly high up list,
http://www.citeulike.org/group/10570/tag/carole-goble
19
13.04.2015
Citeulike disincentives
• Privacy, don’t want to share with rivals
• (but can make collections private)
• Citeulike might go bust?
• But Springer sponsored
• Parsers are fragile
• easily (and deliberately) broken by publishers
• Valuable data in the hands of a commercial company?
• But Facebook? LinkedIn? Twitter etc?
• No academic reward for using it
• publication = “finished”
• Social software works best with network effects
• There are LOTS of other tools that do this…
20
13.04.2015
And the rest…
www.refworks.com
www.zotero.org
www.mendeley.com
“Last.fm of
research”
www.hubmed.org
www.connotea.org
www.mekentosj.com
21
13.04.2015
“iTunes for PDF files”
Giant corporate commercial competitors
• With significant vested financial interests
• Scopus http://www.scopus.com/
• ISI WOK http://isiknowledge.com
Wrote a review of these systems: Hull, D., S. R. Pettifer,
and D. B. Kell (2008). Defrosting the digital library:
Bibliographic tools for the next generation web. PLoS
Comput Biol 4 (10), e1000204+.
DOI:10.1371/journal.pcbi.1000204
22
13.04.2015
Conclusions
• “Publish or perish” has some unfortunate and unintended
consequences in science
• Citeulike is an interesting Web 2.0 tool
• We’ve had some success using it (typical “long tail”)
• Weak incentives for use by many cultural barriers to adoption
• Technical barriers to adoption, many tools, messy data
• Future work
• Social network analysis, clickthroughs, tag analysis
• Any other ideas…
• But the times they are a changin’
• Citeulike or something like it will work much better if/when
“publishing” incentives change over time…
23
13.04.2015
Acknowledgements
•
•
•
•
Mark Baker for organising this workshop
EBI, Christoph Steinbeck (laboratory head)
Carole Goble, University of Manchester
The Sanger, Alex Bateman, Frances Martin, Tim Hubbard
and all the contributors to the Journal Picks group
• Richard Cameron, Kevin Emamy and the rest of the
citeulike team
• BBSRC for funding
• Any questions?
24
13.04.2015