Access Changes Everything The Benefits of Open Access and Open Semantics for Researchers Leslie Carr QuickTime™ and a Intelligence, Agents anddecompressor Multimedia Group TIFF (Uncompressed) are needed to see.

Download Report

Transcript Access Changes Everything The Benefits of Open Access and Open Semantics for Researchers Leslie Carr QuickTime™ and a Intelligence, Agents anddecompressor Multimedia Group TIFF (Uncompressed) are needed to see.

Access Changes Everything
The Benefits of Open Access and
Open Semantics for Researchers
Leslie Carr
QuickTime™ and a
Intelligence, Agents
anddecompressor
Multimedia Group
TIFF (Uncompressed)
are needed to see this picture.
University of Southampton
Salutary Warning
• A scholar is just a library’s way of making
another library
– Daniel Dennett, Consciousness Explained
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee thi s pi ctu re.
Quic kT ime™ and a
T IFF (Uncompres sed) decompres sor
are needed to s ee this picture.
Thanks to Tim Brody and Stevan Harnad (Southampton University)
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Outline
• Open Access
–
–
–
–
–
Visionary Foundations
Rationale: Research Impact
Effect of Open Access on Research Impact
Tools and Services
Initiatives
• Semantic Web
– Introduction
– Resource Description
– Examples
• Concluding Thoughts
Open Access
Visionary Foundations
H. G. Wells, World Brain: The Idea of a
Permanent World Encyclopaedia
Encyclopédie Française, August, 1937
• encyclopaedias of the past sufficed for the
needs of a cultivated minority
– universal education was unthought of
– gigantic increase in recorded knowledge
– more gigantic growth in the numbers of
human beings requiring accurate and easily
accessible information
Permanent World Encyclopaedia
• Discontented with the role of universities
and libraries in the intellectual life of
mankind
• Universities multiply but do not enlarge
their scope
– thought & knowledge organization of the
world
• No obstacle to the creation of an efficient
index to all human knowledge, ideas and
achievements
Vannevar Bush, As We May Think
Atlantic Monthly, July 1945
• Director of the Office of Scientific Research
and Development in USA, coordinating 6,000
American scientists during WWII
• Turns to making our ‘bewildering store’ of
knowledge more accessible
• “For many years inventions have extended
man’s physical powers rather than the powers
of his mind.”
Memex
• The Memex (never built) was to be a
mechanised device to allow a library user to
– consult all kinds of written material
– organize it in any way the user wanted
– add private comments and link documents together at
will.
• A personal library station which held all written
articles and journals on microfilm.
– system of levers allowed users to add links
– create trails
Doug Engelbart
• Inventor of the mouse, was inspired by
Bush’s article.
• Computers were too expensive to be used
interactively and for non-numeric tasks
• Augment project (1962) to “develop
computer tools to augment human
capabilities and productivity”
Ted Nelson
• Hypertext is more than text (1965)
• Literature is a system of interconnected
documents
• Project Xanadu was a global literature: a
repository of documents, their multiple
versions and their interconnections.
Stevan Harnad, Scholarly
Skywriting, Psychological
Science (1990).
• Internet provides improvements in storing and
communicating ideas.
• The reward is improvement in generating ideas:
research.
• Greatest reward is the possibility of much
greater intellectual productivity in one lifetime.
Tim Berners-Lee
• Inventor of the WWW (1990)
• Intended as a tool for physicists at CERN
• Aim was to help quickly share research
results in collaborative projects
• Achieved through simple document,
communications and linking standards.
– simple standards caused rapid adoption
QuickTi me™ and a
T IFF (Uncompressed) decompressor
are needed to see thi s pi cture.
Paul Ginsparg
• Creator of the Los Alamos preprint archive
(1991)
• Now contains 280,000 articles
– High Energy Physics
– Computing
– Maths
– Qualitative Biology
• Founder of the Open Archiving Initiative
Various Visions
• Wells : a centralised, managed global knowledge
repository to combat fragmenting academic authority.
• Bush : a cross-disciplinary scholarly paradigm to
combat fragmenting scientific knowledge.
• Engelbart : computers augment productivity
• Nelson : computers create a global literature
• Harnad : Internet to boost personal research impact
• Berners-Lee : low-impact, standards-based
document dissemination for scientific research
• Ginsparg : Web to speed up personal scientific
communication against publication delays
Fast Forward to Open Access
• The Optimal and Inevitable for Researchers.
– The entire full-text refereed corpus online
– On every researcher’s desktop, everywhere
– 24 hours a day
– All papers citation-interlinked
– Fully searchable, navigable, retrievable
– For free, for all, forever
Stevan Harnad, Les Carr
OpCit International DLI Project Proposal (1999)
Open Access
Rationale
Open Archiving Initiative
• Initially UPS: Universal Preprint Service
– discussions initiated by Los Alamos HEP
archive (Paul Ginsparg)
– Inaugural meeting October 1999, Santa Fe
• Protocols to facilitate exchange of
metadata
– HTTP / XML Schema / Dublin Core
• Data provider / service provider distinction
EPrint Archiving Software
• A simple, turnkey
environment for
setting up an OAI
compliant archive
– Self archiving
– Institutional
archives
• (other software
available: DSpace,
Fedora etc)
The Literature: As We Imagine
• Integrated
• Available
The Literature: As It Is
• Disjoint
• Inaccessible
Twin Peaks Problem
Harvards
financial firewalls
Impact
Access
Have-Nots
The Research-Impact Cycle
Open access to research output
maximizes
research access
maximizing (and accelerating)
research impact
(hence also research productivity
and research progress
and their rewards)
Limited Access: Limited Research Impact
Impact cycle
begins:
12-18 Months
Research is
done
Researchers write
pre-refereeing
“Pre-Print”
Submitted to Journal
Pre-Print reviewed by
Peer Experts – “PeerReview”
Pre-Print revised by
article’s Authors
Refereed “Post-Print”
Accepted, Certified, Published
by Journal
Researchers can access the
Post-Print if their university
has a subscription to the
Journal
New impact cycles:
New research builds
on existing research
Maximized Research Access and Impact Through Self-Archiving
12-18 Months
Impact cycle
begins:
Researchers write
pre-refereeing
Research is done
“Pre-Print”
Pre-Print is selfarchived in
University’s Eprint
Archive
Submitted to Journal
Pre-Print reviewed by Peer
Experts – “Peer-Review”
Pre-Print revised by
article’s Authors
Refereed “Post-Print” Accepted,
Certified, Published by Journal
Researchers can access the
Post-Print if their university
has a subscription to the
Journal
Post-Print is selfarchived in
University’s Eprint
Archive
New impact cycles:
Self-archived
research
impact is greater (and
faster) because
access is maximized
(and accelerated)
New impact cycles:
New research builds on
existing research
Research Impact
I.
measures the size of a research contribution to
further research (“publish or perish”)
II.
generates further research funding
III.
contributes to the research productivity and
financial support of the researcher’s institution
IV.
advances the researcher’s career
V.
promotes research progress
Open Access
Effect on Research Impact
“Online or Invisible?” (Lawrence 2001)
“average of 336% more citations to online articles compared to offline
articles published in the same venue”
Lawrence, S. (2001) Free online availability substantially increases a paper's
impact Nature 411 (6837): 521.
http://www.neci.nec.com/~lawrence/papers/online-nature01/
Open vs non-Open Impact
(All Physics)
Open Access vs. Non-Open Access Citation Impact Ratios
All Physics Fields
600%
557%
100000
90000
500%
80000
70000
400%
60000
322%
300%
253%
298%
233%
287%
270% 274% 270%
255% 259%
50000
40000
200%
30000
20000
100%
10%
1%
4%
6%
8%
10%
12%
14%
15%
17%
18%
0%
10000
0
All
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Open Access/Non-Open Access Impact Ratio
Open Access Articles as a Percentage of All Articles
Total Open Access and Non-Open Access Articles
Open vs non-Open Impact
(Nuclear Physics)
Open Access vs. Non-Open Access Citation Impact Ratios
Nuclear and Particle Physics
350%
327%
302%
300%
250%
270%
16000
286%
274%
259%
275%
252%
247%
18000
263%
14000
12000
218%
200%
10000
150%
8000
6000
100%
50%
36%
8%
20%
29%
35%
41%
42%
46%
45%
48%
48%
0%
4000
2000
0
All
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Open Access/Non-Open Access Impact Ratio
Open Access Articles as a Percentage of All Articles
Total Open Access and Non-Open Access Articles
Open vs non-Open Impact
(Chemical Physics)
Open Access vs. Non-Open Access Citation Impact Ratios
Chemical Physics
450%
384%
400%
350%
307%
300%
250%
215%
212%
200%
184%
178%
200% 222%
155%
150%
100%
33%
50%
2%
2%
1%
1%
1%
1%
0%
0%
0%
0%
0%
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Open Access/Non-Open Access Impact Ratio
Open Access Articles as a Percentage of All Articles
Total Open and Non-Open Access Articles
9400
9200
9000
8800
8600
8400
8200
8000
7800
7600
7400
Open vs non-Open Impact
(General Physics)
Open Access vs. Non-Open Access C itation Impact Ratios
General Physics
800%
700%
600%
729%
25000
20000
500%
400%
390%
300% 364% 249% 218% 237% 237% 230% 248% 250%
200%
100%
0%
30000
15000
296% 296%
10000
5000
0%
1%
3%
5%
8%
10%
12%
16%
16%
18%
20%
15%
0
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Open Access/Non-Open Access Impact Ratio
Open Access Articles as a Percentage of All Articles
Total Open and Non-Open Access Articles
Research Assessment, Research
Funding, and Citation Impact
“Correlation between RAE ratings and mean
departmental citations +0.91 (1996) +0.86
(2001) (Psychology)”
“RAE and citation counting measure
broadly the same thing”
“Citation counting is both more cost-effective
and more transparent”
(Eysenck & Smith 2002)
http://psyserver.pc.rhbnc.ac.uk/citations.pdf
Time-Course of Citations (red)
and Usage (hits, green)
Witten, Edward (1998) String Theory and Noncommutative Geometry Adv. Theor. Math. Phys. 2 : 253
1. Preprint or
Postprint appears.
2. It is downloaded
(and sometimes
read).
3. Eventually
citations may
follow (for more
important papers).
4. This generates
more downloads,
etc.
Usage Impact
is correlated with Citation Impact
(Physics ArXiv: hep, astro, cond, quantum; math, comp)
http://citebase.eprints.org/analysis/correlation.php
(Quartiles Q1 (lo) - Q4 (hi))
All
Most papers are not cited at all
r=.27, n=219328
Q1 (lo) r=.26, n=54832
Q2
r=.18, n=54832
Q3
r=.28, n=54832
Q4 (hi) r=.34, n=54832
hep
r=.33, n=74020
Q1 (lo)
Q2
Q3
Q4 (hi)
r=.23, n=18505
r=.23, n=18505
r=.30, n=18505
r=.50, n=18505
(correlation is highest for highcitation papers/authors)
Average UK downloads per paper: 10
(UK site only: 18 mirror sites in all)
Some old and new scientometric
(“publish or perish”) indices of
research impact
•
Peer-review quality-level and citation-counts of
the journal in which the article appears
•
citation-counts for the article
•
citation-counts for the researcher
•
co-citations, co-text, “semantic web” (cited with
whom/what else?)
•
citation-counts for the preprint
•
usage-measures (“hits,” webmetrics)
•
time-course analyses, early predictors, etc. etc.
Open Access
Tools and Services
Tools for
(a) creating OAI-compliant university eprint archives
(b) parsing and finding cited references on the web,
(c) reference-linking eprint archives,
(d) doing scientometric analyses of research impact,
(e) creating OAI-compliant open-access journals
http://software.eprints.org
http://paracite.eprints.org/
http://opcit.eprints.org/evaluation/Citebaseevaluation/evaluation-report.html
http://citebase.eprints.org/help/
http://psycprints.ecs.soton.ac.uk/
Citation Linking Service
Reference links on PDF
copies of papers
PDF technology from Open
Journals Project, David
Brailsford, Steve Probets,
David Evans
Citation-Ranked Search Service
Citation Visualisation Service
Open Access
Initiatives
The Budapest Open Access Initiative
Two open-access strategies: Gold
and Green
The two open-access strategies:
Gold and Green
Open-Access Publishing
(OApub) (BOAI-2)
Open-Access Self-Archiving
(OAarch) (BOAI-1)
1.
1.
2.
3.
Create or Convert 23,000
open-access journals (1000
exist currently)
Find funding support for
open-access publication
costs ($500-$1500+)
Persuade the authors of the
annual 2,500,000 articles to
publish in new open-access
journals instead of the
existing toll-access journals
Persuade the authors of the
annual 2,500,000 articles
they publish in the existing
toll-access journals to also
self-archive them in their
institutional open-access
archives.
Berlin Declaration
on
Open Access to Knowledge in the Sciences and Humanities
http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html
The pertinent passages:
“Open access [means]:
“1. free... [online, full-text] access
“2. A complete version of the [open-access] work... is deposited...
in at least one online repository... to enable open access,
unrestricted distribution, [OAI] interoperability, and long-term
archiving.
“[W]e intend to... encourag[e].. our researchers/grant recipients to
publish their work according to the principles of... open access.”
What is needed for open access now:
1.
Universities: Adopt a university-wide policy of making all university
research output open access (via either the gold or green strategy)
2.
Departments: Create and fill departmental OAI-compliant open-access
archives
3.
University Libraries: Provide digital library support for research selfarchiving and open-access archive-maintenance.
4.
Promotion Committees: Require a standardized online CV from all
candidates, with refereed publications all linked to their full-texts in the
open-access journal archives and/or departmental open-access archives
5.
Research Funders: Mandate open access for all funded research (via
either the gold or green strategy). Fund (fixed, fair) open-access journal
peer-review service charges. Assess research and researcher impact online
(from the online CVs).
6.
Publishers: Become either gold or green.
RoMEO Directory of Publishers who have given their
Green Light to Self-Archiving
http://www.sherpa.ac.uk/romeo.php
http://romeo.eprints.org
Proportion of journals already formally giving their green light to
author/institution self-archiving (already 83%) continues to grow:
Green light
to self-archive:
Neither yet
Journals
%
Publishers
%
10,673
(100%)
88
(100%)
1,793
17%
37
42%
3,253
+30% (=83%)
7
+8% (=58%)
1,772
+17% (=53%)
3,855
36%
Preprint
Postprint
Postprint and
Preprint
14
30
+16% (=50%)
34%
Percentage Green and Gray PUBLISHERS for years:
2003 (n=80)
2004 (n=88)
100%
90%
80%
70%
PERCENTAGE
Percentage of green
PUBLISHERS
grew from
42% - 58%
from 2003-2004
42%
58%
60%
8%
50%
40%
30%
16%
9%
9%
20%
10%
(no green light yet)
P reprint
P os tprint
P os tprint + preprint
34%
25%
0%
YEA RS 2003 VS. 2004 PUBLISHER SELF-A RCHIVING POLICIES
2003 (n=7,135)
2004 (n=10,673)
100%
1793
90%
80%
3238
70%
PERCENTAGE
Percentage of green
JOURNALS
grew from
55% - 83%
from 2003-2004
Percentage Green and Gray JOURNALS for years:
3253
60%
50%
1772
40%
2552
30%
20%
10%
136
3855
1209
0%
JOURNA L SELF-A RCHIVING POLICIES: YEA RS 2003 VS. 2004
(no green light yet)
preprint
P os tprint
P os tprint + preprint
OAIster, a cross-archive search engine, now covers over 250 OAI Archives
(about half of them Eprints.org Archives) indexing over 3 million items (but not
all research papers, and not all full-texts).
http://oaister.umdl.umich.edu/o/oaister/
Number of Papers in OAIster (80 Archives)
300000
243558
250000
200000
172129
152026
150000
106617
100000
77687
85029
56777
50000
39807
5701
6523
1990
1991
13247
21074
44921
28809
0
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
Year
…but there are 2.5 million journal articles published per year!
Declaration of Institutional Commitment
to implementing
the Berlin Declaration on open-access provision
Our institution hereby commits itself to adopting and implementing an official institutional policy of
providing open access to our own peer-reviewed research output -- i.e., toll-free, full-text online
access, for all would-be users webwide -- in accordance with the Budapest Open Access Initiative
and the Berlin Declaration
UNIFIED OPEN-ACCESS PROVISION POLICY:
(OAJ) Researchers publish their research in an open-access journal if a suitable one exists
otherwise
(OAA) Researchers publish their research in a suitable toll-access journal and also self-archive it
in their own research institution's open-access research archive.
To sign: http://www.eprints.org/signup/sign.php
A JISC survey (Swan & Brown 2004) "asked authors to say how they would feel if
their employer or funding body required them to deposit copies of their published
articles in one or more… repositories. The vast majority... said they would do so
willingly.”
http://www.jisc.ac.uk/uploaded_documents/JISCOAreport1.pdf
Semantic Web
Introduction
Archiving: More than Articles
• Metadata
collection
and
distribution
• Basis of OAI
• But extra
effort for
researcher
Semantic Web
• W3C activity to improve Web resources
– By providing metadata
– Formal descriptions of resources
– Based on strict standards
• RDF - Resource Description Format
• RDF(S) - Schema Language for defining types or
resources and types of properties
• OWL - Ontology language for more complex
relationships
Old Web Service
• Web server sends a document to a user
Modern Web Services
type = info
number=1
name
price
item
ref
invoice
item
number=2
name
id = xyz
price
• Web server sends data to a program
Semantic Web
type = info
number=1
item
invoice
number=2
item
name
price
ref
name id = xyz
price
• Semantic web provides resources to users and
their semantics to computers
Semantic Web
Resource Description
RDF: Metadata
• Data about data
– information about documents
• title, author, journal, date, keywords
– information about people
• role, history, salary, expertise
– information about exhibits
• catalogue number, price, date, artist
– information about metadata
• validity, purpose, compiler, authority
Content:
Some hills,
a lake and
the sun
Width
Colour
distribution
Shapes
Height
Artist
Ambient No. 3
Represents:
peace
tranquility
Title
Catalogue information. artist, title of the image or
picture, date acquired, dimensions.
Syntactic content. primitive features, e.g. colour,
texture and shapes.
Semantic content. what it’s supposed to represent,
e.g. painting of a landscape or a representation of
happiness.
RDF Model
http://www.w3c.org/Intro.html
Author
Tim Berners-Lee
RDF Model
http://www.w3c.org/Intro.html
predicate
Author
subject
Tim Berners-Lee
object
RDF Model
http://www.w3c.org/Intro.html
subject
predicate
author
object
email
[email protected]
name
Tim Berners-Lee
Semantic Web Examples
• Example Projects
– CSAKTive Space
– Web Photos
• Ontologies
– Role of ontologies
– How they dovetail in with OAI
– Dspace / SIMILE
– Bridging the semantic gap
CS AKTive Space
• Integrating
info from
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
– Eprint
archives
– Home
pages
– Funding
agencies
Web Conference Photo
• Attendees
upload photos
for public display
• Can then be
publicly
annotated
• List of known
people collected
– community
Web Photo RDF Model
• Ontologies used
– Dublin Core
– Friend-of-aFriend
– Creative
Commons
Rights
Management
– Geographical
Locations
– Calendar Events
Simile
• DSpace / MIT / HP / W3C Semantic Web
and Digital Library project
• Many resources in many sites catalogued
with different schemes for different
purposes
• Use ontologies to switch between domains
and perform cross-domain searches
Simile Scenario
(Taken from Dspace
User Group slides)
• Started on ARTstor island
– SUBJECT: Abstract
Roamed around island
SUBJECT: Abstract, CREATOR: Gorky
Travelled over Gorky bridge to OCW island
CREATOR: Gorky, IS PART OF: ...
Found resource not on ARTstor island
Travelled over Graham bridge
To another part of ARTstor island
Semantic Web raison d’etre
• Bridging between resources
• Through shared semantics of metadata
• Made possible by ontologies
Lessons for Open Access
• Collect and organise metadata
– and explain to authors the benefits of their
investments
• Researchers become responsible
maintainers of their output
– For sharing with their community
– For sharing with posterity
• Build value-added services that build on
shared agreements about meaning
Final Thoughts
• Open access improves science
• Network effect
– more participants -> better services
• Just do it!
– But start with small steps