Access Changes Everything The Benefits of Open Access and Open Semantics for Researchers Leslie Carr QuickTime™ and a Intelligence, Agents anddecompressor Multimedia Group TIFF (Uncompressed) are needed to see.
Download ReportTranscript Access Changes Everything The Benefits of Open Access and Open Semantics for Researchers Leslie Carr QuickTime™ and a Intelligence, Agents anddecompressor Multimedia Group TIFF (Uncompressed) are needed to see.
Access Changes Everything The Benefits of Open Access and Open Semantics for Researchers Leslie Carr QuickTime™ and a Intelligence, Agents anddecompressor Multimedia Group TIFF (Uncompressed) are needed to see this picture. University of Southampton Salutary Warning • A scholar is just a library’s way of making another library – Daniel Dennett, Consciousness Explained QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Quick Time™ a nd a TIFF ( Un co mpr es sed ) d eco mp res so r ar e n eed ed to s ee thi s pi ctu re. Quic kT ime™ and a T IFF (Uncompres sed) decompres sor are needed to s ee this picture. Thanks to Tim Brody and Stevan Harnad (Southampton University) QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Outline • Open Access – – – – – Visionary Foundations Rationale: Research Impact Effect of Open Access on Research Impact Tools and Services Initiatives • Semantic Web – Introduction – Resource Description – Examples • Concluding Thoughts Open Access Visionary Foundations H. G. Wells, World Brain: The Idea of a Permanent World Encyclopaedia Encyclopédie Française, August, 1937 • encyclopaedias of the past sufficed for the needs of a cultivated minority – universal education was unthought of – gigantic increase in recorded knowledge – more gigantic growth in the numbers of human beings requiring accurate and easily accessible information Permanent World Encyclopaedia • Discontented with the role of universities and libraries in the intellectual life of mankind • Universities multiply but do not enlarge their scope – thought & knowledge organization of the world • No obstacle to the creation of an efficient index to all human knowledge, ideas and achievements Vannevar Bush, As We May Think Atlantic Monthly, July 1945 • Director of the Office of Scientific Research and Development in USA, coordinating 6,000 American scientists during WWII • Turns to making our ‘bewildering store’ of knowledge more accessible • “For many years inventions have extended man’s physical powers rather than the powers of his mind.” Memex • The Memex (never built) was to be a mechanised device to allow a library user to – consult all kinds of written material – organize it in any way the user wanted – add private comments and link documents together at will. • A personal library station which held all written articles and journals on microfilm. – system of levers allowed users to add links – create trails Doug Engelbart • Inventor of the mouse, was inspired by Bush’s article. • Computers were too expensive to be used interactively and for non-numeric tasks • Augment project (1962) to “develop computer tools to augment human capabilities and productivity” Ted Nelson • Hypertext is more than text (1965) • Literature is a system of interconnected documents • Project Xanadu was a global literature: a repository of documents, their multiple versions and their interconnections. Stevan Harnad, Scholarly Skywriting, Psychological Science (1990). • Internet provides improvements in storing and communicating ideas. • The reward is improvement in generating ideas: research. • Greatest reward is the possibility of much greater intellectual productivity in one lifetime. Tim Berners-Lee • Inventor of the WWW (1990) • Intended as a tool for physicists at CERN • Aim was to help quickly share research results in collaborative projects • Achieved through simple document, communications and linking standards. – simple standards caused rapid adoption QuickTi me™ and a T IFF (Uncompressed) decompressor are needed to see thi s pi cture. Paul Ginsparg • Creator of the Los Alamos preprint archive (1991) • Now contains 280,000 articles – High Energy Physics – Computing – Maths – Qualitative Biology • Founder of the Open Archiving Initiative Various Visions • Wells : a centralised, managed global knowledge repository to combat fragmenting academic authority. • Bush : a cross-disciplinary scholarly paradigm to combat fragmenting scientific knowledge. • Engelbart : computers augment productivity • Nelson : computers create a global literature • Harnad : Internet to boost personal research impact • Berners-Lee : low-impact, standards-based document dissemination for scientific research • Ginsparg : Web to speed up personal scientific communication against publication delays Fast Forward to Open Access • The Optimal and Inevitable for Researchers. – The entire full-text refereed corpus online – On every researcher’s desktop, everywhere – 24 hours a day – All papers citation-interlinked – Fully searchable, navigable, retrievable – For free, for all, forever Stevan Harnad, Les Carr OpCit International DLI Project Proposal (1999) Open Access Rationale Open Archiving Initiative • Initially UPS: Universal Preprint Service – discussions initiated by Los Alamos HEP archive (Paul Ginsparg) – Inaugural meeting October 1999, Santa Fe • Protocols to facilitate exchange of metadata – HTTP / XML Schema / Dublin Core • Data provider / service provider distinction EPrint Archiving Software • A simple, turnkey environment for setting up an OAI compliant archive – Self archiving – Institutional archives • (other software available: DSpace, Fedora etc) The Literature: As We Imagine • Integrated • Available The Literature: As It Is • Disjoint • Inaccessible Twin Peaks Problem Harvards financial firewalls Impact Access Have-Nots The Research-Impact Cycle Open access to research output maximizes research access maximizing (and accelerating) research impact (hence also research productivity and research progress and their rewards) Limited Access: Limited Research Impact Impact cycle begins: 12-18 Months Research is done Researchers write pre-refereeing “Pre-Print” Submitted to Journal Pre-Print reviewed by Peer Experts – “PeerReview” Pre-Print revised by article’s Authors Refereed “Post-Print” Accepted, Certified, Published by Journal Researchers can access the Post-Print if their university has a subscription to the Journal New impact cycles: New research builds on existing research Maximized Research Access and Impact Through Self-Archiving 12-18 Months Impact cycle begins: Researchers write pre-refereeing Research is done “Pre-Print” Pre-Print is selfarchived in University’s Eprint Archive Submitted to Journal Pre-Print reviewed by Peer Experts – “Peer-Review” Pre-Print revised by article’s Authors Refereed “Post-Print” Accepted, Certified, Published by Journal Researchers can access the Post-Print if their university has a subscription to the Journal Post-Print is selfarchived in University’s Eprint Archive New impact cycles: Self-archived research impact is greater (and faster) because access is maximized (and accelerated) New impact cycles: New research builds on existing research Research Impact I. measures the size of a research contribution to further research (“publish or perish”) II. generates further research funding III. contributes to the research productivity and financial support of the researcher’s institution IV. advances the researcher’s career V. promotes research progress Open Access Effect on Research Impact “Online or Invisible?” (Lawrence 2001) “average of 336% more citations to online articles compared to offline articles published in the same venue” Lawrence, S. (2001) Free online availability substantially increases a paper's impact Nature 411 (6837): 521. http://www.neci.nec.com/~lawrence/papers/online-nature01/ Open vs non-Open Impact (All Physics) Open Access vs. Non-Open Access Citation Impact Ratios All Physics Fields 600% 557% 100000 90000 500% 80000 70000 400% 60000 322% 300% 253% 298% 233% 287% 270% 274% 270% 255% 259% 50000 40000 200% 30000 20000 100% 10% 1% 4% 6% 8% 10% 12% 14% 15% 17% 18% 0% 10000 0 All 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Open Access/Non-Open Access Impact Ratio Open Access Articles as a Percentage of All Articles Total Open Access and Non-Open Access Articles Open vs non-Open Impact (Nuclear Physics) Open Access vs. Non-Open Access Citation Impact Ratios Nuclear and Particle Physics 350% 327% 302% 300% 250% 270% 16000 286% 274% 259% 275% 252% 247% 18000 263% 14000 12000 218% 200% 10000 150% 8000 6000 100% 50% 36% 8% 20% 29% 35% 41% 42% 46% 45% 48% 48% 0% 4000 2000 0 All 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Open Access/Non-Open Access Impact Ratio Open Access Articles as a Percentage of All Articles Total Open Access and Non-Open Access Articles Open vs non-Open Impact (Chemical Physics) Open Access vs. Non-Open Access Citation Impact Ratios Chemical Physics 450% 384% 400% 350% 307% 300% 250% 215% 212% 200% 184% 178% 200% 222% 155% 150% 100% 33% 50% 2% 2% 1% 1% 1% 1% 0% 0% 0% 0% 0% 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 Open Access/Non-Open Access Impact Ratio Open Access Articles as a Percentage of All Articles Total Open and Non-Open Access Articles 9400 9200 9000 8800 8600 8400 8200 8000 7800 7600 7400 Open vs non-Open Impact (General Physics) Open Access vs. Non-Open Access C itation Impact Ratios General Physics 800% 700% 600% 729% 25000 20000 500% 400% 390% 300% 364% 249% 218% 237% 237% 230% 248% 250% 200% 100% 0% 30000 15000 296% 296% 10000 5000 0% 1% 3% 5% 8% 10% 12% 16% 16% 18% 20% 15% 0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 Open Access/Non-Open Access Impact Ratio Open Access Articles as a Percentage of All Articles Total Open and Non-Open Access Articles Research Assessment, Research Funding, and Citation Impact “Correlation between RAE ratings and mean departmental citations +0.91 (1996) +0.86 (2001) (Psychology)” “RAE and citation counting measure broadly the same thing” “Citation counting is both more cost-effective and more transparent” (Eysenck & Smith 2002) http://psyserver.pc.rhbnc.ac.uk/citations.pdf Time-Course of Citations (red) and Usage (hits, green) Witten, Edward (1998) String Theory and Noncommutative Geometry Adv. Theor. Math. Phys. 2 : 253 1. Preprint or Postprint appears. 2. It is downloaded (and sometimes read). 3. Eventually citations may follow (for more important papers). 4. This generates more downloads, etc. Usage Impact is correlated with Citation Impact (Physics ArXiv: hep, astro, cond, quantum; math, comp) http://citebase.eprints.org/analysis/correlation.php (Quartiles Q1 (lo) - Q4 (hi)) All Most papers are not cited at all r=.27, n=219328 Q1 (lo) r=.26, n=54832 Q2 r=.18, n=54832 Q3 r=.28, n=54832 Q4 (hi) r=.34, n=54832 hep r=.33, n=74020 Q1 (lo) Q2 Q3 Q4 (hi) r=.23, n=18505 r=.23, n=18505 r=.30, n=18505 r=.50, n=18505 (correlation is highest for highcitation papers/authors) Average UK downloads per paper: 10 (UK site only: 18 mirror sites in all) Some old and new scientometric (“publish or perish”) indices of research impact • Peer-review quality-level and citation-counts of the journal in which the article appears • citation-counts for the article • citation-counts for the researcher • co-citations, co-text, “semantic web” (cited with whom/what else?) • citation-counts for the preprint • usage-measures (“hits,” webmetrics) • time-course analyses, early predictors, etc. etc. Open Access Tools and Services Tools for (a) creating OAI-compliant university eprint archives (b) parsing and finding cited references on the web, (c) reference-linking eprint archives, (d) doing scientometric analyses of research impact, (e) creating OAI-compliant open-access journals http://software.eprints.org http://paracite.eprints.org/ http://opcit.eprints.org/evaluation/Citebaseevaluation/evaluation-report.html http://citebase.eprints.org/help/ http://psycprints.ecs.soton.ac.uk/ Citation Linking Service Reference links on PDF copies of papers PDF technology from Open Journals Project, David Brailsford, Steve Probets, David Evans Citation-Ranked Search Service Citation Visualisation Service Open Access Initiatives The Budapest Open Access Initiative Two open-access strategies: Gold and Green The two open-access strategies: Gold and Green Open-Access Publishing (OApub) (BOAI-2) Open-Access Self-Archiving (OAarch) (BOAI-1) 1. 1. 2. 3. Create or Convert 23,000 open-access journals (1000 exist currently) Find funding support for open-access publication costs ($500-$1500+) Persuade the authors of the annual 2,500,000 articles to publish in new open-access journals instead of the existing toll-access journals Persuade the authors of the annual 2,500,000 articles they publish in the existing toll-access journals to also self-archive them in their institutional open-access archives. Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html The pertinent passages: “Open access [means]: “1. free... [online, full-text] access “2. A complete version of the [open-access] work... is deposited... in at least one online repository... to enable open access, unrestricted distribution, [OAI] interoperability, and long-term archiving. “[W]e intend to... encourag[e].. our researchers/grant recipients to publish their work according to the principles of... open access.” What is needed for open access now: 1. Universities: Adopt a university-wide policy of making all university research output open access (via either the gold or green strategy) 2. Departments: Create and fill departmental OAI-compliant open-access archives 3. University Libraries: Provide digital library support for research selfarchiving and open-access archive-maintenance. 4. Promotion Committees: Require a standardized online CV from all candidates, with refereed publications all linked to their full-texts in the open-access journal archives and/or departmental open-access archives 5. Research Funders: Mandate open access for all funded research (via either the gold or green strategy). Fund (fixed, fair) open-access journal peer-review service charges. Assess research and researcher impact online (from the online CVs). 6. Publishers: Become either gold or green. RoMEO Directory of Publishers who have given their Green Light to Self-Archiving http://www.sherpa.ac.uk/romeo.php http://romeo.eprints.org Proportion of journals already formally giving their green light to author/institution self-archiving (already 83%) continues to grow: Green light to self-archive: Neither yet Journals % Publishers % 10,673 (100%) 88 (100%) 1,793 17% 37 42% 3,253 +30% (=83%) 7 +8% (=58%) 1,772 +17% (=53%) 3,855 36% Preprint Postprint Postprint and Preprint 14 30 +16% (=50%) 34% Percentage Green and Gray PUBLISHERS for years: 2003 (n=80) 2004 (n=88) 100% 90% 80% 70% PERCENTAGE Percentage of green PUBLISHERS grew from 42% - 58% from 2003-2004 42% 58% 60% 8% 50% 40% 30% 16% 9% 9% 20% 10% (no green light yet) P reprint P os tprint P os tprint + preprint 34% 25% 0% YEA RS 2003 VS. 2004 PUBLISHER SELF-A RCHIVING POLICIES 2003 (n=7,135) 2004 (n=10,673) 100% 1793 90% 80% 3238 70% PERCENTAGE Percentage of green JOURNALS grew from 55% - 83% from 2003-2004 Percentage Green and Gray JOURNALS for years: 3253 60% 50% 1772 40% 2552 30% 20% 10% 136 3855 1209 0% JOURNA L SELF-A RCHIVING POLICIES: YEA RS 2003 VS. 2004 (no green light yet) preprint P os tprint P os tprint + preprint OAIster, a cross-archive search engine, now covers over 250 OAI Archives (about half of them Eprints.org Archives) indexing over 3 million items (but not all research papers, and not all full-texts). http://oaister.umdl.umich.edu/o/oaister/ Number of Papers in OAIster (80 Archives) 300000 243558 250000 200000 172129 152026 150000 106617 100000 77687 85029 56777 50000 39807 5701 6523 1990 1991 13247 21074 44921 28809 0 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Year …but there are 2.5 million journal articles published per year! Declaration of Institutional Commitment to implementing the Berlin Declaration on open-access provision Our institution hereby commits itself to adopting and implementing an official institutional policy of providing open access to our own peer-reviewed research output -- i.e., toll-free, full-text online access, for all would-be users webwide -- in accordance with the Budapest Open Access Initiative and the Berlin Declaration UNIFIED OPEN-ACCESS PROVISION POLICY: (OAJ) Researchers publish their research in an open-access journal if a suitable one exists otherwise (OAA) Researchers publish their research in a suitable toll-access journal and also self-archive it in their own research institution's open-access research archive. To sign: http://www.eprints.org/signup/sign.php A JISC survey (Swan & Brown 2004) "asked authors to say how they would feel if their employer or funding body required them to deposit copies of their published articles in one or more… repositories. The vast majority... said they would do so willingly.” http://www.jisc.ac.uk/uploaded_documents/JISCOAreport1.pdf Semantic Web Introduction Archiving: More than Articles • Metadata collection and distribution • Basis of OAI • But extra effort for researcher Semantic Web • W3C activity to improve Web resources – By providing metadata – Formal descriptions of resources – Based on strict standards • RDF - Resource Description Format • RDF(S) - Schema Language for defining types or resources and types of properties • OWL - Ontology language for more complex relationships Old Web Service • Web server sends a document to a user Modern Web Services type = info number=1 name price item ref invoice item number=2 name id = xyz price • Web server sends data to a program Semantic Web type = info number=1 item invoice number=2 item name price ref name id = xyz price • Semantic web provides resources to users and their semantics to computers Semantic Web Resource Description RDF: Metadata • Data about data – information about documents • title, author, journal, date, keywords – information about people • role, history, salary, expertise – information about exhibits • catalogue number, price, date, artist – information about metadata • validity, purpose, compiler, authority Content: Some hills, a lake and the sun Width Colour distribution Shapes Height Artist Ambient No. 3 Represents: peace tranquility Title Catalogue information. artist, title of the image or picture, date acquired, dimensions. Syntactic content. primitive features, e.g. colour, texture and shapes. Semantic content. what it’s supposed to represent, e.g. painting of a landscape or a representation of happiness. RDF Model http://www.w3c.org/Intro.html Author Tim Berners-Lee RDF Model http://www.w3c.org/Intro.html predicate Author subject Tim Berners-Lee object RDF Model http://www.w3c.org/Intro.html subject predicate author object email [email protected] name Tim Berners-Lee Semantic Web Examples • Example Projects – CSAKTive Space – Web Photos • Ontologies – Role of ontologies – How they dovetail in with OAI – Dspace / SIMILE – Bridging the semantic gap CS AKTive Space • Integrating info from QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. – Eprint archives – Home pages – Funding agencies Web Conference Photo • Attendees upload photos for public display • Can then be publicly annotated • List of known people collected – community Web Photo RDF Model • Ontologies used – Dublin Core – Friend-of-aFriend – Creative Commons Rights Management – Geographical Locations – Calendar Events Simile • DSpace / MIT / HP / W3C Semantic Web and Digital Library project • Many resources in many sites catalogued with different schemes for different purposes • Use ontologies to switch between domains and perform cross-domain searches Simile Scenario (Taken from Dspace User Group slides) • Started on ARTstor island – SUBJECT: Abstract Roamed around island SUBJECT: Abstract, CREATOR: Gorky Travelled over Gorky bridge to OCW island CREATOR: Gorky, IS PART OF: ... Found resource not on ARTstor island Travelled over Graham bridge To another part of ARTstor island Semantic Web raison d’etre • Bridging between resources • Through shared semantics of metadata • Made possible by ontologies Lessons for Open Access • Collect and organise metadata – and explain to authors the benefits of their investments • Researchers become responsible maintainers of their output – For sharing with their community – For sharing with posterity • Build value-added services that build on shared agreements about meaning Final Thoughts • Open access improves science • Network effect – more participants -> better services • Just do it! – But start with small steps