www.sherpa.ac.uk

Download Report

Transcript www.sherpa.ac.uk

Moving Forward with the
OpenDOAR Directory
Peter Millington
SHERPA Technical Development Officer
University of Nottingham, England
Outline
• Brief introduction to OpenDOAR
– What it is. Project time line
• OAI-PMH harvesting exercise
– Modus operandi
– Results for re-use policies
– Technical issues & performance
• Conclusions & Recommendations
• Prototype ‘policy generator’ tool
• Questions & Feedback
What is OpenDOAR?
• Directory of Open Access Repositories
• Coverage
– Institutional & Subject-based repositories; Funders’ OA archives
– Not covering: OA journals – see DOAJ – http://www.doaj.org/
• Authoritative evaluated data
– More than auto-harvested OAI data
– Proactive - more than data supplied by repository administrators
– Periodic review for currency and functionality
• Target users
– Search service providers, OA stakeholders, end-users
– Active dialogue with providers, administrators, funders, etc
• http://www.opendoar.org/
OpenDOAR Project Time Line
• Started early 2005
– University of Nottingham & University of Lund
– Funded by: OSI, JISC, CURL & SPARCEurope
• First public version
– January 2006
– Data built on work by Tim Brody, Southampton, & others
– 380 repositories (04-May-2006)
• Developing Version 2
– Additional fields & views
– Due summer 2006
Harvesting Modus Operandi
• Aims
– Familiarisation with OAI-PMH
– Investigation of repositories’ policies
• OAI-PMH protocol
– 315 Repositories in OpenDOAR with an OAI Base URL
– verb=Identify – policies from eprints.xsd schema
– Timings recorded & technical glitches noted
• Microsoft Excel Macros
– Prompted for operator interventions
– Such events would hamper auto-harvesting
• PHP
– Firewall problems – needed to use HTTP proxy server
– PHP functions would not handle HTTPS
eprints.xsd Policy Criteria
• content
– Text and/or a URL linking to text describing the content of the
repository
– It would be appropriate to indicate the language(s) of the
metadata/data in the repository
• metadataPolicy
– Text and/or a URL linking to text describing policies relating to the
use of metadata harvested through the OAI interface
• dataPolicy
– Text and/or a URL linking to text describing policies relating to the
data held in the repository
– This may also describe policies regarding downloading data (fullcontent)
• submissionPolicy
– Text and/or a URL linking to text describing policies relating to the
submission of content to the repository (or other accession
mechanisms)
Metadata Policy Results
Harvesting Problems
9%
Commercial allowed
11%
Non-commercial
5%
Cogprints-derived
8%
Other Allowed
6%
Nothing given
40%
URL
3%
No Rights
1%
Undefined
17%
Metadata Policy Results
• No policy info for two thirds of repositories
– Technical problems with 9%
– No data provided for 40%
– ‘Undefined’ for 17% - EPrints default settings
• Policies given
– Nearly all permit re-use for non-commercial purposes
– A third seem to allow commercial re-use
• Many policies copied from other repositories
– e.g. CogPrints
• Issues for service providers
– Lack of easily accessible policy statements
– Prohibited re-sale of metadata – Why prohibited?
[Full] Data Policy Results
Harvesting Problems Fully Open
1%
9%
Non-profit
10%
Cogprints-derived
6%
No Robots
7%
Unclear
3%
URL
4%
Nothing Given
42%
No Rights
1%
Undefined
17%
Full Data Policy Results
• Also no policy info for two thirds of repositories
– Technical problems with 9%
– No data provided for 42%
– ‘Undefined’ for 17%
• Policies given
– Re-sale of full items nearly universally prohibited
– Unclear policy in ~7% of cases
– 7% prohibit harvesting by robots
• Prohibited harvesting by robots
– Total prohibition prevents full text indexing and analysis
– Transient harvesting should be permitted – e.g. CalTech
Content Policies
• Repository Type
– Institutional or departmental repository
– Multi-institution subject-based repository
• Subject Specialities
– Up to three, or ‘many’
• Type of Material
– e.g. Research papers, Theses, etc
• Publication Status
– Pre-prints (not peer-reviewed)
– Final peer-reviewed drafts (post-prints)
– Published versions
• Individual tagging with peer-review and publication status
• Principle Languages
– Up to three
Submission Policies
• Eligible Depositors
– Role and/or Organisation unit
– Or their delegated agents
• Deposition Rules
– Who can deposit what – usually own work only
– Mandatory deposition of metadata
•
Moderation (vetting)
– What, if anything, is vetted by the administrator
– e.g. eligibility, relevance, valid layout. Exclusion of spam
• Content Quality Control (Peer review)
– Responsibility for the validity and authenticity of the content
– Not checked, or checking by internal subject specialists.
• Copyright Policy
– Responsibility for copyright clearance
– Dealing with proven copyright violations
Interim Conclusions
• The eprints.xsd is not working
– Not used at all – or left ‘undefined’
– Muddled entries – e.g. items under wrong heading
• Why?
–
–
–
–
Lack of awareness of its existence
Unsupported by repository software package
Insufficient guidance – possible language issues
Some policies not covered – e.g. preservation
• But…
– Copying indicates a desire for model policies
– Plenty of good examples on which to base models
– Would be very useful to service providers, advocates, etc.
Recommendations
• For Repository Administrators
–
–
–
–
Ensure the eprints.xsd schema is in your OAI configuration
Put real policy info in the schema – not just ‘undefined’
Fix any technical issues
Avoid using HTTPS
• For OpenDOAR
– Encourage repository administrators to improve matters
– Provide model policies
– Provide a ‘policy generator’ tool for administrators
• Future Work
– Update eprints.xsd or replace with something new
– Re-analyse annually to monitor progress
OpenDOAR Policy Generator
• Aims
– Capturing policies using standard formulae
– Tool to help administrators formulate their policies
• Analysis of policies
– Identification of recurring phrases and concepts
– Natural language cluster analysis
• Selection of statements & options
– Appropriate to the policy type
– And meaningful
• OpenDOAR policy recommendations
– Minimum options – achieving OA goals but restricted
– Optimum options – refinements for more use or better quality
Proposed Minimum Metadata Policy
• Anyone may access the metadata free of charge.
• The metadata may be re-used in any medium
– without prior permission for not-for-profit purposes
– provided the OAI Identifier and/or a link to the original
metadata record are given.
• The metadata must not be re-used in any medium
– for commercial purposes without formal permission.
Proposed Minimum Full Data Policy
• Anyone may access full items free of charge.
• Single copies of full items can be:
– Reproduced & displayed or performed in any format or
medium
– for personal research or study, educational, or not-for-profit
purposes
– without prior permission or charge.
• Full items must not be harvested by robots
– except transiently for full-text indexing or citation analysis
• Full items must not be sold commercially
– in any format or medium
– without formal permission of the copyright holders.
Proposed Minimum Submission Policy
• Items may only be deposited by accredited members
of the organisation, or their delegated agents.
• Authors/Depositors may archive only their own work.
• The administrator only vets items for the exclusion of
spam
• The validity and authenticity of the content of
submissions is the sole responsibility of the depositor.
• Any copyright violations are entirely the responsibility
of the authors/depositors.
• If the repository receives proof of copyright violation,
the relevant item will be removed immediately.
Optimum Policy Ideas
• Metadata Policy
– Allow re-sale of metadata
– Increased visibility outweighs ‘exploitation’
• Full Data Policy
– Allow multiple copying – for educational purposes
– Allow full harvesting – LOCKSS-like preservation
• Submission Policy
– Mandatory deposition of metadata
– Mandatory deposition of thesis full texts
What Next?
• Consultation
– SHERPA partners
– Other interested parties
• Policy generator
– End-user testing – volunteers needed
– Ideas for output – e.g. text for EPrints configuration
• Refining recommended policies
– Ideas for minimum and optimum options
– Feedback on our proposals
• Aiming for release summer 2006
Any Questions or Feedback?
http://www.opendoar.org/
Contact
Peter Millington
[email protected]
OpenDOAR Organisation
• The OpenDOAR Team
– University of Nottingham, England
• Bill Hubbard, Gareth Johnson, Peter Millington
– University of Lund, Sweden
• Lars Bjørnshauge, Kristoffer Lundqvist, Salam Baker Shanawa
• Our Funders
–
–
–
–
Open Society Institute (OSI)
Joint Information Systems Committee (JISC)
Consortium of Research Libraries (CURL)
SPARCEurope