Building an Institutional Research Repository from the
Download
Report
Transcript Building an Institutional Research Repository from the
Building an Institutional Research Repository from the
Ground Up:
Status Snapshot as of
The ARROW
September 2004
Experience
(pre-Bandicoot)
Dr Andrew Treloar
Project Manager, Strategic
Information Initiatives &
ARROW Technical Architect
Vacant Lot
Context – Global
Increasing focus on content as institutional asset
Increasing proportion of this content is now borndigital or re-born digital
Wide uptake of software such as Dspace and
eprints.org
Open Access scholarship movement gathering
strength worldwide
Recent UK House of Commons STC report calling
for establishment of institutional research
repositories and mandated deposit
Context – Australian
Higher Education Information Infrastructure Advisory
Committee (HEIIAC) report in Nov 2002 identified need for
Research Information Infrastructure
DEST arranged Digital Object Repository Management meeting
in Sydney in May 2003
DEST called for RII bids in June 2003
Four successful:
Australian Digital Theses (ADT)
Australian Partnership for Sustainable Repositories
(APSR)
Meta Access Management System (MAMS)
Australian Research Repositories Online to the World
(ARROW)
Design Brief
Requirements – Content Streams
E-Prints
Pre-prints, postprints, working papers, etc
Digital theses
Masters and Ph. D.
Electronic Publishing
Open-access ejournals
DEST Returns
Actually, database behind the returns
Non-University Research
‘Scholar in the Garden Shed’
Requirements – Content Types
Based on Dspace philosophy:
Lots of digital material is already lost
Most digital material is at risk
Preserving bits is better than nothing
It is important to capture as much information as possible
It will be necessary to evaluate cost/benefit trade-offs over
time
Decided to divide content into three types:
Supported
Known
Unsupported
Long list of actual types in referenced paper (URL at end)
Architectural Drawings
Architecture Considerations
Common Repository
because boundaries between Research and
Teaching/Learning are very fluid
Series of Content Workflow and Management layers
to handle ingest/management of content
Exposure of content in variety of ways
to maximise access
ARROW OLAD
Building Materials - Foundation
Repository
Repository decision determines a number of other
aspects of project
Functionality
Type of application development
Lots of options available (refer
http://www.soros.org/openaccess/software/)
Version 3 of this report due out soon
Careful examination of alternatives narrowed quickly
to focus on DSpace & FEDORA
Repository – Dspace
Joint activity between MIT Libraries and Hewlett-Packard to
develop a software system to enables institutions to:
Capture and describe digital works using customized
workflow processes
Provide access to an institution's digital works so users
can search and retrieve items in the collection
Preserve digital works over the long term
Being made available under the BSD open source license to
other groups to run as-is, or to modify and extend as needed.
Can best be thought of as a general-purpose repository
application, with a series of both hard-wired and preferred
behaviours
Designed to provide stable long-term storage needed to house
the digital products of MIT faculty and researchers
Repository – FEDORA
Not the RedHat FEDORA...
Flexible Extensible Digital Object and Repository
Architecture
Joint venture between UVA Library and Cornell CS
Both a software platform and an architecture
Open source, digital object repository system using
public APIs exposed as web services
Best thought of as services-mediation infrastructure,
rather than an off-the-shelf application
Underlying object-based model
Repository – Decision
After lots of due diligence, decided to go with
FEDORA:
better/cleaner underlying architecture (flexible
not hierarchical)
easier to build on top of (APIs exposed as web
services)
designed from ground up as services provider
and mediator (not packaged application)
powerful idea of objects and disseminators
(content behaviours)
Construction Strategy:
Sub-Contract or DIY?
Original bid assumed that project would hire and manage
development team
ARROW Project Manager (Geoff Payne) realised we could do
much better by sub-contracting development work to a
company already familiar with FEDORA:
outsource risk
save time by avoiding initial learning curve
partner in way that met ARROW and company needs
increase attractiveness of FEDORA
build a sustainable support and enhancement model
VTLS the Builder
ARROW entered into contract with VTLS (Blacksburg, VA) to
acquire VITAL 1.0 (and successor versions)
extend the functionality of FEDORA either by contributing
back to the core FEDORA code or by writing a series of
ARROW-commissioned modules
ARROW-commissioned modules to be open-sourced using the
same license as the FEDORA code
VTLS will be able to build products on top of these new
ARROW-commissioned modules, but so will anyone else
Open-Access Publishing
VTLS won’t be writing all the modules
Need module to provide simple OA ejournal
publishing
Have decided to use the Open Journal System
(http://www.pkp.ubc.ca/ojs/ from the Public
Knowledge Project at UBC
Provides high-level of devolved functionality
Still deciding how best to integrate this with rest of
ARROW
Building Materials - Frame
Application Framework
ARROW-commissioned modules will
call FEDORA API-A (Access) and API-M (Management)
web services
expose themselves as Web Services
Possible that combination of ARROW-modules and FEDORA
will lead to refactoring of existing APIs into:
API-A (Access)
API-S (Search)
API-M (Management)
API-W (Workflow)
FEDORA Development Consortium
Announced at same time as ARROW-VTLS deal
Joint activity of FEDORA, VTLS, ARROW, and
others
partners selected on ability to contribute and
resources to make it happen
Rest of 2004 will be spent working out how this
might function
Work towards API-W will be used as process testbed
Building Materials - Doors and Windows
Search and Exposure
Exposure of metadata for OAI-PMH harvesting
Open Archives Initiative - Protocol for Metadata Harvesting
Each repository will be an OAI Data Provider
Support for direct searching via SRU/SRW
Simpler version of Z39.50
Exposure of full text (including derived full text) for spidering by
Google and other search engines)
Local search gateways at each ARROW site
http://search.arrow.monash.edu.au/
National Resource Discovery Service offered by NLA
http://search.arrow.edu.au/
NLA acting as OAI Service Provider (as well as Data
Provider with their non-uni research repository)
Possible RSS feeds later
ARROW Branded Services Profile
Internet
National
Library of
Australia
National
Library of
Australia
ARROW
Resource
Discovery
Service
Swinburne
UNSW
Monash
ARROW
Repository
Digital
Object
Storage
using
Fedora &
VITAL
Using
TeraText to
index
metadata
harvested by
OAI PMH
ARROW Open
Access
Journal
Publishing
System
ARROW
Web Site
Using OJS
from Public
Knowledge
Project
Members
only area
Meeting
Minutes etc
Internet
Search
Engines
Project
Information
Capture text
exposed by
ARROW
Repositories
Building Site
State of Development
Funding commenced in February
A$ 3.66*106 over 3 years
Project Manager appointed in February
Contract with VTLS signed in June
FEDORA Phase 2 funding secured in June
US$ 1.4*106 over 3 years
Anticipated delivery of ARROW Phase 1 (Bandicoot)
functionality in September
Anticipated delivery of ARROW Phase 2 (Bilby) functionality in
February 2005
Phased
Deliverables
DEST Metadata
Collections
Copyright support
Object validation
Search engine support
Still Images
PDF
RTF
XHTML
SRU/SRW
Web-based XML Editor
SMIL
Audio
Video
DEST Reporting
Multiple Object Viewing and
Editing
Open House?
What we’ve learned already
All IT projects involve People, Processes and Technology. In
addition, this one has a heavy focus on Content.
These proportions are going to change over time
Component
2004
2005
2006
People
5%
20%
35%
Processes
10%
20%
10%
Technology
75%
20%
5%
Content
10%
40%
50%
ARROW Availability
ARROW partners (NLA, Monash, UNSW,
Swinburne) will be testing and refining beta software
this year and early next year
Hope to be able to offer ARROW more broadly
around mid-2005
http://arrow.edu.au/ will be regularly updated with
news and more information
Questions?
[email protected]
Project Manager
[email protected]
Technical Architect
http://arrow.edu.au/
Project web site
http://andrew.treloar.net/research/publications/ausweb04/
Link to updated version of AusWeb04 paper
about development of ARROW architecture