Transcript DRIVER-ODaF

The DRIVER initiative
for networking repositories
Wolfram Horstmann
Universität Bielefeld
DRIVER motivation (and reactions)
Scholarly communication changes towards
distributed provision of text, data and services
Repositories are thought as a saviour in this
development building such a distributed system
An infrastructure supporting distributed
repositories and services is needed
Question today
Is an overarching infrastructure bridging between
distributed text-data and primary/secondary data
possible?
DRIVER has adressed many problems and found
many answers in the domain of distributed text
repositories
But we don‘t know yet, whether or not these are
transferable to the data domain
Some observations on data
Data landscape very diverse
Formats differ widely – unlike text publications
Descriptions are often highly subject-specific
Some have special provenance (e.g. vendor software)
Some require special rendering, education, caution …
Data require disciplinary support
Better managed by researchers than service providers
Still, data interoperability acknowledged
Double effort: many data are lost to re-use/remix
„Good practice“ in research, also WRT publications
 Transparency, „Falsifiability“, testability …
Some observations on repositories
They represent a shift towards …
open internet-exposure
 as opposed to closed database (‚graveyards‘)
content orientation
 as opposed to mere technical orientation (‚web-servers‘)
distributed systems
 centralized structures not immediateley required nowadays
Repositories can solve access problem
„Everybody can be a publisher“
Common description standards


e.g. Dublin Core Metadata Initiative
Many subject-specific standards
Common transfer protocols

e.g. OAI-PMH, but also FTP, XML-RPC, WS, etc.
Searchability is possible!
Still: many data are lost to re-use/remix
Closed: too sensible, weakly described, unimportant (???)
Missing service frameworks / infrastructures
Problems: Data and service interoperability
Solution: „Infrastructure“
What infrastructures are: DRIVER terms
Not an infrastructure
Single repository
Single application for search and retrieval (e.g. BASE)


Only local operation
Backwards causation on repositories is missing
Maybe an infrastructure
Distributed repository landscape as a whole


As a capacity for emergent properties, e.g. quality and quantity
incentive for data population
Nurturing development of service providers
Definitely an infrastructure
Many service providers in one organisational and technical
context (e.g. run-time environment)
Enabling re-use and remix of data and services
DRIVER Objectives
Organisational structure for repositories
e.g. the „Confederation“
Improving quality and standards in local rep.
e.g. validation procedures
Building a distributed runtime system
e.g. service and data sharing
Target Groups
Repository Managers
Service Providers
Information System Executives
The DRIVER approach is incremental
Start with publication metadata
Existing distributed system, somehow connected
Considerable homogeneity and formats: OAI-PMH
Extend geographical coverage
From 5 countries, to 10, to 27, to ???
Extend towards other contents
From publication metadata to enhanced publications,
i.e. representations of „texts + data“
Learn about subject specificity
Data bring in disciplinary requirements
The DRIVER Initiative
DRIVER-I 6/2006 – 11/2007
Organisational Models and Technical Test-Bed
DRIVER-II 12/2007 – 11/2009
Running Organisation and Production Infrastructure
DRIVER-Confederation
2010ff
Operations Office and Technical Deployment
NB: DRIVER is not an authoritative body, it is a
liberal bottom-up initiative of stakeholders
10
DRIVER partners and related projects
Networking, Support, Policy, Studies
Göttingen, Nottingham, SURF, Genth, Ljubiljana,
Minho, Copenhagen
Technical development and deployment
Athens, Bielefeld, Pisa, Warsaw
Partners make links to many other things
OA-services: Sherpa-ROMEO, OpenDOAR, BASE…
Projects: Europeana, PEER, DELOS, DL.org,
D4Science, PARSE-Insight, NESTOR…
Orgs: DINI, JISC, LIBER, SPARC, KE …
Platforms: DSPACE/FEDORA/OPUS/ePrints
Some Results: Studies
Some Results: A Portal
Some Results: A Search
Some Results: Repository Registration
Some Results: Guidelines
Build on knowledge from past &
current IR projects (EU)
26 actively involved contributors
(experts and repository managers)
from 8 countries.
Practical answers on how to:
Improve full-text access
Standardize metadata quality
Create a reliable infrastructure for
permanent identification,
resolution, traceability and storage
Resolve semantic and
classification issues
Some Results: Support structures
Some Results: Repositories
185+
harvested
repositories
21
countries
856,264+
documents
Some Results: Service-Oriented-Arch.
9
hosting
nodes
25+
Functionality
typologies
(services)
36
service
Instances
3 applications: DRIVER Main, Belgium, Spain-Recolecta
Some Results: Runtime-System & Hosting
Advanced User Interfaces
National portals
Administrators
End users
Project Applications
20
Functionality Layer
EU Open Access
Repositories
Data Layer
Enabling Layer
Some Results: A software
Meant for large service providers only!
Current Work: DRIVER-II
Networking
Confederation with who-is-who advisory board
Outreach: LIBER, SPARC, US, JAPAN etc…
Consolidation
DRIVER-I
 Services packaged and performing in production quality
Enhancement
DRIVER-I Services
 Improved indexing and data aggregation functionalities
DRIVER-II Services
 Enhanced publication management and functionality
22
Outlook: Enhanced Publications
Outlook: Enhanced Publications
Based on
OAI-ORE
Lessons learnt
Distributed data infrastructure requires links
between organisational and technical concepts
Data specialists, computer scientists, service providers
Guidelines / content policies as a „glue“
In distributed data provision, quality and access
measures are the most ‚expensive‘ tasks
Distributed service operation (not data provision)
can be solved but asks novel questions (SLAs)
„Infrastructure“ is a very tough concept to get
across and eventually forms a complex system
Simplification makes it weaker, e.g. re-use is restricted
Summary
DRIVER tackles the data infrastructure challenge
from the text-repository side (mostly OAI-PMH)
DRIVER handshakes with primary & secondary
data through „enhanced publications“
DRIVER isn‘t only a project but a forum for
information specialists
‚Products‘ include: Studies, Infrastructure run-timesystem in production, software, support …
DRIVER has adressed many problems for data
and service interoperability and found solutions
What are the required steps to support data?
Thanks