FLOW: Federating Libraries on the Web

Download Report

Transcript FLOW: Federating Libraries on the Web

FLOW: Federating
Libraries on the Web
ACM/IEEE Joint Conference on Digital
Libraries: Portland, July 17, 2002
Anna Keller Gold (UC San Diego Libraries); Karen Baker (Scripps Institution of
Oceanography, LTER); Kim Baldridge (San Diego Supercomputer Center); JeanYves LeMeur (European Center for Nuclear Research, CERN)
Outline:
1.
2.
3.
In theory: defining repository success and
developing system requirements to match
In practice: field report and local
observations
Next steps: developing for the future
JCDL, July 17 2002
1. In theory:


The individual, team and network have document
management needs in common
Building successful research repositories entails
active participation by relevant research
communities in the full range of repository activities
(“GSD”):
 Gather
 Share
 Discover
JCDL, July 17 2002

Repository success depends on good match between
technical and social design.


E.g., institutional vs. disciplinary repositories
Good social design remains unsolved research
problem. See call for participants in October 2002
conference addressing the cultural and management
aspects of repository building, and emphasize
institutionally-based repositories:
http://www.arl.org/ir2002.html
JCDL, July 17 2002
FLOW hypothesis:
Repository success depends on addressing
divergent roles of repository participants and
multiple levels of organization, including:
1)
2)
Divergence among ingrained, more-or-less wellfunctioning workflows and practices
Multiple (and differing) motivations for participation
by individuals, groups, networks, institutions,
disciplines
JCDL, July 17 2002
Practices and motivations, e.g. of:




Individuals
Research groups
Institutions
Disciplines
JCDL, July 17 2002
Practices as individuals:





Notebooks, articles, office files
Mail, email, in-person: circulate preprints by
mail, email
Personal web pages (multi-format links)
Personal databases (e.g. flat files, citation
managers: can extract from, download and import
to)
Deposit to/extract from disciplinary repositories
(e.g. arXiv)
JCDL, July 17 2002
Motivations as individuals:




Tenure (maintain lists of peer-reviewed
publications; track citation counts)
Manage knowledge for easy retrieval and
discovery
Exchange with key colleagues
Participate in building shared knowledge
JCDL, July 17 2002
Practices of research groups:


Internal databases (shared)
Web sites with lists
JCDL, July 17 2002
Motivations of research groups:




Manage knowledge
Track output (for funding agencies)
Track impact (greater exposure leads to
greater impact)
Discovery
JCDL, July 17 2002
Practices of institutions, orgs:





Publish (e.g. tech reports, conf.
proceedings, journals)
Create internal databases
Establish repositories
Establish libraries
Hybrid library/repositories
JCDL, July 17 2002
Motivations of institutions:



Sharing, discovery, and reputation
Management and reporting (including
accountability to funding agencies)
Archiving
JCDL, July 17 2002
Practices of disciplines:


Professional society databases, portals
Establish disciplinary repositories (may be
distributed & federated or centralized, e.g.
NCSTRL, arXiv)
JCDL, July 17 2002
Motivations of disciplines:


Sharing
Discovery
JCDL, July 17 2002
FLOW:

The distinctive document management tools
and practices used within each layer
(individuals, group, center, network,
discipline) represent boundaries across which
information could flow openly if technology
and metadata could provide an enabling
digital framework (“metadata grid”)
JCDL, July 17 2002
2. Practice:


Field report of progress in creating a
prototype repository at the San Diego
Supercomputer Center using CERN’s
CDSware
Goal is to prototype a system that reconciles
the divergent practices and motivations of
target repository participants
JCDL, July 17 2002
CDSware: reasons for selection






Proven institutional implementation at CERN
Extended features fully implemented
(personalization, review)
OAI compliant
Supports hybrid repository / bibliography
Technical support and active development
Open source
JCDL, July 17 2002
CDSware:

CERN implementation of CDSware manages
over 350 collections of data, consisting of
over 550,000 bibliographic records, including
220,000 full-text documents: preprints,
articles, books, journals, photographs…
http://cdsware.cern.ch/
JCDL, July 17 2002
CDSware:

Configurable portal-like interface for hosting various
kind of collections:





Powerful search engine with Google-like syntax.
User personalization, including document baskets and
email notification alerts.
Electronic submission and upload of various types of
documents.
Runs an OAI data and service provider enabling the
metadata exchange between heterogeneous repositories.
Automated citation recognition and linking
JCDL, July 17 2002
CDSware:





MySQL database server (adaptable to Oracle)
Apache/{PHP,Python} web application server
Compile-time configuration via GNU Autoconf and
WML
Runtime configuration via MySQL configuration
tables
Integrates with other platform independent services


E.g. CDS Conversion Server – converts file formats
Extensible: enables the integration of any other
installation-specific application.
JCDL, July 17 2002
CDSware status:




CDSware is major revision and repackaging
of CDS (CERN Document Server)
First public release planned for July 2002
Announce & users mailing lists released June
2002
News:

http://cdsware.cern.ch/news/
JCDL, July 17 2002
Why another repository?

Repositories and their design diverge in
important ways:






How things get in
How things get out
Who can put things in (and take out)
What things can be put in
What linkages they have to other systems
What protocols/standards they follow
JCDL, July 17 2002
Comparing repository tools
Parameter:
openEprints
CDSware
Reference Web Poster
Library catalogs
1. how things get in
Deposit by registered /
authorized
people
*Deposit by registered /
authorized
people
* Upload from structured
file
Upload by administrator
from one or more
private citation
libraries
*additions to citation
library can be
batch-extracted
from commercial
sources
*additions may also be
individual entries
by private library
manager
*FTP of batch files
consisting of
individual entries
or single record
copies from
bibliographic
utilities
2. how things get out
*OAI metadata harvesting
protocol
*OAI metadata harvesting
protocol
*marked records can be
downloaded
singly or
collectively
*personal “baskets” can
be made, shared
*record output in XML,
HTML, MARC,
DC record
formats
*CERN applications
support file
format
conversions
*Marked records may be
downloaded to
citation
management
software
(Z39.80)
*Marked records may be
extracted in
printable or
downloadable
formats, e.g. to
citation
management
software
JCDL, July 17 2002
Comparing repository tools
Parameter:
openEprints
CDSware
Reference Web
Poster
Library catalogs
3. who can put things in
Configurable: registered
or authorized people; may
include researcher direct
deposits, or be configured
to “flow” deposits through
administrators
Configurable: may be
registered or authorized
people: researchers in or
outside the institution;
may be linked to
institutional ID
Administrator with access
to server and commercial
software
Specially trained and / or
authorized staff, usually in
libraries, using locally
configured catalog
software
4. what things can be put
in
* Focus on preprints,
working papers (full text)
* Other uses: conference
proceedings (CalTech);
other monographs
*Configurable; current
support for documents
with metadata or
metadata alone.
*Articles and conference
proceedings are focus.
*Monographic works and
entire journals are
primary focus
*CERN configured for
preprints, commercial
articles, books, photos,
presentations, etc.
*Developing “people”
records
JCDL, July 17 2002
Comparing repository tools
Parameter:
openEprints
CDSware
Reference Web
Poster
Library catalogs
5. what linkages to other
systems
OAI supports crossrepository searching
*OAI supports crossrepository searching
*Linkages created to
local applications and
databases, e.g.
personnel database
*Upload from citation
management software
OK
*primarily to commercial
article databases for
which citation
management download
filters have been written
*via Z39.50, federated
search of other library
catalogs; extraction and
deposit to parallel
collective catalogs
(OCLC, union catalogs)
6. what protocols,
standards followed
*DC
*OAI-PMH
*crosswalks from other
metadata formats
*OAI-PM
*DC
*MARC21
*Z39.80 (in dev.)
*Z39.80
*MARC
*MARC
*Z39.50
*Z39.80
JCDL, July 17 2002
CDSware at SDSC:

How things get in:



One-by-one item deposits
Batch uploading from local collections
Goal: to also populate the collection via
intelligent spidering of designated open
collections/documents (ResearchIndex does this
now)
JCDL, July 17 2002
CDSware @ SDSC:

How things get out:





Extract to bibliographic software
Extract as XML
Extract as MARC 21 records
Extract as DC
Batch or single item extraction
JCDL, July 17 2002
CDSware at SDSC:

Who can put things in (or take out)



Organization affiliates (tracked by personnel
database)
Registered affiliates (voluntary deposits),
associated by research collaboration, or just
research interest
Any interested parties (extract only)
JCDL, July 17 2002
CDSware at SDSC:

What things can be put in:





Digital objects plus metadata
Metadata only
Document-like objects
Event records
People records (and associations with
organizations and research groups)
JCDL, July 17 2002
CDSware @ SDSC

What (data) linkages with other systems?


Now: personnel database at SDSC
Future:



NSF grants database
Open URL
Storage Resource Broker (SRB)
JCDL, July 17 2002
CDSware at SDSC:

What protocols / standards followed:




OAI-Protocol for Metadata Harvesting
MARC 21
Z39.80 (article databases, bibliographic software)
DC
JCDL, July 17 2002
Design decisions:

People and digital objects:






Q: Are “creators” authors or people? A: Both.
Integration with personnel database (also enables organization
views – “all the people associated with XYZ research group”)
Incorporate records for non-document objects (groups,
people, grants)
Allow hybrid system of metadata with or without
associated digital objects
End-user uploading from EndNote or similar commercial
citation management software a goal
Genre-based views for public; organization views for
center
JCDL, July 17 2002
Accomplishments:





Formed interdisciplinary team
Assessed available repository software and design
choices
Demonstrated upload from test citation management
file
Integrated repository database with internal “people”
table linking people with organizations
Grounded in both local practices and management
demands
JCDL, July 17 2002
Next steps:



Complete demonstration of submit and upload
functions from citation management software
and grants database
Populate database using both individual and
batch submissions
Demonstrate internal views of data for
program administrators
JCDL, July 17 2002
Conclusion:



Further work needed to address integration of
repository building with researcher workflow.
Further assess centrality of people and
organizations in digital libraries / repositories.
Further assess prospect of creating a metadata
grid in which participation and flow is
multilateral and multidirectional.
In short – continued work toward…
JCDL, July 17 2002
D-Repository Grail:


Accommodate current practices at all levels
and
Enhance participation at all stages of research
/ learning process.
JCDL, July 17 2002
Acknowledgements:
Programming support:
Frank Sudholt and Josh Polterock (SDSC)
Integrative Biosciences at SDSC
NSF (DBI and OPP)
JCDL, July 17 2002
References and more information

CDSware:


http://cdsware.cern.ch/
CDS at SDSC:

[email protected]
JCDL, July 17 2002