Emerging Information Technologies: The Impact on Academic

Download Report

Transcript Emerging Information Technologies: The Impact on Academic

Emerging Information Technologies:
The Role of XML, DOIs, OpenURL,
and Federated Search
William H. Mischo
[email protected]
Grainger Engineering Library Information Center
University of Illinois at Urbana-Champaign
2002 International Conference on Digital Archive
Technologies (ICDAT2002)
December 19, 2002
Outline
• Digital Libraries and the Distributed Information
Environment.
• Document Representation and Full-Text
• Digital Library Tools
• Illinois Projects.
• XML Technologies.
• Metadata Technologies.
• DOIs, Linking, Local Resolver
• Portals, Simultaneous Search, Linking
• Grainger Search Aid
• Issues & Trends.
The Digital Library
• ‘Digital’, ‘Virtual’, ‘Electronic’ Library as
network-based library without regard to
place and time.
• Tendency to apply term to collections and
resources.
• Digital Collections vs. Digital Library.
• Emphasis on the integration of collections
and services (e.g. NSDL grant).
• Application of standards and protocols is
important.
Scholarly Communication Overview
•
•
•
•
•
•
•
E-Resources are Web-based and publisher-centric.
Growth of Heterogeneous Distributed Repositories.
Value-added services and ‘branding’ of journals.
Prestige of Journals and Publishers
Reciprocal linking relationships between publishers.
Cooperation on linking standards (DOI, CrossRef).
Alternative publishing models - Academia, Preprint
Servers, disintermediation.
Distributed Information Environment
• We live in a world of multiple, heterogeneous
information repositories, resources, portals, and IR
systems.
– OPACs – local, regional, national shared bibliographic
databases.
– Local and remote A & I Services.
– Discrete publisher and vendor repositories (full-text).
– Web search engines, vertical portals, custom portals
(NSDL, ARL Portal).
– Local metadata, digital objects, GIS, finding aids.
– Preprint servers and institutional repositories (D-Space).
– Instructional (course) management systems (WebCT,
Blackboard).
– Harvestable (OAI) sites and services.
Distributed Repository - Issues
• Integration of discrete, heterogeneous information
resources.
• Role of federated and broadcast searching of distributed
resources.
• Integration of collections with reference, instructional
and navigation services -TOC, remote reference
assistance.
• Integration of Library, institutional, vendor, publisher,
and government portals and information services.
• Linking technologies.
• Metadata harvesting, archiving.
Distributed Environment Action Plan
• Pressing need for document representation,
retrieval, transmission, and linking
middleware tools and standards.
• Metadata standards, DOIs, OpenURL.
• Factor: changing landscape of Scholarly
Communication and disintermediation of
publishers and libraries.
• Federated search and simultaneous search
with reference linking as mechanism to
integrate DL landscape.
Portal Functions:
Linking:
--Between full-text using DOI,
CrossRef, Appropriate Copy.
Web Client
--Between A&I and full-text.
--Between OPAC and full-text.
Portal Presentation Level
Local Link Server,
Local Value-Added
OPAC
A& I Services
(Local and Remote)
E-Resource
Registry
Aggregator
(Ebsco, OCLC)
Full-Text
Resources
Publisher
Portal
(Elsevier)
--Authorization
--Linking mechanisms between
resources and among resources.
--Simultaneous search.
--Navigation
CrossRef
Metadata
DOI
Server
Web
Local Databases
Resources &
and OAI
Knowledge
Resources via DBMS Environments
Document Representation
• Continuum of Web-Enabled technologies -all presently being utilized.
• Evolving technologies and standards.
• Role and history of markup.
• XML: its role and importance.
• The Smart Document.
Digital Library Tools
• We have at our disposal the tools to create integrated
digital libraries from the distributed digital resources
environment in which we operate:
– Standard retrieval environment (Web) and interface/client
(Web Browser);
– Standard transport mechanisms to connect heterogeneous
content (HTTP, OAI, SOAP);
– Standard metalanguages and tools for describing and
transforming content and metadata (XML, DTDs & Schemas,
XSLT, DC/DCQ, RDF, METS);
– Standardized search/retrieval mechanisms (HTTP Post/Get,
SQL, Z39.50, Object Oriented Databases);
– Standard linking tools and infrastructure (DOI, OpenURL,
CrossRef).
• Candidate set of ‘best practices’ for IR.
Work by Illinois DLI Group
• We are attempting to address many of these issues within
the Digital Library Initiatives group.
• Headquartered at Grainger Engineering Library
Information Center at UIUC.
• Grant Work:
– Digital Library Initiative I (NSF, others), 1994-1998.
– Corporation for National Research Initiatives (CNRI) D-Lib
Test Suite, 1998-2001.
– Collaborating Partners Program, 1998--.
– Andrew Mellon Foundation OAI Harvesting grant, 2001-2002.
– NSF NSDL (National Science, Engineering, Technology, and
Mathematics Digital Library) Program, 2002-2004.
– Institute of Museum and Library Services (IMLS) Registry and
Integration grant, 2002-2005.
Illinois Testbed Project
• Funded under DLI-I by NSF, DARPA, and
NASA, 1994--1998. Awards made to 6
universities.
• Large-scale Testbed, Distributed Repository
models, evaluation, Web software.
• Funded under CNRI D-Lib Test Suite
Program, 1998—2001.
• Collaborating Partners Program. AIP, APS,
ASCE, IEE, NRL, ASM, ACM, NTT
Learning Systems, Elsevier.
• All XML Journal -- AIP, APS, ACM.
Illinois Full-Text Testbed
• American Institute of Physics--APL, JAP, RSI
– 19,000+ articles, 1995--.
• American Physical Society--PRL
– 15,000+ articles, 1995--, weekly updates.
• ASCE Journals (25 titles)
– 11,000+ articles, 1995--.
• IEE Proceedings and Electronics Letters
– 9,500+ articles, 1993--.
• IEEE Computer Society.
• ASM (American Society for Materials) Handbook.
• ACM (Association for Computing Machinery)
Transactions.
• Elsevier Science.
Accomplishments
• Process & retrieve from multiple publishers &
heterogeneous DTDs.
• SGML to XML Conversion.
• Development of a metadata specification that
uses RDF, Dublin Core (DCQ and XML) XML
Schemas, local Namespace.
• Cross-repository searching (Testbed & D-LIB
Test Suite). Full-Text and Metadata.
• XSLT, CSS, for transformation & rendering,
including Mathematics.
Accomplishments (2)
• Introduction of numerous technologies now deployed
within publisher repositories:
– Forward and Backward links in bibliographies -- within
Testbed/Repository, from/to A & I Services.
– Use of XSLT for transforming XML to HTML.
– Rich extended abstracts.
• Conversion of ISO 12083 math markup to MathML.
CSS/DHTML mathematics rendering. Use of plug-ins.
• Enhanced Web retrieval mechanisms: Author Word
Wheels, Co-Occurrence Matrices.
• Local Link Server for DOIs, Context-Sensitive linking.
•
•
•
•
•
•
•
•
XML (eXtensible Markup
Language)
Like SGML, a Data Description Metalanguage.
XML a subset/version of SGML.
Document representation and interchange Standard.
Allows fine-granularity markup of content and structure.
Author can create their own elements (extensible).
Tags define the structure of document not the presentation
format.
Validated vs. “well-formed” - separation of authoring
process from representation & presentation.
Either validated in DTD/Schema or well-formed.
Integrated with relational DBs.
XML Features
• The milestones in document description and
transmission: ASCII, TCP/IP, HTTP and
HTML, XML. Web Programmability.
• DTD not required with XML. Needed if
internal entities.
• Use of Document Object Model (DOM).
• Technology approach from Web developer’s
standpoint: XML data, CSS presentation
layer, XSLT to transform the structure
(‘view’) of the data/document.
XML in Information Technologies
• Used in Open Archives Initiative (OAI),
NSDL.
• Compatible with MS SQL Server, Tamino
(Software AG), Oracle, DLXS/XPAT
(University of Michigan/OpenText), others.
• Integral to Web Services (WSDL) and
SOAP – Google Web Service.
• Used in Library of Congress MODS and
METS metadata technologies.
• Baked into XyVision and publishing
packages.
XML, XSLT, and CSS
• Use XML full-text articles as ordered hierarchy
of content objects.
• Generate item-level metadata in XML, using
RDF and Dublin Core syntax and semantics.
• XSLT and CSS used to present metadata and
articles in either XML or HTML format
depending on Browser.
• Mathematics rendering using MathML tools
(conversion from ISO 12083 to MathML).
• Real-time transformation between XML and
HTML using XSLT.
Schemas vs. DTDs
• Both are systems of representing a data model
that defines the data’s elements and attributes,
and the relationship among elements.
• Schema addresses limitations of DTDs and the
increasingly data-oriented role of XML.
• W3C XML Schema Working Group: two
documents: XML structures and datatypes.
Schema Justification
• Description of document type’s structure should
be in an XML document instead of written in
special syntax (DTD).
• Schema are in XML: easier to edit and process
using standard XML DOM manipulation tools.
• DTD notation doesn’t allow schema designers the
power to impose strong data typing -- for
example, the ability to say that a certain element
type must always have a positive integer value,
that it may not be empty, or that it must be one of
a list of possible choices.
Metadata and Linking Standards
• Digital Object Identifier (DOI) and Persistent
Object Identifiers.
• OpenURL and Value-Added Service
Components (SFX).
• Open Archives Initiative (OAI), Dublin Core
and Qualifiers, RDF.
• Local Resolver Servers.
Open Archives Initiative (OAI)
• Released version 1.0 of metadata harvesting
protocols. Frozen through second quarter 2001.
• Mechanism for data providers to expose their
metadata through an HTTP protocol and a
mechanism for harvesting records containing
metadata from repositories.
• Roots in e-print archives.
• Lightweight, low-barrier. Easy to implement Web
server to handle OAI protocol requests; need to
develop procedures to access and extract your
metadata.
Ongoing Investigations
• Relationship between interoperability models for
search and discovery: federated searching (OAI
harvested) and broadcast, simultaneous searching of
distributed repositories. Not mutually exclusive.
• OAI Provider and Harvesting software. Encoding
Archival Description (EAD). OAI
Engineering/CS/Physics site.
• Role of HTTP harvesting, Spider technology.
• Reference Linking integration built on OpenURL and
DOI.
• Reference Assistant software with simultaneous search,
point-of-contact assistance, and remote reference
capability.
Portals and Gateways
• Role is to bring together and integrate
disparate e-resources.
• Provide a systematic ‘view’ of the
information landscape, particularly full-text.
• Two primary foci: robust search/navigation
and the ability to link everywhere from
anywhere in the environment of OPACs, A &
I Services, full-text.
• Central to this implementation is federated
and simultaneous search and reference linking
technologies.
Digital Object Identifier (DOI)
• DOI is both a unique identifier of a piece of
digital content AND a system to access that
content digitally. Persistent object identifier.
• ‘The ISBN for the 21st Century’ -- Norman
Paskin.
• DOI system has two main parts: (the identifier
and a directory system) and a third logical
component, a database.
• Developed by AAP (Association of American
Publishers), now managed by International DOI
Foundation.
DOI Construction
• First real open standard for content identification.
• DOI is a number that identifies a digital object:
– 10.1063/S000369519903216
• 10
Registration Agency Prefix
• 1063
Publisher Prefix
• S000369519903216 Suffix (Publisher-assigned ID)
• Suffix can be SICI or PII.
• The DOI and URL pointing to the digital object,
is registered with the International DOI
Foundation, e.g:
– 10.1063/333 | http://www.pubsite.org/apr99/artl1.pdf
Using a DOI
• DOIs are resolved using the Handle System
technology from CNRI (Corporation for
National research Initiatives).
• Retrieval of object is two step process: link is
sent to central directory where current Web
address is stored, location is sent back to
browser with special message to redirect to
address, e.g:
– dx.doi.org/10.1063/333 redirects to
www.pubsite.org/apr99/artl1.pdf
Reference Linking
• CrossRef Publisher system: major Sci-Tech
professional societies and commercial
publishers.
• System design calls for one URL for each
DOI; underlying technology can handle
multiple URLs however.
• Issue: Directing users to locally held or
licensed version of Digital Object (locally
loaded or from Aggregator). Appropriate
Copy problem.
Cookie on
OpenURL
Client
client
(Web Browser) dx.doi.org/10.1063/1234
DOI Proxy
Nosfx=y
AIP
Handle
Server
IEE
Aware
Elsevier
Local
AIP, IEE
OpenURL
CrossRef
Metadata
Database
DOI
Illinois Local
Link Server
Metadata
UIUC Metadata
Registry
Local
Value
Added
Simultaneous Search Implementations
•
•
•
•
•
•
•
•
•
•
•
DialIndex from Dialog.
Ex Libris MetaLib service.
Endeavor EnCompass.
Innovative Interfaces MetaFind.
Ovid Multiple Search and reference De-Duping.
ISI Web of Knowledge.
Gale Corporation InfoTrac Total Access.
WebFeat.
California Digital Library SearchLight system.
Los Alamos FlashPoint system.
Fretwell-Downing partnering with ARL Portal and
Monash University.
Grainger Search Aid
• Assist users in the selection of appropriate
databases .
• Normalize user search arguments and display
search results from candidate databases.
• Cross-database asynchronous concurrent
searching.
• Article level and e-journal Web site access to
publisher full-text repositories.
• Utilize OpenURL, CrossRef metadata database
and DOI for reference linking at the article level.
• Proxying of vendor systems and capability of
‘taking over’ the search in vendor native mode.
Grainger Search Aid
Reference Assistant Project
• Utilize Search Aid simultaneous search and
link capabilities.
• Opportunity to explore interface and
navigation issues.
• Mimics the behavior of reference librarian.
• Allows the application of ‘best match’ and
‘quorum searching’ algorithms.
Reference Assistant Top Menu
Simultaneous Search
Implementations
• Shared Blackboard approach employing
Independent Searchbots dedicated to
searching information resources and passing
results to Web clients.
• Event-Driven, Asynchronous HTTP Queries
from within a Single Script returning results
to Web browser.
Event-Driven, Asynchronous
Queries
• Single, event-driven web server process,
asynchronously querying multiple resources.
• Uses WinHTTP from ASP and VBScript
• Simpler, not as flexible. Search algorithms and
processing coded in scripts.
• This is the approach we currently use for our
service.
• Implementation of multi-step login and session
variable passthru being investigated.
OpenURL-Based Services
• Standard for expressing and transmitting
metadata.
• Promise of standardized, normalized search
results.
• Provides value-added links to the Ovid
search results.
• Using CrossRef metadata database to look
up DOIs.
CiteParse.dll
• An ActiveX DLL which can parse various Ovid
citations and turn them into OpenURLs:
• Tansu N. Chang YL. Takeuchi T. Bour DP. Corzine SW. Tan
MRT. Mawst LJ. Temperature analysis … quantum-well
lasers. [Article] IEEE Journal of Quantum Electronics.
38(6):640-651, 2002 Jun.
• http://…/resolver.asp?genre=article&aulast=Tansu&auinit1=N
&atitle=Temperature+analysis+…+quantumwell+lasers&title=IEEE+Journal+of+Quantum+Electronics&
volume=38&issue=6&spage=640&epage=651&pages=640651&date=2002-06
Conclusions
• User reactions very positive.
• The one-stop-shopping approach has been successful.
• Users consider ability to link to full-text from citations
in A & I Services and from references on publisher
portals very helpful.
• Technically, best approach appears to be a hybrid of
asynchronous client interface with Web Services
querying databases. Moves database middleware to
Web Services and eliminates extensive custom script
code for search and database query.
Publishing Trends
• Publishers will continue to add value to
online journal articles.
• Digital version will become version of
record.
• Virtual journals (both publisher-based and
cross-publisher) will become common.
• Next-generation knowledge environments
will evolve. Multimedia, data exposed, live
equations with in-place calculations.
Publishing Trends (Continued)
• Personalized services will be available -agent technology, alerting services.
• Different economic and subscription models
will be introduced.
• Deconstruction of Journal (Bob Kelly,
APS); article at a time publishing.
• Journal branding or perhaps publisher
branding.
• Academia issues: publishing, tenure.
Continuing Issues
• Role of Authors, Academic Institutions, Libraries,
Publishers, Abstracting & Indexing Services.
• Disintermediation may affect both Libraries and
Publishers.
• Information as Function not Place.
• Provide a ‘Digital Library’ out of digital
collections.
• Role of XML technology.
• Service mechanisms: processing & archiving,
search and discovery, presentation, linking.