2003ETDtutorial.ppt

Download Report

Transcript 2003ETDtutorial.ppt

ETDs for Beginners: History and
Approach
Edward A. Fox
Executive Director, NDLTD
(plus slides from Vinod Chachra, Thom Hickey, Joan
Lippincott, and Gail McMillan)
Professor, Dept. of Computer Science
Virginia Tech (VPI&SU), Blacksburg, VA, USA
http://fox.cs.vt.edu [email protected]
ETD 2003
Humboldt University, Berlin 21-24 May 2003
ACKNOWLEDGEMENTS
• ETD 2003 organizers and attendees
• Wonderful service of NDLTD Board of Directors, and previous
Steering Committee, other committees
• Bold efforts by those running ETD initiatives in universities,
regions, and countries
• Helpful sponsorship by many organizations, especially Adobe,
Brocade Communications, c.a.r.u.s. Information Technoligy,
Cisco Systems, CONACyT, Controlware, DFG, Enterasys
Networks, Ex Libris, FIPSE, IBM, ImageWare Components,
LIB-IT, Microsoft, Nionex, NSF, OCLC, VTLS, SOLINET,
Springer-Verlag, SUN, SURA, T-Systems, UNESCO, many
governments (Australia, Germany, India, …), …
PERSPECTIVE
Digital Libraries --- Virginia Tech
•
•
•
•
•
•
•
•
•
•
•
•
MARIAN (NLM, NSF)
CS DL Prototype - ENVISION (NSF, ACM)
TULIP (Elsevier, OCLC)
BEV History Base (NSF, Blacksburg)
DL for CS Education - EI (NSF, ACM)
WATERS, NCSTRL (NSF)
NDLTD (SURA, US Dept. of Education, NSF)
CSTC (NSF, ACM), CRIM (NSF, SIGMM)
WCA (Log) Repository (W3C)
VT-PetaPlex-1 (Knowledge Systems)
NSDL (NSF): CITIDEL, DL-in-a-Box, GetSmart
AmericanSouth.Org (Mellon)
DL Examples
•
•
•
•
•
•
IBM Digital Library
Virtua (www.vtlc.com)
Greenstone (www.greenstone.org)
Eprints (www.eprints.org)
Many systems in NSF DLI projects
VT systems: MARIAN, CSTC, NDLTD
• Work on ODL, DL-in-a-box, CITIDEL, NCSTRL
Libraries of the Future
JCR Licklider, 1965, MIT Press
World
Nation
State
City
Community
Info.
Literacy
(1995)
NSF DLI (1994)
Improving
Education
Digital
Libraries
SGML (1985)
Multimedia
(1986)
WWW
(1994)
PDF
(1992)
Internet
(1984)
Library
Cancellations
(1988)
University
Scholarly
Electronic
Pub. (1988)
Synchronous
Scholarly Communication
Same time, Same or different place
Asynchronous, Digital Library
Mediated Scholarly Communication
Different time and/or place
Information
Life
Cycle
Borgman et al.:
Workshop Report on
Social Aspects of
Digital Libraries:
http://www-lis.gseis.
ucla.edu/DL/
Information Life Cycle
Authoring
Modifying
Using
Creating
Retention
/ Mining
Organizing
Indexing
Accessing
Filtering
Storing
Retrieving
Distributing
Networking
Communications
(bandwidth, connectivity)
Locating Digital Libraries in Computing and
Communications Technology Space
Digital Libraries
technology
trajectory: intellectual
access to globally
distributed information
Computing (flops)
Digital content
less
more
Digital Library Content
Content
Types
Text
Documents
Video
Audio
Geographic
Information
Software,
Programs
Bio
Information
Images and
Graphics
Articles,
Reports,
Books
Speech,
Music
(Aerial)
Photos
Models
Simulations
Genome
Human,
animal,
plant
2D, 3D,
VR,
CAT
Digital Libraries
Shorten the Chain from
Editor
Reviewer
Publisher
A&I
Consolidator
Library
DLs Shorten the Chain to
Author
Teacher
Digital
Reader
Editor
Reviewer
Learner
Librarian
Library
Digital Libraries --- Objectives
• World Lit.: 24hr / 7day / from desktop
• Integrated “super” information systems: 5S:
streams, structures, spaces, scenarios, societies
• Ubiquitous, Higher Quality, Lower Cost
• Education, Knowledge Sharing, Discovery
• Disintermediation -> Collaboration
• Universities Reclaim Property
• Interactive Courseware, Student Works
• Scalable, Sustainable, Usable, Useful
Benefits
• Ease of use
• Effectiveness
• “The benefits of digital libraries will not be
appreciated unless they are easy to use
effectively.” - IITA Workshop report
DLs: Why of Global Interest?
• National projects can preserve antiquities and
heritage: cultural, historical, linguistic, scholarly
• Knowledge and information are essential to
economic and technological growth, education
• DL - a domain for international collaboration
•
•
•
•
wherein all can contribute and benefit
which leverages investment in networking
which provides useful content on Internet & WWW
which will tie nations and peoples together more
strongly and through deeper understanding
R
e
a
g
a
n
M
o
o
r
e
E
d
F
o
x
Application
Domain
Related Institutions
Examples
Technical Challenges
Benefit / Impact
Publishing
Publishers, Eprint
archives
OAI
Quality control, openness
Aggregation, organization
Education
Schools, colleges,
universities
NSDL, NCSTRL
Knowledge management,
reuseability
Access to data
Art, Culture
Museum
AMICO, PRDLA
Digitization, describing, cataloging
Global understanding
Science
Government,
Academia, Commerce
NVO, PDG,
SwissProt, UK
eScience,European
Union Commission
Data models
reproducibility, faster reuse, faster
advance
(e)
Government
Government Agencies
(all levels)
Census
Intellectual property rights, privacy,
multi-national
Accountability, homeland security
(e)
Commerce,
(e) Industry
Legal institutions
Court cases, patents
Developing standards
Standardization, economic development
History,
Heritage
Foundations
Crosscutting
Library,
Archive
American Memory
Content, context, interpretation
Long term view, perspective,
documentation, recording, facilitating,
interpretation, understanding
Web, personal
collections
Multi-language, preservation,
scalability, interoperability,
dynamic behavior, workflow,
sustainability, ontologies,
distributed data, infrastructure
Reduced cost, increased access,
pereservation, democratization, leveling,
peace, competitiveness
J
u
n
e
2
0
0
2
f
o
r
N
S
F
Digital Libraries
• Online course materials at
http://ei.cs.vt.edu/~dlib/rcontents.htm
• Topical outlines:
Topical Outline - Foundations
•
•
•
•
•
Early visions
Definitions
Resources
References
Projects
Topical Outline – IR Areas
•
•
•
•
•
•
•
•
Search, Retrieval, Resource Discovery
Information storage and retrieval
Boolean vs. natural language
Search engines
Indexing, phrases, thesauri, concepts
Federated search and harvesting, OAI
Integrating links and ratings
Crawlers, spiders, metasearch, fusion
• Details following – Li Wang indep. study
Topical Outline - Multimedia
•
•
•
•
•
•
Multiple media types, representations
Text, audio, image, video, graphics, animation
Capture, digitization, standards, interchange
Compression, content-based retrieval
Playback (Real), SMIL, QoS
JPEG, MPEG (and versions)
Topical Outline - Architectures
•
•
•
•
•
•
Distributed, centralized
Modular, componentized
Bus (InfoBus), hierarchical, star
Mediators, wrappers (TSIMMIS)
Light weight protocols
Architecture of OAI and XOAI
Topical Outline – Interfaces
•
•
•
•
•
•
Taxonomy of interface components
Workflow
Visualization
Environments
Design
Usability testing
Topical Outline – Metadata
•
•
•
•
•
•
•
•
MARC
Dublin Core
RDF
IMS
OAI (Open Archives Initiative)
Crosswalks, mappings
Ontologies
Topics maps, concept maps
Topical Outline – Epub, SGML, XML
•
•
•
•
•
•
•
•
Authoring
Rendering, presenting
Structure
Tagging, Markup, DOM
Semi-structured information
Dual-publishing, eBooks
Styles (XSL, XSLT)
Structure queries
Topical Outline – Databases
•
•
•
•
•
•
Extending database technology
Structured and unstructured info
Multimedia databases
Link databases
Performance
Replicated storage, I2-DSI (details following)
Topical Outline – Agents
•
•
•
•
•
•
Protocols
Knowledge interchange
Negotiation, registries
Distributed issues
Ontologies (standard upper)
Webbots (automatic indexing)
Topical Outline – Economics
• E-commerce
• Sustainability
• Preservation and archiving
• DLF, Besser, Lorie, Gladney
• Self-archiving
• Open collections
• Economic models, business plans
Topical Outline – IPR
•
•
•
•
•
•
•
Intellectual property rights (IPR)
Legal issues
Terms and conditions
Copyright
Patents, trademarks
Distributed rights management
Security
Topical Outline – Social Issues
•
•
•
•
•
•
•
•
•
Cooperation, collaboration
Annotation, ratings
Digital divide
Educational applications
Cultural heritage
Museums (AMICO)
Organizational acceptance
Personalization
Internationalization
DL Challenges
• Preservation - so people with trust DLs
• Supporting infrastructure - networks, ...
• Scalability, sustainability, interoperability
• DL industry - critical mass by covering libraries,
archives, museums, corporate info, govt info,
personal info - “quality WWW” integrating IR,
HT, MM, ...
• Need tools & methods to make them easier to build
Definitions
• Library ++ (library+archive+museum+…)
• Distributed information system + organization
+ effective interface
• User community + collection + services
• Digital objects, repositories, IPR management,
handles, indexes, federated search, hyperbase,
annotation
Definition: Digital Libraries
are complex systems that
•
•
•
•
•
help satisfy info needs of users (societies)
provide info services (scenarios)
organize info in usable ways (structures)
present info in usable ways (spaces)
communicate info with users (streams)
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
5S Model
Models Examples
Objectives
Stream
Text; video; audio; image
Describes properties of the DL
content such as encoding and
language for textual material or
particular forms of multimedia data
Structures
Collection; catalog;
hypertext; document;
metadata; organization tools
Specifies organizational aspects of
the DL content
Spatial
Measure; measurable,
topological, vector,
probabilistic
Defines logical and presentational
views of several DL components
Scenarios
Searching, browsing,
recommending,
Details the behavior of DL services
Societies
Service managers, learners,
Teachers, etc.
Defines managers, responsible for
running DL services; actors, that
use those services; and relationships
among them
5S Model for DLs
5S
Definition
Streams
Sequences of elements of an arbitrary
type
Structures
Labeled directed graphs
Spatial
Sets and operations on those sets
Scenarios
Sequences of events that modify states
of a computation in order to
accomplish some functional
requirement.
Societies
Sets of communities and relationships
among them
5SLGen: Automatic DL Generation
Requirements (1)
5S
Meta
Model
DL
Expert
Analysis (2)
5SLGraph
DL
Designer
Practitioner
5SL
DL
Model
component
pool
ODLSearch,
ODLBrowse,
ODLRate,
ODLReview,
…….
Teacher
Design (3)
Researcher
5SLGen
Tailored
DL
Services
Implementation (4)
OCKHAM
•
•
•
•
Simplicity (a la OCCAM’s razor)
Support by Mellon and DLF
Next meeting in Atlanta Jan. 8, 2003
Four main ideas:
1. Components
2. Lightweight protocols
3. Open reference models (e.g., 5S, OAIS)
4. Community perspective and involvement
Problem
Why do DL developers continue to “reinvent
the wheel”? The top 10 reasons are:
1. The library budget won’t allow purchase of a
commercial DL system.
2. Unless the development effort is local, there
won’t be any control.
3. DLs are extensions of DBMSs, so they are
simple applications to develop.
4. Since DLs operate on the Web, one must adopt
the newest W3C proposal.
Problem – cont’d
5. Since technology moves so quickly, it is
essential to follow the latest fad.
6. CS students always develop from scratch.
7. This team knows it can do it better.
8. This system must have more capabilities than
any other system.
9. This DL has to be more flexible and extensible.
10. This is the right system architecture – at last!
Problem Approach
We
• address the problem of how to develop DLs;
• build on experience in building many DLs;
• strive for simplicity as per OCKHAM initiative;
• build upon the Open Archives Initiative;
• demonstrate our approach in diverse situations;
• and invite all to
• use DL-in-a-box and
• help build Open Digital Libraries.
NUDL (www.nudl.org)
Int’l Research Support (1997)
• Networked University Digital Library
• Partners: Germany, Mexico (Puebla and
Monterrey), Brazil
• Problems: Multilingual search, high
performance DLs, requirements/usability, …
• Start with ETDs, then expand to other
student works, portfolios, data sets, (CS)
courseware, ... -> institutional repositories
ALPHABET SOUP,
NOT
ROCKET SCIENCE
Alphabet Soup
• E and T or D = ETD
• (electronic)
• (thesis)
• (dissertation)
Alphabet Soup
• ET and ED = ETDs
Alphabet Soup
• DL and ET or ED =
DLTD
• (digital library)
Alphabet Soup
• SURA and DLs and
ETDs = Regional DLTD
• (Southeastern University
Research Association)
Alphabet Soup
• FIPSE and DLs and ETDs =
National DLTD
• (Fund for the Improvement
of Post Secondary
Education – US Dept. of Ed)
Alphabet Soup
• International and DLs and
ETDs = Networked
DLTD = NDLTD
• (Recall “n” in CNI –>
Coalition for Networked
Information)
Alphabet Soup - Factoring
• NDLTD = ND LTD
• (Paul Mather – from UK)
• NDLTD = NDL TD
• (Edie Rasmussen)
• (Later, Networked University Digital
Library = NUDL
A Digital Library Case Study
• Electronic theses and • Networked Digital
Library
of
Theses
and
dissertations (ETDs)
Dissertations
• Submission:
(NDLTD)
http://etd.vt.edu
http://www.ndltd.org
(formerly “National”
• Collection:
because
of
Fed.
funds,
http://www.theses.org
before international
members started
joining)
SLIDES FROM
1998
What led to today’s situation?
• 1987 mtg in Ann Arbor: UMI, VT, …
• 1992 mtg in Washington: CNI, CGS, UMI, VT
and 10 universities with 3 reps each
• 1993 mtg in Atlanta to start Monticello
Electronic Library (MEL): SURA, SOLINET
• 1994 mtg in Blacksburg re ETD project: std of
PDF + SGML + multimedia objects
• 1996 funding by SURA and US Dept. of
Education (FIPSE) for regional, national
projects (NDLTD)
VISION,
BENEFITS,
APPROACH,
POSSIBILITIES
What are we doing?
• Aiding universities to enhance grad educ.,
publishing and IPR efforts: to help
improve the availability and content of
theses and dissertations
• Educating ALL future scholars so they can
publish electronically and effectively use
digital libraries (i.e., are Information
Literate and can be more expressive)
• Demonstrating how for other organizations
What are the key ideas?
• Scalability
• Empower authors to submit to DL, as a natural part
of the educational process
• Study workflow & apply automation, so institutions
streamline processing and build their part of the DL
• Federate along most suitable cultural/political lines
• People can switch to electronic documents
• Becoming more expressive with hypermedia
• Mandating ETDs will change all
future scholarship
What are the benefits?
• Save students money
• Save handling, shelf space in libraries
• Build the Networked Digital Library of
Theses and Dissertations: with faster,
broader, and less expensive access
• Demonstrate how universities can
work together directly (vs. indirectly
through publishers or associations)
What are the long term goals?
• 400K US students / year getting grad
degrees are exposed / involved
• 200K/yr rich hypermedia ETDs that
may turn into electronic portfolios
• Dramatic increase in knowledge
sharing: lit. reviews, bibliographies, …
• Services providing lifelong access for
students/researchers: browse, search,
prior searches, citation links
Grad Student Workstation?
• Record all work with NDLTD, return to
prior situation, prepare bibliography
• Powerful (multilingual, text, image)
searching, browsing (with categories),
following citation links
• Support collaboration with others in same
field: help with literature review, sharing
tools and data sets, applying their methods
Social Capital?
• Increase local interchange among students,
faculty, library, graduate school
• Increase international understanding,
building many more invisible colleges,
with students more empowered
• Connect graduate researchers with
undergrads, who can access ETDs / them
• Facilitate direct university collaboration,
explicitly, in reshaping publishing world
How are ETDs being done at
Virginia Tech?
• Produced using standard word
processing packages as PDF files
•
•
LaTeX class, outline fonts
Word template, PDFwriter
• Reviewed by the Graduate School
• Cataloged and archived by the library
• Downloaded by UMI from server (if
payment has been made)
Convene Local Planning Group
ETD
Build An ETD Site
ETD
Workshop/Training
Digital Library
Policies
Inspection/Approval
Student Prepares Thesis or Dissertation
NDLTD
Literature
Computer Resources
Research
Student Defends and Finalizes ETD
My Thesis
ETD
Student Gets Committee Signatures
and Submits ETD
Signed
Grad School
Graduate School Approves ETD
Student is Graduated
Ph.D.
Library Catalogs ETD and New Students
Have Access to the New Research
WWW
NDLTD
Status of the Local Project
• Approved by university governance
Spring 1996; required starting 1/1/97
• Submission & access software in place
• Submission workshops for students
(and faculty) occur often: beginner/adv.
• Faculty training as part of Faculty
Development Initiative
• Over 700 ETDs in collection by 1/98
How can a university get
involved?
• Select planning/implementation team
•
•
•
•
Graduate School
Library
Computing / Information Technology
Institutional Research / Educ. Tech.
• Send us letter, give us contact names
• Adapt Virginia Tech solution
•
•
Build interest and consensus
Start trial / allow optional submission
CONCERNS,
PROBLEMS,
OPPOSITION
Some Barriers at Universities
•
•
•
•
Lethargy; Not invented here (esp. large univ’s)
Anger with unfunded, added, required work
Last straw: using more frustrating technology
Lack of experience in working together:
graduate school, library, computing staff
• Lack of interest in (quality of) student work
• More loyalty to discipline than to campus
• Unwillingness to accept responsibility for $
problems with libraries, publishers
MECCA Conf. 6/11/98
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Armbruster, U. Tennesee, Memphis
Bennett, Robert C., U. Texas Med Sch
Brown, Melinda, Vanderbilt
Eaton, John, Graduate School, Va Tech
Fox, Ed, Computer Science, Va Tech
Gherman, Paul, Library, Vanderbilt
Goodstein, Lynn, Penn St. U.
Hagen, John H., Library, WVU
Hardemon, James, U. Florida
Helmstetter, Wendy, Library, FIT,
Liston, Rick, NCSU
Lutz, Richard, Graduate School, Florida
McFarland, Mark U. Texas, Austin
McMillan, Gail, Library, Va Tech
Minsker, Tom, Penn St U.
Mortara, Antionet, FIT
Painter, Linda, U. Tennessee
Sowell, Robert, Graduate School, NCSU
Tague, Larry, U. Tennessee, Memphis
Vaughan, Mary Ann, Vanderbilt
ETD
Overview
Spirit of NDLTD
•
•
•
•
•
Help make a better (smaller) world
Win-win-win (everyone can benefit)
Have fun helping others
Helpers/teachers learn more than those they work with
Cooperation, friendly competition
• When you “1-up” VT, share your software, documents!
• “Doing better” requires both “doing”, “better”
• Balance (and build on standards)
• New, popular, powerful, expressive, exciting, “better”
• Doable, feasible, learnable, affordable, sharable, preservable
• We can always do more, enhancing quality and
knowledge!
The Networked Digital Library of Theses and Dissertations
www.NDLTD.org
Training Authors
Expanding Access
Preserving Knowledge
Improving Graduate Education
Enhancing Scholarly Communication
Empowering Students & Universities
Leader of the Worldwide ETD
(Electronic Thesis and Dissertation) Initiative
NDLTD
Grad
Program
IT
Library
Ed.
(Tech)
Key Ideas:
Scalability
Networked infrastructure
University collaboration
Workflow, automation
Education is the rationale
Maximal
Access
8th graders vs. grads
Authors must submit
Standards
PDF, SGML, MM,
MARC, DC, URNs,
Federated search
What led to today’s meeting?
• 1987 mtg in Ann Arbor: UMI, VT, …
• 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities
with 3 reps each
• 1993 mtg in Atlanta to start Monticello Electronic Library (regional,
US Southeast): SURA, SOLINET
• 1994 mtg at VT: std: PDF + SGML + multimedia objects
• 1996 funding by SURA, US Dept. of Education (FIPSE)
• 1997 meetings in UK, Germany, ...
• 1998 – 1st symposium – Memphis (20)
• 1999 – 2nd symposium – Blacksburg (70)
• 2000 – 3rd symposium – St. Petersburg (225)
• 2001 – 4th symposium – Caltech (200)
• 2002 – 5th syposium – BYU, Provo, Utah
• 2003 – 6th syposium – Berlin (215)
• 2004 – 7th syposium – U. Kentucky
• 2005 – 8th syposium – Sydney, Australia
NDLTD Membership
•
•
•
•
•
As of 5/17/2003 there were at least:
176 members, including:
155 individual universities
6 consortia
21 institutional members
National / Regional Projects
• Australia
•
•
•
•
•
•
•
U. New South Wales (lead)
U. of Melbourne
U. of Queensland
U. of Sydney
Australian National U.
Curtin U. of Technology
Griffith U.
• Belgium
• Brazil
• Germany
• Humboldt University (lead)
• 3 other universities
• 5 learned societies: Math, Physics,
Chemistry, Sociology, Education
• 1 computing center
• 2 major libraries
• India
• Lithuania
• Spain: Consorci de Biblioteques
Universitàries de Catalunya, as
group, www.cbuc.es: 9 sites
• Sudan
• UK (British Library, JISC,
Edinburgh)
• UNESCO (especially Latin
America, Eastern Europe, Africa)
• USA:
• CIC (“Big 10”)
• Ohio: OhioLINK: 79 colleges/univs
• SOLINET
• …
OhioLINK
•
•
•
•
•
•
Statewide Consortium
Represents 79 colleges, universities, libraries
Public Universities
Private Universities and Colleges
2-Year Colleges
Only a few (e.g., Miami U. of Ohio) are also
NDLTD members on their own
US University Members
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Air University (Alabama)
Baylor University
Boston University
Brigham Young University
Caltech
Clemson University
College of William & Mary
Concordia University (Illinois)
Drexel University – required 4/2002
East Carolina University
East Tenn. State U. – required 1/2001
Florida Institute of Technology
Florida International University
Florida State University
Florida Tech
George Washington University
Georgetown University
Johns Hopkins University
Louisiana State University – required 1/2002
Marshall University (W. Va.)
Miami University of Ohio
Michigan Tech
Mississippi State University
MIT
Montana State University
Naval Postgraduate School (CA)
New Jersey Inst. of Technology
New Mexico Tech
North Carolina State University – required 9/2002
Northwestern University
Penn. State University
Regis University
Rochester Institute of Tech.
Texas A&M
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
U. of Central Florida
U. of Colorado Health Science Center
U. of Florida – required 8/2001
U. of Georgia – required 9/2001
U. of Hawaii, Manoa
U. of Illinois, Urbana-Champaign
U. of Iowa
U. of Kentucky – required in CS only
U. of Maine – required in CS, Spatial Info Sci/Eng
U. of Missouri-Columbia
U. of North Texas – required since 8/99
U. of Oklahoma
U. of Nevada, Las Vegas
U. of New Orleans
U. of North Texas – required 8/1999
U. of Oklahoma
U. of Pittsburgh
U. of Rochester
U. of South Florida – required 8/2002
U. of Tennessee, Knoxville
U. of Tennessee, Memphis
U. of Texas at Austin – required 6/2001
U. of Virginia – required 1/2003
U. of West Florida
U. of Wisconsin - Madison – part reqt 12/1999
Vanderbilt U.
Virginia Commonwealth U.
Virginia Tech - required 1/97
Wake Forest U.
West Virginia U. - required 8/1998
Western Kentucky U. – required 9/2004
Western Michigan U.
Worcester Polytechnic Inst. – required 7/2002
Yale U.
Other Countries (selected)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Australia
Belgium
Brazil
Canada
Chile
China, Hong Kong
Columbia
Finland
France
Germany
Greece
India
Italy
Jamaica
Korea
Lithuania
Mexico
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Netherland
Norway
Poland
Russia
Singapore
S. Africa
S. Korea
Spain
Sudan
Sweden
Taiwan
Thailand
UK
Venezuela
Institutional Members
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Australian Digital Theses Program
British Library
Cinemedia
Coalition for Networked Information (CNI)
Committee on Institutional Cooperation (CIC)
Consorci de Biblioteques Universitàries de Catalunya
Diplomica.com
Dissertation.com
Dissertationen Online (Germany)
ETDweb, a Division of Answer4.com
Ibero-American Science & Technology Education Consortium (ISTEC)
MathDISS International
National Documentation Centre (NDC), Greece
National Library of Canada
National Library of Portugal
OCLC Online Computer Library Center
Office of Scientific and Technical Info (US Dept of Energy)
OhioLINK
Organization of American States (SEDI/OAS)
Southeastern Library Network (SOLINET)
Sudanese National Electronic Library
UNESCO (www.unesco.org/webworld/etd)
UNESCO and ETDs
• Promoting the use of the Internet as a tool for disseminating
scientific knowledge
• Facilitating the transfer of ETD expertise from developed to
developing countries
• 1998: Member of the NDLTD Steering Committee
• 1999: First UNESCO ETD meeting on ETD
internationalisation
• 2002: “UNESCO Guide to Electronic Theses and Dissertations”
• 2003: Model training programmes and training courses
• 2003: Sponsor pilot projects
• 2003: Pilot projects (Africa, Europe, Latin-America)
Access Possibilities
Web
search
engines
www.
theses.
org
Virginia MIT National
Tech
Library of
Portugal
www.
library
openarchives. catalog
org
clients
CBUC
(Spain)
Ohio
Link
3rd
Party
Services
(e.g.,
Bell &
Howell)
National
Projects:
AU, GE, …
ETD Initiative (and ProQuest)
Students
Learn about
DL, EPub
TDs
become more
expressive
Global TDs
become more
accessible,
archived
Universities
UMI
N. Amer. (T)Ds are
accessible, archived
Why ETD?
Short Answer
• For Students:
• Gain knowledge and skills for the Information Age
• Richer communication (digital information, multimedia, …)
• For Universities:
• Easy way to enter the digital library field and benefit thereby
• For the World:
• Global digital library – large, useful, many services
• General:
• Save time and money
• Increased visibility for all associated with research results
The Process?
Short Answer
• For Students:
• Plan on ETD from day 1
• Secure knowledge from: workshops, online info, colleagues
• Work with faculty to plan approach
• PDF? XML? TEI? Multi/hypermedia? Data sets? Viz?
• Get signed approval form: access, ©, proxy assignment
• After defense and approval, submit ETD to university
• For Universities:
• Form team
• Adapt solution from work at other universities, attend ETD
conference
• Pilot -> Option -> Requirement
Assistance
• Software, documentation, tech support
• Email, listservs ([email protected])
• UNESCO sponsored etdguide.org
• English in 2001, Spanish&French in 2002
• Training sessions in Latin America …
• Marcel Dekker book soon in press
• www.ndltd.org
Open Archives Initiative
OAI
www.openarchives.org
[email protected]
Technical Umbrella for Practical
Interoperability…
Reference
Libraries
Museums
Publishers
E-Print
Archives
…that can be exploited by different communities
The World According to OAI
Service Providers
Discovery
Current
Awareness
Data Providers
Preservation
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
Repository of Digital Objects
Repository
Access
Protocol
handle
terms and conditions
Digital object
OAI – Black Box Perspective
OA 7
OA 4
OA 2
OA 1
OA 3
OA 6
OA 5
OAI – Black Box Perspective
Services:
Search
Browse
Metadata:
Summarize
Visualize
OA 7
OA 4
OA 2
OA 3
OA 1
OA 6
OA 5
Docs:
DO
DO
DO
DO
DO
DO
DO
Protocol for Metadata Harvesting
• Service Requests
•
•
•
•
•
•
•
•
•
•
Identify
ListMetadataFormats
ListSets
GetRecord
ListIdentifiers
ListRecords
Metadata Multiplicity
Date/Time Ranges
Sets (with semantics depending on local data providers)
Resumption Tokens
Key Features of the OAI
Metadata Harvesting Protocol
• definitions &
concepts
• repository
• record
• identifier
• datestamp
• set
• protocol features
• HTTP encoding
• metadata prefix &
schema
• flow control
• protocol requests
• supporting requests
• harvesting requests
repository
support
data
harvesting
data
h
a
r
v
e
s
t
e
r
OAI protocol
r
e
p
o
s
i
t
o
r
y
items
selective harvesting - datestamps
harvest within
date range
record
record
r
e
p
o
s
i
t
o
r
y
DL Components
Gateways
MM/ HT Renderer
User Interfaces
Workflow Mgr
Search Engines, Classifiers, …
DBMS
Rights Mgr
Data, MM Info
Repository
Open Digital Library (ODL)
Hypothesis (Hussein Suleman)
• Can we leverage the successful model of the OAI
Protocol for Metadata Harvesting to alleviate our
architectural problems ?
Maybe … if
Digital Libraries can be modeled as
• networks of extended Open Archives, where
• each extended Open Archive is a
• source of data and/or a provider of services.
Document
Document
Document
1010100101
?
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
users
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
digital objects
?
?
Document
Document
Document
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
?
?
?
componentized digital library
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
?
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
XPMH
Document
Document
Document
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
OA
OA
XPMH
XPMH
OA
OA
XPMH
XPMH
XPMH
XPMH
OA
PMH
XPMH
OA
XPMH
OA
OA
XPMH
OA
XPMH
open digital library
PMH
Program
Program
Program
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Image
Image
Image
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Video
Video
Video
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Component System Approach
• (Open) DL = Network of Extended OAs
Data Input
Local Archive
Resource Discovery
Search
Browse
Recommend
Metadata Repository
legend
Remote Archive
User Interface
OAI/ODL archive
OAI/ODL protocol
Example Architecture (NDLTD)
Virginia Tech
User Interface
PhysNet
Humboldt
Search
Browse
Recent
Duisburg
CalTech
Union Catalog
MIT Filter
MIT
legend
Dresden
User Interface
OAI/ODL archive
OAI/ODL protocol
ODL Demonstration - FrontPage
ODL Component Requirements
• Search
• Retrieve a list of items
• Index new items
• Annotate
• Add annotation to item
• Retrieve a list of annotations for an item
Open Digital Library Components
• Running now
• XML-File (data provider from file system)
• Union, search, browse, recent, filter
• E-journal/review, Submit, Edit, Annotation
• Class projects
• High performance multilingual search
• Recommender, Rating; Mirroring (see JCDL’02)
• Working with NCSA: from DB, unstructured text
• Others discussed
• Classification/categorization
• DL-Viz interconnection (VIDI – Jun Wang ETD)
Open Digital Library: Extended
As What’s
New Service
Provider
What’s
New
Engine
XML File
Coll. & Data
Provider 1
XML File
Coll. & Data
Provider 2
XML File
Coll. & Data
Provider 3
As Metadata
Search Service
Provider
IRDB-1
Search
Engine
As Metadata
Browse Service
Provider
DBBrowse
Browse
Engine
As Recommend
& Rate Service
Provider
Recommend
Rate
Engine
DBUnion Archive Merger Component
Harvest from
As Annotation
Search Service
Provider
Annotation
Engine
data providers
Filter
OAI-PMH
Data Provider
Submit
Archive
IRDB-2
Search
Engine
OAIB (NCSA:
from RDBMS)
Example Open Digital Library
ODLRecent
Document
Document
ETD-1
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Recent
USER INTERFACE
Students and
researchers
ODLUnion
PMH
Filter
PMH
ODLUnion
Browse
Union
PMH
ODLBrowse
ODLUnion
Search
ODLSearch
Program
Program
ETD-2
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
PMH
Filter
PMH
Image
Image
ETD-3
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Video
Video
ETD-4
1010100101
1010100101
0100101010
1010100101
0100101010
1001010101
0100101010
1001010101
0101010101
1001010101
0101010101
0101010101
Digital Library for the Networked Digital Library
of Theses and Dissertations (www.ndltd.org)
ETD collections
Example Open Digital Library
USER INTERFACE
Box:
Users
Box:
Reviews
DBReview
Box:
Accepted
Resources
Box:
Resources
under Review
Thread
DBRate
Suggest
DBUnion:
Metadata
Union
User Interface
OAI/ODL component
OAI/ODL protocol
IRDB
DBBrowse
DBUnion:
Legacy
Metadata
Digital Library for the
Computer Science Teaching Center (www.cstc.org)
Digital Library in a Box
• Domain: helping DL projects
• Genre: any domain, but especially those
involved in NSDL (since funded in part
is through NSDL – with U. FL, NCSA)
• Software and Documentation:
http://dlbox.nudl.org
DL Standardized Log Format- Design
5S
Definition
Use in Log Design
Streams
Represent static and dynamic
multimedia content
Temporal events, types of digital objects
Structures
Labeled directed graphs; provide
organization within the DL
Structured documents and metadata; structured
searches, collection, metadata catalog; hypertext,
classification scheme
Spaces
Sets, properties and operations on
those sets
Retrieval mode, Presentation information,
Scenarios
sequences of events that modify
states of a computation in order to
accomplish some functional
requirement.
Organization of the user and system actions into
transactions, statements, events and actions; DL
services as sets of scenarios.
Societies
Sets of communities and
relationships among them
User information
ETDs and Libraries
Gail McMillan
Digital Library and Archives, University Libraries
Virginia Polytechnic Institute and State University
Ohio State University/Virginia Tech Video Conference
October 24, 2002
Goals for Libraries and Archives
• Improve services
• Better turn-around time
• Always available
• Reduce work (save $)
• Catalog from etext
• Eliminate handling
• Save space
ETDs at Virginia Tech
• Partnership: Library, Graduate School, and
Faculty
• Approved by university governance- Mar.1996
• Full implementation- Jan.1997
• Web submission
• Students: http://etd.vt.edu
• Programmers: http://scholar.lib.vt.edu/ETD-db/
• Workshops for students (and faculty)
• Over 5000 ETDs approved
How are ETDs managed?
•
Graduate student creates ETD
•
•
•
Graduate student submits ETD
•
•
•
Directly to library server/permanent archive
Archiving fee replaces binding fee
Graduate School approves
•
•
•
•
Word processor, multimedia
Saves as PDF, usually
E-mails author, advisor, UMI (VT scripts)
Authors/advisors prescribe Internet access
Library catalogs and archives
UMI downloads
Q uickTim e™ and a
Cinepak decom pr essor
ar e needed t o see t his pict ur e.
http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/
Library Resources
• Hardware: server
• Maintenance and security
• Started small: NeXt 3.3 (HP; 1989-97)
• Grew: Sun dual-processor Enterprise 250--Solaris 2.7 (Apache
web server)
• Software
• Submission scripts written by DLA
• Includes e-mail notifications to authors, advisors, UMI
• Use it too: http://scholar.lib.vt.edu/ETD-db/
• Log files analyzed with Analog
• Survey scripts written by DLA
• Data from authors and readers
• Use it too: http://lumiere.lib.vt.edu/surveys/
• Search Engine
• Started small: freeWAIS >> Grew: InfoSeek’s ULTRASEEK
Financial Concerns
• At VT: start-up costs = $0
• On-hand staff, equipment, software, freeware
• From zero base: estimate $65,000
• $24,000
• $36,000
• $15,000
Staff (part time)
Equipment
Software
http://scholar.lib.vt.edu/theses/data/setup.html
Costs/Savings at VT
• Graduate School stopped shipping to the library 3000
copies of paper TDs/year
• Library stopped handling (e.g., shipping, binding,
shelving, and circulating) 3000 copies of TDs/year
• 166 ft of shelf space saved yearly by the Library
• VT used existing equipment in Library (vs. start-up
costs for staff, hardware and software)
Digital Library Benefits:
Low margin, high use
• Incorporate ETDs with other digital library activities
• Ejournals, online class materials, digital images, etc.
• Additional equipment, staff may not be necessary
• http://scholar.lib.vt.edu/theses/data/setup.html
• Use VT programs, scripts, etc.
• http://scholar.lib.vt.edu/ETD-db/
• Online accesses vs. circulation of copies
• VT theses 1990-1994, combined average circulation per
copy: 2.24/yr
• VT dissertations 1990-1994, combined average circulation
per copy: 3.2/yr
Access to VT’s ETDs
http://scholar.lib.vt.edu/theses/
5,000,000
4,500,000
4,000,000
3,500,000
3,000,000
2,500,000
2,000,000
1,500,000
1,000,000
500,000
ETD files requested
Abstracts requested
1997/98
231,709
165,710
1997/98
483,030
215,493
1999/00
578,152
260,699
2000/01
2,173,420
573,149
2001/02
4,497,199
471,917
Why are ETDs so popular?
• User surveys
•
•
•
•
•
67% found VT ETDs easily
61% found them by searching
22% browsed by department
16% browsed by author
53% downloaded 1 or more ETDs
• Author surveys
• Conversion and submission processes less difficult than
anticipated
• Over half plan to publish articles from their ETDs
• Why did they restrict access?
http://lumiere.lib.vt.edu/surveys/
Availability of 4224 VT ETDs
Available/
Unrestricted
52.6%
W ithheld
17.2%
Mixed
2.9%
Restricted
VT-only
27.3%
Reasons for Restricted Access
Advice of faculty
48.58%
Advice of publisher
4.17%
Patent pending
3.23%
Personal choice
25.81%
Advice of others
9.49%
Other reasons
8.73%
ETDs and Accessibility
• Inaccessible ETDs
• Patents pending
• Future publication fears
• Broken links
• Quality of work remains
• Similar to out-of-print articles
• Media standards
• Open source software (e.g., PDF reader)
• Typical commercial software
• Few esoteric programs, include original scripts
ETDs and Publishing
• Early controversies waning
• Faculty: prior publication?
• Protective of future academics
• Surveys of publishers
• No specific policies largely
• Consider submissions individually
• VT ETD Alumni
• None had problems getting published
• Authors
• Retain some rights, e.g., link to curriculum vitae,
online course materials
ETDs and Copyright
• Author’s rights
• Reproduction, modification, distribution, public
performance, public display
• Retain rights
• Share non-exclusive rights
• Permit library to store and to provide access
• Publishers
• Author’s obligations: fair use
• Balance factors or get permission
• Notification: optional
Copyright 2002 by Gail McMillan ALL RIGHTS RESERVED
• Registration: optional
• Possibly receive greater compensation, with less
documentation if filing infringement law suit
ETDs and Long-term Preservation
• Concerns: Access without paper
• Long term preservation
• Standard multimedia formats
• PDF Reader: an open source
• http://scholar.lib.vt.edu/theses/archive.html
• Addressed Concerns
• Cooperatives
• OhioLink
• Why not: OCLC, NDLTD?
• Commercial options
• UMI: traditional microfilming
• Frequent, regular back-ups available on, off-site
Ensuring Access to VT ETDs
• Every 15 minutes back-ups made of newest, notyet-approved submissions
• Hourly back-ups of newly approved ETDs
• Weekly back-ups of entire ETD collection
• Multiple copies stored on-site and off-site
• NDLTD: let’s reciprocate, cooperative mirroring
Lessons from ETDs
• Implementation of new formats slower than expected
• Text oriented
• Not planning for online readers
• If you build it, it will get used.
• Access exceeded expectations
• Disappointing number are inaccessible
• Remarkable increase in exposure to graduate student
research
• Requiring institutions slower than expected
• No longer experimental
• Increase in number and diversity of NDLTD institutions
Available at VT
• Information
http://scholar.lib.vt.edu/theses
• Automated submission system ready for
customization
http://scholar.lib.vt.edu/ETD-db/
• Student guidelines, training materials, FAQ's,
multimedia educational materials
http://etd.vt.edu
• NDLTD: Network educational institutions
• Annual conferences: Berlin 2003, U of Kentucky 2004
http://www.ndltd.org
Union Catalog
(with
Vinod Chachra,
Thom Hickey)
NDLTD Union Catalog Architecture
VT ODL Demo
Search/Browse
SRU/SRW
(search) OAI-PMH
TD OAI
ETD OAI
OCLC
Repository
Repository
OAI-PMH
Virtua
VTLS
Union
Catalog
OAI-PMH
OAI-PMH
WorldCat
Try:
Z39.50
harvest
20+ sites
email FTP
Union Catalog Creation
Name Authority
Service
(e.g. OCLC)
NDLTD Central
VTLS Union
Catalog
NDLTD Site / Member
Librarian
Verification /
Validation /
Enrichment /
Maintenance
Student
Entry
OAI
Server
Local DB
MARIAN
Union
Catalog
Virtua
MARC DB
OAI
Harvester
Conversion
Local
Search /
Brow se
Alternate MARC
Transport (ftp?) tapes?)
OCLC Capabilities
• Harvesting
• OAI-PMH versions 1.1 and 2.0
• Harvestable sets
• Sets by institution
• Searching
• SRU (Z39.50 on the Web)
• VTLS
• Virginia Tech Open Digital Library demo
• Unicode support
OCLC Statistics
• 19 Sources
• 61,998 records
• Probably some overlap
• Adding 1-2 new sites/month
OCLC Metadata Formats
• Dublin Core – All
• ETDMS – 9
• MARC – 5
Complex to Simple
MARC ($50)
Dublin Core (DC)
+
thesis
ETD-MS
• ETD Metadata Standard
• XML-encoded metadata standard
(content and encoding) for Electronic
Theses and Dissertations (ETDs)
• in part conforming to Dublin Core (DC)
• using UNICODE
• (optionally / later using RDF)
• Well specified relationship with MARC
NDLTD Members and ETD-MS
• NDLTD members
• Share metadata for their ETDs
• Providing that in either ETD-MS
• Or if they use a version of MARC locally, work
to have that eventually shared in either
MARC21 or UNIMARC
• Run OAI, either locally or in consortia, so
their metadata can be harvested, according
to necessary terms and conditions
The OAI Static Repository Model
• Components of the model
• The static repository
• An well-defined structure XML file with
information similar to that in OAI-PMH responses
• Accessible at a persistent network-location
• The static repository gateway
• makes one or more Static Repositories harvestable.
• assigns a unique base URL to each such Static
Repository
• Responding to OAI-PMH requests
The OAI Static Repository Model
NDLTD Union Catalog Statistics
1. Participating Countries

So far ETDs from 7 countries are included in the database.
 Canada
 Germany
 Greece
 Korea
 Portugal
 Spain
 U.S.

UK to be added by June 30, 2002.
Brazil to be added soon.

NDLTD Union Catalog Statistics
2. Interface Languages in Union Catalog



The language here is the language of the interface
The VTLS NDLTD Union Catalog has 14 languages:

English, Arabic, Catalan, Chinese

French, German, Hebrew, Korean

Polish, Portuguese, Russian, Slovak

Spanish and Swedish
Example follows
German
NDLTD Union Catalog Statistics
3. Languages in the Union Catalog



The language here is the language of the content of ETD
The VTLS NDLTD Union Catalog has data in 6 different
languages. These are:
 English
 German
 Greek
 Korean
 Portuguese
 Spanish
Examples follow
Language = German; hits = 137
Full record display
Language = Greek
In Greek
In English
Other Topics
•
•
•
•
•
Extended services: linking
Retrospective conversion
Z39.50
Requiring ETDs
…
Collaborative
Development
(Joan Lippincott)
Why Collaboration?
• Expertise in aspects of the digital
environment
• Pooling of resources
Collaboration and digital projects
•
•
•
•
•
Distributed systems
Digital course content
Digital library resources
Delivery of services
Development of policies
Collaborations involve:
• Shared goals
• Common vision
• Shared vocabulary
Two views of an ETD progam
• Have staff scan
• Implement now
• Increase
university
visibility
• Teach students to
write and submit
ETDs
• Implement soon
• Develop electronic
authors
In a collaboration...
• Each contributes resources
• Partners acknowledge and value
contributions
• Partners develop a clear process
• Group and individual accountability
ETD project participants
•
•
•
•
•
•
•
Academic administrators
Faculty
Students
Staff
Graduate school / provost / registrar
Information technologists
Librarians
Collaboration and NDLTD
• Common goals of members
• Diverse sets of skills and expertise
• Need for strategies and tactics to
surmount any problems -> advocacy
Collaborative project strategy
• Champion initiates project
• Leadership establishes initial goal and
parameters
• Issue a call for participants
• Conduct procedure to select
participants
Collaborative project strategy
• Initial meeting
• Develop shared goals
• Develop clear process
• Continue work at institutions
• Establish communication channels
• Establish project milestones
• Evaluate progress, refine approach
Collaborative project strategy
• Disseminate results
• Online documentation
• In-person event
• Disseminate a product
• Regional workshops
• Session at ETD 20XX
NDLTD project areas
•
•
•
•
Training materials
Promotional materials
Identify and recommend standards
Local, national, regional policies
Your
Plans
(Ana Pavani)