Streams, Structures, Spaces, Scenarios, and Societies (5S

Download Report

Transcript Streams, Structures, Spaces, Scenarios, and Societies (5S

Virtual Day on Digital Theses
5 October 2007 – Mexico
Networked Digital Library of
Theses and Dissertations
(NDLTD) – www.ndltd.org
Edward A. Fox1, Executive Director
Gail McMillan, Secretary
Ryan Richardson, PostDoc
Venkat Srinivasan, Graduate Research Asst.
[email protected]
1
http://fox.cs.vt.edu/talks/2007/20071005MexicoNDLTD.ppt
Acknowledgements (selected)
• Colleagues: Tony Atkins, Lillian Cassel,
Debra Dudley, John Eaton, Lourdes
Fernandez, Marcos Gonçalves, Ming Luo,
Silvia González Marín, Uma Murthy, Doug
Oard, Alfredo Sanchez, Craig Scott, Hussein
Suleman, Alberto Castro Thompson, …
• Sponsors: Dept. of Education (FIPSE), DFG,
Elsevier, Google, IBM, IMLS, Microsoft, NSF
(DUE-0121679, IIS-9986089, 0080748,
0086227, 0535057), OCLC, RDEC/ACE,
SOLINET, SUN, SURA, VTLS, …
Digital Libraries & ETDs
• Domain: graduate
education, research
• Genre:ETDs=electronic
theses & dissertations
• Benefits: ETD creators
develop lifelong skills
with DLs. Students,
faculty, departments, &
universities save money
and gain visibility.
Project:
Networked Digital
Library of Theses
& Dissertations
(NDLTD)
http://www.ndltd.org
4
Importance of ETDs
• Open access is natural and highly
effective.Levels playing field, making
research from every nation and university
equally visible.
• Promotes scholarship and understanding
since research details are widely shared.
• Quantity of content is comparable to that
of the journal publishing enterprise.
• Can leverage “electronic” for flexibility,
expressivity, savings, and perservation.
5
Main Points
1. NDLTD was launched in 1996 to help with
ETD activities worldwide.
2. It is a member organization, so we urge
joining by all interested in digital theses.
3. Visible results, e.g., ETDs from Mexico
accessible from the NDLTD Union Catalog
(and then through Scirus, …), show that
working together helps everyone.
4. NDLTD helps with training/education,
conferences, standards, technologies,
research, and leadership.
6
What are we doing?
• Aiding universities to enhance
graduate education, publishing, and
IPR efforts
• Helping improve the availability and
content of theses and dissertations
• Educating ALL future scholars so they
can publish electronically and
effectively use digital libraries (i.e., are
Information Literate and can be more
expressive)
Digital Library Content
Content
Types
Text
Documents
Video
Audio
Geographic
Information
Software,
Programs
Bio
Information
Images and
Graphics
Articles,
Reports,
Books
Speech,
Music
(Aerial)
Photos
Models
Simulations
Genome
Human,
animal,
plant
2D, 3D,
VR,
CAT
8
Q u ic k T im e ™ a n d a
Cin e p a k d e c o m p r e s s o r
a r e n e e d e d t o s e e t h is p ic t u r e .
http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/
9
10
Digital Objects (DOs)
• Born digital
– Word processors (e.g., Word)
– XML, LaTex, BibTeX, and other processors
– Multimedia authoring and capture tools
• Digitized version of “real” object
– Scanners, cameras, MRI, …
– 3D models, datasets, …
• Renderings for presentation, preservation
– PDF/A
– ORE (Object Reuse and Exchange)
11
Metadata Objects (MDOs)
•
•
•
•
•
•
Dublin Core, and extension to ETD-MS
RDF
OAI (Open Archives Initiative) sharing
MARC
Crosswalks, mappings
Ontologies (to aid classification)
12
LOCKSS
•
•
•
•
Lots of copies keep stuff safe
Initially at Stanford (Vicky Reich)
Initial focus on lower levels
Initial content: journals
• Extending to ETDs (Gail McMillan)
13
OAI - Open Archives Initiative
• www.openarchives.org
• Advocacy for interoperability
• Standard for transferring metadata among
digital libraries
– Protocol for Metadata Harvesting (PMH)
• Standard for handling compound/complex
objects like ETDs
– ORE
14
OAI – Repository Perspective
Required: Protocol
MDO
MDO
MDO
MDO
MDO
MDO
MDO
MDO
DO
DO
DO
DO
15
OAI – Black Box Perspective
OA 7
OA 4
OA 2
OA 1
OA 3
OA 6
OA 5
16
The World According to OAI
Service Providers
Discovery
Current
Awareness
Preservation
Data Providers
17
Software Options
• ETD-db (Virginia Tech; also in Spanish)
– Customized into ADT solution (Australia)
– http://scholar.lib.vt.edu/theses/presentati
ons/ETDdb4Uppsala2007.ppt -> future
• Many local / commercial solutions
• Digital libraries or institutional repositories
– Eprints, Greenstone, Fedora/Fez, …
– DSpace (MIT, HP Labs)
18
19
20
21
22
Institutional Repositories - 1
• “Institutional repositories are digital
collections that capture and preserve the
intellectual output of a single university or
a multiple institution community of colleges
and universities.”
• Crow, R. “Institutional repository checklist
and resource guide”, SPARC, Washington,
D.C., USA
• www.arl.org/sparc/IR/IR_Guide_v1.pdf
23
Institutional Repositories - 2
• “A university-based institutional repository is a set
of services that a university offers to the members
of its community for the management and
dissemination of digital materials created by the
institution and its community members. It is most
essentially an organizational commitment to the
stewardship of these digital materials, including
long-term preservation where appropriate, as well
as organization and access or distribution.”
• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7,
Feb. 2003, www.arl.org/newsltr/226/ir.html
24
Software Issues
• Be sure:
– Can export metadata using OAI-PMH
– Is a sustainable solution
– Allows open access and preservation
• Request support for
– Flexible workflow management
• Scope: just ETDs <-> institutional memory
• Scope: time coverage -- authoring,
reviewing, submission, defense
presentation
25
Student Gets Committee
Signatures and Submits ETD
Signed
Grad School/
Library/IT
Library Catalogs ETD, Access is
Opened to the New Research
WWW
NDLTD
Union catalog: OCLC
• http://alcme.oclc.org/ndltd/servlet/OAIHan
dler?verb=ListSets (sets of ETDs)
• Is getting data from WorldCat (so, from
many sites!).
• Will harvest from all others who contact
them.
• Need DC and either ETD-MS or MARC.
28
29
30
OCLC SRU Interface
31
32
ETD Union Search Mirror Site in China (CALIS)
(http://ndltd.calis.edu.cn – popular site!)
33
VTLS and
Content Languages

The VTLS browse/search service has data in many
different languages. These include:
 English
 German
 Greek
 Korean
 Portuguese
34
Language = German; hits = 137
35
36
37
ETDs: Library Goals
• Improve library services
–Better turn-around time
–Always available
• Reduce work
–catalog from e-text
–eliminate handling: mailing to
ProQuest, bindery prep, checkout, check-in, reshelving, etc.
• Save space
38
39
The Concept Map:
From learning tool to crosslanguage knowledge discovery tool
Problem:
• Finding interesting ETDs written in Language1 may
be difficult for Language2 speakers, and vice versa.
• NDLTD has > 360,000 ETDs in > 12 languages.
• Many TDs from the Spanish speaking world are not
yet in NDLTD, e.g., UNAM in Mexico City has
50,000+ ETDs .
• ETDs exist in many languages, but discovery and
summarizing across languages is even more
difficult.
40
Cross-language Experiment - 1
English version of
ETD by Saraiya
41
Cross-language Experiment - 2
Spanish (automatic)
translation of ETD by
Saraiya
42
Cmap Study Summary
Using
• NLP tools and a domain-specific ontology
We have been able to automatically produce
concept maps for large documents (ETDs).
For the cross-language case, using
• Phrase translations mined from ETD collection
• Off-the-shelf MT tools
We have been able to automatically produce &
translate concept maps that allowed users to
determine relevance of ETDs better than using
machine-translated abstracts alone.
Google will support further R&D.
43
Problems Solved/Solvable
•
•
•
•
•
•
•
Plagiarism
Concern over quality
Concern about publishers
Intellectual property rights management
Handling restricted works
Pilot -> Recommendation -> Requirement
Inertia, lack of vision/leadership
44
Appeal
• Join NDLTD
• Move forward (in stages) so all theses and
dissertations in Mexico lead to open ETDs.
• Make all metadata accessible through the
NDLTD Union Catalog.
• Let NDLTD know how we can help!
• PREGUNTAS?
45