Universal Digital Library

Download Report

Transcript Universal Digital Library

Computing - The Next 10 Years
Universal Access to Information
Raj Reddy
Carnegie Mellon University
Pittsburgh, USA
April 6, 2001
Talk presented at Georgia Tech 10th Anniversary Convocation
Future Technology

Computational power doubles every 18 months
(Moore’s Law)


Disk Densities double every 12 months


1000-fold improvement every 10 years
Optical bandwidth doubling every 9 months


100-fold improvement every 10 years
10000-fold improvement every 10 years
Infinite Bandwidth and Memory before Computation

Cost decreasing, density increasing
What does the future hold?
We can see some glimpses of the future

Universities without walls,

Computers that never fail and self healing software

Every home with giga PCs connected by gigabit networks

Access to all the published creative works of the world

anytime anywhere anyone

Emergence of the World Bank of, not money, but Knowledge

Systems, so-called geriatric robotics, that help the disabled
lead normal lives, and

Systems that give the rest of us superhuman capabilities,
like getting a month’s work done in a day
Universal Access to Information
Information at your fingertips
 Access
to all human knowledge:
 Anyone
 Anywhere
 Anytime
All Human Knowledge
Recorded Information
Books
 Periodicals (journals, newspapers)
 Music, opera, dance
 Paintings, Sculptures and Monuments
 Movies, video
 Databases, software

Suppose all of this were on the Web
Examples from www.ulib.org
Lecture: Michael Shamos on UL
 Books: A Child’s History of England
 Art: Greek Art

What is a book?
What is a digital book ?

Collection of static content


Linearly organised


Selected by User as related
Occupying a single physical location


Browsable, navigable
Selected by an Author as related


Collection of dynamic multimedia content
No physical existence
Physically bound between cover

Instantly Transmittable
What is a Library?
Collection of items
 Linearly organized (shelves)
 Chosen by budget constraints
 Occupying physical space
 Cataloged for access

What is a Digital Library?
Collection of digital items
 (potentially huge)
 Encompassing everything (someday)
 Organized arbitrarily
 Occupying no physical space
 Fully content-searchable

Universal Library Implications
Elimination of time, space, cost constraints
 Democratization of information
 “Knowledge is power”

Hyperlinks to related information
 Preservation and Dissemination of Knowledge




faster and wider
Backup preservation
Preservation of culture
Universal Library Implications

Research
 Web
of scholarly information, reviews
Teaching
 Support for distance education
 Academic publishing
 Virtual museums

 Interactivity
Universal Library Applications

Acess to “Born Digital” Information
 World
produces a Billion Billion(1018) bytes of
information every year(Lyman and Varian)
 90% is stored digitally
Digital museum
 Digital tour guide

 What’s
in the Taj Mahal?
Universal Library Applications

Research assistant
 What
did Newton write about color?
 What are Moslem views on race?

Teaching resource
 “Act
out” books in virtual reality
 Real-time explanations
Business information
 Data mining

We Can Store Everything

1 book = 500 pp.
1MB uncompressed – 300KB compressed
 108 to 3x 108 books = ~1014 bytes = 100 terabytes


Over 100 million computers on the Internet


At 1 GB each, >100 petabytes now
1 GB of disk costs ~$3

100 terabytes < $300 thousand to $1 million
Non-textual Material

1 Movie = 10 GB



Audio



1 petabyte = 100,000 movies
All the movies ever made!
1 petabyte = 3000 years of music
All music ever performed or recorded
Paintings and Photos @ 1 MB

1 petabyte = 1 billion painting or photos
Non-textual Material

Gore’s Digital Earth

“A multi-resolution, three-dimensional representation
of the planet, into which we can embed vast
quantities of geo-referenced data.”
of Earth  1/2 peta m2
 1000 bytes/m2 feasible
 2 MB/m2 not practical yet 1021 bytes
= 1 zettabyte
 Area

{peta-, exa-, zetta-, yotta-}
Technological Challenges
Input (scanning, digitizing, OCR)
 Data representation

 text,
notations, images, web pages
Navigation and Search
 Multilingual Issues
 Output (voice, pictures, virtual reality)
 Synthetic Documents

Universal Library Design

Modular
 Technology

Distributed
 Mirror

plug-ins (e.g. machine translation)
sites
Multiple interfaces
 Human
(languages, cultures, literacy)
 Machine
Universal Library Design
 Speech
input/output
 Pictorial output
 Language support
 Translation
assistants
 Summarization
tools
 Synthetic documents
 Encyclopedia-on-demand
Input Issues

Non-digital media
 Conversion,
scanning, correction
 Triple keyboard, uncorrected OCR

Digital media
 Formats,
conversions, color representation
 ASCII, HTML, SGML, XML, PDF, PS, TEX
 JPEG, TIFF, GIF?
Input Issues
 Structured
matter
 Musical
notation, Laban
 Chemistry
 3D
Items
 Resource allocation (what’s first?)
 Duplication of effort (no registry)
Metadata

Data about an item not part of the item
 Bibliographic
 Format,
medium, encoding, resolution
 Provenance
 Reliability, integrity
 Permissions

Who generates metadata?
Navigation
Making Sense Of The World’s Knowledge

Browsing, finding, searching, flying

Fractal view
are granularity and connectivity
 View whole collections or one glyph
 Keys

Understanding structure of information
Searching Mathematics

e

 x2
2
sin x dx
0

 2 2
2
9/ 4
Searching Mathematics

e

 x2
2
sin x dx
0
MATHEMATICA Canonical Form:
Integrate[
Times[Power[E,Times[-1,Power[V1,2]]],
Sin[Power[V1,2]]],
{V1,0,Infinity}]
Multilingual Issues
 Character
sets
 Representations
Íîäà ôèçè÷åñêè íàõîäèòñÿ â çäàíèè Èçâåñòèé
Нода физически находится в здании Известий
 Multilingual
navigation
 Translation assistance
Synthetic Documents
 Documents
derived automatically
from retrieved information
 Multilingual
 Abstracts,
translation
summaries, glossaries
 Encyclopedia-on-demand
Information Reliability
 Existence
 Universal
 validity
Library Philosophy
 Avoid
value judgments
 Provide information from which users
(and programs) can assess validity
 Source,
reputation, recency, reviews,
consistency
Scaling Problems
 Search
services (e.g. Altavista) index
8
>10 documents
 Suppose
 How
there were 1012 ?
can a billion users access the
same item at once?
Policy Challenges
 Use
of copyrighted material
 Economics (Who pays? Who gets?)
 Privacy
 Reliability of information
 Change in the nature of teaching
Use Of © Content
 Philosophy:
 Authors,
must pay for use
publishers will not suffer
 Implied
license
 Automated permissions
 Bulk licensing
 Compulsory licensing
 Owner
CAN’T refuse; user MUST pay
Economics
 Flat-fee
subscriptions (e.g. HBO)
 Metered use (electric company)
 Microcharge (Tobias “clickl”)
 Free (paid by government)
 Automated permissions
 Use measured by technology
Operating Model
Single portal for access to all information
 Universal Library provides input, access,
multilingual, output and synthesis tools
 Universal Library will be a model scanning
operation
 Registry of digitized works

Operating Model
 Specialized
collections curated by
specialists, provided to Universal
Library
 Foreign collection performed in
foreign countries
 Universal Library will be mirrored in
~12 sites around the world
Universal Library Status
>13,000 digital volumes
 Art
 Newspapers
 Music, video
 Portal to hundreds of other collections

Visit http://www.ulib.org
Projects
 Navigator
 Academic
electronic publishing
 Electronic Union Catalog
 Books out of copyright
books out of print
 Software distribution
Conclusions and Recommendations

Conclusions





Barely 10% of all public information is available on the Internet
Government needs to play a leadership role in developing digital
libraries
Significant technical and operational challenges in migrating and
maintaining holdings in digital form
Intellectual Property rights need to be addressed to facilitate
creation and access digital libraries
Recommendations




Support research: meta data, scalability, multiple languages,
security, and usability
Create testbeds: million book project
Place all public governmental information online
Preserve IP rights of creators by creating tax incentives for
public use of online copyrighted information