Universal Digital Library
Download
Report
Transcript Universal Digital Library
Computing - The Next 10 Years
Universal Access to Information
Raj Reddy
Carnegie Mellon University
Pittsburgh, USA
April 6, 2001
Talk presented at Georgia Tech 10th Anniversary Convocation
Future Technology
Computational power doubles every 18 months
(Moore’s Law)
Disk Densities double every 12 months
1000-fold improvement every 10 years
Optical bandwidth doubling every 9 months
100-fold improvement every 10 years
10000-fold improvement every 10 years
Infinite Bandwidth and Memory before Computation
Cost decreasing, density increasing
What does the future hold?
We can see some glimpses of the future
Universities without walls,
Computers that never fail and self healing software
Every home with giga PCs connected by gigabit networks
Access to all the published creative works of the world
anytime anywhere anyone
Emergence of the World Bank of, not money, but Knowledge
Systems, so-called geriatric robotics, that help the disabled
lead normal lives, and
Systems that give the rest of us superhuman capabilities,
like getting a month’s work done in a day
Universal Access to Information
Information at your fingertips
Access
to all human knowledge:
Anyone
Anywhere
Anytime
All Human Knowledge
Recorded Information
Books
Periodicals (journals, newspapers)
Music, opera, dance
Paintings, Sculptures and Monuments
Movies, video
Databases, software
Suppose all of this were on the Web
Examples from www.ulib.org
Lecture: Michael Shamos on UL
Books: A Child’s History of England
Art: Greek Art
What is a book?
What is a digital book ?
Collection of static content
Linearly organised
Selected by User as related
Occupying a single physical location
Browsable, navigable
Selected by an Author as related
Collection of dynamic multimedia content
No physical existence
Physically bound between cover
Instantly Transmittable
What is a Library?
Collection of items
Linearly organized (shelves)
Chosen by budget constraints
Occupying physical space
Cataloged for access
What is a Digital Library?
Collection of digital items
(potentially huge)
Encompassing everything (someday)
Organized arbitrarily
Occupying no physical space
Fully content-searchable
Universal Library Implications
Elimination of time, space, cost constraints
Democratization of information
“Knowledge is power”
Hyperlinks to related information
Preservation and Dissemination of Knowledge
faster and wider
Backup preservation
Preservation of culture
Universal Library Implications
Research
Web
of scholarly information, reviews
Teaching
Support for distance education
Academic publishing
Virtual museums
Interactivity
Universal Library Applications
Acess to “Born Digital” Information
World
produces a Billion Billion(1018) bytes of
information every year(Lyman and Varian)
90% is stored digitally
Digital museum
Digital tour guide
What’s
in the Taj Mahal?
Universal Library Applications
Research assistant
What
did Newton write about color?
What are Moslem views on race?
Teaching resource
“Act
out” books in virtual reality
Real-time explanations
Business information
Data mining
We Can Store Everything
1 book = 500 pp.
1MB uncompressed – 300KB compressed
108 to 3x 108 books = ~1014 bytes = 100 terabytes
Over 100 million computers on the Internet
At 1 GB each, >100 petabytes now
1 GB of disk costs ~$3
100 terabytes < $300 thousand to $1 million
Non-textual Material
1 Movie = 10 GB
Audio
1 petabyte = 100,000 movies
All the movies ever made!
1 petabyte = 3000 years of music
All music ever performed or recorded
Paintings and Photos @ 1 MB
1 petabyte = 1 billion painting or photos
Non-textual Material
Gore’s Digital Earth
“A multi-resolution, three-dimensional representation
of the planet, into which we can embed vast
quantities of geo-referenced data.”
of Earth 1/2 peta m2
1000 bytes/m2 feasible
2 MB/m2 not practical yet 1021 bytes
= 1 zettabyte
Area
{peta-, exa-, zetta-, yotta-}
Technological Challenges
Input (scanning, digitizing, OCR)
Data representation
text,
notations, images, web pages
Navigation and Search
Multilingual Issues
Output (voice, pictures, virtual reality)
Synthetic Documents
Universal Library Design
Modular
Technology
Distributed
Mirror
plug-ins (e.g. machine translation)
sites
Multiple interfaces
Human
(languages, cultures, literacy)
Machine
Universal Library Design
Speech
input/output
Pictorial output
Language support
Translation
assistants
Summarization
tools
Synthetic documents
Encyclopedia-on-demand
Input Issues
Non-digital media
Conversion,
scanning, correction
Triple keyboard, uncorrected OCR
Digital media
Formats,
conversions, color representation
ASCII, HTML, SGML, XML, PDF, PS, TEX
JPEG, TIFF, GIF?
Input Issues
Structured
matter
Musical
notation, Laban
Chemistry
3D
Items
Resource allocation (what’s first?)
Duplication of effort (no registry)
Metadata
Data about an item not part of the item
Bibliographic
Format,
medium, encoding, resolution
Provenance
Reliability, integrity
Permissions
Who generates metadata?
Navigation
Making Sense Of The World’s Knowledge
Browsing, finding, searching, flying
Fractal view
are granularity and connectivity
View whole collections or one glyph
Keys
Understanding structure of information
Searching Mathematics
e
x2
2
sin x dx
0
2 2
2
9/ 4
Searching Mathematics
e
x2
2
sin x dx
0
MATHEMATICA Canonical Form:
Integrate[
Times[Power[E,Times[-1,Power[V1,2]]],
Sin[Power[V1,2]]],
{V1,0,Infinity}]
Multilingual Issues
Character
sets
Representations
Íîäà ôèçè÷åñêè íàõîäèòñÿ â çäàíèè Èçâåñòèé
Нода физически находится в здании Известий
Multilingual
navigation
Translation assistance
Synthetic Documents
Documents
derived automatically
from retrieved information
Multilingual
Abstracts,
translation
summaries, glossaries
Encyclopedia-on-demand
Information Reliability
Existence
Universal
validity
Library Philosophy
Avoid
value judgments
Provide information from which users
(and programs) can assess validity
Source,
reputation, recency, reviews,
consistency
Scaling Problems
Search
services (e.g. Altavista) index
8
>10 documents
Suppose
How
there were 1012 ?
can a billion users access the
same item at once?
Policy Challenges
Use
of copyrighted material
Economics (Who pays? Who gets?)
Privacy
Reliability of information
Change in the nature of teaching
Use Of © Content
Philosophy:
Authors,
must pay for use
publishers will not suffer
Implied
license
Automated permissions
Bulk licensing
Compulsory licensing
Owner
CAN’T refuse; user MUST pay
Economics
Flat-fee
subscriptions (e.g. HBO)
Metered use (electric company)
Microcharge (Tobias “clickl”)
Free (paid by government)
Automated permissions
Use measured by technology
Operating Model
Single portal for access to all information
Universal Library provides input, access,
multilingual, output and synthesis tools
Universal Library will be a model scanning
operation
Registry of digitized works
Operating Model
Specialized
collections curated by
specialists, provided to Universal
Library
Foreign collection performed in
foreign countries
Universal Library will be mirrored in
~12 sites around the world
Universal Library Status
>13,000 digital volumes
Art
Newspapers
Music, video
Portal to hundreds of other collections
Visit http://www.ulib.org
Projects
Navigator
Academic
electronic publishing
Electronic Union Catalog
Books out of copyright
books out of print
Software distribution
Conclusions and Recommendations
Conclusions
Barely 10% of all public information is available on the Internet
Government needs to play a leadership role in developing digital
libraries
Significant technical and operational challenges in migrating and
maintaining holdings in digital form
Intellectual Property rights need to be addressed to facilitate
creation and access digital libraries
Recommendations
Support research: meta data, scalability, multiple languages,
security, and usability
Create testbeds: million book project
Place all public governmental information online
Preserve IP rights of creators by creating tax incentives for
public use of online copyrighted information