Transcript Document
Content - part 2
Week 4
Tonight
• More detailed look at metadata
description of content
• No access to a network today, so not all
the updating I would like to do… Sorry.
Google Books Project
• Michael A. Keller, Closing Keynote
–
–
–
–
Ida M. Green University Librarian at Stanford,
Director of Academic Information Resources,
Publisher of HighWire Press, and
Publisher of the Stanford University Press:
• "One good turn deserves another; how the
Google Book Search project is benefiting
everyone".
Google Books demo
• Full text - Life of Miguel de Cervantes
• Limited Preview - The Life of Miguel de
Cervantes Saavedra
• Snippet View - "Discreción" in the
Works of Cervantes: A Semantic Study
What has been accomplished
• As of September 2006
• Nearly 30,000 Stanford books digitized
– ~1M books from all partner libraries
• Over 4,000 books identified as needing
preservation treatment (& so not digitized)
• A great debate about copyright has started
– Orphan works
– What can an archive do to provide access
– Defense of fair use underway
• Today’s news:
www.pcworld.com/article/172315/google_books_wont_hit_digital_shelves_anytime_soon.html
This slide is taken from the presentation by Michael A. Keller at ECDL 2006
Original Principles
• If legally possible, digitize every book (9M
volumes) in the Stanford libraries
– Now digitizing with imprint dates up to 1963
• Partner libraries (*added recently)
–
–
–
–
–
–
University of Michigan (similar to Stanford)
Harvard (public domain (?), maybe > 1M)
NYPL (public domain, unusual collections)
Oxford - Bodleian (earlier than 1885, ~ 1M titles)
University of California (similar to Stanford >6M)
(more to follow)
This slide is taken from the presentation by Michael A. Keller at ECDL 2006
Purposes
• Digital preservation
– Virtual Bookshelves in Stanford Digital Repository under
construction as part of the Stanford Digital Repository
– For Stanford use only
• Other searching and research functions
– Subtle searching (as in Socrates & HighWire)
– Taxonomic (LCSH & HighWire) & Associative Searching
(Takano)
– Citation linking (HighWire & “InforTools” (Ebrary)
– Better navigation (through visualization ?) (Grokker)
• Digitized books from all sources as test bed for new
research; combine with articles, datasets, etc. for
data mining & other transformative uses.
This slide is taken from the presentation by Michael A. Keller at ECDL 2006
Some Conclusions
• Google Book Search
– Is an indexing, not a publishing project
– Offers substantial increases in access to contents of books in library
collections by keyword searching
– Offers publishers global marketing of their publications
– Offers several useful services to readers
• Offers participating libraries
– Digital copies of books on their shelves for preservation
– New possibilities for services to local readers
– New possibilities for research for local faculty & students
• Note – recent settlement between Google and
publishers. -- anyone hear about that?
This slide is taken from the presentation by Michael A. Keller at ECDL 2006
Google Books of 2007
•
In May, the Cantonal and University Library of Lausanne, and Ghent University
Library join the Book Search program, adding a substantial amount of books in
French, German, Flemish, Latin and other languages, and bringing the total
number of European libraries partners to six.
•
In July, we add a "View plain text" link to all out-of-copyright books. T.V. Raman
explains how this opens the book to adaptive technologies such as screen
readers and Braille display, allowing visually impaired users to read these books
just as easily as users with sight.
•
By December, the Book Search interface is available in over 35 languages, from
Japanese to Czech to Finnish. Over 10,000 publishers and authors from 100+
countries are participating in the Book Search Partner Program. The Library
Project expands to 28 partners, including seven international library partners:
Oxford University (UK), University of Complutense of Madrid (Spain), the
National Library of Catalonia (Spain), University Library of Lausanne
(Switzerland), Ghent University (Belgium) and Keio University (Japan).
Open Content Alliance
•
The Open Content Alliance (OCA) is a collaborative effort of a
group of cultural, technology, nonprofit, and governmental
organizations from around the world that helps build a permanent
archive of multilingual digitized text and multimedia material. An
archive of contributed material is available on the Internet Archive
website and through Yahoo! and other search engines and sites.
•
The OCA encourages access to and reuse of collections in the
archive, while respecting the content owners and contributors.
Contributors to the OCA have agreed to the principles set forth in
the Call for Participation.
•
The Open Content Alliance is administered by the Internet Archive,
a 501c3 non-profit library.
http://www.opencontentalliance.org/about/
European Digital Library Project
EDLproject was a Targeted Project funded by the European
Commission under the eContentplus Programme and coordinated
by the German National Library.
The project, started in September 2006 and completed in February
2008, worked towards the integration of the bibliographic
catalogues and digital collections of the National Libraries of
Belgium, Greece, Iceland, Ireland, Liechtenstein, Luxembourg,
Norway, Spain and Sweden, into The European Library.
EDLproject also addressed the enhancement of multilingual
capabilities of The European Library portal, took first steps towards
collaboration between The European Library and other non-library
cultural initiatives, and expanded the marketing and communication
activities of The European Library service. To learn more click here.
Comments? Discussion?
A DL example
• Library of Congress American Memory
project
– http://memory.loc.gov/ammem/index.html
– “American Memory provides free and open access through
the Internet to written and spoken words, sound recordings,
still and moving images, prints, maps, and sheet music that
document the American experience. It is a digital record of
American history and creativity. These materials, from the
collections of the Library of Congress and other institutions,
chronicle historical events, people, places, and ideas that
continue to shape America, serving the public as a resource
for education and lifelong learning.”
Dublin Core for a map
• Map found in the LOC American Memory
collection
– Map at
http://memory.loc.gov/ammem/gmdhtml/gmdhome.html
• Dublin Core metadata illustration found at
http://webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
– Part of a DL course at U. of Alabama
Go to web site to explore what is there -including copyright information, title,
history, etc.
Dublin Core: Title
• Name given, usually by the creator or publisher
< META name = “DC.Title”
content = “Novi Belgii Novæque Angliæ:nec non
partis Virginiæ tabula multis in locis emendata ”
lang = “la”
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Subject
• What the work is about, possibly
keywords, terms from classification
scheme if available.
<META name = “DC.Subject”
content = “Middle Atlantic States - Maps
- Early works to 1800 - Facsimilies”
scheme = “LCSH”
LCSH = Library of Congress Subject Headers
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Description
• Free text description, abstract, etc.
<META
name = DC.Description”
content = “An (sic) historical map
showing the coast of New Jersey as
perceived in the senventeenth century”
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Source
• Is this object derived from another? Is
this map a part of a larger map? Is this
text a variation or revision of another
piece of text?
<META
name = “DC.Source”
content = “G3715 1685 .V5 1969”
scheme = “LCCN” LCCN = Library of Congress Call Number
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Language
• Language of the content of the resource
• For the map, there is no language
content
<META
name = “DC.Language”
content = “nl”
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Relation
• To what other object(s) or collection is this object
related? Does it also exist in another collection? Is it
derived from another document or image? How is it
related?
<META
name = “DC.Relation”
content = “isPartOf
http://lcweb2.loc.gov/cgibin/query/r?ammem/gmd:@filreq(@field(NUMBER+@band(g3715+ct000001))+@field(COLLID+dsxpmap))
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Creator
• Person or organization responsible for
the Intellectual Content of this object
<META
name = “DC.Creator”
content = “Nicolaum Visscher”
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Publisher
• Entity responsible for making the
resource available in its present form
• Not shown in the example, but should
be something like this:
<META name = “DC.Publisher”
content = “Library of Congress
American Memory Project”
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Contributor
• Any entity making a contribution to this
object.
• Example: someone who added some
information to the original document or
image
• No entry for this map.
Dublin Core: Rights
• A pointer to a copyright notice, a rights
management statement, or a rights server.
<META
name = “DC.Rights”
content =
http://lcweb2.loc.gov/cgi-bin/ ammemrr.pl
?title=%3ca%20href%3d%22%2fammem%2fgmdhtml
%2fdsxphome.html%22%3eDiscovery%20and%20Exploration
%3c%2fa%3e&coll=gmd&div=&agg=g3715&default=ammem &dir=ammem
>
Dublin Core: Date
• Date on which this object was made available
in its present form, possibly the date it was
entered into this digital collection.
<META
name = “DC.DATE”
content = “1996-04-17”
scheme = “ISO 8601”
Specify the date format so that others can interpret it correctly
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Type or Category
• What sort of thing is this? Some
examples: home page, novel, poem,
working paper, technical report, essay
dictionary, …
• Type should be selected from a
controlled list. For example, see the
DCMI Type Vocabulary:
•
http://dublincore.org/documents/2006/08/28/dcmi-type-vocabulary/
Why is this recommended as a controlled vocabulary field?
DCMI Type Vocabulary
•
•
•
•
•
•
Collection
Dataset
Event
Image
InteractiveResource
MovingImage
•
•
•
•
•
•
PhysicalObject
Service
Software
Sound
StillImage
Text
See the official page for explanations of the categories.
Note that Image is a broad category and Moving Image and
StillImage are more restricted subcategories.
Dublin Core: Type
• Category of this resource
<META
name = “DC.Type”
content = “image.photograph”
>
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Dublin Core: Format
• The way the content is encoded. This
tells what resource is needed to access
this content.
<META
name=“DC.Format”
content = “image/gif”
Internet MIME Types:
http://www.ltsw.se/knbase/internet/mime.htp
scheme = “IMT”
See also Internet Media Type:
>
http://www.graphcomp.com/info/specs/mime.html
Dublin Core: Unique ID
• The key for this object in the collection.
• I cannot find one for the map we are looking
at, but the ID for the map of which it is a part
is g3715 ct000001
• The Metadata specification for that would be
<META name= “DC.Id”
content = “g3715 ct000001”
>
Source: http://memory.loc.gov/cgi-bin/query/r?ammem/gmd:@filreq(@field(NUMBER+
@band(g3715+ct000001))+@field(COLLID+dsxpmap))
Dublin Core: Coverage
• The time, space or other measurement of the
scope or completeness of the object.
• No coverage entry specified, but might be
this:
<META
name = “DC.Coverage”
content = “North America, Eastern lands and
coast, as viewed in late seventeenth century”
Example not a controlled vocabulary. Why
>
would a controlled vocabulary be better?
International Concensus
• Recognition of International Scope of
Resource Discovery on Web
• 17 Countries Currently Involved in DC
Working Groups
• 50+ Implementation Projects in 10
Countries
Source:
webapp.slis.ua.edu/smmweb/DLib/Metadata/OrganizingInternetResources_files/v3_document.htm
Guide to Good Practice
• The NINCH Guide to Good Practice in the Digital
Representation and Management of Cultural
Heritage Materials
• http://www.nyu.edu/its/humanities/ninchguide/index.html
Access Control and Rights
Management
Legal and Technical Issues
• Legal: When is a resource available to
digitize and make available. What
requirements exist for controlling access.
• Technical: How do we control access to a
resource that is stored online?
– Policies
– Encoding
– Distribution limitations
Date of work
Protected from
Term
Created 1-1-78
or after
When work is fixed in tangible Life + 70 years1(or if work of
corporate authorship, the shorter of
medium of expression
Published before
1923
In public domain
None
Published 1923 63
When published with notice
28 years + could be renewed for 47
years, now extended by 20 years for
a total renewal of 67 years. If not so
renewed, now in public domain
Published from
1964 - 77
When published with notice
28 years for first term; now automatic
extension of 67 years for second term
Created before 11-78 but not
published
1-1-78, the effective date of the
1976 Act which eliminated
common law copyright
Life + 70 years or 12-31-2002,
whichever is greater
Created before
1-1-78 but
published
between then and
12-31-2002
1-1-78, the effective date of the
1976 Act which eliminated
common law copyright
Life + 70 years or 12-31-2047
whichever is greater
95 years from publication, or 120
years from creation
Chart created by Lolly Gasaway. Updates at
http://www.unc.edu/~unclng/public-d.htm
Works for hire
• Usual case -- works created by faculty
are not the property of the university.
– Faculty surrender copyright to publishers of
journals and books
– Some publishers allow faculty to retain
copyright, giving the publisher specific
limited rights to reproduce and distribute
the work.
Fair use
• No clear, easy answers.
• Checksheet provided in the article is a
good guide to the issues.
• Link to the checksheet:
http://www.copyright.iupui.edu/checklist.htm
Moral rights
• Fair to the creator
– Keep the identity of the creator of the work
– Do not cut the work
– Generally, be considerate of the person (or
institution) that created the work.
Getting Permission
• With the best will in the world, getting the appropriate
permissions is not always easy.
– Identify who holds the rights
– Get in touch with the rights holder
– Get a suitable agreement to cover the needs of your use.
• Useful links:
http://www.loc.gov/copyright/
http://www.utsystem.edu/OGC/IntellectualProperty/PERMISSN.HTM
– Connections to various ways to discover and contact the
rights holder of a work.
Checking copyright
status
Source: NINCH Guide to
Good Practice. Chapter 4:
Rights Management
Considering
people
depicted in
the work
Source: NINCH
Guide to Good
Practice. Chapter
4: Rights
Management
Copyright: Lauryn
G. Grant
Technical issues
• Link the resource to the copyright statements
• Maintain that link when the resource is copied
or used
• Approaches:
–
–
–
–
Steganography
Encryption
Digital Wrappers
Digital Watermarks
Issues in Encryption
• General cases for protection of controlled content:
Concern for passive listening, active interference.
– Listening: intruder gains information, may not be detected.
Effects indirect.
– Active interference
• Intruder may prevent delivery of the message to the intended
recipient.
• Intruder may substitute a fake message for the intended one
• Effects are direct and immediate
• Less likely in the case of digital library content
Message interception
Encoding
Method
Ciphertext
Eavesdropping
Decoding
Method
Masquerading
Original
message
Received
message
(Plain text)
(Plain text)
Intruder
Types of Encryption Methods
• Substitution
– Simple adjustment, Caesar’s cipher
• Each letter is replaced by one that is a fixed distance from it in
the alphabet. A becomes D, B becomes E, etc. At the end,
wrap around, so X becomes A, Y becomes B, Z becomes C.
• May have been confusing the fist time it was done, but it would
not have taken long to figure it out.
• Note the simple example at geocaching.com
– No intention to hide or confuse. Just keep a person from seeing too much
information about the hide, unless the person wants to see the help.
– Simple substitution of other characters for letters -- numbers,
dancing men, etc.
– More complex substitution. No pattern to the replacement
scheme.
• See common cryptogram puzzles. These are usually made
easier by showing the spaces between the words. (For very
modern version, see http://www.cryptograms.org/)
Dancing Men????
• Arthur Conan Doyle: The Adventure of
the Dancing Men. A Sherlock Holmes
Adventure.
“Speaking roughly, T, A, O, I, N, S, H, R, D, and L are the numerical order in
which letters occur; but T, A, O, and I are very nearly abreast of each other,
and it would be an endless task to try each combination until a meaning was
arrived at.”
Read the story online and see the images and analysis of the
decoding at http://camdenhouse.ignisart.com/canon/danc.htm
Types of encryption - 2
Hiding the text.
• The wax tablet example
– message written on the base of the tablet and wax put over
top of it with another message on the wax
• Steganography: (ste-g&n-o´gr&-fē) (n.) The art and
science of hiding information by embedding messages
within other, seemingly harmless messages.
Steganography works by replacing bits of useless or
unused data in regular computer files (such as
graphics, sound, text, HTML, or even floppy disks )
with bits of different, invisible information. This hidden
information can be plain text, cipher text, or even
images.
• Special software is needed for steganography, and there are
freeware versions available at any good download site.
• Can be used to insert identification into a file to track its source.
Definition from www.webopedia.com
Types of encryption - 3
• Key-based shuffling
– Using a mnemonic to make the key easy to
remember.
• A machine to do the shuffling
A
A
B
B
C
C
D
D
What shuffling is used?
How would “CAB” look?
Monoalphabetic codes
• Any kind of substitution in which just one letter (or
other symbol) represents one letter from the original
alphabet is called monoalphabetic encoding.
– Such codes are easy to break. That is what you do when
you solve cryptograms.
– Frequency distribution of letters in normal text for a given
language are well known.
• “The twelve most frequently-used letters in the English
language are ETAOIN SHRDL, in that order.” -http://www.cryptograms.org/
Letter distributions in English
A
7.81%
N
7.28%
TH
3.18
OU
0.72 THE
6.42
B
1.28
O
8.21
IN
1.54
IT
0.71 OF
4.02
C
2.93
P
2.15
ER
1.3
ES
0.69 AND
3.15
D
4.11
Q
0.14
RE
1.30
ST
0.68 TO
2.36
E
13.05
R
6.64
AN
1.08
OR
0.68 A
2.09
F
2.88
S
6.46
HE
1.08
NT
0.67 IN
1.77
G
1.39
T
9.02
AR
102
HI
0.68 THAT
1.25
H
5.85
U
2.77
EN
1.02
EA
0.64 IS
1.03
I
6.77
V
1.00
TI
1.02
VE
0.64 I
0.94
J
0.23
W
1.49
TE
0.98
CO
0.59 IT
0.93
K
0.42
X
0.30
AT
0.88
DE
0.55 FOR
0.77
L
3.60
Y
1.51
ON
0.84
RA
0.55 AS
0.76
M
2.62
Z
0.09
HA
0.84
RO
0.55 WITH
0.76
SOURCE: Tannenbaum Computer Networks 1981 Prentice Hall
Disguising frequencies
• First trick: use more than 26 symbols
and use several different symbols to
represent the same letter. The goal is
to even out the distribution.
• Ex. Use the letters plus the digits.
– 36 symbols
– Assign five symbols to the letter E, two to
the letter I, three to the letter N, two each to
R and S.
More complex
• Vigenere’s table
• Arrange all the letters of the alphabet 26 times, in
parallel columns, such that each column begins with
a different letter, first A, then B, etc.
• Encode each letter by using a different column for
each successive letter of the message.
• How to know which column to use? Use a keyword.
Examples and breaking:
http://www.trincoll.edu/depts/cpsc/cryptography/vigenere.html
Decoding
• The Vigenere cipher looks really hard, but is not
secure. Since the keyword repeats, it is really just a
bunch of monoalphabetic codes. If you can figure out
the length of the keyword, you can do standard
analysis.
• Making it harder - instead of a regular arrangement of
the letter columns, scramble them in some arbitrary
way.
– Makes decoding much more difficult, but also makes it
difficult to have the arrangement known to the people who
are supposed to be able to read the message.
Enigma
• Suppose we take a conversion for the first letter of
the message and a different mapping for the next
letter and a different mapping for the next letter …
• That is what we did with Vigenere
• Add additional encodings. Rotate from a fixed starting
point through 26 positions of the first set of columns,
then iterate a second set of columns. Now have 676
different mappings.
• To decode, must figure out the wiring inside each
phase, and the order in which they are arranged in
the machine.
Enigma
• German engineer, Artur Scherbius (18781929) invented a machine of this type around
1918 and bought the patent rights to one
invented in Holland also. He added a
reflecting cylinder, which allowed the same
machine to encode and decode. He called
the machine enigma, from the Greek for
riddle.
• The enigma used by the Germans in WWII
had three rotors, and later four.
Enigma - 2
Encryption/Decryption Keys
• Problem is that you have to get the key to the
receiver, secretly and accurately.
• If you can get the key there, why not use the same method
to send the whole message? (Efficiency of scale)
• If the key is compromised without the communicators
knowing it, the transmissions are open.
• Exact working of the enigma machine:
– http://www.codesandciphers.org.uk/enigma/rotorspec.htm
• How Polish mathematicians broke the enigma
– http://www.codesandciphers.org.uk/virtualbp/poles/poles.htm
Summary of encryption goals
•
•
•
•
High level of data protection
Simple to understand
Complex enough to deter intruders
Protection based on the key, not the
algorithm
• Economical to implement
• Adaptable for various applications
• Available at reasonable cost
Data Encryption Standard
• Complex sequence of transformations
– hardware implementations speed performance
– modifications have made it very secure
• Known algorithm
– security based on difficulty in discovering the key
• http://www.itl.nist.gov/fipspubs/fip46-2.htm
The Data Encryption Standard Illustrated
64 bit blocks, 64 bit key
Federal InformationProcessing Standards 46-2 http://www.itl.nist.gov/fipspubs/fip46-2.htm
INTERNET-LINKED COMPUTERS CHALLENGE DATA ENCRYPTION
STANDARD
LOVELAND, COLORADO (June 18, 1997). Tens of thousands of computers, all
across the U.S. and Canada, linked together via the Internet in an unprecedented
cooperative supercomputing effort to decrypt a message encoded with the governmentendorsed Data Encryption Standard (DES).
Responding to a challenge, including a prize of $10,000, offered by RSA Data
Security, Inc, the DESCHALL effort successfully decoded RSADSI's secret message.
According to Rocke Verser, a contract programmer and consultant who developed
the specialized software in his spare time, "Tens of thousands of computers worked
cooperatively on the challenge in what is believed to be one of the largest
supercomputing efforts ever undertaken outside of government."
Using a technique called "brute-force", computers participating in the challenge
simply began trying every possible decryption key. There are over 72 quadrillion keys
(72,057,594,037,927,936). At the time the winning key was reported to RSADSI, the
DESCHALL effort had searched almost 25% of the total. At its peak over the recent
weekend, the DESCHALL effort was testing 7 billion keys per second.
Public Key encryption
• Eliminates the need to deliver a key
• Two keys: one for encoding, one for
decoding
• Known algorithm
– security based on security of the decoding
key
• Essential element:
– knowing the encoding key will not reveal
the decoding key
Effective Public Key
Encryption
• Encoding method E and decoding method D are inverse
functions on message M:
– D(E(M)) = M
• Computational cost of E, D reasonable
• D cannot be determined from E, the algorithm, or any
amount of plaintext attack with any computationally
feasible technique
• E cannot be broken without D (only D will accomplish the
decoding)
• Any method that meets these criteria is a valid Public
Key Encryption technique
It all comes down to this:
• key used for decoding is dependent
upon the key used for encoding,
but the relationship cannot be
determined in any feasible
computation or observation of
transmitted data
Rivest, Shamir, Adelman
(RSA)
• Choose 2 large prime numbers, p and
q, each more than 100 digits
• Compute n=p*q and z=(p-1)*(q-1)
• Choose d, relatively prime to z
• Find e, such that e*d=1 mod (z)
– or e*d mod z = 1, if you prefer.
• This produces e and d, the two keys that define the E
and D methods.
Public Key encoding
• Convert M into a bit string
• Break the bit string into blocks, P, of size k
– k is the largest integer such that 2k<n
– P corresponds to a binary value: 0<P<n
• Encoding method
– E = Compute C=Pe(mod n)
• Decoding method
– D = Compute P=Cd(mod n)
• e and n are published (public key)
• d is closely guarded and never needs to be
disclosed
An example:
•
•
•
•
P=7; q=11; n=77; z=60
d=13; e= 37; k=6
Test message = CAT
Using A=1, etc and 5-bit representation:
– 00011 00001 10100
• Since k=6, regroup the bits (arrange right to left so that any
padding needed will put 0's on the left and not change the value):
– 000000 110000 110100
(three leading zeros added to fill the block)
• decimal equivalent: 0 48 52
• Each of those raised to the power 37 (e) mod n: 0 27 24
• Each of those values raised to the power 13 (d) mod n (convert
back to the original): 0 48 52
A practical note
• There is a lot more to security than
encryption.
• Encryption coding is done by a few experts
• Understanding how the common encryption
algorithms work is useful in choosing the right
approach for your situation.
• Our interest here is in providing assurance
that access to protected resources will be
limited to those with legitimate rights.
On a practical note: PGP
• You can create your own real public and
private keys using PGP (Pretty Good Privacy)
• See the following Web site for full information.
• (MIT site - obsolete)
• http://www.pgpi.org/products/pgp/versions/freeware/
• http://www.freedownloadscenter.com/Utilities/Required_Files/
PGP.html
Issues
• Intruder vulnerability
– If an intruder intercepts a request from A for B’s public key,
the intruder can masquerade as B and receive messages
from B intended for A. The intruder can send those same or
different messages to B, pretending to be A.
– Prevention requires authentication of the public key to be
used.
• Computational expense
– One approach is to use Public Key Encryption to send the
Key for use in DES, then use the faster DES to transmit
messages
Digital Signatures
• Some messages do not need to be
encrypted, but they do need to be
authenticated: reliably associated with
the real sender
– Protect an individual against unauthorized
access to resources or misrepresentation
of the individual’s intentions
– Protect the receiver against repudiation of
a commitment by the originator
Digital Signature basic
technique
Intention to send
Sender
A
E(Random Number)
where E is A’s public key
Message and
D(E(Random Number))
= Random Number,
decoded as only A
could do
Receiver
B
Public key encryption with implied
signature
•
•
•
•
•
Add the requirement that E(D(M)) = M
Sender A has encoding key EA, decoding key DA
Intended receiver has encoding (public) key EB.
A produces EB(DA(M))
Receiver calculates EA(DB(EB(DA(M))))
– Result is M, but also establishes that only A could
have encoded M
Digital Signature Standard
(DSS)
• Verifies that the message came from
the specified source and also that the
message has not been modified
• More complexity than simple encoding
of a random number, but less than
encrypting the entire message
• Message is not encoded. An
authentication code is appended to it.
Digital Signature - SHA
FIPS Pub 186 - Digital Signature Standard http://www.itl.nist.gov/fipspubs/fip186.htm
Encryption summary
• Problems
– intruders can obtain sensitive information
– intruder can interfere with correct
information exchange
• Solution
– disguise messages so an intruder will not
be able to obtain the contents or replace
legitimate messages with others
Important methods
• DES
– fast, reasonably good encryption
– key distribution problem
• Public Key Encryption
– more secure
• based on the difficulty of factoring very large numbers
– no key distribution problem
– computationally intense
Digital signatures
• Authenticate messages so the sender
cannot repudiate the message later
• Protect messages from changes during
transmission or at the receiver’s site
• Useful when the contents do not need
encryption, but the contents must be
accurate and correctly associated with
the sender
Legal and ethical issues
• People who work in these fields face
problems with allowable exports, and are not
always allowed to talk about their work.
• Is it desirable to have government able to
crack all codes?
• What is the tradeoff between privacy of law
abiding citizens vs. the ability of terrorists and
drug traffickers to communicate in secret?
Tonight
•
•
•
•
Further detail of Dublin Core
Look at another DL
Google Books example
Access management
– Encryption
– Digital Signatures