Chapter 6 Text and Multimedia Languages and Properties
Download
Report
Transcript Chapter 6 Text and Multimedia Languages and Properties
Chapter 6
Text and Multimedia
..
Languages
and
Properties
.
Introduction
Document has given syntax and
structure
also has semantics
may have presentation style associated
with it
Figure 6.1 depicts all these relationships
document can also have information
about itself, called metadata
Syntax of document can express
different elements such as structure,
presentation style, semantics
one or more of these elements may be
given together
structural element (e.g. section) can have
fixed formatting style
Syntax of document can be
implicit in its content
expressed in declarative language or PL
current trend is to use languages
that provide information on document
structure
format
semantics
readable by humans and computers
SGML is one such language
Metadata
Metadata is data about data
metadata associated with text include
author
date of publication
source of publication
document length (in pages, words, bytes)
document genre (book, article, memo)
Machine Readable Cataloging Record
(MARC) is most used format for library
records
In Web, metadata used for many
purposes
cataloging
content rating (e.g. to protect children
from reading some type of document)
intellectual property rights
digital signatures (for authentication)
privacy levels (who should/should not
have access to document)
application to EC, etc.
New standard for Web metadata is
Resource Description Framework (RDF)
RDF allows description of Web
resources
consists of description of nodes and
attached attribute/value pairs
nodes can be any Web resource (any URI),
that include URL
attributes are properties of nodes, and their
values are text strings or other nodes
Text
With the advent of computers,
necessary to code text in binary digits
first coding schemes were EBCDIC
and ASCII
for internationalization of oriental
languages like Chinese or Japanese
Kanji, 16-bit Unicode (ISO10616)
exists
Text Formats
No single format for text document
in the past, IR systems would convert
document to internal format
cannot change content of document
current IR systems have filters to handle
most popular documents, in particular
Word, WordPerfect or Framemaker
Other text formats for document
interchange
Rich Text Format (RTF)
Portable Document Format (PDF)
used by word processors and has ASCII
syntax
developed for displaying and printing
documents
Multipurpose Internet Mail Exchange
(MIME)
used to encode electronic mail