Chapter 6 Text and Multimedia Languages and Properties

Download Report

Transcript Chapter 6 Text and Multimedia Languages and Properties

Chapter 6
Text and Multimedia
..
Languages
and
Properties
.
Introduction

Document has given syntax and
structure




also has semantics
may have presentation style associated
with it
Figure 6.1 depicts all these relationships
document can also have information
about itself, called metadata


Syntax of document can express
different elements such as structure,
presentation style, semantics
one or more of these elements may be
given together

structural element (e.g. section) can have
fixed formatting style

Syntax of document can be
implicit in its content
 expressed in declarative language or PL


current trend is to use languages

that provide information on document
structure
 format
 semantics

readable by humans and computers
 SGML is one such language

Metadata


Metadata is data about data
metadata associated with text include
author
 date of publication
 source of publication
 document length (in pages, words, bytes)
 document genre (book, article, memo)


Machine Readable Cataloging Record
(MARC) is most used format for library
records

In Web, metadata used for many
purposes
cataloging
 content rating (e.g. to protect children
from reading some type of document)
 intellectual property rights
 digital signatures (for authentication)
 privacy levels (who should/should not
have access to document)
 application to EC, etc.




New standard for Web metadata is
Resource Description Framework (RDF)
RDF allows description of Web
resources
consists of description of nodes and
attached attribute/value pairs


nodes can be any Web resource (any URI),
that include URL
attributes are properties of nodes, and their
values are text strings or other nodes
Text



With the advent of computers,
necessary to code text in binary digits
first coding schemes were EBCDIC
and ASCII
for internationalization of oriental
languages like Chinese or Japanese
Kanji, 16-bit Unicode (ISO10616)
exists
Text Formats


No single format for text document
in the past, IR systems would convert
document to internal format


cannot change content of document
current IR systems have filters to handle
most popular documents, in particular
Word, WordPerfect or Framemaker

Other text formats for document
interchange

Rich Text Format (RTF)


Portable Document Format (PDF)


used by word processors and has ASCII
syntax
developed for displaying and printing
documents
Multipurpose Internet Mail Exchange
(MIME)

used to encode electronic mail