Presentazione di PowerPoint
Download
Report
Transcript Presentazione di PowerPoint
MILOS: An Architecture for Multimedia
Digital Libraries and Content
Management Applications
Pasquale Savino
I.S.T.I.
Scope of Digital Library
technology
High
Semistructure
d data
Knowledge of
Users/Tasks
Databases
Digital Library Technologies
Information
Retrieval
Semantic
Web
Web
Low
Structure of Data
High
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Digital Libraries today
Focus on Cultural Heritage preservation and access
Access to OLAP (Online Public Access Catalog) of public
libraries, museums, etc. from the Web
New libraries (documents, images, audio/video) with digital
multimedia content.
Access based on standardized Metadata, generic (eg.
DublinCore) or area-specific
Distributed Web-based architectures
New services available:
– Multilingual access
– Personalization
– Recommendation
– Annotation
– Collection support
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Digital Library Vision
Digital libraries should enable any citizen to
access all human knowledge any time and
anywhere, in a friendly, multi-modal, efficient, and
effective way, by overcoming barriers of distance,
language, and culture and by using multiple
Internet-connected devices
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Digital Library Vision
DL Functionalities
Rich information needs
Multiple sources of related information
Heterogeneous information
Rich data sources
Multimedia information
Defined user populations
Motivated users
Task-orientation
Domain-orientation
Cross-lingual access
Collaboration
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Application areas
Multimedia digital archives
Publishing support
Broadcasting support
Production support
E-Learning
Corporate content management
Health and medicine
Biology
Government and Public
Administration
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
…
May 21-22 2004
Ormylia Art Diagnosis Centre
Why a Content Management
System
Digital libraries are used to manage documents
of many different types of data
Many different metadata models
DL software components are actually built only
for a specific use
–
Lack of general purpose building components
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
The main characteristics of the
MCMS
Flexibility
–
–
–
Scalability
–
–
Management of different types of data stored in different
repositories with different storage strategies
Capability of describing documents with arbitrary, and possibly
heterogeneous, metadata
Support of custom/personalized views on the metadata
schema used
Management of DLs of different sizes
Dealing with DL evolution
Efficiency
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
The MILOS MCMS
MILOS
is a general purpose Multimedia Content
Management System
Manages and serves any multimedia documents
– Manages any metadata of documents
–
MILOS
is based on a standard platform
Developed by using the Web Service technology, which provides,
in many cases support for authentication, authorization
management, distribution, etc.
– Mainly developed in Java
– Very easy installation (Drag and Drop)
– Exploitation of advanced XML native database technology
–
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
The MILOS MCMS
Search
capabilities:
Traditional fielded search capabilities
– Full text search (e.g. on video transcripts)
– Search on automatically associated classification categories
– Visual content similarity search
–
The
system is not tied to a specific metadata schema
Any XML encoded metadata can be managed by the system (e.g.
DC, MPEG-7, ECHO, proprietary model)
– Metadata mapping techniques are used to provide users with a
homogeneous view
– Several different and heterogeneous applications can be
supported
–
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Combined Search
MDEdit.
capabilities:
XML
SearchContent
Engine:
Multimedia
Web
services:
Retrieve
all videos
with
Structure
search
Metadata
Management
Server:
mountains
in
the
Fielded search
Interface Logic
Editor:
(SOAP
Web Service)
background
discussing
Full text search
Retrieval Interface:
MultimediaVisual
doc.
serv.:
Basic
about the
Afghanistan
Multimedia
search
Repository
JSP
(SOAP Web
Service)
heart quake,
andindependent
(SOAP
Comm.)
Schema
Metadata classified as foreign
(SOAP Comm.)
XQuery support
Integrator affair. (SOAP Web Service)
Business Logic
Search
Browser
Metadata independence:
The schema seen in Full
the Text Index
Multimedia
interface
be Cat. Index
Server logic canTopic
different of the one(s)
Data Logic
Visual Cnt. Index
used in the repository
…
MPEG-1
MPEG-2
JPEG
…
Metadata
Storage
Retrieval
ECHO
MPEG-7
Workshop on Novel Technologies
for Digital
Preservation, Information
Dublin
Core
Processing and Access to Cultural Heritage Collections
…
May 21-22 2004
Ormylia Art Diagnosis Centre
Metadata Storage and Retrieval
Based on a native XML database/repository
–
Metadata represented in XML
–
–
Arbitrary metadata structure allowed
Export/import of metadata easily managed
No XML schema definition is needed
–
Solutions based on the use of DB technology, may be too inefficient
for complex metadata models
Arbitrary and heterogeneous metadata representations
Search based on XQuery extended with similarity search support
Optional index definition for performance improvements
–
–
–
The system administrator can associate an index to specific XML
elements
Support for free text search
Image similarity search
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Multimedia Server
Storage of data of any media
Support of different storage strategies, which may
depend on the application (data size, access and
transfer time). The required strategy may change over
time.
DL application developers must not specify how and
where data are stored, but only what is the
performance they want
Use of a mapping between URNs and actual location
Use of rules (based on MIME types) to enforce specific
storage strategies
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Repository Metadata Integrator
Metadata independence (via metadata
mapping)
Use of schema mapping rules to map
application metadata into Metadata Storage
–
Each rule specifies how to translate a metadata field
known to the application into an XPath expression
used to access that field in the Metadata Storage
Mapping rules are used to specify the XQuery
statements executed in the Metadata Storage
and to transform them back into application
metadata
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Access to heterogeneous
metadata repositories
MILOS repository
based on ECHO
metadata
MILOS repository
based on MPEG-7
metadata
Application providing
a Dublin Core view on
metadata
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Ingestion of existing data and
metadata in MILOS
Repository using a
proprietary metadata
model
Ingestion of data and
metadata in MILOS
New metadata
immediately accessible.
Possibility to define
indexes to speed-up
retrieval
MILOS repository
based on the
proprietary metadata
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Distribution and multiple disk
storage
Multimedia
Server
MILOS repository
using multiple disk
storage
Multimedia
Server
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Examples of DL archives
Four DL have been ingested
Reuters Data Set
–
ACM Sigmod Record and DBLP data sets
–
–
–
810000 news agencies (2,6 GB), text and metadata encoded
in XML
Sigmod Record composed of 46 XML files
DBLP – one single XML file (187MB)
Different structure, one single interface through mapping
mechanisms
The ECHO data set
–
–
–
About 50 hours of historical documentaries (8000 videos),
coming from 4 different countries
43000 XML files (36MB), 21GB MPEG-1 video and Jpeg
Image similarity search based on MPEG-7 image descriptors
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Indexing videos
Main components:
Metadata editing
station
Automatic
processing services:
Speech recognition,
Segmentation,
Summarisation, …
Indexing
Workflows:
New Film
Entry point
Film
repository
Automatic
Processing
Metadata
repository
Video and
Metadata
repository
Manual
metadata
editing
Entry point
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
Searching videos
Main components:
Query formulation
• Metadata fields
• Audio transcripts
• Video key frames
• Cross-language queries
Examples of queries
–
Video Search
• Access to metadata DB
• Full text search on transcripts
• Image similarity search
• Cross-language retrieval on
selected metadata fields and
transcript
Metadata associated to the entire
video
–
Metadata associated to video shots
–
Transcript
Repository
Video and
Metadata
repository
E.g. find a shot where the audio
transcript contains the words
“Attentato Banca Nazionale
dell’Agricoltura”
Metadata associated to single
frames
Video key
frames
Repository
E.g. find b&w videos produced
before II world war by Istituto
Luce
E.g. find a video that contains a
frame similar to this image [the
image is provided as an example]
– Any combination of the previous
Workshop on Novel Technologies
cases for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
ECHO metadata model
Supports a multi-layer and hierarchical
description of audio-video documents
–
Description of different aspects of the same
document
The model can be adapted to specific
application needs
Describes metadata that are automatically
extracted as well as metadata manually
extracted
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
ECHO metadata model
Extends the IFLA-FRBR model
Four entities used to describe different aspect of a resource:
WORK
Describes a distinct intellectual or artistic
Describescreation
a distinct intellectual or artistic creation
It
is the abstract
idea ofrealisation
a creation of a work in
Intellectual
or artistic
do not specify
if we realize a book,
a film, oror
a
the Weform
of alphanumeric,
musical,
cartoon
choreographic
sound,
etc..
Intellectual
or
artisticnotation,
realization
ofimage,
a work
EXPRESSION
This is described by the Expression Entity
No information on the physical embodiment is
Examples of WORK are
given
The terrorist attack at Banca Nazionale
Examples
of Expression are:
dell’Agricoltura
TV2001:
news
theOdyssey,
terrorist
attack
Physical
embodiment
an
expression
A on
space
MANIFESTATION
Physical
embodiment
ofof
an…….
expression
A documentary
on themaps,
terrorist
E.g.
manuscripts, books,
sound, attack
CD_ROM
Interviews on the terrorist attack
………
A
single
exemplar
of a manifestation
A
single exemplar
of a manifestation
ITEM
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
WORK L EVEL
WORK
Expr essedBy
AVDOCUMENT
EXPRESSION
EXPRESSION LEVE L
VERSION
M anifestedBy
The ECHO
metadata
model
Par O
t f
TRANSCRI PT
Par O
t f
VIDEO
HasChanne sl
AUDIO
HasAudi o
HasTr anscript
Fol owedBy
l
Fol owedBy
l
HasTr anscript
MANIF ESTATION LEVEL
M ANIF ESTATION
SynchronisedW th
i
MEDIA
Avai abl
l e
As
ITEM L EVEL
I TEM
Legen a
d
Enti t y
I -s ARela t ishoi np s
Re lioa nt ps
s oh ni t eooWorkshop
en
Re l on
a ts iho inpoema tn y
STORAGE
on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre
MILOS Demo
Start
Workshop on Novel Technologies for Digital Preservation, Information
Processing and Access to Cultural Heritage Collections
May 21-22 2004
Ormylia Art Diagnosis Centre