A Motivating Scenario for Designing an Extensible AudioVisual Description Language Raphaël Troncy, Jean Carrive, Steffen Lalande and Jean-Philippe Poli Monday 25th of October,

Download Report

Transcript A Motivating Scenario for Designing an Extensible AudioVisual Description Language Raphaël Troncy, Jean Carrive, Steffen Lalande and Jean-Philippe Poli Monday 25th of October,

A Motivating Scenario for
Designing an Extensible AudioVisual Description Language
Raphaël Troncy, Jean Carrive,
Steffen Lalande and Jean-Philippe Poli
Monday 25th of October,
2004
Description of the AV content
• Various uses / Different granularity :
– identification of the content creator and the content
provider: Dublin Core metadata, VRA core categories, TV
Anytime metadata …
– feature extraction from the video signal: storing and
exchanging automatic tools results (MPEG-7)
– structural decomposition
in video segments
corresponding to a logical structure of the program: time-code,
spatial coordinates
– semantic description of these segments: controlled
vocabulary, thesaurus, free text annotation
Raphaël Troncy
CoRIMedia - 10/25/2004
2
Description of the AV content
(cultural heritage point of view)
• Segmentation
– locate and date some
events
time t
• Description
– type each segment with an
AV genre
– type each segment with a
general thematic
– give hints on the production
– describe the scene (who,
when, where, what, …)
report
fade in/out
athletics
Michael Johnson smashed the 200m
world record to complete a 200m in
19''32 in Atlanta for the Olympic Games
⇒ needs a powerful description language
Raphaël Troncy
CoRIMedia - 10/25/2004
3
Motivating scenario
• Generic application for describing manually TV
programs w.r.t:
– structural constraints: patterns represent the logical
structure of a document
– semantic constraints: the description of the content is
machine understandable
• Let us define the temporal structure of a Sports
Magazine
Raphaël Troncy
CoRIMedia - 10/25/2004
4
MPEG-7, the natural candidate
description language?
• ISO standard since
December of 2001
• Main components:
– Descriptors (Ds) and
Description Schemes
(DSs)
– DDL (XML Schema +
extensions)
• Concern all types of
media
Collections
Content organization
Models
Navigation &
Access
Creation &
Production
User
interaction
User
Preferences
Summaries
Media
Usage
Content management
Views
User
History
Content description
Structural
aspects
Semantic
aspects
Variations
Basic elements
Schema
Tools
Basic
datatypes
Links & media
localization
Basic
Tools
Part 5 - MDS
Raphaël Troncy
CoRIMedia - 10/25/2004
5
MPEG-7: a non-suitable description
language for this scenario
1. A non-extensible language
•
closed set of descriptors
2. Exchange syntax rather than a real machine
processable multimedia description language
•
•
non object-based data model
non modular language (universal approach)
3. No formal semantics provided
•
applications cannot have access to the meaning of
the documents
⇒ the DDL (XML Schema) fault ?
Raphaël Troncy
CoRIMedia - 10/25/2004
6
MPEG-7: a non-suitable description
language for this scenario
• How to define new descriptors ?
• How to define new description schemes ?
• How to make the description machine
understandable ?
⇒ how to reconciliate the critical issue
object-oriented semantic expression
versus structural validation
Raphaël Troncy
CoRIMedia - 10/25/2004
7
Our proposition: AVDL
• AVDL: a reduced yet extensible audio-visual
description language
– an object meta-model (an instance model specifies
the vocabulary for and the rules followed by the
descriptions)
– an XML syntax
– a semantics (closed to DL for the descriptors)
• Description Schemes
• Descriptions
– Descriptors
– Properties
– Structures
Raphaël Troncy
– valid instances w.r.t
description schemes
CoRIMedia - 10/25/2004
8
The meta class level
Raphaël Troncy
CoRIMedia - 10/25/2004
9
The class level
Raphaël Troncy
CoRIMedia - 10/25/2004
10
Location
Raphaël Troncy
CoRIMedia - 10/25/2004
11
Document, Content and Media
• Distinction :
– Document vs Content vs
Media
– Virtual content vs
physical content
• Media: a content
abstraction for
decomposition
– audio tracks, subtitles
Raphaël Troncy
CoRIMedia - 10/25/2004
12
Defining Structures
• A structure defines how the descriptors may and have to be
combined
– allows a description control
– allows an automatic completion of the descriptions
• AVDL provides some predefined structure models
– containment : gives the list of the possible sub-segments of an AV
segment (in space and in time)
– regular expression : by analogy of grammar for temporal succession
• Other models are currently studied: temporal constraints, etc.
Raphaël Troncy
CoRIMedia - 10/25/2004
13
AVDL Implementation
• XML Serialization
– Independent from a schema language
– Use XML Schema validation (mainly for
datatypes)
• C#
– Object inheritance
– Use of the .NET reflexivity
Raphaël Troncy
CoRIMedia - 10/25/2004
14
XML Serialization
avdl.xsd
Audio-Visual
Description
Language
partial
control
ds-17.xsd
partial control
ds-17.xml
transformation
Description
Schemes
Raphaël Troncy
d-162.xml
Descriptions
CoRIMedia - 10/25/2004
15
XML Syntax (DS)
<Descriptor
xsi:type="LocatedDescriptorType"
id="id-d2" name="Tracking">
<Property id="id-p2"
name="nbDetection">
<Property
ref="id-p2"/>
<Domain descriptor="id-d2"/>
<Structure
<Range> ref="id-s2"/>
<DescriptionRelationship
characterization="string">
<Primitive nameType="int"/>
<Location type="TemporalInterval"/>
</Range>
<Media type="Media"/>
</Property>
</DescriptionRelationship>
<Structure
id="id-s2" name="TrackingStructure">
</Descriptor>
<FormalModel>
<Constraint type="temporal" validation="full" method="system
parser="XMLSchema">
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element name="Detection" type="DetectionType"/>
</xsd:sequence>
</Constraint>
</FormalModel>
</Structure>
Raphaël Troncy
CoRIMedia - 10/25/2004
16
XML Syntax (Descriptions)
<Tracking type="LocatedDescriptorType"
<Structure
constraintType="temporal"> nbDetection="1">
<DescriptionRelationship>
<Detection type="LocatedDescriptorType" nbFeature="1">
<Location>
<DescriptionRelationship>
<avdl:Begin
<Location>timeRef="147329280"/><avdl:End timeRef="147329280"/>
</Location>
<avdl:Instant timeRef="147329280"/>
<Media
id="CPB86006610.mpg" name="CPB86006610.mpg" contentID="CPB86006610.mpg"/>
</Location>
</DescriptionRelationship>
<Media id="CPB86006610.mpg" name="CPB86006610.mpg"
contentID="CPB86006610.mpg" frameHeight="288" frameWidth="352"/>
</DescriptionRelationship>
<Structure constraintType="spatial">
<Feature xsi:type="FaceType">
<DescriptionRelationship>
<Location>
<avl:BoundingBox>
<avdl:NE numX="92" denX="352" numY="217" denY="288"/>
<avdl:NW numX="92" denX="352" numY="267" denY="288"/>
<avdl:SE numX="136" denX="352" numY="217" denY="288"/>
<avdl:SW numX="136" denX="352" numY="267" denY="288"/>
</avdl:BoundingBox>
</Location>
Raphaël Troncy
CoRIMedia - 10/25/2004
17
...
.NET implementation
ds-17.dll
read/write
Memory
.NET
instanciation
parsing
ds-17.xml
d-162.xml
Description
Schemes
Raphaël Troncy
parsing
Descriptions
CoRIMedia - 10/25/2004
18
Two kinds of applications
• Static Description Schemes
– DS are well-known
– The developer uses generated libraries
• Dynamic Description Schemes
– DS are created by the application
– Use of the dynamic instantiation mechanism
(reflexivity) of .NET
Raphaël Troncy
CoRIMedia - 10/25/2004
19
Carrying out the scenario
• Definition of new descriptors and properties
– associating behavior with the corresponding classes
– performing reasoning on the descriptions with the
formal definitions in OWL
• Definition of logical and temporal structures
– the description is controlled and validated by a
grammar
Raphaël Troncy
CoRIMedia - 10/25/2004
20
Conclusion and Future Work
• AVDL: a reduced yet extensible Audio-Visual
Description Language
– descriptors, properties, structures
– XML syntax and DL semantics
– .NET implementation and APIs
• About structure validation:
– which constructors used ? which semantics ?
• Trade-of expressivity vs calculability
– OWL Full is undecidable
– constraints satisfaction problems can be complex
Raphaël Troncy
CoRIMedia - 10/25/2004
21