A Reduced Yet Extensible AudioVisual Description Language: How to Escape From The MPEG-7 Bottleneck Raphaël Troncy, Jean Carrive Thursday 28th of October, 2004

Transcript A Reduced Yet Extensible AudioVisual Description Language: How to Escape From The MPEG-7 Bottleneck Raphaël Troncy, Jean Carrive Thursday 28th of October, 2004

A Reduced Yet Extensible AudioVisual Description Language:
How to Escape From The MPEG-7 Bottleneck
Raphaël Troncy, Jean Carrive
Thursday 28th of October, 2004
Description of the AV content
• Various uses / Different granularity :
– identification of the content creator and the content
provider: Dublin Core metadata, VRA core categories, TV
Anytime metadata …
– feature extraction from the audio/video signal: storing
and exchanging automatic tools results (MPEG-7)
– structural decomposition
in video segments
corresponding to a logical structure of the program: time-code,
spatial coordinates
– semantic description of these segments: controlled
vocabulary, thesaurus, free text annotation
10/28/2004
ACM DocEng'04 - Raphaël Troncy
1
Description of the AV content
(cultural heritage point of view)
• Segmentation
– locate and date some
events
time t
• Description
– type each segment with an
AV genre
– type each segment with a
general thematic
– give hints on the production
– describe the scene (who,
when, where, what, …)
report
fade in/out
athletics
Michael Johnson smashed the 200m
world record to complete a 200m in
19''32 in Atlanta for the Olympic Games
⇒ needs a powerful description language
10/28/2004
ACM DocEng'04 - Raphaël Troncy
2
MPEG-7, the natural candidate
description language?
• ISO standard since
December of 2001
• Main components:
– Descriptors (Ds) and
Description Schemes
(DSs)
– DDL (XML Schema +
extensions)
• Concern all types of
media
• XML Syntax
10/28/2004
Collections
Content organization
Models
Navigation &
Access
Creation &
Production
User
interaction
User
Preferences
Summaries
Media
Usage
Content management
Views
User
History
Content description
Structural
aspects
Semantic
aspects
Variations
Basic elements
Schema
Tools
Basic
datatypes
Links & media
localization
Basic
Tools
Part 5 - MDS
ACM DocEng'04 - Raphaël Troncy
3
MPEG-7: a non-effective description
language for intelligent access to AV
1. A non-extensible language
•
closed set of descriptors
2. Exchange syntax rather than a real machine
processable multimedia description language
•
•
non object-based data model
non modular language (universal approach)
3. No formal semantics provided
•
applications cannot have access to the meaning of
the documents
⇒ the DDL (XML Schema) fault ?
10/28/2004
ACM DocEng'04 - Raphaël Troncy
4
Motivating scenario
• Generic application for describing manually TV
programs w.r.t:
– structural constraints: patterns represent the logical
structure of a document
– semantic constraints: the description of the content is
machine understandable
• Let us define the temporal structure of a Sports
Magazine
10/28/2004
ACM DocEng'04 - Raphaël Troncy
5
MPEG-7 cannot carry out this
scenario
• How to define new descriptors ?
• How to define new description schemes ?
• How to make the description machine
understandable ?
⇒ how to reconciliate the critical issue
object-oriented semantic expression
versus structural validation
10/28/2004
ACM DocEng'04 - Raphaël Troncy
6
Our proposition: AVDL
• AVDL: a reduced yet extensible audio-visual
description language
– an object meta-model (an instance model specifies
the vocabulary for and the rules followed by the
descriptions)
– an XML syntax
– a semantics (closed to DL for the descriptors)
• Description Schemes
– Descriptors
– Properties
– Structures
10/28/2004
• Descriptions
– valid instances w.r.t
description schemes
ACM DocEng'04 - Raphaël Troncy
7
The meta class level
10/28/2004
ACM DocEng'04 - Raphaël Troncy
8
The class level
10/28/2004
ACM DocEng'04 - Raphaël Troncy
9
Location
10/28/2004
ACM DocEng'04 - Raphaël Troncy
10
Document, Content and Media
• Distinction :
– Document vs Content vs
Media
– Virtual content vs
physical content
• Media: a content
abstraction for
decomposition
– audio tracks, subtitles
10/28/2004
ACM DocEng'04 - Raphaël Troncy
11
Defining Structures
• A structure defines how the descriptors may and have to be
combined
– allows a description control
– allows an automatic completion of the descriptions
• AVDL provides some predefined structure models
– containment : gives the list of the possible sub-segments of an AV
segment (in space and in time)
– regular expression : by analogy of grammar for temporal succession
• Other models are currently studied: temporal constraints, etc.
10/28/2004
ACM DocEng'04 - Raphaël Troncy
12
AVDL Implementation
• XML Serialization
– Independent from a schema language
– Use XML Schema validation (mainly for
datatypes)
• C#
– Object inheritance
– Use of the .NET reflexivity
10/28/2004
ACM DocEng'04 - Raphaël Troncy
13
XML Serialization
avdl.xsd
Audio-Visual
Description
Language
ds-17.xsd
partial
control
partial control
ds-17.xml
transformation
Description
Schemes
10/28/2004
d-162.xml
Descriptions
ACM DocEng'04 - Raphaël Troncy
14
XML Syntax (DS)
<Descriptor
xsi:type="LocatedDescriptorType"
id="id-d2" name="Tracking">
<Property
id="id-p2"
name="nbDetection">
<Property
ref="id-p2"/>
<Domain
descriptor="id-d2"/>
<Structure ref="id-s2"/>
<Range>
<DescriptionRelationship
characterization="string">
<Primitive nameType="int"/>
<Location type="TemporalInterval"/>
</Range>
<Media type="Media"/>
</Property>
</DescriptionRelationship>
<Structure
id="id-s2" name="TrackingStructure">
</Descriptor>
<FormalModel>
<Constraint type="temporal" validation="full" method="system
parser="XMLSchema">
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element name="Detection" type="DetectionType"/>
</xsd:sequence>
</Constraint>
</FormalModel>
</Structure>
10/28/2004
ACM DocEng'04 - Raphaël Troncy
15
XML Syntax (Descriptions)
<Tracking type="LocatedDescriptorType"
<Structure
constraintType="temporal"> nbDetection="1">
<DescriptionRelationship>
<Detection type="LocatedDescriptorType" nbFeature="1">
<Location>
<DescriptionRelationship>
<avdl:Begin
<Location> timeRef="147329280"/><avdl:End timeRef="147329280"/>
</Location>
<avdl:Instant timeRef="147329280"/>
<Media
id="CPB86006610.mpg" name="CPB86006610.mpg" contentID="CPB86006610.mpg"/>
</Location>
</DescriptionRelationship>
<Media id="CPB86006610.mpg" name="CPB86006610.mpg"
contentID="CPB86006610.mpg" frameHeight="288" frameWidth="352"/>
</DescriptionRelationship>
<Structure constraintType="spatial">
<Feature xsi:type="FaceType">
<DescriptionRelationship>
<Location>
<avl:BoundingBox>
<avdl:NE numX="92" denX="352" numY="217" denY="288"/>
<avdl:NW numX="92" denX="352" numY="267" denY="288"/>
<avdl:SE numX="136" denX="352" numY="217" denY="288"/>
<avdl:SW numX="136" denX="352" numY="267" denY="288"/>
</avdl:BoundingBox>
</Location>
...
10/28/2004
ACM DocEng'04 - Raphaël Troncy
16
Carrying out the scenario
• Definition of new descriptors and properties
– associating behavior with the corresponding classes
– performing reasoning on the descriptions with the
formal definitions in OWL
• Definition of logical and temporal structures
– the description is controlled and validated by a
grammar
10/28/2004
ACM DocEng'04 - Raphaël Troncy
17
Conclusion and Future Work
• AVDL: a reduced yet extensible Audio-Visual
Description Language
– descriptors, properties, structures
– XML syntax and DL semantics
– .NET implementation and APIs
• About structure validation:
– which constructors used ? which semantics ?
• Trade-of expressivity vs calculability
– OWL Full is undecidable
– constraints satisfaction problems can be complex
10/28/2004
ACM DocEng'04 - Raphaël Troncy
18
Complements
.NET implementation
ds-17.dll
read/write
Memory
.NET
instanciation
parsing
ds-17.xml
d-162.xml
Description
Schemes
10/28/2004
parsing
Descriptions
ACM DocEng'04 - Raphaël Troncy
20
Two kinds of applications
• Static Description Schemes
– DS are well-known
– The developer uses generated libraries
• Dynamic Description Schemes
– DS are created by the application
– Use of the dynamic instantiation mechanism
(reflexivity) of .NET
10/28/2004
ACM DocEng'04 - Raphaël Troncy
21

A Reduced Yet Extensible AudioVisual Description Language: How to Escape From The MPEG-7 Bottleneck Raphaël Troncy, Jean Carrive Thursday 28th of October, 2004

Transcript A Reduced Yet Extensible AudioVisual Description Language: How to Escape From The MPEG-7 Bottleneck Raphaël Troncy, Jean Carrive Thursday 28th of October, 2004

Directory