A Reduced Yet Extensible AudioVisual Description Language: How to Escape From The MPEG-7 Bottleneck Raphaël Troncy, Jean Carrive Thursday 28th of October, 2004
Download ReportTranscript A Reduced Yet Extensible AudioVisual Description Language: How to Escape From The MPEG-7 Bottleneck Raphaël Troncy, Jean Carrive Thursday 28th of October, 2004
A Reduced Yet Extensible AudioVisual Description Language: How to Escape From The MPEG-7 Bottleneck Raphaël Troncy, Jean Carrive Thursday 28th of October, 2004 Description of the AV content • Various uses / Different granularity : – identification of the content creator and the content provider: Dublin Core metadata, VRA core categories, TV Anytime metadata … – feature extraction from the audio/video signal: storing and exchanging automatic tools results (MPEG-7) – structural decomposition in video segments corresponding to a logical structure of the program: time-code, spatial coordinates – semantic description of these segments: controlled vocabulary, thesaurus, free text annotation 10/28/2004 ACM DocEng'04 - Raphaël Troncy 1 Description of the AV content (cultural heritage point of view) • Segmentation – locate and date some events time t • Description – type each segment with an AV genre – type each segment with a general thematic – give hints on the production – describe the scene (who, when, where, what, …) report fade in/out athletics Michael Johnson smashed the 200m world record to complete a 200m in 19''32 in Atlanta for the Olympic Games ⇒ needs a powerful description language 10/28/2004 ACM DocEng'04 - Raphaël Troncy 2 MPEG-7, the natural candidate description language? • ISO standard since December of 2001 • Main components: – Descriptors (Ds) and Description Schemes (DSs) – DDL (XML Schema + extensions) • Concern all types of media • XML Syntax 10/28/2004 Collections Content organization Models Navigation & Access Creation & Production User interaction User Preferences Summaries Media Usage Content management Views User History Content description Structural aspects Semantic aspects Variations Basic elements Schema Tools Basic datatypes Links & media localization Basic Tools Part 5 - MDS ACM DocEng'04 - Raphaël Troncy 3 MPEG-7: a non-effective description language for intelligent access to AV 1. A non-extensible language • closed set of descriptors 2. Exchange syntax rather than a real machine processable multimedia description language • • non object-based data model non modular language (universal approach) 3. No formal semantics provided • applications cannot have access to the meaning of the documents ⇒ the DDL (XML Schema) fault ? 10/28/2004 ACM DocEng'04 - Raphaël Troncy 4 Motivating scenario • Generic application for describing manually TV programs w.r.t: – structural constraints: patterns represent the logical structure of a document – semantic constraints: the description of the content is machine understandable • Let us define the temporal structure of a Sports Magazine 10/28/2004 ACM DocEng'04 - Raphaël Troncy 5 MPEG-7 cannot carry out this scenario • How to define new descriptors ? • How to define new description schemes ? • How to make the description machine understandable ? ⇒ how to reconciliate the critical issue object-oriented semantic expression versus structural validation 10/28/2004 ACM DocEng'04 - Raphaël Troncy 6 Our proposition: AVDL • AVDL: a reduced yet extensible audio-visual description language – an object meta-model (an instance model specifies the vocabulary for and the rules followed by the descriptions) – an XML syntax – a semantics (closed to DL for the descriptors) • Description Schemes – Descriptors – Properties – Structures 10/28/2004 • Descriptions – valid instances w.r.t description schemes ACM DocEng'04 - Raphaël Troncy 7 The meta class level 10/28/2004 ACM DocEng'04 - Raphaël Troncy 8 The class level 10/28/2004 ACM DocEng'04 - Raphaël Troncy 9 Location 10/28/2004 ACM DocEng'04 - Raphaël Troncy 10 Document, Content and Media • Distinction : – Document vs Content vs Media – Virtual content vs physical content • Media: a content abstraction for decomposition – audio tracks, subtitles 10/28/2004 ACM DocEng'04 - Raphaël Troncy 11 Defining Structures • A structure defines how the descriptors may and have to be combined – allows a description control – allows an automatic completion of the descriptions • AVDL provides some predefined structure models – containment : gives the list of the possible sub-segments of an AV segment (in space and in time) – regular expression : by analogy of grammar for temporal succession • Other models are currently studied: temporal constraints, etc. 10/28/2004 ACM DocEng'04 - Raphaël Troncy 12 AVDL Implementation • XML Serialization – Independent from a schema language – Use XML Schema validation (mainly for datatypes) • C# – Object inheritance – Use of the .NET reflexivity 10/28/2004 ACM DocEng'04 - Raphaël Troncy 13 XML Serialization avdl.xsd Audio-Visual Description Language ds-17.xsd partial control partial control ds-17.xml transformation Description Schemes 10/28/2004 d-162.xml Descriptions ACM DocEng'04 - Raphaël Troncy 14 XML Syntax (DS) <Descriptor xsi:type="LocatedDescriptorType" id="id-d2" name="Tracking"> <Property id="id-p2" name="nbDetection"> <Property ref="id-p2"/> <Domain descriptor="id-d2"/> <Structure ref="id-s2"/> <Range> <DescriptionRelationship characterization="string"> <Primitive nameType="int"/> <Location type="TemporalInterval"/> </Range> <Media type="Media"/> </Property> </DescriptionRelationship> <Structure id="id-s2" name="TrackingStructure"> </Descriptor> <FormalModel> <Constraint type="temporal" validation="full" method="system parser="XMLSchema"> <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element name="Detection" type="DetectionType"/> </xsd:sequence> </Constraint> </FormalModel> </Structure> 10/28/2004 ACM DocEng'04 - Raphaël Troncy 15 XML Syntax (Descriptions) <Tracking type="LocatedDescriptorType" <Structure constraintType="temporal"> nbDetection="1"> <DescriptionRelationship> <Detection type="LocatedDescriptorType" nbFeature="1"> <Location> <DescriptionRelationship> <avdl:Begin <Location> timeRef="147329280"/><avdl:End timeRef="147329280"/> </Location> <avdl:Instant timeRef="147329280"/> <Media id="CPB86006610.mpg" name="CPB86006610.mpg" contentID="CPB86006610.mpg"/> </Location> </DescriptionRelationship> <Media id="CPB86006610.mpg" name="CPB86006610.mpg" contentID="CPB86006610.mpg" frameHeight="288" frameWidth="352"/> </DescriptionRelationship> <Structure constraintType="spatial"> <Feature xsi:type="FaceType"> <DescriptionRelationship> <Location> <avl:BoundingBox> <avdl:NE numX="92" denX="352" numY="217" denY="288"/> <avdl:NW numX="92" denX="352" numY="267" denY="288"/> <avdl:SE numX="136" denX="352" numY="217" denY="288"/> <avdl:SW numX="136" denX="352" numY="267" denY="288"/> </avdl:BoundingBox> </Location> ... 10/28/2004 ACM DocEng'04 - Raphaël Troncy 16 Carrying out the scenario • Definition of new descriptors and properties – associating behavior with the corresponding classes – performing reasoning on the descriptions with the formal definitions in OWL • Definition of logical and temporal structures – the description is controlled and validated by a grammar 10/28/2004 ACM DocEng'04 - Raphaël Troncy 17 Conclusion and Future Work • AVDL: a reduced yet extensible Audio-Visual Description Language – descriptors, properties, structures – XML syntax and DL semantics – .NET implementation and APIs • About structure validation: – which constructors used ? which semantics ? • Trade-of expressivity vs calculability – OWL Full is undecidable – constraints satisfaction problems can be complex 10/28/2004 ACM DocEng'04 - Raphaël Troncy 18 Complements .NET implementation ds-17.dll read/write Memory .NET instanciation parsing ds-17.xml d-162.xml Description Schemes 10/28/2004 parsing Descriptions ACM DocEng'04 - Raphaël Troncy 20 Two kinds of applications • Static Description Schemes – DS are well-known – The developer uses generated libraries • Dynamic Description Schemes – DS are created by the application – Use of the dynamic instantiation mechanism (reflexivity) of .NET 10/28/2004 ACM DocEng'04 - Raphaël Troncy 21