A Motivating Scenario for Designing an Extensible AudioVisual Description Language Raphaël Troncy, Jean Carrive, Steffen Lalande and Jean-Philippe Poli Monday 25th of October,
Download ReportTranscript A Motivating Scenario for Designing an Extensible AudioVisual Description Language Raphaël Troncy, Jean Carrive, Steffen Lalande and Jean-Philippe Poli Monday 25th of October,
A Motivating Scenario for Designing an Extensible AudioVisual Description Language Raphaël Troncy, Jean Carrive, Steffen Lalande and Jean-Philippe Poli Monday 25th of October, 2004 Description of the AV content • Various uses / Different granularity : – identification of the content creator and the content provider: Dublin Core metadata, VRA core categories, TV Anytime metadata … – feature extraction from the video signal: storing and exchanging automatic tools results (MPEG-7) – structural decomposition in video segments corresponding to a logical structure of the program: time-code, spatial coordinates – semantic description of these segments: controlled vocabulary, thesaurus, free text annotation Raphaël Troncy CoRIMedia - 10/25/2004 2 Description of the AV content (cultural heritage point of view) • Segmentation – locate and date some events time t • Description – type each segment with an AV genre – type each segment with a general thematic – give hints on the production – describe the scene (who, when, where, what, …) report fade in/out athletics Michael Johnson smashed the 200m world record to complete a 200m in 19''32 in Atlanta for the Olympic Games ⇒ needs a powerful description language Raphaël Troncy CoRIMedia - 10/25/2004 3 Motivating scenario • Generic application for describing manually TV programs w.r.t: – structural constraints: patterns represent the logical structure of a document – semantic constraints: the description of the content is machine understandable • Let us define the temporal structure of a Sports Magazine Raphaël Troncy CoRIMedia - 10/25/2004 4 MPEG-7, the natural candidate description language? • ISO standard since December of 2001 • Main components: – Descriptors (Ds) and Description Schemes (DSs) – DDL (XML Schema + extensions) • Concern all types of media Collections Content organization Models Navigation & Access Creation & Production User interaction User Preferences Summaries Media Usage Content management Views User History Content description Structural aspects Semantic aspects Variations Basic elements Schema Tools Basic datatypes Links & media localization Basic Tools Part 5 - MDS Raphaël Troncy CoRIMedia - 10/25/2004 5 MPEG-7: a non-suitable description language for this scenario 1. A non-extensible language • closed set of descriptors 2. Exchange syntax rather than a real machine processable multimedia description language • • non object-based data model non modular language (universal approach) 3. No formal semantics provided • applications cannot have access to the meaning of the documents ⇒ the DDL (XML Schema) fault ? Raphaël Troncy CoRIMedia - 10/25/2004 6 MPEG-7: a non-suitable description language for this scenario • How to define new descriptors ? • How to define new description schemes ? • How to make the description machine understandable ? ⇒ how to reconciliate the critical issue object-oriented semantic expression versus structural validation Raphaël Troncy CoRIMedia - 10/25/2004 7 Our proposition: AVDL • AVDL: a reduced yet extensible audio-visual description language – an object meta-model (an instance model specifies the vocabulary for and the rules followed by the descriptions) – an XML syntax – a semantics (closed to DL for the descriptors) • Description Schemes • Descriptions – Descriptors – Properties – Structures Raphaël Troncy – valid instances w.r.t description schemes CoRIMedia - 10/25/2004 8 The meta class level Raphaël Troncy CoRIMedia - 10/25/2004 9 The class level Raphaël Troncy CoRIMedia - 10/25/2004 10 Location Raphaël Troncy CoRIMedia - 10/25/2004 11 Document, Content and Media • Distinction : – Document vs Content vs Media – Virtual content vs physical content • Media: a content abstraction for decomposition – audio tracks, subtitles Raphaël Troncy CoRIMedia - 10/25/2004 12 Defining Structures • A structure defines how the descriptors may and have to be combined – allows a description control – allows an automatic completion of the descriptions • AVDL provides some predefined structure models – containment : gives the list of the possible sub-segments of an AV segment (in space and in time) – regular expression : by analogy of grammar for temporal succession • Other models are currently studied: temporal constraints, etc. Raphaël Troncy CoRIMedia - 10/25/2004 13 AVDL Implementation • XML Serialization – Independent from a schema language – Use XML Schema validation (mainly for datatypes) • C# – Object inheritance – Use of the .NET reflexivity Raphaël Troncy CoRIMedia - 10/25/2004 14 XML Serialization avdl.xsd Audio-Visual Description Language partial control ds-17.xsd partial control ds-17.xml transformation Description Schemes Raphaël Troncy d-162.xml Descriptions CoRIMedia - 10/25/2004 15 XML Syntax (DS) <Descriptor xsi:type="LocatedDescriptorType" id="id-d2" name="Tracking"> <Property id="id-p2" name="nbDetection"> <Property ref="id-p2"/> <Domain descriptor="id-d2"/> <Structure <Range> ref="id-s2"/> <DescriptionRelationship characterization="string"> <Primitive nameType="int"/> <Location type="TemporalInterval"/> </Range> <Media type="Media"/> </Property> </DescriptionRelationship> <Structure id="id-s2" name="TrackingStructure"> </Descriptor> <FormalModel> <Constraint type="temporal" validation="full" method="system parser="XMLSchema"> <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element name="Detection" type="DetectionType"/> </xsd:sequence> </Constraint> </FormalModel> </Structure> Raphaël Troncy CoRIMedia - 10/25/2004 16 XML Syntax (Descriptions) <Tracking type="LocatedDescriptorType" <Structure constraintType="temporal"> nbDetection="1"> <DescriptionRelationship> <Detection type="LocatedDescriptorType" nbFeature="1"> <Location> <DescriptionRelationship> <avdl:Begin <Location>timeRef="147329280"/><avdl:End timeRef="147329280"/> </Location> <avdl:Instant timeRef="147329280"/> <Media id="CPB86006610.mpg" name="CPB86006610.mpg" contentID="CPB86006610.mpg"/> </Location> </DescriptionRelationship> <Media id="CPB86006610.mpg" name="CPB86006610.mpg" contentID="CPB86006610.mpg" frameHeight="288" frameWidth="352"/> </DescriptionRelationship> <Structure constraintType="spatial"> <Feature xsi:type="FaceType"> <DescriptionRelationship> <Location> <avl:BoundingBox> <avdl:NE numX="92" denX="352" numY="217" denY="288"/> <avdl:NW numX="92" denX="352" numY="267" denY="288"/> <avdl:SE numX="136" denX="352" numY="217" denY="288"/> <avdl:SW numX="136" denX="352" numY="267" denY="288"/> </avdl:BoundingBox> </Location> Raphaël Troncy CoRIMedia - 10/25/2004 17 ... .NET implementation ds-17.dll read/write Memory .NET instanciation parsing ds-17.xml d-162.xml Description Schemes Raphaël Troncy parsing Descriptions CoRIMedia - 10/25/2004 18 Two kinds of applications • Static Description Schemes – DS are well-known – The developer uses generated libraries • Dynamic Description Schemes – DS are created by the application – Use of the dynamic instantiation mechanism (reflexivity) of .NET Raphaël Troncy CoRIMedia - 10/25/2004 19 Carrying out the scenario • Definition of new descriptors and properties – associating behavior with the corresponding classes – performing reasoning on the descriptions with the formal definitions in OWL • Definition of logical and temporal structures – the description is controlled and validated by a grammar Raphaël Troncy CoRIMedia - 10/25/2004 20 Conclusion and Future Work • AVDL: a reduced yet extensible Audio-Visual Description Language – descriptors, properties, structures – XML syntax and DL semantics – .NET implementation and APIs • About structure validation: – which constructors used ? which semantics ? • Trade-of expressivity vs calculability – OWL Full is undecidable – constraints satisfaction problems can be complex Raphaël Troncy CoRIMedia - 10/25/2004 21