Transcript Topic Here
Chinese Academy of Sciences, Beijing, China Report Document Overview of MPEG-7 Dr Zhang Sen Speech Group, INRIA-LORIA Villers les Nancy, France Chinese Academy of Sciences Beijing, China 7/17/2015 Speech and Language Processing Techniques Chinese Academy of Sciences, Beijing, China Report Document Outline of contents • • • • • • • Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information Speech and Language Processing Techniques 2 Chinese Academy of Sciences, Beijing, China Report Document Ozone WP2 architecture Ozone application User Context Ozone Context Multi-modal widgets spee ch re cognition Dialog management ge sture re cognition smart age nt Pe rce ption QoS Situation Se nsitivity User Interface manage ment v ide o browser animated age nt Oz on e Se ic rv es Authen tication User-interaction module ... Security Software Environment layer Speech and Language Processing Techniques 3 Chinese Academy of Sciences, Beijing, China Report Document From MPEG-1 to MPEG-7 90 92 94 98 99 01 ? mpeg7 mpeg21 v1 v2 mpeg1 mpeg2 mpeg4 • MPEG-3, ever defined, but abandoned • MPEG-5 and -6, not defined Speech and Language Processing Techniques 4 Chinese Academy of Sciences, Beijing, China Report Document MPEG Family MPEG-1 – Coding of moving pictures and audio for digital storage media (CD-ROM, MP3), 11/92 MPEG-2 – Generic Coding of moving pictures and audio information (DVD, Digital TV), 11/94 MPEG-4 – Coding of Audiovisual Objects for MM appls Ver1 09/98, Ver2 11/99 MPEG-7 – Multimedia content description for AV material 08/01 MPEG-21 – Digital AV framework: Integration of multimedia technologies, 11/01 Speech and Language Processing Techniques 5 Chinese Academy of Sciences, Beijing, China Report Document Why is MPEG-7 needed • Digital audiovisual information increasing – more and more available contents – all kinds of sources of information • Use of the digital audiovisual information – description of the contents – fast search of the contents Speech and Language Processing Techniques 6 Chinese Academy of Sciences, Beijing, China Report Document Objective of MPEG-7 • Standardize content-based description for various types of audiovisual information – Enable fast and efficient content searching, filtering and identification – Describe several aspects of the content (low-level features, structure, semantic, models, collections, creation, etc.) – Address a large range of applications • Types of audiovisual information: – Audio, speech – Moving video, still pictures, graphics, 3D models – Information on how objects are combined in scenes Speech and Language Processing Techniques 7 Chinese Academy of Sciences, Beijing, China Report Document Scope of MPEG-7 Description generation Description Description consumption Research and future competition Scope of MPEG-7 Research and future competition • The description generation (feature extraction, indexing process, annotation & authoring tools,...) and consumption (search engine, filtering tool, retrieval process, browsing device, ...) are non normative parts of MPEG-7. • The goal is to define the minimum that enables interoperability. Speech and Language Processing Techniques 8 Chinese Academy of Sciences, Beijing, China Report Document Scope of MPEG-7 standardization Feature Extraction Feature Extraction: Content analysis (D, DS) Feature extraction (D, DS) Annotation tools (DS) Authoring (DS) MPEG-7 Description MPEG-7 Scope: Description Schemes (DSs) Descriptors (Ds) Language (DDL) Ref: MPEG-7 Concepts Search Engine Search Engine: Searching & filtering Classification Manipulation Summarization Indexing Speech and Language Processing Techniques 9 Chinese Academy of Sciences, Beijing, China Report Document Audio in MPEG-7 • • • • • Audio content description (yes) Sound retrieval and classifier (yes) Speech synthesis (no) Speech recognition (no) Probability Models (yes) Speech and Language Processing Techniques 10 Chinese Academy of Sciences, Beijing, China Report Document Parts of the MPEG-7 Standard • ISO / IEC 15938 - 1: Systems • ISO / IEC 15938 - 2: Description Definition Language • ISO / IEC 15938 - 3: Visual • ISO / IEC 15938 - 4: Audio • ISO / IEC 15938 - 5: Multimedia Description Schemes • ISO / IEC 15938 - 6: Reference Software Speech and Language Processing Techniques 11 Chinese Academy of Sciences, Beijing, China Report Document Outline of contents • • • • • • • Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information Speech and Language Processing Techniques 12 Chinese Academy of Sciences, Beijing, China Report Document Main elements of MPEG-7 • Descriptors (D): representations of features, that define the syntax and the semantics of each feature representation (low-level). • Description Schemes (DS): that specify the structure and semantics of the relationships between their components, which may be both Ds and DSs (high-level). • A Description Definition Language (DDL): based on XML Schema, to allow the creation of new DSs and Ds, and to allow the extension and modification of existing DSs • System tools: to support multiplexing of descriptions, synchronization issues, transmission mechanisms, coded representations, management and protection of intellectual property Speech and Language Processing Techniques 13 Chinese Academy of Sciences, Beijing, China Report Document Relations of main elements DDL DS D DS D DS DS D DS D D D DS DS DS D D Speech and Language Processing Techniques 14 Chinese Academy of Sciences, Beijing, China Report Document Description Definition Language • Description Definition Language (DDL) is a language that define what description is valid, and allows the creation of new Description Schemes and Descriptors. It also allows the extension and modification of existing Description Schemes • DDL is used to define a set of formal rules • ordering of the elements • occurrences of elements ……... • XML + MPEG-7 extensions Speech and Language Processing Techniques 15 Chinese Academy of Sciences, Beijing, China Report Document XML: Base for DDL • Why choose XML as the base for the DDL? • The popularity of XML • The interoperability with other standards in the future • Why XML should be extended for MPEG-7? • SGML > XML • Structural extensions • Datatype extensions Speech and Language Processing Techniques 16 Chinese Academy of Sciences, Beijing, China Report Document DDL parser DDL parser is a software to check if a description is valid Description Parser Yes or No Schema Speech and Language Processing Techniques 17 Chinese Academy of Sciences, Beijing, China Report Document Outline of contents • • • • • • • Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information Speech and Language Processing Techniques 18 Chinese Academy of Sciences, Beijing, China Report Document Type of descriptions • Low level description (features, etc) • Generic and flexible • Intelligent / efficient search engine • High level description (structures, concepts,etc) • Efficient and powerful • Lack of flexibility Speech and Language Processing Techniques 19 Chinese Academy of Sciences, Beijing, China Report Document Low-level Description • Information in the creation and production processes • director, title, short feature movie • Information related to the usage of the content • copyright pointers, usage history, broadcast schedule • Information on the storage features of the content • storage format, encoding • Information about low-level features in the content • colors, textures, sound timbres, melody Speech and Language Processing Techniques 20 Chinese Academy of Sciences, Beijing, China Report Document High-level Description • Structural description – video segments, frames, still and moving regions, audio segments – Segment DS (representing the spatial, temporal or spatio-temporal structure) • Conceptual (semantic) description – objects, events, and notions – links of the two descriptions Speech and Language Processing Techniques 21 Chinese Academy of Sciences, Beijing, China Report Document Illustration of descriptions Speech and Language Processing Techniques 22 Chinese Academy of Sciences, Beijing, China Report Document Basic description • Elements – Information containers – containing data and other elements – <city> …… </city> • Attributes – Attribute-value pairs used to characterize elements – <city population=“10000”> …… </city> Speech and Language Processing Techniques 23 Chinese Academy of Sciences, Beijing, China Report Document Structured descriptions • Structured descriptions are trees • Trees are suitable for retrieval and search DS DS D D DS D D D Speech and Language Processing Techniques 24 Chinese Academy of Sciences, Beijing, China Report Document Description trees <letter> <header> <name> Mr Sen </name> <address> <street> 16 rue Laplace </street> <city> Nancy </city> </address> </header> <text> Dear Mr White, …</text> letter </letter> header name text address street Speech and Language Processing Techniques city 25 Chinese Academy of Sciences, Beijing, China Report Document Example: Audio description <Mpeg7Main> <DescriptionMetadata> <Version>1.0</Version> </DescriptionMetadata> <ContentDescription> <AudioContent xs1:type=“AudioType”> <Audio> <CreationInformation> <Creation> <Title> The daily news </Title> </Creation> </CreationInformation> </Audio> </AudioContent> </ContentDescription> </Mpeg7Main> Speech and Language Processing Techniques 26 Chinese Academy of Sciences, Beijing, China Report Document Outline of contents • • • • • • • Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information Speech and Language Processing Techniques 27 Chinese Academy of Sciences, Beijing, China Report Document Audio description • Low-level Description – spectrum, parametric, and temporal features • High-level Description – – – – Audio signature Description Scheme Instrument timbre Description Schemes The melody Description Tools Sound recognition and indexing Description Tools – Spoken Content Description Tools Speech and Language Processing Techniques 28 Chinese Academy of Sciences, Beijing, China Report Document Audio low-level descriptors • • • • • • • • • Waveform Loudness Spectral basis Spectral envelope Spectral centroid Spectral spread Fundamental frequency Harmonicity Attack time Speech and Language Processing Techniques 29 Chinese Academy of Sciences, Beijing, China Report Document Audio descriptor: Basic • Two basic audio Descriptors – AudioWaveform Descriptor • describes the audio waveform envelope (minimum and maximum) – AudioPower Descriptor • describes the temporally-smoothed instantaneous power Speech and Language Processing Techniques 30 Chinese Academy of Sciences, Beijing, China Report Document Audio descriptor: Basic Spectral • AudioSpectrumEnvelope Descriptor – describes the short-term power spectrum • AudioSpectrumCentroid Descriptor – describes the center of gravity of the log-frequency power spectrum • AudioSpectrumSpread Descriptor – describing the second moment of the log-frequency power spectrum • AudioSpectrumFlatness Descriptor – describes the flatness properties of the spectrum Speech and Language Processing Techniques 31 Chinese Academy of Sciences, Beijing, China Report Document Audio Signature Description • AudioSignature Description Scheme provides a unique content identifier for the purpose of robust automatic identification of audio signals • Applications include – audio fingerprinting – identification of audio – locating metadata for legacy audio content Speech and Language Processing Techniques 32 Chinese Academy of Sciences, Beijing, China Report Document Instrument Timbre Description • Timbre is defined as the perceptual features that make two sounds having the same pitch and loudness sound different. • Timbre Description describes the perceptual features with a reduced set of Descriptors – – – – HarmonicInstrumentTimbre Descriptor LogAttackTime Descriptor PercussiveIinstrumentTimbre Descriptor Combination with Basic Spectral Descriptors Speech and Language Processing Techniques 33 Chinese Academy of Sciences, Beijing, China Report Document Melody Description Tools The melody Description Tools is to facilitate efficient, robust, and expressive melodic similarity matching • MelodyContour Description Scheme – 5-step contour representation – basic rhythmic information representation • MelodySequence Description Scheme – supporting an expanded descriptor set and high precision of interval encoding Speech and Language Processing Techniques 34 Chinese Academy of Sciences, Beijing, China Report Document General Sound Recognition and Indexing Description Tools • SoundModel (SM) DS – statistical model, such as HMM or GMM – SoundModelStatePath Descriptor • consists of a state sequence generated by a SM – SoundModelStateHistogram Descriptor • consists of a normalized histogram of the state sequence generated by a SM given an audio segment • SoundClassificationModel DS – a trainable multi-way classifier based on SMs • speech vs music, male vs female, trumpet vs violin • genre classification, voice recognition Speech and Language Processing Techniques 35 Chinese Academy of Sciences, Beijing, China Report Document Spoken content retrieval • Output of ASR – phone lattice or word lattice – spoken content DS stores these lattices instead of plain text – lattices are good for retrieval Speech and Language Processing Techniques 36 Chinese Academy of Sciences, Beijing, China Report Document Spoken Content Description Tools • SpokenContentLattice – representing the actual decoding produced by an ASR engine • SpokenContentHeader – contains information about the speakers being recognized and the recognizer itself – WordLexicon Descriptor – PhoneLexicon Descriptor – SpeakerInfo Descriptor – ConfusionInfo Descriptor Speech and Language Processing Techniques 37 Chinese Academy of Sciences, Beijing, China Report Document Gaussian DS <Gaussian> <Mean> 4087.18 7173.73 1.36364 94.2727 1834.36 2359.55 2645.27 2577.09 ……………………………… </Mean> <Variance> 1.6982e+007 5.21621e+007 14.3636 9749.09 3.65743e+006 ……………………………… </Variance> </Gaussian> Speech and Language Processing Techniques 38 Chinese Academy of Sciences, Beijing, China Report Document State-transition model DS <StateTransitionModel> <Transitions size1="20" size2="20"> 0 0 0.210526 0.0526316 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 …………………………………… </Transitions> <Initial size="20"> 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 </Initial> <State label="0 players" confidence="1"> …………………………………… <State label="19 players" confidence="0.223607"> </StateTransitionModel> Speech and Language Processing Techniques 39 Chinese Academy of Sciences, Beijing, China Report Document ProbabilityModelClassier DS <ProbabilityModelClassifier confidence="0.9" length="2"> <ProbabilityModelClass SemanticLabel="fish" Confidence="0.5" DescriptorName="ColorHistogram"> <Gaussian> <Mean> 4087.18 7173.73 1.36364 94.2727 1834.36 2359.55 …………………………. </Mean> <Variance> 1.6982e+007 5.21621e+007 14.3636 9749.09 …………………………. </Variance> </Gaussian> </ProbabilityModelClass> Speech and Language Processing Techniques 40 Chinese Academy of Sciences, Beijing, China Report Document SpokenContentLattice DS A lattice structure for an hypothetical (combined phone and word) decoding of the expression “Taj Mahal drawing …”. Speech and Language Processing Techniques 41 Chinese Academy of Sciences, Beijing, China Report Document SoundRecognitionClassifier HMM AND BASES Extraction of sound indexes using a sound-recognition classifier. The model reference and state path is stored. AudioSpectrumBasis HMM 1 Segmented Audio Description HMM 2 AUDIO QUERY SPECTRUM PROJECTION N SELECT MODEL REF +STATE PATH MPEG-7 SOUND DATABASE SoundModelStatePath HMM N-1 HMM N SoundRecognitionModel Speech and Language Processing Techniques 42 Chinese Academy of Sciences, Beijing, China Report Document SoundRecognitionClassifier Indexed Audio HMM AND BASIS MPEG-7 SOUND DATABASE ContinuousMarkovModel AudioSpectrumBasis HMM 1 HMM 2 AUDIO QUERY SPECTRUM PROJECTION N SELECT MODEL REF +STATE PATH MATCHING SoundModelStatePath HMM N-1 Query-by-example application with a query in media source form. Features must be extracted and projected into the classification space for each model in order to match against the database. RESULT LIST HMM N SoundRecognitionModel Speech and Language Processing Techniques 43 Chinese Academy of Sciences, Beijing, China Report Document An example search application utilizing a query in DDL format DDL QUERY MODEL REF + STATE PATH MPEG-7 SOUND DATABASE MATCHING RESULT LIST Speech and Language Processing Techniques 44 Chinese Academy of Sciences, Beijing, China Report Document Extraction of hidden Markov model and basis functions and storage in a DDL representation AudioSpectrumBasis AUDIO WAV FILES FEATURE EXTRACT SoundRecognitionFeatures BASIS EXTRACT SoundRecognitionModel HMM HMM AND BASIS ContinuousMarkovModel Speech and Language Processing Techniques 45 Chinese Academy of Sciences, Beijing, China Report Document Scenario for for the spoken content Description Tools • Recall of AV data by memorable spoken events – A film or video recording where a character or person spoke a particular word or sequence of words. The source media would be known, and the query would return a position in the media. • Spoken Document Retrieval – There is a database consisting of separate spoken documents. The result of the query is the relevant documents, and optionally the position in those documents of the matched speech • Annotated Media Retrieval – Similar to spoken document retrieval. The result of the query is the media which is annotated with speech, and not the speech itself. An example is a photograph retrieved using a spoken annotation. Speech and Language Processing Techniques 46 Chinese Academy of Sciences, Beijing, China Report Document Outline of contents • • • • • • • Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information Speech and Language Processing Techniques 47 Chinese Academy of Sciences, Beijing, China Report Document Multimedia DSs Multimedia Description Schemes are metadata structures for describing and annotating audio-visual (AV) content • • • • • • Basic Elements Content Management Content Description Content Organization Navigation and Access User Interaction Speech and Language Processing Techniques 48 Chinese Academy of Sciences, Beijing, China Report Document Organization of Multimedia DSs Speech and Language Processing Techniques 49 Chinese Academy of Sciences, Beijing, China Report Document Content Management • Creation and production information – Creation information • title, textual annotation, creators, and dates – Classification information • genre, subject, purpose, language • Media coding, storage and file formats – format, compression, and coding • Content usage – usage rights, usage record Speech and Language Processing Techniques 50 Chinese Academy of Sciences, Beijing, China Report Document Navigation and Access • Summaries – hierarchical summaries – sequential summaries • Partitions and Decompositions – decompositions in space, time and frequency – used in multi-resolution access and progressive retrieval • Variations – selection of the most suitable of an AV program – adapt to the different capabilities of terminal devices, network conditions or user preferences Speech and Language Processing Techniques 51 Chinese Academy of Sciences, Beijing, China Report Document Hierarchical summary Speech and Language Processing Techniques 52 Chinese Academy of Sciences, Beijing, China Report Document Illustration of variations Speech and Language Processing Techniques 53 Chinese Academy of Sciences, Beijing, China Report Document Content Organization • Collections – group the contents into clusters – describes statistics and models of the attribute values – describe relationships among collection clusters • Models – model the attributes and features of AV content – Probability Model • specify statistical functions and structures – Analytic Model • specify semantic labels • specify the confidence • build classifiers Speech and Language Processing Techniques 54 Chinese Academy of Sciences, Beijing, China Report Document Collection Structure Speech and Language Processing Techniques 55 Chinese Academy of Sciences, Beijing, China Report Document User Interaction • User Preference – – – – context dependency in terms of time and place relative importance of different preferences privacy characteristics of the preferences preferences update by agent or user • Usage History – history of actions – used to determine the user's preferences Speech and Language Processing Techniques 56 Chinese Academy of Sciences, Beijing, China Report Document Outline of contents • • • • • • • Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information Speech and Language Processing Techniques 57 Chinese Academy of Sciences, Beijing, China Report Document eXperimentation Model(XM) • Simulation platform for: • Ds, DSs, CSs, DDL • XM applications: • the server (extraction) applications • the client (search, filtering and/or transcoding) applications CS: Coding Schemes Speech and Language Processing Techniques 58 Chinese Academy of Sciences, Beijing, China Report Document The XM applications • Extraction from Media • all low-level Ds or DSs should have an application class of this type • Search & Retrieval Application • either client application • Media Transcoding Application • either client application • Description Filtering Application • either client application Speech and Language Processing Techniques 59 Chinese Academy of Sciences, Beijing, China Report Document Extraction from Media Speech and Language Processing Techniques 60 Chinese Academy of Sciences, Beijing, China Report Document Search and retrieval application Speech and Language Processing Techniques 61 Chinese Academy of Sciences, Beijing, China Report Document Media transcoding application Speech and Language Processing Techniques 62 Chinese Academy of Sciences, Beijing, China Report Document Description Filtering Application Speech and Language Processing Techniques 63 Chinese Academy of Sciences, Beijing, China Report Document Interface model for XM app Speech and Language Processing Techniques 64 Chinese Academy of Sciences, Beijing, China Report Document Real world application MDB = media database, DDB = description database. First, from a media database two features are extracted. Then, basing on the first feature, relevant media files are selected from the media database. The relevant media files are transcoded basing on the second extracted feature. Speech and Language Processing Techniques 65 Chinese Academy of Sciences, Beijing, China Report Document MPEG-7 application areas • Storage and retrieval of audiovisual databases (image, film, radio archives) • Broadcast media selection (radio, TV programs) • Surveillance (traffic control, surface transportation, production chains) • E-commerce and Tele-shopping (searching for clothes / patterns) • Remote sensing (cartography, ecology, natural resources management) • Entertainment (searching for a game, for a karaoke) • Cultural services (museums, art galleries) • Journalism (searching for events, persons) • Personalized news service on Internet (push media filtering) • Intelligent multimedia presentations • Educational applications nBio-medical applications Speech and Language Processing Techniques 66 Chinese Academy of Sciences, Beijing, China Report Document Illustration of applications Users Speech and Language Processing Techniques 67 Chinese Academy of Sciences, Beijing, China Report Document Information Flow Feature extraction AV Description Manual/automatic Search/query Storage Pull Browse Filter Decoding Encoding Push Transmission Users Speech and Language Processing Techniques 68 Chinese Academy of Sciences, Beijing, China Report Document Push and Pull applications • Push applications – Example: Search engines for internet and DBs – Advantage: Many search engines work on standardized descriptions • Pull applications – Example: Broadcast of video, Interactive TV – Advantage: Intelligent agents filter standardized descriptions Speech and Language Processing Techniques 69 Chinese Academy of Sciences, Beijing, China Report Document Example: Pull application MPEG-7 Database Speech and Language Processing Techniques 70 Chinese Academy of Sciences, Beijing, China Report Document Example: Push application Speech and Language Processing Techniques 71 Chinese Academy of Sciences, Beijing, China Report Document Example: queries • Text (keywords): – Find AV material with subject corresponding to some keywords • Semantic description: – Find AV material corresponding to a specified semantic • Image as an example: – Find an image with similar characteristics (global or local) • A few notes of music: – Find corresponding musical pieces or movies • Low level features (example: motion): – Find video with specific object motion trajectories Speech and Language Processing Techniques 72 Chinese Academy of Sciences, Beijing, China Report Document Integration of MPEG-7 into XML <seq begin=20s dur=10s> <img id="Image1" dur=5s> <MP7: annotation> <Who>Fernado Morientes</Who> < WhatAction >Spain vs. Sweden soccer match </ WhatAction> </MP7: annotation> </img> <img id="Image2" dur=2s /> </seq> Speech and Language Processing Techniques 73 Chinese Academy of Sciences, Beijing, China Report Document Outline of contents • • • • • • • Introduction Basic Components Content Description Audiovisual (AV) Descriptions Multimedia Description Schemes XM and Applications More Information Speech and Language Processing Techniques 74 Chinese Academy of Sciences, Beijing, China Report Document MPEG-7 and other Standards • MPEG-1, -2, and -4 are designed to represent the information itself, while MPEG-7 is meant to represent information about the information. • MPEG-1, -2, and -4 make content available, while MPEG-7 allows you to find the content you need. Speech and Language Processing Techniques 75 Chinese Academy of Sciences, Beijing, China Report Document Ultimate ambition of MPEG-7 • To make the web as searchable for multimedia content as it is searchable for text today • To improve the use of computer systems as easy as possible Speech and Language Processing Techniques 76 Chinese Academy of Sciences, Beijing, China Report Document MPEG-7 beyond • To mould computers around human requirements and not humans around computer requirements • To enable content disclosure based on facts, rather than on human annotations • To find information by rich spoken queries, handdrawn images and address what most people expect computers to be able to do Speech and Language Processing Techniques 77 Chinese Academy of Sciences, Beijing, China Report Document More Information on WWW • Major MPEG-7 documents http://www.cselt.it/mpeg/, semi-official website http://www.mpeg-7.com, official website • Others http://www.elsevier.com/locate/image Speech and Language Processing Techniques 78 Chinese Academy of Sciences, Beijing, China Report Document Conclusion Ds Features AV contents User Structures DSs DDL Ds, DSs Speech and Language Processing Techniques 79 Chinese Academy of Sciences, Beijing, China Report Document Thanks Speech and Language Processing Techniques 80 Chinese Academy of Sciences, Beijing, China Report Document Speech and Language Processing Techniques 81 Chinese Academy of Sciences, Beijing, China Report Document Low level AV descriptors Video segments Still regions •Color •Camera motion •Motion activity •Mosaic •Color •Shape •Position •Texture Audio segments Moving regions •Color •Motion trajectory •Parametric motion •Spatio-temporal shape •Spoken content •Spectral feature •Timbre Speech and Language Processing Techniques 82 Chinese Academy of Sciences, Beijing, China Report Document Face Recognition Descriptor • Projection of a face vector onto a set of basis vect • Feature set is extracted from a normalized face im • Normalized face image – 56 lines with 46 intensity values in each line – The centers of the two eyes are located on the 24th row Speech and Language Processing Techniques 83 Chinese Academy of Sciences, Beijing, China Report Document Segment Decomposition Speech and Language Processing Techniques 84 Chinese Academy of Sciences, Beijing, China Report Document MPEG-7 Normative Interfaces Speech and Language Processing Techniques 85 Chinese Academy of Sciences, Beijing, China Report Document Example: Content description Indexing Fea extrac High level process Search retrieval MPEG-7 Database Low level process Speech and Language Processing Techniques 86 Chinese Academy of Sciences, Beijing, China Report Document Segment DS Segment DS describes the result of a spatial, temporal, or spatio-temporal partitioning of the AV content. It has nine major subclasses: • Multimedia Segment DS • AudioVisual Region DS • AudioVisual Segment DS • Audio Segment DS • Still Region DS • Still Region 3D DS • Moving Region DS • Video Segment DS • Ink Segment DS Speech and Language Processing Techniques 87 Chinese Academy of Sciences, Beijing, China Report Document Examples: T/S segments Speech and Language Processing Techniques 88 Chinese Academy of Sciences, Beijing, China Report Document Example: Segment trees Speech and Language Processing Techniques 89 Chinese Academy of Sciences, Beijing, China Report Document Illus of conceptual description Semantic base DS Object DS Event DS Semantic container DS Semantic DS Concept DS Semantic state DS Semantic place DS AV content Semantic time DS Speech and Language Processing Techniques 90 Chinese Academy of Sciences, Beijing, China Report Document Visual description • Basic structures – Grid layout, Time series, Multiple view, Spatial 2D coordinates, Temporal interpolation • Descriptors – Color, Texture, Shape, Motion, Localization Speech and Language Processing Techniques 91 Chinese Academy of Sciences, Beijing, China Report Document Example: Color Descriptors • • • • • • • Color space Color Quantization Dominant Colors Scalable Color Color Layout Color-Structure GoF/GoP Color Speech and Language Processing Techniques 92 Chinese Academy of Sciences, Beijing, China Report Document Example: Color space • • • • • R,G,B Y,Cr,Cb H,S,V HMMD Linear transformation matrix with reference to R, G, B • Monochrome Speech and Language Processing Techniques 93 Chinese Academy of Sciences, Beijing, China Report Document Audio Framework Speech and Language Processing Techniques 94 Chinese Academy of Sciences, Beijing, China Report Document Descriptor • Definition A Descriptor (D) is a representation of a Feature. A Descriptor defines the syntax and the semantics of the Feature representation. • Notes A descriptor allows an evaluation of the corresponding feature via the descriptor value. It is possible to have several descriptors representing a single feature. • Examples For example for the color feature, possible descriptors are: the color histogram, the average of the frequency components, the motion field, the text of the title, etc. Speech and Language Processing Techniques 95 Chinese Academy of Sciences, Beijing, China Report Document Descriptor Value • Definition A Descriptor Value is an instantiation of a Descriptor for a given data set (or subset thereof). • Notes Descriptor Values are combined via the mechanism of a Description Scheme to form a Description. Speech and Language Processing Techniques 96 Chinese Academy of Sciences, Beijing, China Report Document Description Scheme • Definition A Description Scheme (DS) specifies the structure and semantics of the relationships between its components, which may be both Descriptors and Description Schemes. • Examples A movie, structured as scenes and shots, including some textual descriptors at the scene level, and color, motion and some audio descriptors at the shot level. • Note Ds contain only basic data types, and does not refer to others D or DSs. Speech and Language Processing Techniques 97 Chinese Academy of Sciences, Beijing, China Report Document DS: XML Scheme & Extensions • XML Scheme • Data types • Simple and Complex types • Elements • Inheritance, Abstract types • MPEG-7 extensions • Array and Matrix datatype • Enumerated datatypes for MimeType, CountryCode, RegionCode, CurrencyCode and CharacterSetCode • Typed references Speech and Language Processing Techniques 98 Chinese Academy of Sciences, Beijing, China Report Document Basic elements of DS • Constructs for linking media files • Localizing pieces of content • Describing – time, places, persons, individuals, groups, organizations, and textual annotation, etc – Who? What object? What action? Where? When? Why? and How? Speech and Language Processing Techniques 99 Chinese Academy of Sciences, Beijing, China Report Document Content recognition tools • No speech or face or gesture recognition engines included in MPEG-7 • Content recognition tools is a task for industries, not a standard – coding tools in MPEG-1, -2, -4 were for research purposes, not part of the standard – no tools were part of the MPEG standard Speech and Language Processing Techniques 100 Chinese Academy of Sciences, Beijing, China Report Document Speech and Language Processing Techniques 101