MPEG-7 Audio Overview
Download
Report
Transcript MPEG-7 Audio Overview
MPEG-7 Audio Overview
Beinan Li
MUMT 611 Week 2
2005. 1. 20
Content
MPEG-7
What
overview
is…
Why?
Objectives
and scope
Main elements and organization.
MPEG-7 Audio
Low-level
features
High-level tools
What is MPEG-7
"Multimedia
Content Description Interface“
ISO/IEC standard by MPEG (Moving Picture Experts Group)
Providing meta-data for multimedia
MPEG-1, -2, -4: make content available;
MPEG-7: makes content accessible, retrievable, filterable,
manageable (via device / computer).
Multi-degrees of interpretation of information’s meaning
Support as broad a range of applications as possible.
A compatible (with existing tech) and extensible standard.
Why MPEG-7
“The
value of information often depends on how
easy it can be found, retrieved, accessed, filtered
and managed. ”
Past: poverty of the digital multimedia sources
-> Simplicity of the access mechanisms
Now: growing amount of audiovisual information
-> Identifying and managing them efficiently is
becoming more difficult.
e.g. “record only news about sport.”
Why MPEG-7
For future multimedia services, content representation and
description may have to be addressed jointly.
Many services dealing with content representation will
have to deal first with content description
“a non-described content may be useless”
Need for access only to the content description:
New original services (e.g. optimizing personal time)
Adaptation to networks and terminal capabilities
Application’s domains (incomplete)
Broadcast media selection (e.g., radio channel, TV channel).
Digital libraries (e.g., film, video, audio and radio archives).
E-Commerce (e.g., personalized advertising).
Education (e.g., repositories of multimedia courses, multimedia
search for support material).
Home Entertainment (e.g., management of personal multimedia
collections, including manipulation of content, e.g. karaoke).
Journalism (e.g. searching speeches of a certain politician using his
name, his voice or his face).
Multimedia directory services (e.g. yellow pages, G.I.S).
Surveillance and remote sensing.
MPEG-7 Objectives
Standardize content-based description for various
types of audiovisual information
Independent from media support (encoding and storage)
Different granularity
Low-level features: shape, size, key, tempo changes,
High-level semantic info: “scene with a barking brown dog on
the left and with the sound of passing cars in the background.”
Meaningful in the context of the application
Same material -> different types of features and combinations
e.g. timbre v.s. loudness
MPEG-7 Objectives
Information about the content
The form: e.g. the coding format used
Conditions for accessing the material:
e.g. Intellectual property rights / price
Classification: e.g. parental rating
Links to other relevant materials
The context: “e.g. Olympic Games 1996, final of 200 meter hurdles, men)”
Information present in the content:
Combination of low-level and high-level descriptors
Scope of the Standard
processing chain:
Graph by P. Salembier and O. Avaro
An example of architecture
Pull: (Client Queries -> Descriptions repository -> Matched Ds)
Push: (Filter descriptions -> Programmed actions)
Graph by P. Salembier and O. Avaro
Workplan
Graph by P. Salembier and O. Avaro
Where are the descriptions from?
Preservation
of existing descriptive data (e.g. scripts)
through the production/delivery
Generated automatically by capture devices (e.g.
time or GPS location in a camera)
Extracted
automatically & semi-automatically (i.e.
with some human assistance)
Manually
produced (e.g. for legacy material such as
existing film archives)
Main Elements of MPEG-7
Description Tools: ( textual / binary )
Descriptors (D): define the syntax and the semantics of each
feature (metadata element)
Description Schemes (DS): relationships between components
Description Definition Language (DDL):
Define the syntax of the MPEG-7 Description Tools
Creation , extension and modification of DSs
System tools:
Storage and transmission, synchronization of descriptions with
content, multiplexing of descriptions, etc.
Main Elements of MPEG-7
Relationship among elements introduced above.
Graph by P. Salembier and O. Avaro
Description Tools
Creation and production processes: (director, title)
Usage: (broadcast schedule)
Storage features.
Structural information: (spatial-temporal components)
Segmentations
Low level features: (sound timbres, melody description)
Conceptual information: (objects and events, interactions)
Navigation and access: (summaries, variations)
Collections of objects.
User-content interactions: (user preferences, usage history)
Organization of Description Tools
Graph by P. Salembier and O. Avaro
Descriptions (further)
MPEG-7 approaches the description of content from
several viewpoints.
A set of methods and tools for the different viewpoints of
the description (not a monolithic system)
Interrelated and can be combined in many ways.
Associated with the content itself: (searching, filtering)
Location: (document V.S. stream)
physically located with the material
somewhere else on the globe (maybe not)
Interoperability with other metadata standards: (XML)
Use of Description Tools
The description tools are presented on the basis of the
functionality they provide.
In practice, they are combined into meaningful sets of
description units.
Furthermore, each application will have to select a sub-set
of descriptors and DSs.
Library of tools!
DDL can be used to handle specific needs of the
application. (like scripting in many current applications)
Major Functionalities
MPEG-7 Systems
MPEG-7 Description Definition Language
MPEG-7 Visual
MPEG-7 Audio
MPEG-7 Multimedia Description Schemes (D.T.)
Reference Software: the eXperimentation Model (test)
MPEG-7 Conformance (syntax checking)
MPEG-7 Extraction and use of descriptions (technical report)
MPEG-7 Audio
Audio
provides structures—building upon some
basic structures from the MDS—for describing
audio content.
Low-level Descriptors:
audio
features that cut across many applications
High-level
more
Description Tools:
specific to a set of applications.
Low-level Features
“MPEG-7 Audio Framework”:
Two low-level descriptor types: (for sample and segment)
Scalar : (e.g. power or fundamental frequency)
Vector : (e.g. spectra)
Hierarchical, consistent interface
Any descriptor inheriting from these types can be instantiated,
describing a segment with a single summary value or a series of
sampled values, as the application requires.
Scalable Series: (hierarchical re-sampling)
Progressively down-sample the data contained in a series
(Application-oriented)
Low-level Features (types)
Basic
Basic Spectral
Signal Parameters
Timbral Temporal
Timbral Spectral
Spectral Basis
MPEG-7 Silence Descriptor
Low-level Features (graph)
Graph by P. Salembier and O. Avaro
Low-level Features (details)
Basic: (temporally sampled scalar values for general use)
AudioWaveform
waveform
Descriptor
envelope: (for display purposes).
AudioPower
Descriptor
temporally-smoothed
instantaneous power:
(quick summary of a signal)
Applicable
to all kinds of signals
Low-level Features (details)
Basic Spectral: (single time-frequency analysis of signal)
AudioSpectrumEnvelope: (Base class)
AudioSpectrumCentroid:
the short-term power spectrum:
(display, synthesize, general-purpose search)
dominated by high or low frequencies ?
AudioSpectrumSpread:
the power spectrum centered near the spectral centroid, or spread out over
the spectrum?
pure-tone and noise-like sounds
AudioSpectrumFlatness: (the presence of tonal components)
Low-level Features (details)
Signal Parameters: (periodic or quasi-periodic signals)
AudioFundamentalFrequency:
“confidence
measure”, replacing “pitch-tracking”
AudioHarmonicity:
distinction
between sounds with a
harmonic / inharmonic / non-harmonic spectrum
Low-level Features (details)
Timbral Temporal: (temporal characteristics of segments of
sounds, musical timbre)
LogAttackTime
TemporalCentroid
where
in time the energy of a signal is focused.
Useful when attack times are identical
Low-level Features (details)
Timbral Spectral: (spectral features in a linear-frequency space)
SpectralCentroid:
power-weighted average of the frequency
of the bins in the linear power spectrum.
distinguishing musical instrument timbres
4 Ds for harmonic regularly-spaced components of signals:
HarmonicSpectralCentroid
HarmonicSpectralDeviation
HarmonicSpectralSpread
HarmonicSpectralVariation
Low-level Features (details)
Spectral Basis: (low-dimensional projections of a spectral space to aid
compactness and recognition)
AudioSpectrumBasis:
AudioSpectrumProjection:
low-d features of a spectrum after projection upon a reduced rank basis.
independent subspaces of a spectra correlate strongly with
different sound sources.
Provide more salience using less space.
With Sound Classification and Indexing Description Tools.
a series of (time-varying / statistically independent) basis functions
derived from the singular value decomposition of a normalized power
spectrum.
Low-level Features (details)
Silence segment: (no significant sound)
aid further segmentation of the audio stream, or as a hint not to
process a segment
High-level audio Description Tools
(Ds and DSs)
Exchange some generality for descriptive richness:
a smaller set of audio features (as compared to visual features)
that may canonically represent a sound without domain-specific
knowledge.
Audio Signature (DS)
Musical Instrument Timbre
Melody
General Sound Recognition and Indexing
Spoken Content
High-level audio Description Tools (details)
Audio
Signature Description Scheme
SpectralFlatness
Ds
a unique content identifier for the purpose of robust
automatic identification
e.g. audio fingerprinting
High-level audio Description Tools (details)
Musical
Instrument Timbre Description Tools
HarmonicInstrumentTimbre
LogAttackTime
Ds:
Descriptor
PercussiveIinstrumentTimbre
SpectralCentroid
Descriptor
Ds:
High-level audio Description Tools (details)
Melody Description Tools:
efficient, robust, and expressive melodic similarity
matching.
MelodyContour Description Scheme:
terse, efficient melody contour / rhythm
MelodySequence Description Scheme:
verbose, complete, expressive melody / rhythm.
Interval encoding
High-level audio Description Tools (details)
General Sound Recognition and Indexing Description
Tools:
SoundModel Description Scheme
SoundClassificationModel Description Scheme
a
set of SoundModel DS -> multi-way classifier
SoundModelStatePath
indices
Descriptor
to states generated by a SoundModel of a segment
immediately
applied to sound effects
automatically index and segment sound tracks.
Low -> mid -> high level analyses
High-level audio Description Tools (details)
Spoken Content Description Tools:
detailed description of words spoken within an audio
stream.
indexing into and retrieval of an audio stream
indexing of multimedia objects annotated with speech.
Recall of audio/video data by memorable spoken events.
Spoken Document Retrieval
a character or person spoke a particular word
separate spoken documents
Annotated Media Retrieval
photograph retrieved using a spoken annotation
Development
Currently under development:
New Audio Description Tools specified (MPEG-7 version 2):
MPEG-7 Audio COR.1 (currently at DCOR1)
MPEG-7 Amendment 1 (currently at FPDAM1)
Spoken Content:
Audio Signal Quality:
Audio Tempo:
Currently Proposed tools:
Low Level Descriptor for Audio Intensity
Low Level Descriptor for Audio Spectrum Envelope Evolution
Generic mechanism for data representation based on ‘modulation
decomposition’
MPEG-7 Audio-specific binary representation of descriptors
MPEG-7 version 1 Schedule
Call for Proposals
October 1998
Evaluation
February 1999
First version of Working Draft (WD)
December 1999
Committee Draft (CD)
October 2000
Final Committee Draft (FCD)
February 2001
Final Draft International Standard (FDIS) July 2001
International Standard (IS)
September 2001
MPEG-7 work plan:
See
:
Annex A of MPEG-7 Overview (version 9)
http://www.chiariglione.org/mpeg/standards/mpeg7/mpeg-7.htm
Annotated Link Page / References
http://www.music.mcgill.ca/~damonli/611/611_w2
.htm
All pictures taken from:
P. Salembier and O. Avaro, “MPEG-7: Multimedia Content Description
interface”,
http://gps-tsc.upc.es/imatge/_Philippe/demo/MPEG21_MPEG7.pdf