Outline: - Plymouth University

Download Report

Transcript Outline: - Plymouth University

Perceived Speech Quality
Prediction for VoIP Networks
Lingfen Sun
Emmanuel Ifeachor
University of Plymouth
United Kingdom
{L.Sun; E.Ifeachor}@plymouth.ac.uk
ICC 2002, New York, USA
1
Outline
• Introduction
• Simulation system
• Perceived speech quality analysis
– Impact of loss on speech quality
– Impact of talkers on speech quality
• Perceived speech quality prediction using
Neural Network (NN) method
• Conclusions and future work
ICC 2002, New York, USA
2
Introduction
• Speech quality Measurement
– Subjective method (Mean Opinion Score -- MOS)
– Objective methods
• Intrusive methods (e.g. ITU P.862 PESQ)
• Nonintrusive methods (e.g. E-model, NN model)
• Why do we need to predict speech quality?
– For online monitoring VoIP call
– For Quality of Service (QoS) control for VoIP
applications
ICC 2002, New York, USA
3
How to predict speech quality?
• E-model
– All impairments are mapped to R-scale (R MOS)
– Principle: "Psychological factors on the
psychological scale are additive"
– Static and computational model.
• NN-model
– To learn the non-linear relationships between
network impairments and perceived speech
quality
– To adapt to dynamic IP network conditions.
ICC 2002, New York, USA
4
Previous work
• NN databases are based on subjective test only
• As subjective test is time consuming, costly and
stringent, available databases are limited and
cannot cover all the possible scenarios
• Only a limited number of subjects attended MOS
tests
• Limited number of codecs
• Talker dependency has not been considered.
ICC 2002, New York, USA
5
Main objectives of work
• To undertake a fundamental investigation of
the impact of packet loss on perceived
speech quality using an objective
measurement algorithm (e.g. PESQ)
• To investigate the impact of different talkers
on perceived speech quality
• To develop a robust NN model for speech
quality prediction based on PESQ.
ICC 2002, New York, USA
6
Simulation system structure
Simulated VoIP system
encoder
loss
simulator
quality measure
(PESQ)
decoder
Reference
speech
parameter
extraction
Measured
MOS
Degraded
speech
NN
model
Predicted
MOS
• Reference speech is from a speech database
ICC 2002, New York, USA
7
Loss Simulator
p
1-p
0
No-loss
1-q
1
q
2 state Gilbert Model
to simulate packet
loss characteristics
Loss
• Network packet loss + late arrival loss due to jitter
• Unconditional loss probability (ulp, or average loss
rate), ulp = p / (p + 1 – q)
• Conditional loss probability (clp), clp = q to reflect
burst loss features
ICC 2002, New York, USA
8
Impact of loss on speech quality
• How do packet loss and loss burstiness affect
speech quality?
• How does packet size affect speech quality?
• How does codec affect speech quality?
 Using PESQ to calculate perceived MOS score
 Average over 300 different random "seeds" to
reduce the impact from different loss locations
ICC 2002, New York, USA
9
Bursty loss analysis (G.729)
MOS (PESQ)
3.5
3
clp=70%
clp=40%
clp=10%
2.5
2
5
10
15
20
25
Packet Loss (ulp, %) (G.729)
ICC 2002, New York, USA
10
Bursty loss analysis (G.723.1)
MOS (PESQ)
3.5
3
clp=70%
clp=40%
clp=10%
2.5
2
5
10
15
20
25
Packet Loss (ulp, %) (G.723.1)
ICC 2002, New York, USA
11
Bursty loss effect
• clp has an obvious impact on the perceived
speech quality even for the same average
loss rate (ulp)
• When burst loss increases (clp increasing),
the MOS score decreases and the variation
of the MOS score also increases.
 Identify ulp and clp as input parameters
related to loss for NN analysis
ICC 2002, New York, USA
12
Impact of packet size (G.729)
ICC 2002, New York, USA
13
Impact of packet size (G.723.1)
ICC 2002, New York, USA
14
Impact of packet size on quality
• Packet size has, in general, no obvious influence
on speech quality for a given loss rate.
• Variation in speech quality for the same network
loss rate depends on packet size and codec.
• Variation in quality due to loss location is the main
obstacle in the prediction of speech quality
 To consider loss only during active talkspurt
frames (not for silence frames or SID frames).
ICC 2002, New York, USA
15
Impact of talker on speech quality
• To investigate whether difference in talker
(male or female) has an effect on perceived
speech quality
• TIMIT data set and ITU data set are used for
investigation
ICC 2002, New York, USA
16
Talker Dependency
4.5
MOS
4
fdaw0
3.5
flmc0
3
fetb0
2.5
mrjt0
mjhi0
2
mjma0
1.5
0
5
10
15
20
25
30
Packet Loss (ulp, %) (clp=10%)
• For 3 male and 3 female samples
ICC 2002, New York, USA
17
Talker Dependency (cont.)
MOS
4.5
4
fm_1
3.5
fm_2
fm_3
3
mf_1
2.5
mf_2
2
mf_3
1.5
0
5
10
15
20
25
30
Packet Loss (ulp, % ) (clp=10% )
• For 6 mixed male and female samples
ICC 2002, New York, USA
18
Impact of talker on MOS
• Impact of different talkers on perceived
speech quality appears to depend mainly on
the gender of the talker (male or female).
• The quality for the female talker tends to be
worse than that of the male talker for the
same network impairments.
 Identify gender (male and female) as one of
the input parameters for NN analysis.
ICC 2002, New York, USA
19
Quality prediction based on NN
• Developed a neural network model (using
Stuttgart Neural Network Simulator).
• Identified four variables as inputs to NN
– Codec type (G.729, G.723.1 and AMR)
– Gender (male and female)
– Unconditional loss probability  ulp (VAD)
– Conditional loss probability  clp(VAD)
• One output (MOS)
ICC 2002, New York, USA
20
NN structure (for a 4-5-1 net)
1
Gender
1
2
Codec type
2
3
ulp(VAD)
1
MOS
3
4
clp(VAD)
4
5
• a three-layer, feed-forward, neural network architecture
• standard Backpropagation learning algorithm
ICC 2002, New York, USA
21
NN database generation
• Codec: G.729, G.723.1 (6.3Kb/s), AMR
(12.2Kb/s)
• Gender: Male and female
• ulp : 0, 10, 20, 30 and 40%
• clp : 10, 50 and 90%
• Packet size: 1 to 5
 A total of 362 samples (patterns) were
generated based on PESQ. 70% were chosen as
the training set and 30% as the test dataset.
ICC 2002, New York, USA
22
NN training process
Quality measure
Measured
MOS
(PESQ)
Reference
speech
Simulated VoIP
system
Degraded
speech
Backprop
Network,
Codec &
Speech
parameters
ICC 2002, New York, USA

+
-
Predicted
MOS
23
Predicted MOS vs Measured MOS
For test dataset
For training dataset
Predicted MOS
Predicted MOS
5
4
3
2
1
0
0
1
2
3
Measured MOS
4
5
Train:  = 0.967, r = 0.12
ICC 2002, New York, USA
5
4
3
2
1
0
0
1
2
3
4
5
Measured MOS
Test:  = 0.952, r = 0.15
24
Validation of the NN model
• Generated a validation dataset from other
talkers and different network loss conditions
(total 210 samples)
• Obtained  = 0.946, r = 0.19 for the validation
dataset using a trained 4-5-1 neural network.
 This suggested that the neural network
model works well for speech quality
prediction in general.
ICC 2002, New York, USA
25
Conclusions
• Investigated the impact of packet loss, codec and
talker on perceived speech quality based on PESQ
• The loss pattern, loss burstiness and the gender of
the talker have an impact on speech quality.
• Packet size has, in general, no obvious influence on
speech quality, but the deviation in speech quality
depends on packet size and codec.
• Based on codec, bursty loss rate and gender of the
talker, a NN model was developed successfully for
speech quality prediction.
ICC 2002, New York, USA
26
Future work
• Extended to conversational speech quality
prediction to cater for the impact from delay.
• Use real VoIP trace data instead of generated
data from Gilbert loss model.
• Use more robust neural networks.
• Application to QoS Control in VoIP systems.
ICC 2002, New York, USA
27