ch7 (Quality Assesment).ppt

Download Report

Transcript ch7 (Quality Assesment).ppt

6-Speech Quality Assessment
Quality Levels
Subjective Tests
Objective Tests
Intelligibility
Naturalness
Quality Levels
Synthetic Quality (Under 4.8 kbps)
Communication Quality (4.8 to 13 kbps)
Toll Quality (13 to 64 kbps)
Broadcast Quality (Upper than 64 kbps)
Test Types
Intelligibility
Naturalness
Subjective
DRT, MRT
MOS, DAM
Objective
None.
Future ASR
systems
AI, Global SNR, Seg.
SNR, FW-Seg. SNR,
Itakura Measure,
WSSM
First Class
Subjective Intelligibility Tests
Diagnostic Rhyme Test (DRT)
– Selecting between two CVC by different first C
– First C should have specific properties
– Ex. hop - fop And than - dan
Modified Rhyme Test (MRT)
– Selecting between CVC’s by different first C
– Ex. Cat, bat, rat, mat, fat, sat
First Class (Cont’d)
Subjective Intelligibility tests
DRT is very applicable and credible
In this test user can hear the speech only
once
N Correct  N Incorrect
DRT % 
 100
N Tests
Second Class
Subjective Naturalness tests
Mean Opinion Score (MOS)
– MOS is very applicable and credible
– In this test user can hear the speech a lot
Diagnostic Acceptability Measure (DAM)
– This test is very complex
Mean Opinion Score (MOS)
Scores for MOS are like this
Score
Speech Quality
1
Not Acceptable
2
Weak
3
Medium
4
Good
5
Excellent
Diagnostic Acceptability
Measure (DAM)
This test is very complex
In this test there is 19 different
parameters for score. These
parameters divide into 3 main groups:
– Signal Quality
– Background Quality
– Total Quality
Objective Tests
These tests can not be used for
intelligibility. Because system couldn’t
recognize speech intelligibility
Objective tests can only be used for
speech Naturalness
Objective Tests (Cont’d)
Articulation Index (AI)
Signal to Noise Ratio (SNR)
– Global (Classic) SNR
– Segmental SNR
– Frequency Weighted Segmental SNR
Articulation Index (AI)
AI assumes that different frequency bands
distortion are independent, and measure
signal quality in different bands.
In each band determines percentage of
perceptible signal by listener
HZ
200
20 Bands
. . . . . . . . .
6100
Articulation index (Cont’d)
Perceptible by user signal :
– 1- Upper than human hearing threshold
– 2- Under than human pain threshold
– 3- Upper than Masking Noise level
– In each case one of the states 1 or 3 is
prevail
Articulation index (Cont’d)
In AI SNR measured isolated in each
band
20
1
Min ( SNR,30)
AI  
20 j 1
30
Signal To Noise Ratio(SNR)
 ( n )  s( n )  sˆ( n )
E 
Es 
SNR( global)


n  
s
n  
2
(n)


[s
n  
 sˆ( n ) ]
2
(n)
2
(n)
Es
 10 log
 10 log
E

 s( n)
2
n  

2
ˆ
[
s

s
]
 (n) (n)
n  
Segmental SNR
mj
SNR( seg )
1

M
M 1
10 log [
j 0
s
2
(n)
n  m j  N 1
mj
 [s
(n)
n  m j  N 1
 sˆ( n ) ]
]
2
j’th Frame SNR
M : Number of frames
Frequency Weighted
Segmental SNR
K
SNR( fw seg )
1

M
M 1
10 log[
W
k 1
j ,k
[ E s , k ( m j ) E  , k ( m j )]
j 0
W
k 1
K : Number of frequency bands
M : Number of frames
]
K
j ,k
Itakura Measure
H ( )
S ( )
H ( ) Is the envelope spectrum
S ( )  F{R( )}  S ( ) | X ( ) | 2
Use from All-Pole (AR) Model
Itakura Measure (Cont’d)
H ( ) 
1
p
1   ai e
 j
i 1
This is based on the spectrum difference
between main signal and assessment signal
ai
Autoregressive Coefficients
Ki
Ri
Reflection Coefficients
Autocorrelation Coefficients
Itakura Measure (Cont’d)
1
d ( g s (m), g sˆ (m)) 
M
M
[ g (l, m)  g (l, m)]
l 1
m :Index of frame
l : Index of coefficients
s
sˆ
2
Itakura Measure (Cont’d)
~
d lp ( s (m),  sˆ (m' )) 
M
W
[
l 1
l ,m,m '
[ s (l , m)   sˆ (l , m' )]
M
W
l 1

1
]

l ,m,m '
 s (l , m) Is the l’th parameter of the frame that
conduces m’th sample
Weighted Spectral Slope Measure
(WSSM)
 | s(k , m) || s(k  1, m) |  | s(k , m) |
 | sˆ(k , m) || sˆ(k  1, m) |  | sˆ(k , m) |
| s(k  1, m) |
and
| s(k , m) | are in dB.
s(k , m) Is STFT of k’th band of the frame
that conduces m’th sample
d W SSM (| s ( , m) |, | sˆ( , m) |)
36
 K   Wk ,m [ | s (k , m) |  | sˆ(k , m) |] 2
k 1