Assessment of a Video Quality Metric presenter Sarnoff

Download Report

Transcript Assessment of a Video Quality Metric presenter Sarnoff

Results of the ATIS/T1A1.1
Ad Hoc Group on Full-Reference Video
Quality Metrics (FR-VQM)
VSF Meeting
October 3, 2001
John Pearson
Sarnoff Corporation
[email protected]
Take Home Messages
• Tariff’s can now include Visual Quality
Metrics (Full Reference)
• The basis for this is a family of 4
Technical Reports by ATIS/T1A1
• The T1A1 approach is extensible to
additional Visual Quality Metrics, and
does NOT establish a Standard
Outline
• Why is measuring Visual Quality
important?
• Why is measuring Visual Quality hard?
• International Standards for VQM’s
• T1A1 Technical Reports
FR-VQM Needs of US Telecom
Q-??
Q-A
Site of Video
Origination
(e.g., Denver)
Company A
uses VQM-A
Transfer
Between
Network A & B
Q-B
Company B
uses VQM-B
Site of Video
Consumption
(e.g., Mexico City)
• Digital video processing can create objectionable noise
• End-to-End QoS across the networks of multiple companies
requires agreement on Quality at Transfer Points (Tariffs)
• Tariff’s require ANSI sanctioned technical documentation
Digital Video Creates “Patterned” Noise
... Human visual response to patterned noise highly non-linear ...
Random “Analog” Noise
MSE = 27.10
Blocky “Digital” Noise
MSE = 21.26
Measures like MSE suitable for Analog noise no longer work for Digital noise
Patterned noise in the
sky much more perceptible
even though much smaller
in terms of pixel differences
Source Frame
Difference Map
Codec Frame
Visual Quality Metrics
... correlate well across scene types, unlike MSE ...
Visual Discrimination Model
4
Mean-Squared Error
Extreme ly
perc e ptible
Correl ation coef fi ci ent = 0.96
3
Mean of 80 trials
for 20 subjects
2
Clearly
perc e ptible
3
Mildly
perc e ptible
2
Bars show 5%
confidence intervals
Bus
Cos by
Flower Garden
Mobi le Cal endar
NBA
1
Barely
perc e ptible
0
8
10
12
14
VDM Fidel ity Metric
16
18
Not
perc e ptible
Bus
Cos by
Flower Garden
Mobi le Cal endar
NBA
0
50
100
150
MSE Fi deli ty Metric
200
1
250
0
DSIS Rating
DSIS Rating
4
Correl ation coef fi ci ent = 0.39
Vital Role of Subjective Database
• Goal of VQMs is to approximate
subjective quality assessments (SQA)
• The relevance of the SQA depends on:
– Test sequences (SRC’s)
– Distortion generators (HRC’s)
– Viewing conditions and testing protocols
• Producing a relevant SQA is hard
Three Kinds of VQM’s
• Full Reference (FR)
– a double-ended method and is the subject of this Technical Report.
• Reduced Reference (RR)
– only reduced video reference information is available. This is also a
double-ended method.
• No Reference (NR)
– no reference video signal or information is available. This is a
single-ended method.
• It is generally believed that the FR method will provide the most
accurate measurement results while the RR and NR methods
will be more convenient for QoS monitoring.
• The T1A1 Technical Reports concern FR methods
Full-Reference VQM’s with Normalization
Reference
Video
Processed
Video
System Under Test
Normalization
Normalized
Processed Video
Reported Normalization
Adjustments
Measurement Picture Quality
Method
Rating (PQR)
International Standards Progress
• VQEG may be several years from
recommending a FR-VQM standard to ITU
• Its possible that no single FR-VQM will be a
clear “winner”
• The FR-VQM field is young, and significant,
steady improvements are expected over the
next decade
• It’s possible that several different FR-VQMs
may gain industry acceptance
T1A1.1 FR-VQM Strategy
… an extensible family of TR’s for FR-VQ, enabling
Industry to move ahead without Standards ...
• Provide guidelines for how Industry can
– specify its specific FR-VQM needs
– assess the suitability of existing documented FR-VQMs
– drive the development by FR-VQM proponents of new/improved
FR-VQM algorithms and products
– inter-operate with different FR-VQMs
• Provide guidelines for how FR-VQMs can be
– documented in algorithms, accuracy and limitations
– quantitatively cross-calibrated to each another
• Extensible framework enabling addition of FR-VQMs
– Start by specifying two already disclosed FR-VQMs
– Stimulates continued FR-VQM innovation
Primary Contributors
David Fibush
Dick Streeter
Alexander Woerner
Harley Myler
Stephen Wolf
Stephen Voran
Margaret Pinson
Ahmad Ansari
Pierre Costa
Debra Phillips
Michael H. Brill
John Pearson
Jeffrey Lubin
John Grigg
Greg Cermak
Phil Corriveau
A. B. Watson
Tektronix
CBS
Rohde & Schwarz
University of Central Florida
NTIA
NTIA
NTIA
SBC
SBC
SBC
Sarnoff Corporation
Sarnoff Corporation (co-chair)
Sarnoff Corporation
Qwest (co-chair)
Verizon
CRC
NASA Ames Research Center
Family of Technical Reports
• TR A1: Accuracy and Cross-Calibration (Mike Brill, Sarnoff)
– defines accuracy (statistical analysis), limitations of a FR-VQM
– defines transformation to common scale, for cross-calibration with other
applicable FR-VQMs
• TR A2: Normalization Methods (David Fibush, Tektronix)
– applied to source and processed video before VQM calculation
– e.g., spatial/temporal registration, gain/level offset calibration, ...
– may utilize special test signals
• TR A3: Peak Signal to Noise Ratio (Steve Wolf, NTIA)
– Specify PSNR VQM, following TR A1 and TR A2 guidelines
• TR A4: Objective Perceptual FR-VQM Using a JND-Based
Full Reference Technique (David Fibush, Mike Brill)
– Specify JND-based FR-VQM, following TR A1 and TR A2 guidelines
TRA1 Defines Basic Methods:
• “How to” specify VQM accuracy
LIMITATIONS
– with respect to subjective assessments
– based on defined statistical analysis
SCOPE
• “How to” specify VQM scope/limitations
– type of scene content (“signal”)
• high/low motion, color/b&w, interlaced/progressive
Works well, & has been well tested here
– type/severity of artifacts (“noise”)
• e.g., encoding techniques, bit-rates, blurring, blockiness
– subjective testing characteristics
• behavior with viewing distance, resolution, gamma, …
• expert vs non-expert viewers
• “How to” cross-calibrate VQMs
– determination of mathematical transformation relating one
VQM’s outputs to another’s
VQEG Database: “SRC’s”
Sequence
Characteristics
Baloon-pops
film, saturated color, movement
NewYork 2
masking effect, movement)
Mobile&Calendar
available in both formats, color, movement
Betes_pas_betes
color, synthetic, movement, scene cut
Le_point
color, transparency, movement in all the directions
Autumn_leaves
color, landscape, zooming, water fall movement
Football
color, movement
Sailboat
almost still
Susie
skin color
Tempete
color, movement
Table B3. Test sequences used to determine test factors, coding technologies and applications
for which the PQR method has shown the accuracy specified in section 1.3.4
VQEG Database: “HRC’s”
See the VQEG final report (ITU-T COM9-80, June 2000 – see Annex A) for further details
regarding the data in these tables. All data is for the 525-line system.
BIT RATE
2 Mb/s
2 Mb/s
4.5 Mb/s
3 Mb/s
4.5 Mb/s
3 Mb/s
4.5 Mb/s
6 Mb/s
8 Mb/s
8 & 4.5 Mb/s
19 Mb/s - NTSC19 Mb/s - NTSC12 Mb/s
50-50-…
-50 Mb/s
19-19-12 Mb/s
n/a
RES
¾ resolution
¾ resolution
METHOD
mp@ml
sp@ml
mp@ml
mp@ml
mp@ml
mp@ml
mp@ml
mp@ml
mp@ml
mp@ml
422p@ml
COMMENTS
This is horizontal resolution reduction only
422p@ml
7th generation with shift / I frame
With errors
With errors
Composite NTSC and/or PAL
Composite NTSC and/or PAL
Two codecs concatenated
NTSC 3 generations
3rd generation
Multi-generation Betacam with drop-out
compensation (4 or 5, composite/component)
Table B1. Test factors, coding technologies and applications for which the PQR method
has shown the accuracy specified in section 1.3.4.
422p@ml
n/a
JND/PQR & PSNR Limitations: no H.263
BIT RATE
1.5 Mb/s
768 kb/s
Other
RES METHOD
CIF H.263
CIF H.263
COMMENTS
Full Screen
Full Screen
The PQR method specified in this Technical Report is not
appropriate for video conferencing applications that repeat
fields or do not meet the latency and delay requirements of
the video classes. In addition the PQR method is only
applicable to typical broadcast transmission systems with
very low error rates such as those included in the VQEG
tests.
Algorithm Documentation: JND/PQR
Y
(from Front-End processing)
Luma Compression
P yramid Decomposition (4 levels)
Level 0
Level 1
Level 2
Spatial Filtering and Contrast Computation
Level 3
Temporal Filtering and
Contrast Computation
Contrast Energy Masking
Contrast Energy Masking
To Chroma Processing
Luma JND Map
Stripping for JND/PQR Registration
Algorithm Documentation: PSNR



PSNR tn 20 log10 



Ypeak
1 Oh Nh 1
NhNv  j
Oh
 Y i, j,t
Ov Nv 1
ref
i O v
n

 d   Yˆproc i, j,t n 
2







Normalization Requirements
Parameter
Luminance level
JND/PQR
Normalization Tolerance
< 0.2 dB of peak white
Color-difference level
< 0.2 dB of max allowed excursion
Luminance DC level
< 0.5 % of peak white
Color-difference DC level
Channel-to-channel delay offset
Horizontal pixel shift
Vertical line shift
Temporal shift
< 0.5% of max allowed excursion
< 2 ns
< 0.1 pixel
0 lines (limited to integer line shifts)
0 fields
Table 1. Normalization parameters and tolerance
Table 1. Normalization Requirements for PSNR
Parameter
PSNR
Luminance gain
Normalization Tolerance
< 0.2 dB
Luminance DC level
< 0.5 % of signal max
Horizontal pixel shift
< 0.1 pixel
Vertical line shift
< 0.1 line
This tolerance implies fieldaccurate temporal registration.
VQEG data & Logistic-mapped PQR
60
50
MDOS
40
30
VQEG Data
Logistic
20
10
-
2
4
6
(10)
PQR
8
10
12
Logistic-mapped PQR for Common Scale
… provides approach for cross-calibration...
1
Common VQM Scale
0.8
0.6
0.4
0.2
0
0
2
4
6
8
Nativ e PQR values
10
12
14
Accuracy -- 3 Methods
• RMSE
• Resolving Power
• Classification of Errors
Confidence vs.D-VQM: JND/PQR
Confidence vs.D-VQM: PSNR
RMSE
• RMSE: root mean square error between
subjective and objective normalized scores
A first order calculation of resolving power can be made by simply calculating the root mean
square error (RMSE) of the subjective scores versus the objective values in the normalized
domain. Differences in VQM values equal to the RMSE provide a 68% confidence level and 1.96
times the RMSE provides a 95% confidence level. While this method does not give the same
result as the more complex approach it is easily understood and may be quite useful considering
the accuracy levels in operational environments.
VQM_RMSE = 0.06723
This corresponds approximately to the more accurate curve of figure 4 as shown below.
Confidence level
Figure 4
Per RMSE
68%
0.053
0.066
95%
0.187
0.132
Classification of Errors
Bo
Eo
Δo
Wo
Δo
Ws
Δz
Es
0
Δz
Bs
Subjective
Score Diffs.
Wo
Eo
Bo
VQM
Differences
Bs
False Ranking
False Tie
Correct Decision
0
Es
False Differentiation
Correct Decision
False Differentiation
Ws
Correct Decision
False Tie
False Ranking
Wo
Eo
Bo
Bs
False Ranking
False Tie
Correct Decision
Es
False Differentiation
Correct Decision
False Differentiation
Ws
Correct Decision
False Tie
False Ranking
Progress
• T1A1.1 Ad Hoc Group created Feb.
2001, co-chairs John Grigg, John
Pearson
• Mail Ballot Approval August 2001
• Approved by T1A1.1 25 September
2001
• Approved at Plenary meeting of T1A1,
28 September 2001