Fault and Performance Management for Next Generation IP Communication Alan Clark, Telchemy

Download Report

Transcript Fault and Performance Management for Next Generation IP Communication Alan Clark, Telchemy

Fault and
and Performance
Performance Management
Management
Fault
for
for
Next Generation
Generation IP
IP Communication
Communication
Next
Alan Clark,
Clark, Telchemy
Telchemy
Alan
Outline
•
•
•
•
•
Problems affecting VoIP performance
Tools for Measuring and Diagnosing Problems
Protocols for Reporting QoS
Performance Management Architecture
What to ask for/ integrate?
Enterprise VoIP Deployment
IP
Phone
IP
Phones
IP VPN
Branch Office
Teleworker
Gateway
IP
Phone
VoIP Deployment - Issues
IP
Phones
ROUTE
FLAPPING,
LINK FAIL
IP
Phone
IP VPN
CODEC
DISTORTION
Gateway
LAN CONGESTION,
DUPLEX MISMATCH,
LONG CABLES….
ECHO
ACCESS
LINK
CONGESTION
IP
Phone
Call Quality Problems
•
•
•
•
•
•
•
Packet Loss
Jitter (Packet Delay Variation)
Codecs and PLC
Delay (Latency)
Echo
Signal Level
Noise Level
Packet Loss and Jitter
Jitter
Buffer
IP
Network
Codec
Distorted
Speech
Packets lost
in network
Packets discarded
due to jitter
Routers, Loss and Jitter
Queuing
delay
Arriving
packets
Processing
delay
Input
queue
Queuing
delay
Serialization
delay
Output
queue
Prioritize/
Route
Packet loss due to buffer
Overflow or RED
Voice packet delayed
by one or more data
packets
Queuing Delays
200
1 x 1500 byte MTU
Max delay (mS)
175
2 x 1500 byte MTU
150
3 x 1500 byte MTU
125
Added delay due to
wait for data packets
to be sent = Jitter
100
75
50
25
0
0
500
1000
1500
Transmission speed (kbits/s)
2000
Jitter
150
Average jitter level (PPDV) = 4.5mS
Peak jitter level = 60mS
Delay (mS)
125
100
75
50
0
0.5
1
Time (Seconds)
1.5
2
0
0
1
2
5
1
5
0
1
7
5
2
0
0
2
2
5
2
5
0
2
7
5
3
0
0
3
2
5
3
5
0
3
7
5
4
0
0
4
2
5
4
5
0
1
5
0
5
250
7
5
2
0
Delay (mS) & RSSI
WiFi can also cause jitter
300
RSSI
Delay
200
150
100
50
0
Time
Effects of Jitter
• Low levels of jitter absorbed by jitter buffer
• High levels of jitter
o
o
lead to packets being discarded
cause adaptive jitter buffer to grow - increasing delay but reducing
discards
• If packets are discarded by the jitter buffer as they arrive
too late they are regarded as “discarded”
• If packets arrive extremely late they are regarded as “lost”
hence sometimes “lost” packets actually did arrive
Packet Loss
500mS Avge Packet Loss Rate
50
Average packet loss rate = 2.1%
Peak packet loss = 30%
40
30
20
10
0
30
35
40
45
50
55
Time (seconds)
60
65
70
Packet Loss is bursty
• Packet loss (and packet discard) tends to occur in
sparse bursts - say 20-30% in density and one
second or so in length
• Terminology
o
o
o
Consecutive burst
Sparse burst
Burst of Loss vs Loss/Discard
Example Packet Loss Distribution
Bur st w e ight ( pa ck e t s)
200
150
100
50
0
0
100
200
300
Bur st le ngt h ( pa cke t s)
400
500
Loss and Discard
• Loss is often associated with periods of high
congestion
• Jitter is due to congestion (usually) and leads to
packet discard
• Hence Loss and Discard often coincide
• Other factors can apply - e.g. duplex mismatch,
link failures etc.
Example Loss/Discard Distribution
Bur st w e ight ( pa ck e t s)
200
150
100
50
0
0
100
200
300
Bur st le ngt h ( pa cke t s)
400
500
500
400
300
200
100
0
5
MOS
Bandwidth (kbit/s)
Leads To Time Varying Call Quality
High jitter/ loss/ discard
Voice
Data
0 1
2 3
4 5
6 7
8 9 10 11 12 13 14 15 16 17 18
4
3
2
1
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18
Time
Packet Loss Concealment
Estimated by PLC
• Mitigates impact of packet loss/ discard by
replacing lost speech segments
• Very effective for isolated lost packets, less
effective for bursty loss/discard
• But isn’t loss/discard bursty?
• Need to be able to deal with 10-20-30% loss!!!
Effectiveness of PLC
5
Codec
distortion
G.711 no PLC
G.711 PLC
G.729A
ACR MOS
4
Impact of loss/
discard and
PLC
3
2
1
0
5
10
Packet Loss/Discard Rate
15
20
Call Quality Problems
•
•
•
•
•
•
•
Packet Loss
Jitter (Packet Delay Variation)
Codecs and PLC
Delay (Latency)
Echo
Signal Level
Noise Level
Effect of Delay on Conversational Quality
5
MOS Score
4
3
2
55dB Echo Return Loss
35dB Echo Return Loss
1
0
100
200
300
400
Round trip delay (milliseconds)
500
600
Causes of Delay
Accumulate and encode
Echo
Control
CODEC
RTP
IP
UDP
TCP
Network
delay
Jitter buffer, decode and playout
RTP
IP
UDP
TCP
CODEC
Echo
Control
External delay
Cause of Echo
Gateway
IP
Echo
Canceller
Round trip delay - typically 50mS+
Acoustic
Echo
Line
Echo
Additional delay introduced by VoIP
makes existing echo problems more obvious
Also - “convergence” echo
Echo problems
• Echo with very low delay sounds like “sidetone”
• Echo with some delay makes the line sound hollow
• Echo with over 50mS delay sounds like…. Echo
• Echo Return Loss
o
o
55dB or above is good
25dB or below is bad
Call Quality Problems
•
•
•
•
•
•
•
Packet Loss
Jitter (Packet Delay Variation)
Codecs and PLC
Delay (Latency)
Echo
Signal Level
Noise Level
Signal Level Problems
Amplitude Clipping occurs -- speech
sounds loud and “buzzy”
0 dBm0
-36 dBm0
Temporal Clipping occurs with VAD or Echo Suppressors
-- gaps in speech, start/end of words missing
Noise
• Noise can be due to
o
o
o
o
Low signal level
Equipment/ encoding (e.g. quantization noise)
External local loops
Environmental (room) noise
• From a service provider perspective - how to
distinguish between
o
o
room noise (not my problem)
Network/equipment/circuit noise (is my problem)
Measuring VoIP performance
VoIP Specific
Active Test
- Measure test calls
Passive Test
- Measure live calls
VQmon
ITU G.107
VQmon
ITU P.VTQ
Analog signal
based
ITU P.862 (PESQ)
ITU P.563
“Gold Standard” - ACR Test
4
3
2
2
• Speech material
o
o
o
Phonetically balanced speech samples 8-10 seconds in length
Test designed to eliminate bias (e.g. presentation order different for each
listener)
Known files included as anchors (e.g. MNRU)
• Listening conditions
o
o
Panel of listeners
Controlled conditions (quiet environment with known level of background
noise)
Example ACR test results
50
• Extract from an ITU
subjective test
• Mean Opinion Score
(MOS) was 2.4
40
Votes
30
20
10
•
•
•
•
•
1=Unacceptable
2=Poor
3=Fair
4=Good
5=Excellent
0
1
2
3
4
Opinion Score
5
Packet based approaches
Test Call
VoIP
Test
System
VoIP
Test
System
IP
Measure
call
Live Call
VoIP
End
System
VQmon,
G.107.
P.VTQ
VoIP
End
System
IP
Passive
Test
Passive
Test
Packet based approaches
• ITU G.107
o
o
R = Ro - Is - Ie - Id + A
Really a network planning tool
Missing many essential monitoring features
• VQmon
o
o
ITU G.107 + ETSI TS 101 329-5 Annex E +…….
Proprietary but widely used (Superset of G.107 & P.VTQ)
• ITU P.VTQ
o
Available late 2005, very limited functionality
Extended E Model - VQmon
4 State Markov Model
Gather detailed
packet loss info
in real time
Arriving
packets
Loss/ Discard
events
Discarded
Jitter
buffer
CODEC
Signal level
Noise level
Echo level
Metrics
Calculation
Call Quality Scores
Diagnostic Data
Modeling transient effects
Ie(burst)
Measured
Call quality
User Reported
Call quality
Ie(VQmon)
Ie(gap)
10
15
20
25
Time (seconds)
30
35
VQmon - computational model
Burst loss
rate
Perceptual model
Calculate
R-LQ
MOS-LQ
Ie mapping
Gap loss
rate
ETSI TS101 329-5
Signal level
Noise level
Calculate
Ro, Is
Echo
Delay
Calculate
Id
Recency
model
ITU-T G.107
Calculate
R-CQ
MOS-CQ
Accuracy: Non-bursty conditions
Com pa rison of VQm on v s ACR MOS - I LBC 1 5 .2 k
Com pa rison of VQm on v s PESQ - I LBC 1 5 .2 k
4
5
PESQ
ACR MOS
4.5
3.5
VQmon MOS- LQ
VQmon MOS- PQ
P ESQ Score
MOS Score
4
3.5
3
2.5
3
2.5
2
2
1.5
1.5
1
1
0
5
10
Pa cke t Loss Ra t e ( % )
15
20
0
5
10
15
20
Pa cke t Loss Ra t e ( % )
25
30
Accuracy: Bursty conditions
G.107
o
o
o
o
o
•
Well established model for
network planning
No way to represent jitter
Few codec models
Inaccurate for bursty loss
Conversational Quality only
VQmon
o
o
o
o
o
o
Extended G.107
Transient impairment model
Wide range of codec models
Narrow & Wideband
Jitter Buffer Emulator
Listening and Conversational
Quality
4
3.5
Estim a te d MOS
•
3
2.5
E Model
2
1.5
1.5
2
2.5
3
3.5
ACR MO S
Comparison of VQmon and E Model
for severely time varying conditions
4
Signal based approaches
Test Call
VoIP
End
System
P.862
Tester
IP
VoIP
End
System
P.862 is an Active Test Approach
VoIP
End
System
IP
P.563 is a Passive Test Approach
VoIP
End
System
P.563
Tester
ITU P.862 - Active testing
Tested segment of connection
IP
PESQ
Audio
files
Time
align
FFT…
Compare
FFT…
PESQ
Score
ITU P.862 - Active testing
• Send speech file
• Takes typically 50-100 MIPS
per call
3.5
P ESQ Score s
• Compare received file with
original using FFT
4
3
2.5
2
1.5
• MOS-like score in the range 0.5 to 4.5
• Widely used within the
industry
1
0
5
10
15
20
25
30
Pa cke t Loss Ra t e
Results for G.729A codec for a set of
speech files (i.e. for each packet loss
rate the only thing changed is the speech
source file)
35
40
ITU P.563 - Passive monitoring
•
Analyses received speech file
(single ended)
5 .0 0
•
•
•
Produces a MOS score
Correlates well with MOS when
averaged over many calls
Requires 100MIPS per call
ACR MOS
4 .0 0
3 .0 0
2 .0 0
1 .0 0
1
2
3
4
P5 6 3 Scor e
Comparison of P.563 estimated MOS scores with
actual ACR test scores.
Each point is average per file ACR MOS with 16
listeners compared to P.563 score
5
Performance Monitoring - Passive Test
Embedded
Monitoring
Function
RTCP XR
SIP QoS
Report
SLA Monitoring - Active Test
Test call
Active Test Functions
Active or Passive Testing?
• Active testing
o
works for pre-deployment testing and on-demand troubleshooting
• But!!!!
o
IP problems are transient
• Passive monitoring
o
o
o
Monitors every call made - but needs a call to monitor
Captures information on transient problems
Provides data for post-analysis
• Therefore - you need both
VoIP Performance Management
Framework
Network
Management
System
Call Server and
CDR database
Signaling Based
QoS Reporting
Network Probe,
Analyzer or
VQ
Router
VoIP
Endpoint
SNMP
Reporting
VQ
VQ
VoIP
Gateway
RTP stream (possibly encrypted)
Embedded
Monitoring
Media Path Reporting
(RTCP XR)
Embedded
Monitoring
VoIP Performance Management
Framework
• Embedded monitoring function in IP phones, residential
gateways….
o
o
Close to the user
Least cost + widest coverage
• Protocol support developed
o
o
RTCP XR (RFC3611), SIP, MGCP, H.323, Megaco
Draft SNMP MIB
• Works in encrypted environments
• Already being deployed by equipment vendors
The role of RTCP XR
RTCP XR (RFC3611)
1. Provides a useful set of metrics for VoIP performance monitoring
and diagnosis
2. Supports both real time monitoring and post-analysis
3. Extracts signal level, noise level and echo level from DSP software
in the endpoint
4. Exchanges info on endpoint delay and echo to allow remote
endpoint to assess echo impact
5. Provides midstream probes/ analyzers access to analog metrics if
secure RTP is used
6. Goes through firewalls………
RFC3611 - RTCP XR
Loss Rate
Discard Rate
Burst Density
Gap Density
Burst Duration (mS)
Gap Duration (mS)
Round Trip Delay (mS)
End System Delay (mS)
Signal level
RERL
Noise Level
Gmin
R Factor
Ext R
MOS-LQ
MOS-CQ
Rx Config
-
Jitter Buffer Nominal
Jitter Buffer Max
Jitter Buffer Abs Max
SIP Service Quality Reporting Event
PUBLISH sip:[email protected] SIP/2.0
Via: SIP/2.0/UDP pc22.example.com;branch=z9hG4bK3343d7
………
Content-Type: application/rtcpxr
Content-Length: ...
VQSessionReport
LocalMetrics:
TimeStamps=START:10012004.18.23.43 STOP:10012004.18.26.02
SessionDesc=PT:0 PD:G.711 SR:8000 FD:20 FPP:2 PLC:3 SSUP:on
[email protected]
………
Signal=SL:2 NL:10 RERL:14
QualityEst=RLQ:90 RCQ:85 EXTR:90 MOSLQ:3.4 MOSCQ:3.3
QoEEstAlg:VQMonv2.1
DialogID:38419823470834;to-tag=8472761;from-tag=9123dh311
RTCP XR MIB
Session table
History table
Basic
parameters
Call quality
metrics
Alerting
Passive Monitoring Framework
VQ
VQ
IP
Phone
IP
Phones
VQ
VQ
VQ
VQ
IP VPN
VQ
VQ
Branch Office
VQ
Teleworker
VQ
SNMP
VQ
Gateway
SIP QoS Report
NMS
VQ
IP
Phone
What to Implement/ Ask For
• Embedded monitoring functionality in IP Phones
and Gateways (e.g. VQmon)
• RTCP XR for mid-call data exchange between
endpoints
• SIP Service Quality Events for reporting end of call
quality
• RTCP XR MIB for SNMP support
Summary
•
•
•
•
•
Problems affecting VoIP performance
Tools for Measuring and Diagnosing Problems
Protocols for Reporting QoS
Performance Management Architecture
What to ask for/ integrate?