High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

Download Report

Transcript High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

High Performance Logging System for
Embedded UNIX and GNU/Linux Applications
IEEE RTCSA 2013 (8/21/13)
Cisco Systems
Jaein Jeong
Introduction
- Embedded UNIX in many places
Traditional UNIX Logging System
App
log
Process
App
log
Process
syslogd
…
App
log
Process
syslog
File System
USER
KERNEL
Buffer
2 / 25
Problem Statement
- Apps slow down w. large amount of logging
• Long latency to logging daemon
• Inefficiency of unbuffered writes to flash FS
• Long latency even with output buffering
App
log
Process
App
log
Process
syslogd
…
App
log
Process
Flash
Logger
syslog
Flash
File System
USER
KERNEL
Buffer
Named pipe
3 / 25
Our Approach
• Faster Message Transfer
• Compatibility with Existing Logging Apps
• Destination-Aware Message Formatting
4 / 25
Organization
• Related Work for UNIX Logging Systems
• Background
– Cisco UCS and Virtual Interface Card (VIC)
– Evolution of VIC Logging System
• Design Requirements and Implementation
• Evaluation and Optimization
• Conclusion
5 / 25
Related Work
- Logging Methods for UNIX Apps
• Not designed for embedded/flash logging
– Slow msg passing (msg copying over kernel)
– Unbuffered message writes
Rsyslog
• An extension used in latest distros
• Multi-threading.
Syslog-ng
• An extension based on nsyslogd
• Reliable transport, encryption, and
richer set of information and filtering
Syslog
• Introduced in early 80’s
• Still most notable one
6 / 25
Background
- Cisco UCS and Virtual Interface Card
Cisco UCS server
Cisco UCS datacenter
server system
10GBASE-KR
Unified Network
Fabric, 1 to Each
Fabric Extender
Cisco UCS
Virtual Interface Card (VIC)
Mgmt CPU
MIPS proc core
(500MHz, MIPS 24Kc)
VIC ASIC
Mgmt CPU
FCPU 0
Embedded Linux
(Linux kernel 2.6.23-rc5)
FCPU 1
128
Programmable
Virtual
Interfaces
Ethernet NICs
Fibre Channel HBAs
7 / 25
Background
- Evolution of VIC Logging System
App
Process
log
App
Process
log
logd
…
App
Process
log
Flash
Logd – a simple logging daemon
Buffered
• Logging from
syslogd
Switch
JFFS2
System
Process
…
App
Process
log
Flash
Flash
Logger
syslog
JFFS2
USER
KERNEL
log
App
Process
log
Switch
System
Process
System
Process
syslogd
…
System
Process
…
log
Flash
syslog
Switch
syslogd
log
App
Process
App
Process
Multiple Processes
System
Process
• Different Severity Levels
System
App
Process
log
• Process
Formatting
and flash writing …
App
Process
Switch
Unbuffered
syslogd
JFFS2
USER
KERNEL
Buffer
• Improves
Forwards serious
flash write
msgs
performance
to switches
• Functional,
of
unbufferedbut
syslogd
with worse write
• performance
Still suffers long latency
8 / 25
Buffer
Named pipe
Organization
• Related Work for UNIX Logging Systems
• Background
– Cisco UCS and Virtual Interface Card (VIC)
– Evolution of VIC Logging System
• Design Requirements & Implementation
• Evaluation & Optimization
• Conclusion
9 / 25
Design Requirements
- Faster Message Transfer
• Avoid kernel-to-user space msg copying
Switch
Syslogd Logging
App
Process
log
App
Process
log
Switch
System
Process
System
Process
syslogd
…
App
Process
log
KERNEL
…
System
Process
App
Process
Switch
Flash
App
Process
System
Process
log
mqlogd
log
dequeue
…
JFFS2
USER
Switch
System
Process
App
Process
Flash
Logger
syslog
Mqlogd Logging
Named pipe
KERNEL
System
Process
enqueue
log
Flash
Memory
Mapped
File
Flash
Logger
JFFS2
USER
Buffer
…
10 / 25
Named pipe
Design Requirements
- Faster Message Transfer
• Reduce message copying from 4 to 2
3
Syslogd
copy
1 App
locallocal
copy
4 Write to named
pipe
2
kernel buffer
2’ Write directly
from shared
memory
to named pipe
1’
to shared
memory
Switch
Syslogd Logging
App
Process
log
App
Process
log
System
Process
System
Process
1
syslogd
…
App
Process
Switch
2 3
log
syslog
4
Switch
Switch
System
Process
…
App
Process
System
Process
App
Process
Flash
App
Process
Flash
Logger
System
Process
log
log
…
JFFS2
USER
KERNEL
Mqlogd Logging
1’
mqlogd
dequeue
enqueue
2’
log
Named pipe
KERNEL
System
Process
Flash
Memory
Mapped
File
Flash
Logger
JFFS2
USER
Buffer
…
11 / 25
Named pipe
Design Requirements
- Compatibility with Existing Logging Apps
• Thru Logging API
– Replace syslog() with
share memory lib calls
Logging Client
app1
mcp
app2
fls
• Direct Syslog Calls
– Server receives msgs
through UDP Unix socket
Logging Client
…
klogd
fls
xinetd
…
Logging API :
log_info(), log_error(), …
syslog() library call
syslog()
call
Shared library
Memory
Logging
UDP
UnixLibrary
Socket
UDP Unix Socket
Logging Server (mqlogd)
(Syslogd)
Logging Server (mqlogd)
(Syslogd)
12 / 25
Design Requirements
- Destination-Aware Message Formatting
• Syslogd
– Working but limited
– Redundant
– Coarse time granularity (in seconds)
• Mqlogd
– Destination-aware formatting with space saving
– Uses system supported timing (in micro-seconds)
13 / 25
Implementation
- Shared Memory and Circular Queue
Queue Memory Layout
Header Entry
Circular Queue Header
Non-Header Entry
Non-Header Entry
Notification
Disable Flag
Non-Header
Entry
…
Logging
Event
Logging
Enqueue Client
…
Logging
Client
• Notification Mechanism
Shared Memory
Dequeue
Logging
Server
Notification
– Write-and-select
– Signal
• Locking Mechanism
– Semaphore lock
– Pthread lock
14 / 25
Organization
• Related Work for UNIX Logging Systems
• Background
– Cisco UCS and Virtual Interface Card (VIC)
– Evolution of VIC Logging System
• Design Requirements & Implementation
• Evaluation & Optimization
• Conclusion
15 / 25
Evaluation
• Metrics
– Request Latency
– Request Drop Rate
• Parameters
– Number of clients
– Number of iterations (Depth of queue size)
– Locking mechanism
– Notification mechanism
16 / 25
Performance Results
- Performance compared to syslogd
• Avg Latency: >10x speed-up
• Min Latency: >20x speed-up
• Max Latency: >2x speed-up
Maximum
Minimum
Average Request
RequestLatency
Latency- -11Client
Client
Latency (us)
800
50000
40000
600
30000
400
20000
200
10000
syslogd
mqlogd (select, semaphore)
mqlogd (signal, semaphore)
mqlogd (select, pthread)
0 0
100
100
1000
1000
5000
5000
10000
10000
Number
NumberofofIterations
Iterations
50000
50000
mqlogd (signal, pthread)
17 / 25
Performance Results
- Effect of Queue Size
• No drops within queue size (e.g. 10000)
• Queue size should be larger than max
expected burst size
Request Drop Rate - 1 Client
100%
Percent
80%
60%
mqlogd (select, semaphore)
40%
mqlogd (signal, semaphore)
20%
mqlogd (select, pthread)
0%
mqlogd (signal, pthread)
100
1000
5000
10000
Number of Iterations
50000
18 / 25
Performance Results
- Effect of Multiple Clients
• Avg request latency increases proportionally
• With 2 clients, request starts to drop with
smaller number of iterations
Avg
Request
Request
Drop
Latency
Rate - -11and
and22Clients
Clients
(us)
Latency
Percent
2000.0
100%
80%
1500.0
60%
1000.0
40%
500.0
20%
syslogd (1 client)
syslogd (2
mqlogd
(select,
clients)
1 client)
mqlogd (select, 2
1 clients)
client)
0%
0.0
mqlogd (select, 2 clients)
100
100
1000
1000
5000
5000
10000
10000
Number
Numberof
ofIterations
Iterations
50000
19 / 25
Performance Results
- Effect of Notification Mechanisms
• Makes little difference
Maximum
Minimum
Average Request
RequestLatency
Latency- -11Client
Client
Latency (us)
100
25
20000
20
80
15000
15
60
10000
10
40
5000
20
5
mqlogd (select, semaphore)
mqlogd (signal, semaphore)
00 0
100
100
100
1000
1000
1000
5000
5000
5000
10000
10000
10000
Number
Number
Number
of
ofof
Iterations
Iterations
Iterations
50000
50000
20 / 25
Performance Results
- Effect of Lock Mechanisms
• Pthread mutex is 40% faster than semaphore.
• Semaphore is used for our production code due to a
limitation of pthread mutex lock
(Linux kernel 2.6.23-rc5)..
Maximum
Minimum
Average Request
RequestLatency
Latency- -11Client
Client
Latency (us)
250
150
20000
200
15000
100
150
10000
100
50
5000
50
mqlogd (select, semaphore)
mqlogd (select, pthread)
0 0
100
100
1000
1000
5000
5000
10000
10000
Number
NumberofofIterations
Iterations
50000
50000
21 / 25
Performance Results
- Effect of Client Interface Type
• Logging using UNIX socket interface
– Backward compatibility is no faster
– About the same level as syslogd.
– For compatibility, not for general use.
Average Request Latency - 1 Client
Latency (us)
1000
800
600
syslogd
400
mqlogd (select, semaphore)
200
mqlogd (Unix socket)
0
100
1000
5000
10000
Number of Iterations
50000
22 / 25
Optimization
- Effects of deferred notification
• Sends one notification for a batch of msgs
• Measured time for host-to-adapter commands
(capability & macaddr) with and w.o. logging
• 2x speed-up in latency
Latency for 'capability' command
Latency for 'macaddr' command
1000
1800
1600
1400
1200
1000
800
600
400
200
0
800
logging
time (us)
msg xfer
time (us)
logging
time (us)
600
400
msgxfer
xfer
msg
time(us)
(us)
time
200
0
write-and-select
deferred
syslogd
write-and-select
deferred
syslogd
23 / 25
Future Works
• Reduce kernel msg copying even further
• Improve performance with faster lock
• Avoid loss of serious messages
App
log
Process
App
log
Process
enqueue
App
log
Process
…
Memory
Mapped
File
mqlogd
dequeue
Memory
Mapped
File
Flash
Logger
File System
USER
KERNEL
Named pipe
24 / 25
Conclusion
•
•
•
•
Logging system for embedded UNIX apps
Up to 100x speed-up in latency, 10x throughput
Backward Compatibility
Commercially used in Cisco UCS Virtual
Interface Cards
25 / 25