uClinux vs Linux Context Switching and IPC Performance

Download Report

Transcript uClinux vs Linux Context Switching and IPC Performance

uClinux vs Linux
Context Switching and IPC
Performance Comparison
2004.09.10
Heechul Yun
Digital Media R&D Center
Contents







Introduction
ARM9 Cache Architecture
Benchmark
IPC Performance
Context Switching Performance
Conclusion
Future Works
Introduction

Objective



Context switch


Fair performance comparison of uClinux and Linux
Focused on Context Switching and IPC
Restore register and address space of next process
IPC (Inter Process Communication)

Send & recv messages between processes
uClinux vs Linux

Linux



Separate virtual address space for each process
Need to recover address space on context
switching
uClinux


Single shared address space for all process
No need to recover address space on context
switching
uClinux may be better on context switching
ARM9 MMU Architecture
VA
PA
TLB
Perm
I-Cache
CPU
Memory
D-Cache
Data
Virtually-indexed caches:



Flush cache on context switch
Direct cost : 1k~18k cycles
Indirect cost : up to 54k cycles
 Up to 270us on 200MHz
Observation:

Can avoid cache flush if no
address overlap
ARM926EJ-S Virtual Cache Architecture
32
12 11
Tag
Index
1
2
3
4
.
.
.
.
.
128
TAG
=
Hit
5 4
2 1 0
Word Byte
1 2 3 2
4. 5 6 7 8
=
Read data
• Fully virtual address based index & tag
• Separate 4way set-associative 16K I&D Cache.
• 8word for each cache line
Benchmark

Lmbench2 [Lmbench]



Famous OS benchmark
Modified for uClinux (vfork, FIFO)
lat_ctx, lat_fifo, bw_pipe is used..
Master
read
write
FIFO 0
Child1
read
write
Child2
read
write
FIFO 1
‘lat_ctx’ FIFO architecture
FIFO 2
Benchmark Setup
App
Kernel
H/W


LMBench2
lat_fifo, lat_ctx, bw_pipe used
Linux-2.6.7
uClinux 2.6.7
H/W: SMDK24A0
- ARM926ejs based S3C24A0
- 16K I&D Cache
Same H/W, Same Benchmark program are used
The only difference is kernel (uClinux, Linux)
Context Switching Performance
0KB workload


Each proc immediately
switch to next
pure ctx overhead
comparison
120
100
Context switch time[us]

80
60
40
20
0
2
4
6
8
10
12
Processes
uclinux
linux
14
16
IPC Performance
Linux
uClinux
Ratio
Lat_fifo
(us)
160.64
31.74
5.06
Bw_pipe
(MB/s)
12.58
25.55
2.03
Conclusion

The first fair performance comparison between
uClinux and Linux



Same H/W platform
Same benchmark S/W
uClinux has better IPC & Context switching
performance


Because cache is valid on context switching
Beneficial for
 Real-time critical application
 IPC oriented application
Future Work (?)

Extending benchmarks



Interrupt latency, ….
Share the result with community
Improving uClinux


Need protection
 inherit ‘Single Address Space Operating System’
research for 64bit processors
More compatibility
 No fork()
 Fixed heap & stack size
Reference







[FASS’03] Adam Wiggins et el. “Implementations of Fast
Address-Space Switching and TLB Sharing on the StrongARM
Processor”, in the Proceddings of the 8th Australia-Pacific
Computer Systems Architecture Conference, Aizu-Wakmatsu
City, Japan, September 2003.
[LmBench’96] McVoy, L., Staelin, C. “lmbench: Portable tools
for performance analysis”. In: Proceedings of the 1996
[UC] uClinux/ARM 2.6 Project.
http://opensrc.sec.samsung.com/
USENIX Technical Conference, San Diego, CA, USA (2996)
[24A0] Samsung S3C24A0 Product Datasheet.
http://www.samsung.com/Products/Semiconductor/SystemLSI
/MobileSolutions/MobileASSP/MobileComputing/S3C24A0/S3C
24A0.htm
[926] ARM926EJ-S Technical Reference Manual.
http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf
[ARM] ARM Architecture Reference Manual. ARM LTD.
보조 TP
Context Switching Performance
400
Context switch time[us]
350
uc. -s 16k
uc. -s 1k
uc. -s 0k
300
250
200
150
100
50
0
2
4
6
8
10
Processes
12
14
16
Basic Performance
Linux 2.6.7
uClinux 2.6.7
Note
Boot-time
(ms)
437.11
415.16
start_kernel
 shell
Kernel size
(KB)
876
728
Memory size
(KB)
1428
1200
Shell is
different