K42 Research Operating System
Download
Report
Transcript K42 Research Operating System
K42: Building a Complete OS
Orran Krieger, Marc Auslander, Bryan Rosenburg,
Robert Wisniewski, Jimi Xenidis, Dilma Da Silva,
Michal Ostrowski, Jonathan Appavoo, Maria Butrico,
Mark Mergen, Amos Waterland, Volkmar Uhlig
http://www.research.ibm.com/K42
How it all started
Our Predictions 1996
•
•
•
•
•
•
Microsoft Windows will dominate
Large-scale SMMP increasingly important
Within 5 years multi-core pervasive
Traditional OS structures not maintainable
Customizability and extensibility critical
Within 5 years 64-bit pervasive
Sufficient motivation to design entirely new OS.
Small aggressive research team.
Resulting K42 Goals
• performance/scalability:
– up to large MP and large applications
– down for small-scale MP and small apps on large-scale MP
• flexibility/customizability:
– policies/implementations of resource instances can be customized to
application needs
– system can adapt without penalizing common case performance
• applicability
– full functionality with multiple personalities
– support client to embedded to server
• wide availability
– release open source and build community
– highly maintainable/extensible structure
• enable problem domain experts
• re-enable architectural innovation
• re-enable OS research community
Technical directions 1996
•
•
•
•
Micro-kernel design
User-level implementation
OO design
Extensive infrastructure &
programming model
• Pervasive exploitation of
64 bits
• Application manager for
fault containment
Application
Legacy OS
emulation
K42 lib
Application
Legacy OS
emulation
K42 lib
Servers
Micro-kernel
Key technology/work
Scaling existing OSes
Scale up
External Service Requests
• Incremental approach of optimizing global data
structures/policies … focuses on concurrency rather
Service Interface
than locality.
• Poor scaling of SW requires major HW investments to
compensate, resulting in:
• Systems that are not cost competitive.
• Limits to the system scalability. Software Structures
Processors
Processors
Processors
Memory
Memory
Memory
Add brick
Our solution
Scale up
External Service Requests
• Key elements of our solution:
• System services that Service
avoid sharing
when possible.
Interface
• OO design with per-resource instance objects
• Exploit sharing where workload demands or where
performance is not critical.
Software
Structures
• Tools to identify sharing
problems
and develop
basic design methodology and set of tools to
simplify the task of fixing the SW.
Processors
Processors
Processors
Memory
Memory
Memory
Add brick
Independent workloads
modified SDET
Parallel Make (flex)
40000
Linux
25
K42
Linux
K42
35000
30000
20
15
Speedup
20000
15000
10
10000
5000
Parallel PostMark
0
-1
4
18
9
14
19
5
24
Linux
Processors
K42
0
-1
16
4
9
14
Processors
14
12
Speedup
Scripts/Hour
25000
10
8
6
4
2
0
-1
4
9
14
Processors
19
24
19
24
Memclone benchmark:
Memory intensive parallel application
All MM objects distributed
Linux 2.4.21
All MM objects shared
1000
msec
per thread
1000
500
500
K42
0
0
1
5
9
13
17
21
1
5
9
13
17
21
Customization
• User-level implementation allows per-application
customizations.
• Framework per service designed to:
– Separate mechanism/policy that can be independently customized.
– Application or agents can determine which implementation to use
for workload.
• Dynamic customizations: patches/updates, adaptive
algorithms, specializing common case, monitoring,
application optimizations
– Hot swapping: replacing O1 with O2 to adapt to new demands
– Dynamic upgrade: replace all objects of a type
Hot-swapping
Adaptive paging
Adaptive file imp.
1800
LRU
160
1600
Adaptive
140
Transactions per second
SDET throughput (scripts/hour)
180
120
100
80
60
40
1400
1200
1000
800
600
400
200
20
0
0
0
1
2
3
4
5
Number of concurrent background streams
6
Shared
Shared-Exclusive
Shared-Exclusive /
Small-Large
Infrastructure & Programming
model
•
•
•
•
•
Clustered objects
Pervasive use of RCU to avoid existence locks
Event based programming model
Performance monitoring
Scalable services
– Protected Procedure calls
– Locality aware memory allocation
– Processor specific memory
• Automated interface generator/xobject services
automate security, garbage collecting, …
Massive investment in/on Linux
• In late 90s Linux appeared to be taking off & we abandoned
multiple personalities
• Linux API/ABI compatibility largely in library, exceptions:
– Server code for process groups, ptys…
– Fork has had way too pervasive impact on kernel MM (we violated our
programming style).
• Support both unmodified glibc via trap reflection, and
modified glibc.
• Applications with specialized needs can reach past Linux
personality, e.g., to instantiate object, handle events…
• We are also compatible with Linux kernel modules, including
device drivers, FS & TCP/IP stack:
– Tracking Linux is an ongoing nightmare
Bad predictions,
mistakes and questions
Our Predictions 1996
Microsoft Windows will dominate
Wasted huge amount of time on multiple personality support.
Large-scale SMMP increasingly important.
– True, but much slower than expected.
– Massive investment in HW:
• allows existing OSes to run reasonably well
• Makes SMMP not cost effective
Within 5 years multi-core pervasive
– Only common today, not compelling differentiator until now
Traditional OS structures not maintainable.
Customizability and extensibility critical
Within 5 years 64-bit pervasive.
– Only common today, this has been a huge barrier to building community
Mistakes/Questions
• We should have had a 32-bit version.
• Application manager was a bad idea, we totally missed on
virtualization:
– Gets rid of the device driver nightmare
– Can deploy new OS to solve subset of problem.
• While user-level implementation & micro-kernel clean,
continuous challenge & orthogonal to OO design
• We implemented fork wrong!!!
• OO design, and infrastructure, obscures control flow:
– Much more difficult for Linux hacker to gain broad understanding.
– Requires more sophisticated debugging tools.
• Does OO really help maintainability?
Concluding remarks
The good news
• High degree of functionality:
– 32 & 64 bit apps, support standard gentoo tree, MPI.
– Applications/benchmarks include SPEC SDET, ReAIM, SPECfp,
many HPC apps (DARPA & DOE)
– Recently provided enough support to run commercial JVM (J9)
and DB2.
• Object-oriented design has advantages...
–
–
–
–
–
have found special casing easy
hot-swapping simpler than adaptive algorithm
Clustered objects relatively simple to do
local fixes, publish interface not structure
Domain experts/students can easily develop specialized
component.
• Have been able to work around global policies, e.g.,
paging.
The good news
• General performance monitoring infrastructure key to
identifying problems.
• We achieved excellent base performance (although since
degraded); can compensate for intrinsic overheads:
– advantages of Linux's hierarchical page tables: exception level traversal,
identify PT entry for fast unmap and avoid segment unmapping,
aggressive fork pre-mapping for anonymous memory
– user-level implementation: cost initialization, page fault costs on fork
– OO design: indirections, code replication, poor instruction cache locality,
per-object data structures…
– initialization costs of scalable implementations compensate by lazy
initialization & hot swapping/specialization…
Ongoing projects
• IBM PERCS for DARPA HPCS
– PEM and CPO & architectural evaluation
• DOE/FastOS
– HEC with K42 at LBL, UNM, UofToronto
– SmartApps at Texas A&M
• New South Wales (dynamic upgrade)
• Device drive I/O & Super page support with
LTC
Concluding remarks
• Sufficiently functionality & performance to run real
workloads.
• A great framework for fundamental OS research and HW
architecture studies.
• Basic architecture/technologies largely successful.
• Virtualization, pervasive 64-bit processors, and pervasive
multi-core makes design more relevant than at any time in
project history.
• Most of IBM team no longer have K42 as day job, but are
still passionate about it:
– We continue to be excited to support community.
– We are actively soliciting people to take over parts of the system.