UNIX - SigmaNet

Download Report

Transcript UNIX - SigmaNet

Lecture 2: UNIX STRUCTURE
UNIX
User
Interface
The layers of a UNIX system.
Essential Unix Architecture
Applications
System Libraries (libc)
Modules
System Call Interface
I/O Related
File Systems
Networking
Process Related
Scheduler
Memory Management
Device Drivers
IPC
Architecture-Dependent Code
Hardware
UNIX Utility Programs
A few of the more common UNIX utility programs required by POSIX
FreeBSD machine independent kernel code
Category
Code lines
% from all kernel code
Headers
38158
4,8%
Initialization
1663
6,7 %
Kernel means
53805
6,7%
Common interfaces
22191
2,8%
IPC
10019
1,3%
Terminal management
5798
0,7%
Virtual memory
24714
3,1%
Vnode management
22764
2,9%
Local file system
28067
3,5%
Different file systems (19)
58753
7,4%
Network File system
22436
2,8%
Network communication
46570
5,8%
IPv4 protocol support
41220
5,2 %
Category
Code lines
% from all kernel code
IPv6 protocol support
45527
5,7%
IPsec
17956
2,2%
Netgraph
74338
9,3%
Cryptography support
7515
0.9%
GEOM level
11563
1,4%
CAM level
41805
5,2%
ATA level
14192
1,8%
ISA bus
10984
1,4%
PCI bus
72366
9,1%
PCCARD bus
6916
0.9%
Linux compatibility
subsystem
10474
1,3%
ALL 689794
86,4%
FreeBSD machine dependent kernel code
Category
Code lines
% from all kernel code
Machine dependent
headers
16115
2,0%
ISA bus
50882
6,4%
PCI bus
2266
0,3%
Vrtual memory
3118
0,4%
Different machine
dependent code
26708
3,3%
Assembler procedures
4400
0.6%
Linux compatibility
subsystem
4857
0.6%
All 108346
13,6%
Kernel services
 Border between kernel level and user level code is supported by the hardware
protection
 Kernel is working in completely isolated address range and it’s impossible to get
access to that address space from user level
 Any interaction between user level code and kernel possible only by means of
system call, which are strictly controlled by the kernel
 In most cases system calls are synchronous for user level application.
 But it is possible that kernel will make some work sometimes after it returns
results to the user level.
 In most cases system call is implemented by the means of hardware exception,
that change CPU working mode and current virtual memory content
 Kernel is controlling system call arguments very strictly before executing the
system call.
 Every system call arguments will be copied to the kernel address space in order
to garantee, that arguments will not be changed during the system call
 The addreses, where the result of the system call will be placed, have to be
owned by the process, who called system call (checked)
 If system call got an error, it returns -1 and sets global errno variable.
System Calls
 System Calls for process control






fork()
wait()
execl(), execlp(), execv(), execvp()
exit()
signal(sig, handler)
kill(sig, pid)
 System calls for low level file I/O







creat(name, permissions)
open(name, mode)
close(fd)
unlink(fd)
read(fd, buffer, n_to_read)
write(fd, buffer, n_to_write)
lseek(fd, offest, whence)
 System Calls for IPC


pipe(fildes)
dup(fd)
 Total ~270 System Calls in Linux kernel v2.6





Portable Operating System Interface (POSIX)
ISO/IEC 9945
IEEE 1003
Single UNIX Specification (SUS)
Linux Standard Base
System Calls for Process Management
s is an error code
pid is a process ID
residual is the remaining time from the previous alarm
System Lifecycle: Ups &
Downs
Power
on
Power
off
Bootloader
Kernel
Init
OS
Init
RUN!
Shut
down
Processes
kernal mode
user mode
lpd
httpd
inetd
kernel
Process 0: Kernel bootstrap. Start process 1.
/etc/init
Process 1: create processes to allow login.
fork
exec
/etc/getty
exec
/bin/login
exec
shell
fork
exec
/etc/getty condition terminal for login
exec
/bin/login check password
exec
shell
command interpreter
Processes
 Processes can run in 2 different modes: user level and kernel level
 Process can switch between these two modes by means of system calls
 Process resources also can be divided into two parts: user level process resources and
kernel level process resorces
 User level process resources – CPU general pursope registers, command counter, CPU
state registers, stack registers, process memory segments (text segments, data sements,
shared lib., stack),
 Kernel level resources – in most cases resources, which are important for underlying
hardware: registers, command counter, stack pointer, schedule information, system call
information and etc.
 Process kernel state divided into two parts: process structure and user structure
 Process structure contains data information, which have to be always in memory and
can’t be swapped out. It have to contain pointers to all other resident structures.
 User structures have to be residently in memory only during process execution.
Otherwise it can be swapped out to the disk.
 User structures can be dynamicly allocated to process by the means of memory
managenet routines.
 Multitasking programming can be achieved by the context switching. And because
context switching operations take place very often,minimizing cotext switching time is
effective way to achieve better performance.
Parts of process memory structure
•user-id
•open files
•saved register states
•environment
switches on system call
(trap, software interrupt)
Stack frames of
invoked functions
•Initialised data
•Non-initialised data
Program code
bash$ size testhand2
92763 + 7564 + 2320 = 102647
arena/heap
•malloc
 Every process have uniq identifier – PID. It’s a common mechanism, how kernel
and other processes can communicate with each other.
 Process structure contains





Process identifier PID
Signal state: waiting signals, signal mask and signal action summary
Profiling information
Timers: realtime timers and CPU usage counters
Different process substructures









Process group identification: process group and session it belongs to
User mandats: actual, effective and stored user and group identification
Memory management describe virtual adress space for every process in the
system.
File descriptors: array of pointers to the files, indexed by file decriptors and open
file flags.
System call vector. It is possible to run object files, compiled for different UNIX
systems, by using different system call vector for different object files.
Resource accounting: rlimit structure, which is used for accounting different
system resources.
Statistics: information got from working processes and which are written to
accounting file at the time process exit, include process timers and profiling
information if it’s necessary.
Signal action: action to be taken when signal send to process
Thread structure.
Big Picture: Another look
kernel memory
process structure
kernel stack/u area
Stack
kernel stack/u area
Stack
kernel stack/u area
Stack
Data
Text (shared)
Data
Text (shared)
Data
Text (shared)
struct proc {
LIST_ENTRY(proc) p_list;
/* (d) List of all processes. */
TAILQ_HEAD(, ksegrp) p_ksegrps; /* (c)(kg_ksegrp) All KSEGs. */
TAILQ_HEAD(, thread) p_threads; /* (j)(td_plist) Threads. (shortcut) */
TAILQ_HEAD(, thread) p_suspended; /* (td_runq) Suspended threads. */
struct ucred *p_ucred;
/* (c) Process owner's identity. */
struct filedesc *p_fd;
/* (b) Open files. */
struct filedesc_to_leader *p_fdtol; /* (b) Tracking node */
/* Accumulated stats for all threads? */
struct pstats *p_stats;
/* (b) Accounting/statistics (CPU). */
struct plimit *p_limit;
/* (c) Process limits. */
struct sigacts *p_sigacts; /* (x) Signal actions, state (CPU). */
/*
* The following don't make too much sense.
* See the td_ or ke_ versions of the same flags.
*/
int
p_flag;
/* (c) P_* flags. */
int
p_sflag;
/* (j) PS_* flags. */
enum {
PRS_NEW = 0,
/* In creation */
PRS_NORMAL,
/* threads can be run. */
PRS_ZOMBIE
} p_state;
/* (j/c) S* process status. */
pid_t
p_pid;
/* (b) Process identifier. */
LIST_ENTRY(proc) p_hash;
/* (d) Hash chain. */
LIST_ENTRY(proc) p_pglist; /* (g + e) List of processes in pgrp. */
struct proc *p_pptr;
/* (c + e) Pointer to parent process. */
LIST_ENTRY(proc) p_sibling; /* (e) List of sibling processes. */
LIST_HEAD(, proc) p_children; /* (e) Pointer to list of children. */
struct mtx p_mtx;
/* (n) Lock for this struct. */
/* The following fields are all zeroed upon creation in fork. */
#define p_startzero p_oppid
pid_t
p_oppid;
/* (c + e) Save ppid in ptrace. XXX */
struct vmspace *p_vmspace; /* (b) Address space. */
u_int
p_swtime;
/* (j) Time swapped in or out. */
struct itimerval p_realtimer; /* (c) Alarm timer. */
struct rusage_ext p_rux;
/* (cj) Internal resource usage. */
struct rusage_ext p_crux;
/* (c) Internal child resource usage. */
int
p_profthreads; /* (c) Num threads in addupc_task. */
int
p_maxthrwaits; /* (c) Max threads num waiters */
int
p_traceflag; /* (o) Kernel trace points. */
struct vnode *p_tracevp; /* (c + o) Trace to vnode. */
struct ucred *p_tracecred; /* (o) Credentials to trace with. */
struct vnode *p_textvp; /* (b) Vnode of executable. */
sigset_t
p_siglist; /* (c) Sigs not delivered to a td. */
char
p_lock;
/* (c) Proclock (prevent swap) count. */
struct sigiolst p_sigiolst; /* (c) List of sigio sources. */
int
p_sigparent; /* (c) Signal to parent on exit. */
int
p_sig;
/* (n) For core dump/debugger XXX. */
u_long
p_code;
/* (n) For core dump/debugger XXX. */
u_int
p_stops;
/* (c) Stop event bitmask. */
u_int
p_stype;
/* (c) Stop event type. */
char
p_step;
/* (c) Process is stopped. */
u_char
p_pfsflags; /* (c) Procfs flags. */
struct nlminfo *p_nlminfo; /* (?) Only used by/for lockd. */
struct kaioinfo *p_aioinfo; /* (c) ASYNC I/O info. */
struct thread *p_singlethread;/* (c + j) If single threading this is it */
int
p_suspcount; /* (c) Num threads in suspended mode. */
struct thread *p_xthread; /* (c) Trap thread */
int
p_boundary_count;/* (c) Num threads at user boundary */
struct ksegrp *p_procscopegrp;
/* End area that is zeroed on creation. */
#define p_endzero
p_magic
/* The following fields are all copied upon creation in fork. */
#define p_startcopy p_endzero
u_int
p_magic;
/* (b) Magic number. */
char
p_comm[MAXCOMLEN + 1]; /* (b) Process name. */
struct pgrp *p_pgrp;
/* (c + e) Pointer to process group. */
struct sysentvec *p_sysent; /* (b) Syscall dispatch info. */
struct pargs *p_args;
/* (c) Process arguments. */
rlim_t
p_cpulimit; /* (j) Current CPU limit in seconds. */
signed char p_nice;
/* (c + j) Process "nice" value. */
/* End area that is copied on creation. */
#define p_endcopy
p_xstat
u_short
p_xstat;
/* (c) Exit status; also stop sig. */
struct knlist p_klist;
/* (c) Knotes attached to this proc. */
int
p_numthreads; /* (j) Number of threads. */
int
p_numksegrps; /* (c) Number of ksegrps. */
struct mdproc p_md;
/* Any machine-dependent fields. */
struct callout p_itcallout; /* (h + c) Interval timer callout. */
u_short
p_acflag;
/* (c) Accounting flags. */
struct rusage *p_ru;
/* (a) Exit information. XXX */
struct proc *p_peers;
/* (r) */
struct proc *p_leader; /* (b) */
void
*p_emuldata; /* (c) Emulator state data. */
struct label *p_label;
/* (*) Proc (not subject) MAC label. */
struct p_sched *p_sched;
/* (*) Scheduler-specific data. */
};
Processes in UNIX
Process creation in UNIX.
Threads
Threads in POSIX
The principal POSIX thread calls.
UNIX Scheduler
The UNIX scheduler is based on a multilevel queue structure
 Process status: NEW, NORMAL (RUNNNABLE,
SLEEPING, STOPPED), ZOMBIE
 Kernel uses 2 queues to hold processes in different
states: zombieproc and allproc.
 In most cases threads are organiezed in 2 queues –
runnable queue and waiting queue.
 Threads, which are ready for running going to runnable
queue and threads, which are waiting for some
something placed in waiting queue.
 Queues are organized based on process and threads
priority values. Waiting queue hashed based on event ID
in order to make search operation faster.
 Processes are organized in groups


Process can be created by using
pid_t
fork(void);
pid_t
rfork(int flags);
pid_t
vfork(void);
sysytem call
Child process created by fork() is an exact copy of parent process except for the following:

The child process has a unique process ID.

The child process has a different parent process ID (i.e., the process ID of the parent process).

The child process has its own copy of the parent's descriptors.
These descriptors reference the
same underlying objects, so that, for instance, file pointers in file objects are shared between the child and
the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read(2) or
write(2) by the parent. This descriptor copying is also used by the shell to establish standard input and
output for newly created processes as well as to set up pipes.

The child process' resource utilizations are set to 0; see setrlimit(2).

All interval timers are cleared; see setitimer(2).

Child process created by rfork() is an exact copy of parent process except for the following:
Forking, vforking or rforking are the only ways new processes are created. The flags argument to rfork()
selects which resources of the invoking process (parent) are shared by the new process (child) or initialized
to their default values. The resources include the open file descriptor table (which, when shared, permits
processes to open and close files for other processes), and open files.

The vfork() system call can be used to create new processes without fully copying the address space of the old
process, which is horrendously inefficient in a paged environment. It is useful when the purpose of fork(2) would
have been to create a new system context for an execve(2). The vfork() system call differs from fork(2) in that the
child borrows the parent's memory and thread of control until a call to execve(2) or an exit (either by a call to
_exit(2) or abnormally). The parent process is suspended while the child is using its resources.
 Process exit ether by using exit() call or by
reciving signal. In either way, process exit status
is delivered to parent process by wait4() system
call.
POSIX Signals
The signals required by POSIX.
System Calls for Memory Management







s is an error code
b and addr are memory addresses
len is a length
prot controls protection
flags are miscellaneous bits
fd is a file descriptor
offset is a file offset
System Calls for File Management
 s is an error code
 fd is a file descriptor
 position is a file offset
UNIX File System (1)
Disk layout in classical UNIX systems
The lstat System Call
Fields returned by the lstat system call.
System Calls for Directory Management
 s is an error code
 dir identifies a directory stream
 dirent is a directory entry
System Calls for File Protection
 s is an error code
 uid and gid are the UID and GID, respectively
Disk vs. Filesystem
 The entire hierarchy can actually include many
disk drives.

some directories can be on other computers
/
bin
etc
hollid2
users
scully
tmp
usr
Architecture: File System Structure
 Hierarchical
/
bin dev etc home lib mnt proc tmp usr var
passwd group
bin
man sbin



The usr directory consists of several subdirectories that contain additional Unix commands and data files.
/home


The tmp directory contains temporary files created by Unix system programs. You can remove any
temporary file that does not belong to a running program.
/usr


The mnt directory is an empty directory reserved for mounting removable filesystems like hard
disks,removable cartridge drives, and so on.
/tmp


When files are recovered after any sort of problem or failure,they are placed in the lost + found directory, if
the kernel cannot ascertain the proper location in the system.
/mnt


If programs want to include certain features,they can reference just the shared copy of that utility in the Unix
library rather than having a new unique copy.
/lost+found


Unix designates the etc directory as the storage place for all the adminstrative files and information.
/lib


Device drivers (screen, keyboard, harddisks etc.)
/etc


The bin directory is where all the executables binaries were kept in early Unix.Over time, as more and more
executables were added to Unix, it became quite unmanageable to keep all the executables in one place
and the bin directory split into multiple parts(/bin/sbin, /usr/bin)
/dev


Directory Structure
/bin
Default location of user home directories.
/var

Logfiles, spools (mailqueue)
Fedora Linux Directories
[root@unix /]# ls -l
total 237
drwxr-xr-x
2 root root
4096 Sep 20 17:19 bin
drwxr-xr-x
4 root root
1024 Sep 20 16:04 boot
drwxr-xr-x 23 root root 155648 Sep 20 16:13 dev
drwxr-xr-x 41 root root
4096 Sep 20 17:19 etc
drwxr-xr-x
2 root root
4096 Mar 12 2004 home
drwxr-xr-x
2 root root
4096 Mar 12 2004 initrd
drwxr-xr-x
9 root root
4096 Sep 20 17:19 lib
drwx-----2 root root 16384 Sep 20 19:00 lost+found
drwxr-xr-x
2 root root
4096 Apr 14 20:39 misc
drwxr-xr-x
5 root root
4096 Sep 20 16:13 mnt
drwxr-xr-x
2 root root
4096 Mar 12 2004 opt
dr-xr-xr-x 50 root root
0 Sep 20 19:12 proc
drwxr-x--2 root root
4096 Sep 20 17:06 root
drwxr-xr-x
2 root root 12288 Sep 20 17:19 sbin
drwxr-xr-x
2 root root
4096 Mar 12 2004 selinux
drwxr-xr-x
8 root root
0 Sep 20 19:12 sys
drwxrwxrwt
2 root root
4096 Sep 20 17:28 tmp
drwxr-xr-x 14 root root
4096 Sep 20 16:03 usr
drwxr-xr-x 18 root root
4096 Sep 20 16:10 var
[root@unix /]#
Security in UNIX
Some examples of file protection modes
passwd, shadow, group files
unix etc # ls -l passwd shadow group
-rw-r--r-- 1 root root 705 Sep 23 15:36 group
-rw-r--r-- 1 root root 1895 Sep 24 18:20 passwd
-rw------- 1 root root 634 Sep 24 18:22 shadow
unix etc #
unix root # more /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/bin/false
daemon:x:2:2:daemon:/sbin:/bin/false
adm:x:3:4:adm:/var/adm:/bin/false
lp:x:4:7:lp:/var/spool/lpd:/bin/false
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
...
guest:x:405:100:guest:/dev/null:/dev/null
nobody:x:65534:65534:nobody:/:/bin/false
girtsf:x:1000:100::/home/girtsf:/bin/bash
dima:x:1001:100::/home/dima:/bin/bash
guntis:x:1002:100::/home/guntis:/bin/bash
students:x:1003:100::/home/students:/bin/bash
unix root #
unix root # more /etc/shadow
root:$1$VlYbWsrd$GUs2cptio.rKlGHgAMBzr.:12684:0:::::
halt:*:9797:0:::::
...
guest:*:9797:0:::::
nobody:*:9797:0:::::
girtsf:$1$u6UEWKT2$w5K28n2iAB2wNWtyPLycP1:12684:0:99999:7:::
dima:$1$BQCdIBdV$xzzlj4s8XT6L9cLAmcoV50:12684:0:99999:7:::
guntis:$1$fiJF/0BT$Py9JiQQL6icajjQVyMZ7//:12684:0:99999:7:::
students:$1$wueon8yh$nLpUpNOKr8yTYaEnEK6OJ1:12685:0:99999:7:::
unix root #
unix root # more /etc/group
root::0:root
bin::1:root,bin,daemon
daemon::2:root,bin,daemon
sys::3:root,bin,adm
adm::4:root,adm,daemon
tty::5:girtsf
disk::6:root,adm
lp::7:lp
mem::8:
kmem::9:
wheel::10:root,girtsf
floppy::11:root
mail::12:mail
...
users::100:games,girtsf
nofiles:x:200:
qmail:x:201:
postfix:x:207:
postdrop:x:208:
smmsp:x:209:smmsp
slocate::245:
portage::250:portage
utmp:x:406:
nogroup::65533:
nobody::65534:
unix root #
Terminal Management
The main POSIX calls for managing the terminal
Different Shells
 Bourne
 C Shell
 Korn Shell
 BASH
Last login: Tue Sep 21 07:58:17 2004
[root@unix root]#
[root@unix root]# ps
PID TTY
TIME CMD
20879 pts/7
00:00:00 bash
20905 pts/7
00:00:00 ps
[root@unix root]#
[root@unix root]# ls -l
total 64
-rw-r--r-- 1 root root 1204 Sep 20
-rw-r--r-- 1 root root 49872 Sep 20
-rw-r--r-- 1 root root 2306 Sep 20
[root@unix root]#
[root@unix root]# pwd
/root
[root@unix root]#
from 81.198.226.108
16:11 anaconda-ks.cfg
16:11 install.log
16:11 install.log.syslog
Illustration of Process Control
Calls
POSIX Shell
A highly simplified shell
Environment variables.
Ieejot sistēmā, lietotājam automātiski tiek iestādītas dažas “environment variables”. Lai tos aplūkotu,
jāizpilda komandu env. Piemēram:










PWD=ceļš
TZ=(EET)
PAGER=(less, more)
LOGNAME=vārds
HOME=/home/vārds
HOSTNAME=resursdators
LD_LIBRARY_PATH=:ceļš
MANPATH=:ceļš
ENV=/etc/bash_common
LESS=-fdeiMQw
#kur atrodamies
#laiku zona (East European Time)
#lasītājs pēc noklusēšanas
#lietotāja vārds
#lietotāja mājas katalogs
#resursdatora vārds
#dinamiskās bibliotēkas
#ceļš, kur meklējas “manual”
#kur ir “environment variables”
#atslēgas lasītājam pēc noklusēšanas
Environment variables (Turpinājums).










EDITOR=/usr/local/bin/joe
TERM=vt100
PS1= \u@\h (\w)
Machtype=mašīnas_tips
MAIL=ceļš
RHOST=hosta_adrese
SHELL=ceļš
HOSTTYPE=hosta_tips
OSTYPE=OS
_tips
PATH=ceļš:jauns_ceļš


LESSCHARSET=latin1
_=ceļš
#redaktors pēc noklusēšanas
#termināla tipa uzstādīšana (lietotājam)
#aicinājuma formāts
#mašīnas tips (aparatūra)
#fails, kur noliek ieejošo pastu
#kādā datorā mēs esam
#lietotāja komandinterpretators
# resursdatora tips
#OS tips (solarisN.N, utl.)
#meklēšanas saraksts. Ar to palīdzību tiek
meklēti izpildāmi faili
#attēlu kodētājs
#kur atrodas atbilstošais fails (env)
Environment Variables
[root@unix /]# env
HOSTNAME=unix.mii.lu.lv
TERM=vt100
SHELL=/bin/bash
HISTSIZE=1000
SSH_CLIENT=::ffff:81.198.226.108 1289 22
SSH_TTY=/dev/pts/3
USER=root
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:
USERNAME=root
MAIL=/var/spool/mail/root
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin
INPUTRC=/etc/inputrc
PWD=/
LANG=en_US.UTF-8
SHLVL=1
HOME=/root
BASH_ENV=/root/.bashrc
LOGNAME=root
SSH_CONNECTION=::ffff:81.198.226.108 1289 ::ffff:159.148.108.245 22
LESSOPEN=|/usr/bin/lesspipe.sh %s
G_BROKEN_FILENAMES=1
_=/bin/env
OLDPWD=/sys
[root@unix /]#
The ls Command
Steps in executing the command ls type to the shell
The /proc pseudo filesystem
 The /proc virtual filesystem is a switch in the
configuration of the Linux kernel, one that is turned on by
default. If, for whatever reason, you would like to
completely disable /proc on your system, de-select /proc
file system support within the File system configuration
section of config, menuconfig, or xconfig when rebuilding
your kernel. Alternatively, you can simply comment out
the /proc line in /etc/fstab to prevent it from being
mounted.
The /proc pseudo filesystem





The /proc directory contains virtual files that are windows into the current state of the running kernel.
This allows the user to peer into a vast array of information, effectively providing them with the
kernel's point-of-view within the system. In addition, the user can use the /proc directory to
communicate particular configuration changes to the kernel.
/proc directory contains files that are not part of any filesystem associated with your hard disks, CDROM, or any other physical storage device connected to your system (except, arguably, your RAM).
Rather, these files are part of a virtual filesystem, enabled or disabled in the kernel when it is
compiled.
The /proc virtual filesystem is a switch in the configuration of the kernel, one that is turned on by
default. If, for whatever reason, you would like to completely disable /proc on your system, de-select
/proc file system support within the File system configuration section of config, menuconfig, or
xconfig when rebuilding your kernel. Alternatively, you can simply comment out the /proc line in
/etc/fstab to prevent it from being mounted.
The /proc virtual files exhibit some interesting qualities. First, most of them are 0 bytes in size.
However, when the file is viewed, it likely contains quite a bit of information. In addition, most of their
time and date settings reflect the current time and date, meaning that they are constantly changing.
A system administrator can use /proc as an easy method of accessing information about the state of
the kernel, the attributes of the machine, the states of individual processes, and more. Most of the
files in this directory, such as interrupts, meminfo, mounts, and partitions, provide an up-to-themoment glimpse of a system's environment.
The /proc pseudo filesystem


Interesting quality of virtual files can be seen when viewing them with the more command, which
usually tells gives your location in the file by displaying the percentage of the document you are
currently seeing. This percentage number usually climbs the further you navigate down a long file.
However, when viewing a /proc virtual file, the percentage amount never changes, always staying at
0%.
Be sure to avoid viewing the kcore file in /proc. This virtual file contains an image of the kernel's
memory, and the contents of the file will do strange things to your terminal. You may need to type
reset after hitting [Ctrl]-[C] to get back to a proper command line prompt.
Top-Level Files in /proc
 Most of the files at the top-level of the /proc
directory hold key pieces of information about the
state of the Linux kernel and your system in
general.
 It is important to remember that the content of the
files in the /proc directory and its various subdirectories is entirely dependent on information
concerning your system. In other words, do not
expect to see the exact same information in the
same /proc file on two different machines.
Top-Level Files in /proc






/proc/apm
This file provides information about the Advanced Power Management (APM) state and options on
the system. This information is used by the kernel to provide information for the apm command.
/proc/cmdline
This file essentially shows the parameters passed to the Linux kernel at the time it is started.
/proc/cpuinfo
This file changes based on the type of processor in your system. The output is fairly easy to
understand.
/proc/devices
This file displays the various character and block devices currently configured for use with the kernel.
It does not include modules that are available but not loaded into the kernel. The output from
/proc/devices includes the major number and name of the device.
/proc/dma
This file contains a list of the registered ISA direct memory access (DMA) channels in use.
/proc/execdomains
This file lists the execution domains currently supported by the Linux kernel, along with the range of
personalities they support. Think of execution domains as a kind of "personality" of a particular
operating system. Other binary formats, such as Solaris, UnixWare, and FreeBSD, can be used with
Linux. By changing the personality of a task running in Linux, a programmer can change the way the
operating system treats particular system calls from a certain binary.
Top-Level Files in /proc







/proc/fb
This file contains a list of frame buffer devices, with the frame buffer device number and the driver
that controls it.
/proc/filesystems
This file displays a list of the filesystem types currently supported by the kernel.
/proc/interrupts
This file records the number of interrupts per IRQ on the x86 architecture.
/proc/iomem
This file shows you the current map of the system's memory for its various devices
/proc/ioports
In a way similar to /proc/iomem, /proc/ioports provides a list of currently registered port regions used
for input or output communication with a device.
/proc/isapnp
This file lists Plug and Play (PnP) cards in ISA slots on the system. This is most often seen with
sound cards but may include any number of devices.
/proc/kcore
This file represents the physical memory of the system and is stored in the core file format. Unlike
most /proc files, kcore does display a size. This value is given in bytes and is equal to the size of
physical memory (RAM) used plus 4KB.
Top-Level Files in /proc






/proc/kmsg
This file is used to hold messages generated by the kernel. These messages are then picked up by
other programs, such as klogd.
/proc/ksyms
This file holds the kernel exported symbol definitions used by the modules tools to dynamically link
and bind loadable modules.
proc/loadavg
This file provides a look at load average, or the utilization of the processor, over time, as well as
giving additional data used by uptime and other commands.
/proc/locks
This files displays the files currently locked by the kernel. The content of this file contains kernel
internal debugging data and can vary greatly, depending on the use of the system.
/proc/mdstat
This file contains the current information for multiple-disk, RAID configurations. If your system does
not contain such a configuration, then your mdstat file will look similar to this:
Personalities :
read_ahead not set
unused devices: <none>
/proc/meminfo
This is one of the more commonly used /proc files, as it reports back plenty of valuable information
about the current utilization of RAM on the system.
Top-Level Files in /proc
 /proc/misc
This file lists miscellaneous drivers registered on the miscellaneous major
device, which is number 10
 /proc/modules
This file displays a list of all modules that have been loaded by the system. Its
contents will vary based on the configuration and use of your system
 /proc/mounts
This file provides a quick list of all mounts in use by the system.
 /proc/mtrr
This file refers to the current Memory Type Range Registers (MTRRs) in use
with the system.
 /proc/partitions
For very detailed information on the various partitions currently available to the
system
 /proc/pci
This file contains a full listing of every PCI device on your system. Depending on
the number of PCI devices you have, /proc/pci can get rather long.
Top-Level Files in /proc
 /proc/slabinfo
This file gives information about memory usage on the slab level. Linux kernels
greater than 2.2 use slab pools to manage memory above the page level.
Commonly used objects have their own slab pools.
 /proc/stat
This file keeps track of a variety of different statistics about the system since it
was last restarted.
 /proc/swaps
This file measures swap space and its utilization.
 /proc/uptime
This file contains information about how long the system has on since its last
restart.
 /proc/version
This files tells you the versions of the Linux kernel and gcc
Directories in /proc
 Common groups of information concerning the kernel is grouped
into directories and sub-directories within /proc.
 Process Directories

Every /proc directory contains quite a few directories named with a number.
These directories are called process directories, as they refer to a process's
ID and contain information specific to that process. The owner and group of
each process directory is set to the user running the process. When the
process is terminated, its /proc process directory vanishes. However, while
the process is running, a great deal of information specific to that process is
contained in the process directory's various files. Each of the process
directories contains the following files:



cmdline — Contains the command line arguments that started the
process.
cpu — Provides specific information about the utilization of each of the
system's CPUs.
cwd — A link to the current working directory for the process.
Directories in /proc








environ — Gives a list of the environment variables for the process. The
environment variable is given in all upper-case characters, and the value is in
lower-case characters.
exe — A link to the executable of this process.
fd — A directory containing all of the file descriptors for a particular process.
maps — Contains memory maps to the various executables and library files
associated with this process.
mem — The memory held by the process.
root — A link to the root directory of the process.
stat — A status of the process.
statm — A status of the memory in use by the process.







The seven columns relate to different memory statistics for the process. In order of how
they are displayed, from right to left, they report different aspects of the memory used:
Total program size, in kilobytes
Size of memory portions, in kilobytes
Number of pages that are shared
Number of pages are code
Number of pages of data/stack
Number of pages of library
Number of dirty pages
Directories in /proc




status — Provides the status of the process in a form that is much more readable
than stat or statm.
/proc/self
The /proc/self directory is a link to the currently running process. This allows a
process to look at itself without having to know its process ID. Within a shell
environment, a listing of the /proc/self directory produces the same contents as
listing the process directory for that process.
/proc/bus
This directory contains information specific to the various busses available on the
system. So, for example, on a standard system containing ISA, PCI, and USB
busses, current data on each of these busses is available in its directory under
/proc/bus. The contents of the sub-directories and files available varies greatly on
the precise configuration of your system. However, each of the directories for each
of the bus types contains at least one directory for each bus of that type.
/proc/driver
This directory contains information for specific drivers in use by the kernel.
A common file found here is rtc, which provides output from the driver for the
system's Real Time Clock (RTC), the device that keeps the time while the system is
switched off.
Directories in /proc
/proc/fs
This directory contains specific filesystem, file handle, inode, dentry and quota information. This
information is actually located in /proc/sys/fs.

/proc/ide
This directory holds an assorted array of information about IDE devices on the system. Each IDE
channel is represented as a separate directory, such as /proc/ide/ide0 and /proc/ide/ide1.

/proc/ide
This directory holds an assorted array of information about IDE devices on the system. Each IDE
channel is represented as a separate directory, such as /proc/ide/ide0 and /proc/ide/ide1.
Device Directories

Some of the most useful data can be found in the device directories within the channel directory.
Each device, such as a hard drive or CD-ROM, on that channel will have its own directory containing
its own collection of information and statistics. The contents of these directories vary according to the
type of device connected. Some of the more useful files common to different devices include:
 cache — The device's cache.
 capacity — The capacity of the device, in 512 byte blocks.
 driver — The driver and version used to control the device.
 geometry — The physical and logical geometry of the device.
 media — The type of device, such as a disk.
 model — The model name or number of the device.
 settings — A collection of current parameters of the device.


Directories in /proc




/proc/irq
This directory is used to set IRQ to CPU affinity,
which
allows you to connect a
particular IRQ to only one CPU. Alternatively, you can exclude a CPU from handling
any
IRQs. Each IRQ has its own directory, allowing for each IRQ to be configured different from
any other. The /proc/irq/prof_cpu_mask file is a bitmask that contains the default values for the
smp_affinity file in the IRQ directory. The values in smp_affinity specify which CPUs handle that
particular IRQ.
/proc/net
This directory provides a comprehensive look at various networking parameters and statistics.
/proc/scsi
In the same way the /proc/ide directory only exists if an IDE controller is connected to the
system, the /proc/scsi directory is only available if you have a SCSI host adapter.
/proc/sys
This directory is special and different from the others in /proc, as it not only provides
a lot of information about the system but also allows you to make configuration
changes to a running kernel.
Warning Never attempt to tweak your kernel's settings on a production system
using the various files in the /proc/sys directory. Occasionally, changing a setting
may render the kernel unstable, requiring a reboot of the system. As this would
obviously disrupt any users currently using the system, use a similar development
system to try out changes before utilizing them on any production machines.
Directories in /proc




The /proc/sys directory contains several different directories that control different
aspects of a running kernel.
/proc/sys/dev
This directory provides parameters for particular devices on the system. Most
systems have at least two directories, cdrom and raid, but customized kernels can
have others, such as parport, which provides the ability to share one parallel port
between multiple device drivers.
/proc/sys/fs
This directory contains an array of options and information concerning various
aspects of the filesystem, including quota, file handle, inode, and dentry information.
/proc/sys/kernel
This directory contains a variety of different configuration files that directly affect the
operation of the kernel.
/proc/sys/net
This directory contains assorted directories of its own concerning various networking
topics, including assorted protocols and centers of emphasis. Various configurations
at the time of kernel compilation make available different directories here, such as
appletalk, ethernet, ipv4, ipx, and ipv6. Within these directories, you can adjust the
assorted networking values for that configuration on a running system.
Directories in /proc



/proc/sys/vm
This directory facilitates the configuration of the Linux kernel's virtual
memory (VM) subsystem. The kernel makes extensive and intelligent use of
virtual memory, which is commonly called swap space.
/proc/sysvipc
This directory contain information about System V IPC resources. The files
in this directory relate to System V IPC calls for messages (msg),
semaphores (sem), and shared memory (shm).
/proc/tty
This directory contains information about the available and currently used tty
devices on the system. Originally called a teletype device, any characterbased data terminals are called tty devices. In Linux, there are three
different kinds of tty devices. Serial devices are used with serial
connections, such as over a modem or using a serial cable. Virtual terminals
create the common console connection, such as the virtual consoles
available when pressing [Alt]-[<F-key>] at the system console. Pseudo
terminals create a two-way communication that is used by some higher level
applications, such as X11.
Using sysctl
 Setting kernel parameters in the /proc/sys directory need not be a manual process or one
that required echoing values into a virtual file, hoping they are correct. The sysctl
command can make viewing, setting, and automating special kernel settings very easy.
 To get a quick overview of all settings configurable in the /proc/sys directory, type the
sysctl -a command as root. This will create a large, comprehensive list.
 This is the same basic information you would see if you viewed each of the files
individually. The only difference is the file location. The /proc/sys/net/ipv4/route/min_delay
is signified by net.ipv4.route.min_delay, with the directory slashes replaced by dots and
the proc.sys portion assumed.
 quickly setting single values like this in /proc/sys is helpful during testing, it does not work
as well on a production system, as all /proc/sys special settings are lost when the
machine is rebooted. To preserve the settings that you like to make permanently to your
kernel, add them to the /etc/sysctl.conf file.
 Even though the /proc filesystem is a great resource to exploit, sometimes it is just
missing. The filesystem is not vital to system operation, and there are cases when you
choose to leave it out of the kernel image or simply don't mount it. When you build an
embedded system, for example, saving 40-50 kB can be an interesting option; if you are
very concerned about security, on the other hand, you might decide to hide system
information and leave /proc unmounted.
Using sysctl
 The system call interface to kernel tuning, namely sysctl, is an alternative way to peek
into configurable parameters and to modify them. An additional advantage of the system
call interface is that it's faster, as no fork/exec is involved, nor any directory lookup.
Anyway, unless you run a very old platform, the performance savings are irrelevant.
 To use the system call, the header <sys/sysctl.h> must be included: it declares the
function as:
int sysctl (int *name, int nlen, void *oldval, size_t *oldlenp, void *newval, size_t newlen);
The arguments of the function have the following meaning:
name points to an array of integers: each of the integer values identifies a sysctl item, either a
directory or a leaf node file. The symbolic names for such values are defined in <linux/sysctl.h>.
nlen states how many integer numbers are listed in the array name: to reach a particular entry
you need to specify the path through the subdirectories, so you need to tell how long is such
path.
oldval is a pointer to a data buffer where the old value of the sysctl item must be stored. If it is
NULL, the system call won't return values to user space.
oldlenp points to an integer number stating the length of the oldval buffer. The system call
changes the value to reflect how much data has been written, which can be less than the buffer
length.
newval points to a data buffer hosting replacement data: the kernel will read this buffer to
change the sysctl entry being acted upon. If it is NULL, the kernel value is not changed.
newlen is the length of newval. The kernel will read no more than newlen bytes from newval.
Using sysctl (FreeBSD specific)





The FreeBSD sysctl mechanism is based on the so-called linker set technology[1]. It lets us gather
information of a running kernel and configure it to some extent without rebuilding a new kernel.
All the information is stored inside the kernel and is organized into a Management Information Base
(MIB) tree. To access the MIB tree, you should use sysctl variables whose names are naturally
managed hierarchically.
Most sysctl variables have ASCII names separated by dots. For example, the read-only sysctl
variable kern.ostype contains the type of the kernel. This naming scheme is very similar to filenames,
where we use slashes to separate component names instead of using dots. To list all sysctl variables
by their ASCII names, you can issue the following command:
$ sysctl -a
The types of the sysctl variables include node, integer, string, structure and opaque data. A node is
like a directory in a filesystem. The kern.ostype variable is a string. Its value is "FreeBSD." The sysctl
command that you can use on a command line only accepts ASCII names of a sysctl variable. Unlike
filenames, wildcard characters like "*" and "?" are not accepted. But you do not have to specify full
name to display sysctl variables.
ALL sysctl names are implemented internally as an array of integers. I call it "integer names" to
distinguish with "ASCII names." You can only use integer names with the system call __sysctl(). If the
user only knows the ASCII name of a sysctl variable, it must use a special integer name {0,3} (see
below) along with the ASCII name to get the integer name of the sysctl variable. You can not avoid
this indirection.
Using sysctl (FreeBSD specific)
 The maximum number of integers consisting of a sysctl
name is limited to CTL_MAXNAME (12). The
corresponding internal name of kern.ostype is an array of
integers with two elements: {CTL_KERN,
KERN_OSTYPE} or {1,1}. Note some sysctl variables
only have integer names. For example, {CTL_KERN,
KERN_PROC, GPROF_STATE} is the name for the
kernel profiling sysctl variable recording whether the
kernel is currently being profiled. It has no corresponding
ASCII name and therefore cannot be accessed by the
sysctl command.
Resources
1.
2.
3.
4.
5.
6.
7.
8.
9.
Red Hat : The Official Red Hat Linux Reference Guide
InformIT - The /proc File System
Jonathon T. Giffin George S. Kola - Linux Process Control via the File System
Daemonnews Department of Computer Science, SUNY at Binghamton Zhihui
Zhang - FreeBSD 4.0 Sysctl Mechanism
Sean Davis <[email protected]> - sysctl On NetBSD - An Easy Way To
Get Process Data
Oskar Andreasson [email protected] - Ipsysctl tutorial 1.0.4
FreeBSD Documentation project – FreeBSD Handbook
Marshall Kirk McKusick, Marshall Kirk McKusick, George V. Neville-Neil. Design and Implementation of the FreeBSD Operating System
Gerhard Mourani and Open Network Architecture, Inc. - Securing and
Optimizing Linux: The Ultimate Solution