Transcript PPTX Slides
Dr A Sahu
Dept of Comp Sc & Engg.
IIT Guwahati
• NIC Cards
–Registers TX/RX, Statistics Counter
–Network Device Driver (Skeleton)
• Kernel Counter
–Jiffies, RTC, kernel timer
• File System, Block Devices
–An introduction
• There will not be any class in DIWALI week
• Wishing you a happy and safe Diwali
• Assignment will be uploaded to course website
before this week end.
• Deadline of assignment 13 Nov 2010
• Assignment will carry 5 marks
• You have to show the demo on your lab machine
– No Demo, No marks
– You will not get any marks by simply submitting
assignment
• Memory-information registers
– TDBA(L/H) = Transmit-Descriptor Base-Address
Low/High (64-bits)
– TDLEN = Transmit-Descriptor array Length
– TDH = Transmit-Descriptor Head
– TDT = Transmit-Descriptor Tail
• Transmit-engine control registers
– TXDCTL = Transmit-Descriptor Control Register
– TCTL = Transmit Control Register
• Notification timing registers
– TIDV = Transmit Interrupt Delay Value
– TADV = Transmit-interrupt Absolute Delay Value
• Memory-information registers
– RDBA(L/H) = Receive-Descriptor Base-Address Low/High (64bits)
– RDLEN = Receive-Descriptor array Length
– RDH = Receive-Descriptor Head
– RDT = Receive-Descriptor Tail
• Receive-engine control registers
– RXDCTL = Receive-Descriptor Control Register
– RCTL = Receive Control Register
• Notification timing registers
– RDTR = Receive-interrupt packet Delay Timer
– RADV = Receive-interrupt Absolute Delay Value
• The 82573L has several dozen statistical
counters which automatically operate to keep
track of significant events affecting the
ethernet controller’s performance
• Most are 32-bit ‘read-only’ registers, and they
are automatically cleared when read
• Your module’s initialization routine could read
them all (to start counting from zero)
• The statistical counters all have addressoffsets in the range 0x04000 – 0x04FFF
• You can use a very simple program-loop to
‘clear’ each of these read-only registers
// Here ‘io’ is the virtual base-address
// of the nic’s i/o-memory region
{
int
r;
// clear all of the Pro/1000 controller’s statistical counters
for (r = 0x4000; r < 0x4FFF; r += 4) ioread32( io + r );
}
0x4000 CRCERRS
0x400C RXERRC
0x4014SCC
0x4018ECOL
0x4074GPRC
0x4078BPRC
0x407CMPRC
0x40D0
0x40D4
0x40F0MPTC
0x40F4BPTC
CRC Errors Count
Receive Error Count
Single Collision Count
Excessive Collision Count
Good Packets Received
Broadcast Packets Received
Multicast Packets Received
TPR
Total Packets Received
TPT
Total Packets Transmitted
Multicast Packets Transmitted
Broadcast Packets Transmitted
• Loopback.c, plip.c, e100.c are examples of
network drivers : /drivers/net/
• Device registration:
– Alloc net devices (Request for resources and offer
facilities)
• Struct net_devices *snull_dev[2] ; //linux/netdevice.h
• snull_dev[0]=alloc_netdev(sizeof(struct snull_priv),
“sn%d”,snull_init);
• Alloac_etherdev(int sizeof_priv); /wrapper to
alloc_netdev
– After initialization complete register the devices
• register_netdev(snull_dev[i]); // return 1 if fails
• Strcut snull_priv *priv=nedev_priv(dev);
Strcu snull_priv {
struct net_devices_stats stats;
int status;
strcut snull_packet *ppool;
struct snul_packet *rx_queue;
int rx_enabled, tc_packele;
u8 *tx_packetdata;
struct sk_bff *skb;
spinlock_t lock;
};
• Initialization
priv=netdriv_priv(dev);
memset(priv,0,sizeof(strcutn null_priv));
spin_lock_init(&priv->lock);
snull_rx_inits(dev,1); //enable revice interrupts
• Global Information
– name: name of device
– State: state of device
– net_device *next; // ptr to next dev in global list
– init_funtion: An init fun called by reg_netdev();
• Hardware Information
• Interface Information
• Device methods
• Low level hardware information
• Base_address: io_base address of network interface
• Char irq: dev->irq, the assigned interrupt
number..ifconfig
• Char if_port: the port is in use on multiport
device..10base
• Char dma; // dma allcoated by the device for ISA bus
• Device memory information: address of shared
memory used by the devices
– Rmem (rx mem) , mem (tx_mem)
– rmem_start, rmem_end, mem-start, mem_end;
• Init setup most of the information But device
specific setup information need to setup later on
• Non ethernet interface can use helper functions
– fc_setup, ltalk_setup, fddi_setup
– Fiber channel, local talk, fiber dis data ineterface, token
ring, hihh perf parllel interface (hppi_setup)
• Non default interface filed
– Hard_headerlen,MTU (max tx unit=1500 oct ),
tx_queue_len (ether=1000, pipl=10), short type, char
adresslen; char dev_addeess[Max_add_len],
breadcast[max_ad_len]
• Flags bt sets: Mask bits, loopback, debug, noarp,
multicast
• Special hardware capability the device has: DMA
• Fundamental method
– Open, Stop, Hard_start_xmit
– Hard_header, Rebuild_header
– Tx_timeout, Net_device_stats, Set_config
• Optional methods
– Poll, poll_controller, do_ioctl, set_multicastlist
– Set_mac_address,change_mtu, header_cache,
header_cache_update, hard_header_parse
• Utilities fileds (not methods)
– Trans_start, last_rx, watchdog_timeo, *priv,
mc_list, mc_count, xmit_lock, xmit_lock_owner
•
•
•
•
•
PIT
Jiffies : A global timing counter variable
User space timing
Timer interrupt ISR
Do_timer()
• Accurate timing crucial for many aspects of OS
–
–
–
–
–
Device-related timeouts
File timestamps (created, accessed, written)
Time-of-day (gettimeofday()
High-precision timers (code profiling, etc.)
Scheduling, cpu usage, etc.
• Intel timer hardware
–
–
–
–
RTC: Real Time Clock
PIT: Programmable Interrupt Timer
TSC: TimeStamp Counter (cycle counter)
Local APIC Timer: per-cpu alarms
• Timer implementations
– Kernel timers (dynamic timers)
– User “interval” timers (alarm(), setitimer())
• Need timing measurements to:
– Keep track of current time and date for use by
e.g. gettimeofday().
– Maintain timers that notify the kernel or a user
program that an interval of time has elapsed.
• Timing measurements are performed by
several hardware circuits, based on fixed
frequency oscillators and counters.
• Kernel keeps time by reading a clock device
(oscillator) and maintaining a kernel variable with
the current time
• Current time accessible to user-mode programs via
system calls
• gettimeofday() is the usual interface to the current
time maintained by system.
• Same is also used to determine when the currently
running process should be removed from CPU to
let others run
• Also used to keep track of the amount of time a
process runs in user or supervisor mode!
#include <sys/time.h>
struct timeval theTime;
gettimeofday(&theTime, NULL);
//Definition of struct timeval:
struct timeval {
long tv_sec;
long tv_usec;
};
The date command: this
command gives the time
according to the Gregorian
(modern Christian) calendar.
• The clock ISR
– timer_interrupt() in file arch/i386/kernel/time.c calls
– do_timer( ) function in file kernel/sched.c
• Increments a counter in the kernel variable called jiffies
each time the function (do_timer( )) runs.
• do_timer( ) then marks TIMER_BH (bottom-half) for
execution in the ret_from_sys_call
• For the system time, the timer bottom half uses the
current value of kernel variable jiffies to compute the
current time. It stores the value in struct timeval xtime,
can be read by kernel functions
– sys_gettimeoday( )
• Real-Time Clock (RTC):
– Often integrated with CMOS RAM on separate
chip from CPU: e.g., Motorola 146818.
– Issues periodic interrupts on IRQ line (IRQ 8) at
programmed frequency (e.g., 2-8192 Hz).
– In Linux, used to derive time and date.
– Kernel accesses RTC through 0x70 and 0x71 I/O
ports.
• Intel Pentium (and up), AMD K6 etc incorporate a
TSC.
• Processor’s CLK pin receives a signal from an
external oscillator e.g., 400 MHz crystal.
• TSC register is incremented at each clock signal.
• Using rdtsc assembly instruction can obtain 64bit timing value.
• Most accurate timing method on above
platforms.
• Programmable Interrupt Timers (PITs):
– e.g., 8254 chip. Already discussed
• PIT issues timer interrupts at programmed
frequency.
• In Linux, PC-based 8254 is programmed to
interrupt Hz (=100) times per second on IRQ 0.
– Hz defined in <linux/param.h>
– PIT is accessed on ports 0x40-0x43.
• Provides the system “heartbeat” or “clock tick”.
•
•
•
•
•
•
unsigned long volatile
jiffies;
global kernel variable (used by scheduler)
initialized to zero when system reboots
gets incremented during a timer interrupt
so it counts ‘clock-ticks’ since cpu restart
‘tick-frequency’ is a ‘configuration’ option
• Won’t overflow for at least 16 months
• Linux kernel got modified to ‘fix’ overflow
• Now the declaration is in ‘linux/jiffies.h’:
unsigned long longjiffies_64;
and a new instruction in ‘do_timer()’
(*(u64*)&jiffies_64)++;
• jiffies is incremented every timer
interrupt.
– Number of clock ticks since OS was booted.
• Scheduling and preemption done at
granularities of time-slices calculated in units
of jiffies.
• Every timer interrupt:
– Update jiffies.
– Determine how long a process has been executing
and preempt it, if it finishes its allocated timeslice.
– Update resource usage statistics.
– Invoke functions for elapsed interval timers.
• Signal on IRQ 0 is generated:
• timer_interrupt() is invoked w/ interrupts
disabled (SA_INTERRUPT flag is set to denote
this).
• do_timer() is ultimately executed:
– Simply increments jiffies & allocates other tasks
to “bottom half handlers”.
– Bottom half (bh) handlers update time and date,
statistics, execute fns after specific elapsed intervals
and invoke schedule() if necessary, for
rescheduling processes.
• lost_ticks (lost_ticks_system) store
total (system) “ticks” since update to xtime,
which stores approximate current time. This is
needed since bh handlers run at convenient time
and we need to keep track of when exactly they
run to accurately update date & time.
• TIMER_BH refers to the queue of bottom halves
invoked as a consequence of do_timer().
• Declare a timer: struct timer_list mytimer;
• Initialize this timer: init_timer( &mytimer );
mytimer.func = mytimeraction;
mytimer.data = (unsigned long)mydata;
mytimer.expires = <number-of-jiffies>
• Install this timer: add_timer( &mytimer );
• Modify this timer: mod_timer( &mytimer, <jifs> );
• Delete this timer: del_timer( &mytimer );
• Delete it safely: del_timer_sync( &mytimer);
• RTC
–
–
–
–
–
–
battery backed (packaged with CMOS RAM)
registers to access current date/time (ports 0x70, 0x71)
includes programmable timer (2-8192Hz)
accessible as /dev/rtc
sampled by kernel (only) on startup
set by “clock” command (synched at shutdown)
• TSC time stamp ( MSR: microproc specific register)
–
–
–
–
64 bit counter increments at CPU cycle speed
accessible via user space assembly instruction rdtsc
provides high-resolution timing capability
kernel determines frequency at boot (calibrate_tsc())
• PIT
–
–
–
–
–
heartbeat timer; drives timer interrupt (tick)
100 Hz on PC; 1024 Hz on fast chips (alpha, itanium)
patches to change clock speed via /proc!
jiffies: # of ticks since boot
xtime: struct with secs, usecs since Jan 1, 1970 (“epoch”)
• CPU Local (APIC) Timers
–
–
–
–
when available does per-cpu timing (e.g. quantum)
if not available, driven by PIT
32 bit (instead of PIT 16 bit) so lower frequency possible
decrements in multiples of bus cycles (1, 2, 4, 8, .. 128)
• xtime.tv_sec, xtime.tv_usec
– seconds since Jan 1, 1970
• update_times()
– wall_jiffies: time of last xtime update
– update_wall_time(ticks) // handles usec wrap
– calc_load(ticks) // load average
void update_times(void) {
unsigned long ticks;
write_lock_irq(&xtime_lock);
ticks = jiffies – wall_jiffies;
if (ticks) {
jiffies += wall_jiffies;
update_wall_time(ticks);
}
write_unlock_irq(&xtime_lock);
calc_load(ticks);
}
• checking cpu resource limits
– update user and kernel mode ticks for times()
– per_cpu_utime, per_cpu_stime
– over cpu limit? send SIGXCPU, SIGKILL
• updating system load averages <1.0 is good
– average tasks in run queue last 1, 5, 15 minutes
– includes UNINTERRUPTIBLE (but not pid 0)
• kernel profiling
– samples eip on each interrupt
– activated by kernel option profile=
– results exported via /proc/profile (readprofile command)
• NMI watchdogs (detecting system freeze)
– clever use of APIC to detect freezes (failure to re-enable
interrupts)
– broadcast NMI periodically, check for increasing interrupt
count!
• gettimeofday(): sec, usec
– delay since last bottom half (xtime update)
– delay since last interrupt (jiffies update)
• samples TSC if available for high-precision
– settimeofday(): update xtime (not RTC!) requires
root
• adjtimex(): gradual clock time change
• alarm(), setitimer()
– user mode interval timers
– three different timers
• Block Devices (Disk)
– Sector, inode
• File systems (Operations)
– Read/write, open,close, lseek, type
• Component in the kernel that handles filesystems, directory and file access.
• Abstracts common tasks of many file-systems.
• Presents the user with a unified interface, via
the file-related system calls (open, stat, chmod
etc.).
• Filesystem-specific operations:- vector them to
the filesystem in charge of the file.
• $ mount -t iso9660 -o ro /dev/cdrom
/mnt/cdrom
• Steps involved:
– Find the file system.(file_systems list)
– Find the VFS inode of the directory that is to be the
new file system's mount point.
– Allocate a VFS superblock and call the file system
specific read_super function.
• Operations for block devices
• In include/linux/fs.h :
struct block_device_operations {
int (*open) (struct inode *, struct file *);
int (*release) (struct inode *, struct file *);
int (*ioctl) (struct inode *, struct file *, unsigned,
unsigned long);
int (*check_media_change) (kdev_t);
int (*revalidate) (kdev_t);
};
• In include/linux/blkdev.h :
typedef void (request_fn_proc) (request_queue_t *q);
• Provides common functionality for all block devices
in Linux
– Uniform interface (to file system)
e.g. bread( ) block_prepare_write( )
block_read_full_page( ), ll_rw_block( ) // low level
– buffer management and disk caching
– Block I/O requests scheduling
• Generates and queues actual I/O requests in a
request queue (per device)
– Individual device driver services this queue (likely interrupt
driven)
• Generic block device layer
– Generates and queues I/O request
– If the request queue is initially empty, schedule a plug_tq
tasklet into tq_disk task queue
• Asynchronous run of task queue tq_disk
– Run in a few places (e.g., in kswapd)
– Take a request from the queue and call the request_fn
function:
• q->request_fn(q);
• To service all I/O requests in the queue
• Typical interrupt-driven procedure
– Service the first request in the queue
– Set up hardware so it raises interrupt when it is done
– Return
• Interrupt handler tasklet
– Remove the just-finished request from the queue
– Re-enter the request service routine (to service the next)
• Device operation structure:
– static struct block_device_operations xxx_fops =
{
open: xxx_open,
release: xxx_release,
ioctl: xxx_ioctl,
check_media_change, xxx_check_change,
revalidate, xxx_revalidate,
owner: THIS_MODULE,
};
• Block device driver
– 1 class (Lect 36)
• Creative Sound blaster
– 1 class (Lect 37)
• USB2.0
– 2 class (Lect 38-39)
• Summery After Mid Semester & Question
patterns
– Last class (Lect40)