Digging into ext2 - The Blavatnik School of Computer Science
Download
Report
Transcript Digging into ext2 - The Blavatnik School of Computer Science
Digging into ext2
Nezer J. Zaidenberg
Agenda
Linux kernel module programming (summary from TLDP)
Abstract (Virtual) in C
Initial code review and work on ext2
How to start working on ex 3 + more digging methods
Some clues on ex3
References
The linux kernel - www.kernel.org (or local mirror)
The linux documentation project – Kernel module programming – 2.6
www.tldp.org (http://www.tldp.org/LDP/lkmpg/2.6/html/lkmpg.html)
Under KERNEL tree
Documentation/Kbuild
fs/*.c
fs/ext2/*.c
UNIX filesystems
Understanding The Linux Kernel
The linuxkerenl (from tldp.org) has a fairly reqasonable description of
ext2.h and file systems in general but is referring to version 2.0 of Linux
(so The concepts are correct but but some files have different names!)
Hello world module (1/2) in kernel 2.6
#include<linux/module.h>
/* Needed by all modules */
#include <linux/kernel.h>
/* Needed for KERN_INFO */
#include <linux/init.h>
/* Needed for the macros */
l
static int __init hello_2_init(void)
{
printk(KERN_INFO "Hello, world 2\n");
return 0;
}
Hello world module (2/2)
static void __exit hello_2_exit(void)
{
printk(KERN_INFO "Goodbye, world 2\n");
}
module_init(hello_2_init);
module_exit(hello_2_exit);
Makefile for kernel module
obj-m += hello-2.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
Makefile for multiple objects
obj-m += startstop.o
startstop-objs := start.ostop.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
Implementing abstract functions in C
class base
{
virtual foo()
virtual bar()
Widget W;
}
class derived : public class base // Extends class base
{
foo()
}
Calling base and derived functions
base b;
derived d;
base * ptr=&b;
(*ptr).foo();
(*ptr).bar();
ptr=&d;
(*ptr).foo();
(*ptr).bar();
How does the compiler knows
which function to call?
The Virtual implementation
When implementing virtual in memory, we keep for every
object a “Virtual table”
The virtual table holds function names and pointers to
functions
When we use (*ptr)->foo() the function call goes to the
Virtual table and looks for foo.
When it finds foo it will call the right function.
What does it have to do with C?
Abstract functions in C
The fact that C have no virtual table built by the compiler for us does not mean
we cannot built one ourselves. (If you think it through everything in C++/Java
compiles to assembler (close to C) with no classes either)
We just have to build the virtual table ourselves…
For every struct widget we can add structwidget_operations * which will hold
pointers to functions that operate on widget. (and can be implemented as we
want)
Since the compiler in C lacks the class and inheritance mechanism we have to do
lots of things ourselves such as
Deliver the * this to the functions
Initialize the struct widget operations
Call the function in a bit akward way (b.foo() -> (*(b->ops.foo)(&b))
The abstract interface
structfile_operations {
struct module *owner;
loff_t(*llseek) (struct file *, loff_t, int);
ssize_t(*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t(*aio_read) (structkiocb *, char __user *, size_t, loff_t);
ssize_t(*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t(*aio_write) (structkiocb *, const char __user *, size_t,
…
Complete struct (and others) found under Kernel source tree include/linux/fs.h
Filling the abstract interface (GCC)
structfile_operations fops = {
read: device_read,
write: device_write,
open: device_open,
release: device_release
};
Filling the abstract interface (C99)
structfile_operations fops = {
.read = device_read,
.write = device_write,
.open = device_open,
.release = device_release
};
fs/ext2/super.c
1426 static void __exit exit_ext2_fs(void)
1427 {
1428
unregister_filesystem(&ext2_fs_type);
1429
destroy_inodecache();
1430
exit_ext2_xattr();
1431 }
1432
1433 module_init(init_ext2_fs)
1434 module_exit(exit_ext2_fs)
fs/ext2/super.c
static int __init init_ext2_fs(void)
1407
1408 {
1409
int err = init_ext2_xattr();
1410
if (err)
1411
1412
err = init_inodecache();
1413
if (err)
1414
1415
return err;
goto out1;
err = register_filesystem(&ext2_fs_type);
fs/ext2/super.c
1416
if (err)
1417
1418
1419 out:
1420
1421 out1:
1422
exit_ext2_xattr();
1423
return err;
1424 }
goto out;
return 0;
destroy_inodecache();
What have we just seen
Init the module
Exit
Register/unregister the file system
It would be perfectly reasonable if your module just call
registerfs and unregisterfs in the init module and exit
module (you don’t REALLY need inode cache and xattr
etc.)
Struct ex2_fs_type (fs/ext2/super.c)
1399 static structfile_system_type ext2_fs_type = {
1400
.owner
= THIS_MODULE,
1401
.name
= "ext2",
1402
.get_sb
= ext2_get_sb,
1403
.kill_sb
= kill_block_super,
1404
.fs_flags
1405 };
= FS_REQUIRES_DEV,
What have we just seen
We have 5 fields
.owner = THIS_MODULE (so if this file system is mounted we
cannot unmount this module)
.name = “ext2” (the name that will be specified by –t to
mount)
.get_sb = what to do to get super block (call ext2 method)
.kill_sb = how to free the super block (call linux method)
FS_REQUIRES_DEV This file system is based on device
ext2_get_sb (from fs/ext2/super.c)
1288 static int ext2_get_sb(struct file_system_type *fs_type,
1289
int flags, const char *dev_name, void *data,
structvfsmount *mnt)
1290 {
1291
return get_sb_bdev(fs_type, flags, dev_name,
data, ext2_fill_super, mnt);
1292 }
What have we just seen
We call standard Linux kernel method to get super block
for file system based on BLOCK device This function
initialize the block device
We give this function as argument – a function to fill the
private file super block (read it from disk) called
ext2_sb_fill
Continuing the dig… fs/super.c
You’ll find get_sb_bdev at lines 751-813 (note it’s a different file)
This function does a lot we don’t really care about (initialize the block device) etc. But in lines 795-800 you shall find
:
795
error = fill_super(s, data, flags & MS_SILENT ? 1 : 0);
796
if (error) {
797
up_write(&s->s_umount);
798
deactivate_super(s);
799
goto error;
800
}
Lets take it from there
Back to fs/ext2/super.c
738 static int ext2_fill_super(struct super_block *sb, void *data, int silent)
739 {
740
structbuffer_head* bh;
…
756
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
757
if (!sbi)
758
return -ENOMEM;
759
sb->s_fs_info = sbi;
760
sbi->s_sb_block = sb_block;
…
fs/ext2/super.c
779
if (blocksize != BLOCK_SIZE) {
780
logic_sb_block = (sb_block*BLOCK_SIZE) / blocksize;
781
offset = (sb_block*BLOCK_SIZE) % blocksize;
782
} else {
783
784
785
logic_sb_block = sb_block;
}
fs/ext2/super.c
786
if (!(bh = sb_bread(sb, logic_sb_block))) {
787
printk ("EXT2-fs: unable to read superblock\n");
788
gotofailed_sbi;
789
}
790
/*
791
* Note: s_es must be initialized as soon as possible because
792
*
793
*/
794
es = (struct ext2_super_block *) (((char *)bh->b_data) + offset);
795
sbi->s_es = es;
some ext2 macro-instructions depend on its value
fs/ext2/super.c
…
798
799
if (sb->s_magic != EXT2_SUPER_MAGIC)
goto cantfind_ext2;
…
1042
sb->s_op = &ext2_sops;
…
1045
root = ext2_iget(sb, EXT2_ROOT_INO);
fs/ext2/super.c
299 static const structsuper_operations ext2_sops = {
300
.alloc_inode
301
.destroy_inode = ext2_destroy_inode,
302
.write_inode
303
.delete_inode = ext2_delete_inode,
304
.put_super
314 };
= ext2_alloc_inode,
= ext2_write_inode,
= ext2_put_super,
fs/ext2/super.c
305
.write_super
= ext2_write_super,
306
.statfs
307
.remount_fs
= ext2_remount,
308
.clear_inode
= ext2_clear_inode,
309
.show_options = ext2_show_options,
= ext2_statfs,
310 #ifdef CONFIG_QUOTA
311
.quota_read
= ext2_quota_read,
312
.quota_write
= ext2_quota_write,
313 #endif
What have we just seen
The function find the exact block to read based on block
device block size and super block block size
The function reads the super block via the bread API for
block devices (the data is read into buffer header struct)
The function stores private data
The function checks magic number (you should also do)
The function set super block ops
The function reads the root directory
Saving private data
Linux structsuper_block has a void * called s_fs_info
Linux structinode has a void * called i_private
Those pointers can be used to save private data (your file
system structure)
You can use those pointers data later
Include files and structs
Include/linux/fs.h has most struct definitions including
structsuper_block (lines 1106-1176)
structinode (lines 623-688)
structblock_device (lines 550-580)
Structbuffer_head is found at line 60 of buffer_head.h
Operations
super_operationsfs.h : 1358
Inode_operationsfs.h: 1311
File_ioerations: fs.h 1281
Address_space_operationsfs.h : 486 (for mmap)
Lots of time we will go from our file
system code to Linux code and back
Our uxfs_get_sb -> calls Linux bd_get_sb -> calls our
uxfs_fill_sb
Our Inode ops -> fills Linux do_sync_real -> which calls our
get block
This makes sense so don’t be surprised when you see it
How to start working (from scratch)
on ex3
Learn :
Build hello world module
Build a blank (do nothing) file system module
Think :
What are you going to hold in your file system? Think about
super block, Inodes
Implement :
Write mkfs.uxfs (a user program)
Write fsdebug (a user program that prints/manipulates your file
system)
Ex 3
Move to kernel
Make your file system module mountable (mount your
floppy)
(make sure you read SB correctly)
Implement directories (so we can ls the root of the module)
Implement files (via read/write/lseek)
Implement mmap
You’re done you can work on bonus now
More digging methods + some
advice
Install kernel-dev-2.6.27.5 RPM from
ftp://mirror.isoc.org.il/fedora/releases/10/Fedora/i386/os/
Packages/kernel-devel-2.6.27.5-117.fc10.i686.rpm
rpm –hiv –-force kernel-dev-2.6.27.5-117.fc10.i686
Download complete kernel source from www.kernel.org
You can printk each function that ext2 calls in each
operation as you traverse the file
Some clues on EX3
You can use the unixfilesystems solutions (but some porting is required)
You can use the ext2 solution (which is complete solution +all bonuses)
but “encumbered” with many performance optimizations you don’t
need
If ext2 has implementation to a function you need -> you should
probably implement it as well.
If ext2 uses Linux default implementation -> It should be OK for you too.
All ext2 functions are prefixed with ext2
Use something to help you navigate in the kernel (I use vi ctags, grep
and cscope)
minix
Minix is a “mini-Unix” (OS built by Andrew Tenenbaum for
Educational purposes which later inspired LinusTurvalds to
write Linux) has a unix-like file system MUCH simpler then
ext2.
I am NOT familiar with minix enough to recommend it but
from what I browsed it has all the features you need for
the ex. (+some extra) and is much simpler then ext2
The implementation for minixfs under Linux was authored
by LinusTurvalds at 1991-1992
Amount of lines of code to
understand
Fs name
Lines of code
ext4
26404
ext3
16
ext2
8431
minix
2243
What is required of you
~1000 (+500 bonus)
For those of you that want to base
you work on MINIX FS
Ignore everything that has to do with minix versions (for
simplicity assume minix version 1)
Ignore everything that has to do with aio (your file system
is not required to support aio)
Ignore everything that has to do with inode_cache
(performance optimization that was not required)
The rest is pretty much your homework… (with some
bonuses such as bitmaps)
More minix
Zones = blocks
Zmap = blocks map
Minix is documented (including file system) in Andrew
Tanenbaum book Operating systems (Latest edition with
Albert Woodhull) This documentation describes MINIX
code for the minixfs (which is different then Linux code)
still it can help if you don’t understand the structure
Some VMWARE and Linux admin
clues
install rpm -hiv (package you downloaded) (may require
–force if you try to force install of older package)
yum install emacs (or eclipse or anything)