Final recitation - The Blavatnik School of Computer Science

Download Report

Transcript Final recitation - The Blavatnik School of Computer Science

File system 3
Nezer J. Zaidenberg
Agenda
• Minhala
• Final presentation of Linux FS code
• My personal summary of the semester
Siurhazara
• 12/2/09 Dach 14:00-16:00
• We shall review all questions from exam sample and
oral exams
• Attendance is not required.
• But if you come bring food and soft drinks 
Userland reading
APUE e2
•
•
•
•
•
Files and dirs - Chapter 3 (and a
little of 4)
Process - Chapter 7 + 8 +
relevant subsections of 9 + 10
(signals) +13 (Daemon)
Threads and sync (11+12)
I/O multiplexing and select
(beginning of chapter 14)
Sockets (Internet and UDS) –
Chapter 16,17.3 or Beej
APUE e1
•
Chapter 3
•
Chapter 7 (up to subsection 10)
•
Chapter 8
•
Chapter 9 (relevant subsections)
•
Chapter 10
•
Chapter 11 (select)
•
Chapter 13
•
Also Sockets + Threads
•
Relevant subsections from chapters 4,5
Kernel reading for the exam
•
Understanding the linux kernel 3rd edition chapters
•
•
•
•
•
•
Chapter 12. The Virtual Filesystem
Chapter 14. Block Device Drivers
Chapter 15. The Page Cache
Chapter 16. Accessing Files
Chapter 18. The Ext2 and Ext3 Filesystems
UNIX filesystems
• Chapter 10
• Chapter 12
• Chapter 14
•
As Kernel versions progress and API changes (and my desire to keep the course on
bleeding edge.) not all API’s (such as XX_iget()) can be found in books. Check my
slides.
Last train to Clarksville
• I’ve sent the questions of the Oral exam to Amir he will
upload Sunday
• IF YOU HAVE SUBMITTED AND NOT TESTED
TODAY IS YOUR ABSOLUTLY POSITIVELY
DEFINTELY REALLY LAST CHANCE
• Please approach me at 17:00 or between 15:00 and 16:00
• If you have not taken oral exam your grade in the ex.
Will be ZERO even if you have submitted!
Please return DVD’s
• Even though I created 10
DVD’s for about 30 teams
each needed for about 10
minutes there are 2 and a
lot of time there were 0
(ZERO).
• PLEASE – If you have
taken a DVD from the
envelope return it!!!!
And back to digging
Links and mmap
What we should know
•
Where to find kernel sources
•
How to mount file systems
•
Basic recognition with the source to create, access and remove files and directories.
•
How to read and write from block device
•
I’ve also given you a short manual on how to write a simple file system in memory.
•
We have only two gaps
•
Building links
•
Building directories
•
Following request we will also breeze through mkfs
mkfs
• What does mkfs supposed to do
• Mkfs just MAKES a blank file system on a device (In
windows we would call it FORMAT a disk)
• Usually that means initializing the super block (writing
magic number, writing blocks as empty etc.) and
writing it to disk. Initializing the root directory and
writing it to disk. Initializing other structs (bitmaps)
and writing them to disk. Initializing other directories
(lost+found) and writing them to disk.
• Just to be clear mkfs is a userland program it should not
be more then few hundreds lines of code.
Book mkfs 1/4
main(intargc, char **argv)
{
… (variable definition and sanity check skipped)
devfd = open(argv[1], O_WRONLY); // line 31
error = lseek(devfd, (off_t)(nsectors * 512), SEEK_SET); //make sure device is large enough // line 36
lseek(devfd, 0, SEEK_SET); // line 42
sb.s_magic = UX_MAGIC;
sb.s_mod = UX_FSCLEAN;
sb.s_nifree = UX_MAXFILES - 4;
sb.s_nbfree = UX_MAXBLOCKS - 2;
sb.s_inode[0] = UX_INODE_INUSE;
Book mkfs 2/4
sb.s_inode[1] = UX_INODE_INUSE;
sb.s_inode[2] = UX_INODE_INUSE;
sb.s_inode[3] = UX_INODE_INUSE;
for (i = 4 ; i< UX_MAXFILES ; i++) {
sb.s_inode[i] = UX_INODE_FREE;
}
sb.s_block[0] = UX_BLOCK_INUSE;
sb.s_block[1] = UX_BLOCK_INUSE;
for (i = 2 ; i< UX_MAXBLOCKS ; i++) {
sb.s_block[i] = UX_BLOCK_FREE;
}
write(devfd, (char *)&sb, sizeof(structux_superblock)); // line 89
Book mkfs 3/4
time(&tm);
memset((void *)&inode, 0, sizeof(structux_inode));
inode.i_mode = S_IFDIR | 0755;
inode.i_nlink = 3;
inode.i_atime = tm;
inode.i_uid = 0;
/* ".", ".." and "lost+found" */
inode.i_mtime = tm;
inode.i_ctime = tm;
inode.i_gid = 0;
inode.i_size = UX_BSIZE;
inode.i_blocks = 1;
inode.i_addr[0] = UX_FIRST_DATA_BLOCK;
lseek(devfd, UX_INODE_BLOCK * UX_BSIZE + 1024, SEEK_SET);
write(devfd, (char *)&inode, sizeof(structux_superblock));
// the process repeats itself
Book mkfs 4/4
lseek(devfd, UX_FIRST_DATA_BLOCK * UX_BSIZE, SEEK_SET);
memset((void *)&block, 0, UX_BSIZE);
write(devfd, block, UX_BSIZE);
lseek(devfd, UX_FIRST_DATA_BLOCK * UX_BSIZE, SEEK_SET);
dir.d_ino = 2;
strcpy(dir.d_name, ".");
write(devfd, (char *)&dir, sizeof(structux_dirent));
dir.d_ino = 2;
strcpy(dir.d_name, "..");
write(devfd, (char *)&dir, sizeof(structux_dirent));
dir.d_ino = 3;
strcpy(dir.d_name, "lost+found");
write(devfd, (char *)&dir, sizeof(structux_dirent));
mmap
• Mmapbasicly is read and write but works with pages
• A new operation struct is defined to deal with pages
• The reading and writing of pages is just as reading
and writing of blocks and implemented in a similar
way
• Lets take a look at ext2
Fs/ext2/inode.c
791 const structaddress_space_operations
ext2_aops = {
797
.write_end
792
.readpage
= ext2_readpage,
798
.bmap
793
.readpages
= ext2_readpages,
799
.direct_IO
= ext2_direct_IO,
794
.writepage
= ext2_writepage,
800
.writepages
= ext2_writepages,
795
.sync_page
= block_sync_page,
801
.migratepage
796
.write_begin
= ext2_write_begin,
802
= generic_write_end,
= ext2_bmap,
= buffer_migrate_page,
.is_partially_uptodate =
block_is_partially_uptodate,
803 };
The address space struct
1269
1270
if (S_ISREG(inode->i_mode)) {
inode->i_op =
&ext2_file_inode_operations;
1278
1279
1281
The code is from ext2/fs/inode.c
•
I removed some if-clauses
•
Basicly we set the mmap function
pointers just like we set the inode
operators
inode->i_mapping->a_ops =
&ext2_aops;
1280
•
inode->i_fop = &ext2_file_operations;
}
} else if (S_ISDIR(inode->i_mode)) {
1282
inode->i_op = &ext2_dir_inode_operations;
1283
inode->i_fop = &ext2_dir_operations;
1287
inode->i_mapping->a_ops =
&ext2_aops;
Fs/ext2/inode.c
715 static int ext2_writepage(struct page *page,
structwriteback_control *wbc)
716 {
717
return block_write_full_page(page, ext2_get_block, wbc);
718 }
719
720 static int ext2_readpage(struct file *file, struct page *page)
721 {
722
723 }
return mpage_readpage(page, ext2_get_block);
Fs/ext2/inode.c
725 static int
726 ext2_readpages(struct file *file, struct address_space
*mapping,
727
struct list_head *pages, unsigned nr_pages)
728 {
729
return mpage_readpages(mapping, pages,
nr_pages, ext2_get_block);
730 }
translation
•
Linux functions are called to get the memory page. Those functions gets a
pointer to a function to a ext2 function that ACTUALLY does the block
reading from disk. Once the block is read it is placed in memory page(s) by
the kernel.
•
The Kernel function that we call mpage_readpage(s) is found under lines
370-420 of fs/mpage.cblock_write_full_page is at 2859-2894 of fs/buffer.c
•
I will not dwell into those function because of time limitation. (Those
functions call other functions and are beyond out time limits. However,
they are contained inside the files and are not really required.)
•
Take note that readpage is the same function we would call when we
read(2)
Links
• HARD LINKS
• Just another pointers (in a directory) to the same file
• (no new inode)
• SOFT (Symbolic) Link (bonus in ex.)
• Block contains the file we point kernel does the rest
• Lets dig into ext2 (hard links first)
The struct
360 const structinode_operations
ext2_dir_inode_operations = {
361
362
363
.create
.lookup
.link
364
.unlink
365
.symlink
366
.mkdir
= ext2_create,
= ext2_lookup,
= ext2_link,
= ext2_unlink,
= ext2_symlink,
= ext2_mkdir,
367
.rmdir
368
.mknod
= ext2_mknod,
369
.rename
= ext2_rename,
376
.setattr
377
.permission
ext2_permission,
378 };
= ext2_rmdir,
= ext2_setattr,
=
translation
• The struct can be found at fs/ext2/namei.c
• The relevant functons are called when we link(2) and
unlink(2)
Fs/ext2/namei.c
177 static int ext2_link (structdentry * old_dentry, structinode * dir, structdentry *dentry)
179 {
180
structinode *inode = old_dentry->d_inode;
182
if (inode->i_nlink>= EXT2_LINK_MAX)
183
return -EMLINK;
185
inode->i_ctime = CURRENT_TIME_SEC;
186
inode_inc_link_count(inode);
187
atomic_inc(&inode->i_count);
189
return ext2_add_nondir(dentry, inode);
190 }
Fs/ext2/namei.c
39 static inline int ext2_add_nondir(struct dentry *dentry, structinode *inode)
40 {
41
int err = ext2_add_link(dentry, inode);
42
if (!err) {
43
d_instantiate(dentry, inode);
44
return 0;
45
}
46
inode_dec_link_count(inode);
47
iput(inode);
48
return err;
49 }
Fs/ext2/dir.c
473 int ext2_add_link (structdentry *dentry, structinode *inode)
474 {
//find page
545
if (de->inode) {
546
ext2_dirent *de1 = (ext2_dirent *) ((char *) de + name_len);
547
de1->rec_len = ext2_rec_len_to_disk(rec_len - name_len);
548
de->rec_len = ext2_rec_len_to_disk(name_len);
549
de = de1;
550
}
558
mark_inode_dirty(dir);
Using kernel API
d_instantiate (9)
Name
d_instantiate- fill in inode information for a dentry
Synopsis
void d_instantiate(structdentry * entry, structinode * inode);
Arguments
Entry dentryto complete
Inodeinodeto attach to this dentry
Description
Fill in inode information in the entry.
This turns negative dentries into productive full members of society.
NOTE! This assumes that the inode count has been incremented (or otherwise set) by the caller to indicate that it is now in use
by the dcach
Symbolic links
• Briefly… they are created using the symlink(2)
syscall
• From fs/ext2/inode.c
1289
if (ext2_inode_is_fast_symlink(inode))
1290
inode->i_op =
&ext2_fast_symlink_inode_operations;
The symlink functions
360 const structinode_operations
ext2_dir_inode_operations = {
361
362
363
364
365
.create
.lookup
ext2_lookup,
.link
.unlink
.symlink
ext2_symlink,
= ext2_create,
=
= ext2_link,
= ext2_unlink,
=
366
.mkdir
= ext2_mkdir,
367
.rmdir
= ext2_rmdir,
368
.mknod
ext2_mknod,
.rename
=
= ext2_rename,
376
.setattr
377
.permission
ext2_permission,
378 };
= ext2_setattr,
=
fs/ext2/namei.c
133 static int ext2_symlink (structinode * dir, structdentry * dentry,
134
const char * symname)
135 {
136
structsuper_block * sb = dir->i_sb;
137
int err = -ENAMETOOLONG;
138
unsigned l = strlen(symname)+1;
139
structinode * inode;
141
if (l>sb->s_blocksize)
142
goto out;
144
inode = ext2_new_inode (dir, S_IFLNK | S_IRWXUGO);
145
err = PTR_ERR(inode);
146
if (IS_ERR(inode))
fs/ext2/namei.c
147
149
goto out;
if (l>sizeof (EXT2_I(inode)->i_data)) {
150
/* slow symlink */
151
inode->i_op = &ext2_symlink_inode_operations;
155
inode->i_mapping->a_ops = &ext2_aops;
156
err = page_symlink(inode, symname, l);
157
if (err)
158
gotoout_fail;
165
mark_inode_dirty(inode);
167
err = ext2_add_nondir(dentry, inode);
168 out:
169
return err;
General home work QA
•
Where can we find the kernel?
• www.kernel.org
•
Why is there no WORKING solution supplied by you?
• There are at least 10 solutions you can find in the kernel – minix, ext2/3/4, ufs,
reiser_fs, xfs, sysv, jfs, hpfs and the entire point was to get you to dig there
• Since when are homework supplied with solutions anyway.
•
Where can we find mkfs.minix
• Mkfs.minix is NOT part of the linux kernel. It is user application that is part of
linux-utils
• I found it under linux-utils version 2.12q (downloaded and untarred sources,
./configure and make. Compilation was broken but after I have built mkfs.minix
(didn’t bothered to correct the errors not needed)
• I understand newer versions are available but I never tried them. I understand
some students got them to work as well
We are still at loss how do we
start working on this ex.
•
Copy minix tree to your playground directory, build mkfs.minix (use man --path “path of
mkfs.minix.8” which comes with linuxutils if you want to see man)
•
Edit the source for minixmakefile so that it will call linuxmakefile recursively
•
For each function add “printk” that print the function name. (and __FILE__ and__LINE__)
•
Format the floppy to minix and mount it. Check syslog
•
Start running operations. Check syslog. Check which functions are called immediately, which
functions are called after a while (writing buffer to disks) – check timestamps
•
Either take minix, remove the dead cod eand stuff you don’t need, and learn it or take book
example and start fixing things till its working
•
The experience should be entertaining
•
Maybe it will be best if you access the file system via C code that calls system calls and not via
shell commands who may do other things. (that may confuse you)
General test FAQ
•
There are 36 quiz questions in the test (1 answer in 4)
•
3pts correct answer and no deduction on incorrect answer (total 108)
•
Not an open book exam. (we added the Synopsis of all functions you may need. )
•
No WINDOWS
•
Most of the exam is based on homework (Including ex.3) but there are theoretical questions on the mmap
•
The exam has about 50% of its question based on the questions we asked in the homework (modified to fit quiz
format)
•
To study – learn your code and follow the presentations on file system, sync, virtual memory, GRID.
•
We put some questions regarding using Linux for software development. (i.e. regarding gdb, gcc, man, make,
strace etc.)
•
Don’t be surprised if you finish the exam in 15m….
emphasis on
• Using system calls
• Using UNIX
• Multitasking (processes, threads)
• Communications
• I/O multiplexing (select)
• File I/O (open/read/write/mmap etc.)
• File systems and kernel
Just some personal
summary
My successful objectives for
this course
•
Provide the class with Solid understanding of userland interface to the OS
• Achieved
•
Provide the class with means to add and review code in the Linux kernel
• Partially achieved (we have studied the file system interface)
•
Make the students a better programmers, more proficient debuggers and
more efficient and solving their own problems…
• Achieved for most students (making some students better programmers was easier
then I suspected)
•
Provide the students with confidence in their abilities to code new
interfaces studying from user manual and existing code
• Achieved
•
Survive this class without being burned to the stake ;-)
• Achieved (but then I am getting married so maybe being burned to the stake is
not that bad…)
My failed objectives
for this course
I’ve hoped to get to do a lot more
kernel stuff
Replace scheduler
Write char device driver
Write kernel module that
communicates with user land programs
Maybe next year I will have more luck
I’ve hoped to get more guest
lectures from the industry and
failed. (most companies no longer
need to make a name for
themselves)
I’ve hoped to inflict most students
with the Linux germ as most
efficient programming
environment
Things I’ve done good
• Adjust the course to the level of the class
• Teach most of the class to work on strict timeline
• Receive working exercises with no resubmissions
• Maintain interest
• Gain respect
Things that I haven’t
• Assist the students to much, possibly to a level were selflearning was damaged
• There were errors in some presentations which will be removed
next time I teach the course.
• Daemon code (I’ve removed error checking to fit things on one
slide and ended with if (pid = fork()<0))
• Kernel 2.4 build method on some slides. (used to work also on
2.6….not anymore)
• Changed interface to file system on 2.6 (compared to book)
• Hard link == +1 Inode. (This is true in some implementations
were a circular list of inodes is kept but not on Linux)
• Dragged into too many argument during lectures with the class
If you have enjoyed the course
and want to do some hacking
•
There are plenty of OS related projects out there you can volunteer to contribute code
• OSs : Linux, FreeBSD (BSD on intel with great performance), Darwin (OSX open source),
NetBSD (BSD for max portability), OpenBSD (BSD for max security)
• Servers : Apache web server and many others
• Tools : gcc, vi, emacs
• Games : QFG2/KQ1 VGA remake, Ur Quan masters, GNU Chess, FreeCiv etc.
• Libraries : ACE, nspr, ICE and others
• Documentation : There are many parts that would benefit from good documentation or manual
• And you can start your own
•
There are plenty others thing you can do (install Linux (website do constant surveys)), use open
standards were you work, contribute helper projects to the community
• In one of my companies we used DB which at the time didn’t have Linux GUI. Developers that
worked for me developed a Linux GUI that managed the DB. This work was donated to the Linux
community
• Similarly one of my employees has donated Object Pascal syntax enlightment to vi (to use vi inside
Borland delphi)
• Another one has donated Kernel netlink sockets (type of sockets that allow user and kernel space
to communicate frameworks to ACE
• None of the above damaged our company IP and in fact when it was picked up and updated and
tested by the community it helped us
Does it pay?
Salary market survey
•
CP&S (market survey October 2008)
•
Job info (market survey Dec 2008)
manager
job
student
26-29
‫מתכנת‬C++/.NET
55-70
‫מהנדס‬Web
16-21 18-23 23-28
15-10
15-18 16-20
18-22
‫מתכנת‬Java
‫מהנדס‬++C/C
16-21 21-25 22-27
25-30
‫)בדיקות‬QA(
‫מהנדס‬Dot Net
12-17 18-23 23-28
25-30
‫בקרת תצורה‬
‫מהנדס‬RT Embedded 18-22 20-24 24-29
29-33
Drivers/Kernel ‫מפתח‬
‫מהנדסאבטחתמידע‬
12-15 15-21 21-30
30-32
‫כתב טכני‬
‫מהנדס‬Linux Kernel
17-21 19-23 25-32
32-35
‫מהנדס‬Real Time
‫מהנדסמערכת‬
20-23 23-26 26-29
29-34
‫מהנדס‬DSP ‫ואלגוריתמים‬
‫מנהלפרויקטבתוכנה‬
15-20 20-25 25-32
27-35
0-2
‫מהנדס‬Java
3-5
5-10
3-5
5+
mng
13-16 18-20
20-23
24-27
25-28
55-70a
13-16 18-20
21-24
24-27
25-28
30-50
11-13 13-15
15-17
17-20
18-22
11-13
9-11
12-14
15-17
18-21
23-25
25-27
27-29
27-30
10-11
12-15
16-20
14-16 19-21
22-25
25-28
26-28
15-17 19-22
23-25
26-28
28-31
‫מהנדס תמיכה‬EAF
16-18
19-21
20-23
22-25
‫מהנדס מערכת‬
23-26
25-28
28-31
Pre/Post Sale ‫מהנדס‬
18-21
22-25
25-26
‫מנהלפרויקט‬
18-20
22-24
25-27
‫מנהלמוצר‬
20-22
23-25
26-28
60-75
grad
2-3
How to reach me
• [email protected]
• Students that have submitted all 3 exercises and want
a reference from me should email me.
• I’ll be glad to assist in most UNIX/Linux real world
scenarios you may encounter.
Good luck on the test
And remember… each time you use .Net a penguin dies.
