Transcript Bioinformatics Programming - National Cheng Kung University
Bioinformatics Programming
EE, NCKU Tien-Hao Chang (Darby Chang)
1
Background
Preparation for this class
2
We talk about
Terminology
3
http://farm3.static.flickr.com/2109/2178878189_56a2d16d39.jpg
Synchronization
4
Linux
Difference to UNIX
5
UNIX
To put it very generically, Linux is an operating system kernel, and UNIX is a certification for operating systems.
The UNIX standard evolved from the original Unix system developed at Bell Labs (1969). After Unix System V, it ceased to be developed as a single operating system, and was instead developed by various competing companies, such as Solaris (from Sun Microsystems), AIX (from IBM), HP-UX (from Hewlett-Packard), and IRIX (from Silicon Graphics).
UNIX is a specification for baseline interoperability between these systems, even though there are many major architectural differences between them.
6
Linux was born out of the desire to create a free software alternative free alternative to UNIX, was introduced.
to the commercial UNIX environments. Its history dates back to 1991, or further back to 1983, when the GNU project, whose original aims where to provide a Linux has never been certified as being a version of UNIX, so it is described as being “Unix-like.”
7
UNIX
History
1960s 1970s 1970s/80s 1980s 1990s now multics project (MIT, GE, AT&T) AT&T Bell Labs UC Berkeley DOS imitated many Unix ideas Commercial Unix fragmentation GNU Project Linux Unix is widespread and available from many sources, both free and commercial
8
http://upload.wikimedia.org/wikipedia/commons/5/51/Unix_history.svg
9
UNIX
Flavors
Sun's Solaris, Hewlett-Packard's HP-UX, and IBM's AIX® are all flavors of UNIX that have their own unique elements and foundations.
Windows has two main lines. The older flavors are referred to as "Win9x" and consist of Windows 95, 98, 98SE and Me. The newer flavors are referred to as "NT class" and consist of Windows NT, 2000, XP, Vista, and 7. Microsoft no longer supports Windows NT, all the 9x versions.
The flavors of Linux are referred to as "distros").
distributions (or
10
Linux
Distributions
All the Linux distributions released around the same time frame will use the same kernel. They differ in the – add-on software – GUI – install process – price – documentation – technical support All the flavors of Windows come from Microsoft, the various distributions of Linux come from different companies/vendors such ass Linspire, Red Hat , SuSE , Ubuntu , Xandros, Knoppix, Slackware, Lycoris, and so on.
11
UNIX
Philosophy Multiuser / Multitasking
Flexibility / Freedom
Everything is a file
File system has places, processes have life
Designed by programmers programmers for
12
UNIX
Structure
Programs Kernel Hardware
13
UNIX
The File System
http://www.comsci.us/fs/notes/images/unixfs.gif
14
UNIX
Programs
Shell is the command line interpreter Shell is just another program A program or command – interacts with the kernel – may be any of: • built-in shell command • interpreted script • compiled object code file
15
Any Questions?
16
Vs. Windows
Which is better?
Of course, this is a open question.
17
Terminology
Operating System
18
Vs. Windows
To you, are Linux and Windows the same thing? Or, Linux is an platform for only specific usage?
19
Terminology
Terminal
20
http://www.linuxmail.info/images/windows-xp/putty-terminal-vncserver.png
What is inside the terminal?
21
http://linux.vbird.org/linux_server/0310telnetssh/Xserver_client.png
22
http://rohansplace.com/TSWeb/Remote_desktop_connection_icon.png
Yes, Remote Desktop, is a terminal
23
http://images.ptt.cc/connect.gif
Similar to anything you use to access BBS, conceptually
24
Getting Started
25
You’re welcome to
Interrupt me, anytime!
26
Getting Started
Logging In Login and password prompt to log in
– login is user’s unique name – password is changeable; known only to user, not to system staff
Unix is case sensitive
– issued login and password (usually in lower case)
27
Getting Started
Passwords
Do: – make sure nobody is looking over your shoulder when you are entering your password – change your password often – choose a password you can remember – use eight characters, more on some systems – use a mixture of character types – include punctuation and other symbols
28
Getting Started
Passwords
Don’t: – use a word (or words) in any language – use a proper name – use information in your wallet – use information commonly known about you – use control characters – write your password anywhere – EVER give your password to anybody Your password is your account security: – To change your password, use the passwd command – Change your initial password immediately
29
Getting Started
Unix Command Line Structure
A command is a program that tells the Unix system to do something. It has the form:
command options arguments
– “Whitespace” separates parts of the command line – An argument indicates on what the command is to perform its action – An option modifies the command, usually starts with “ ”
30
Getting Started
Getting Help
Not all Unix commands will follow the same standards Options and syntax for a command are listed in the “man page” for the command man: On-line manual – $ man command – $ man -k keyword
31
Getting Started
Directory Navigation pwd print working directory
cd change working directory (“go to” directory)
mkdir make a directory
rmdir remove directory ls list directory contents
32
Getting Started
Permissions
Each line (when using -l option of ls) includes the following: – type field (first character) – access permissions (characters 2–10): – first 3: user/owner – second 3: assigned unix group – last 3: others Permissions are designated: – r – w – x – read permission write permission execute permission no permission
33
Getting Started
File Maintenance Commands
chmod chgrp chown rm cp mv change the file or directory access permission change the group of the file change the owner of a file remove (delete) a file copy file move (or rename) file chmod [options] file – Using + and - with a single letter: • u user owning file • g those in assigned group • o others – $ chmod u+w file # gives the user (owner) write permission – $ chmod g+r file # gives the group read permission – $ chmod o-x file # removes execute permission for others
34
chmod [options] file – using numeric representations for permissions: • r • w • x = = = 4 2 1 – $ chmod 777 file • gives user, group, and others r, w, x permissions – $ chmod 750 file • gives the user read, write, execute • gives group members read, execute • gives others no permissions
35
Getting Started
Display Commands echo echo the text string to stdout
cat concatenate (list)
head display first -n lines of file tail display last -n lines of file
Useful in pipe
36
Any Questions?
37
Getting Started
System Resource Commands
df du ps kill report file system disk space usage estimate file space usage show status of processes (options vary from system to system — see the man pages) terminate a process whereis which hostname reports the name of the machine the user is logged into uname has additional options to print info about system hardware and software date report program locations report the command found print or set the system date and time
38
Getting Started
More Fun with Files
ln — link to another file – symbolic link (soft link) • $ ln -s source target • A symbolic link is used to create a new path to another file or directory. Useful when the target file has versions.
– hard link • $ ln source target • A hard link creates a new directory entry pointing to the same inode as the original file. The file will not be deleted until all the hard links to it are removed.
– Very different when you delete the original file.
39
sort — sort file contents uniq — remove duplicate lines file — file type tr — translate characters – $ tr ‘[a-z]’ ‘[A-Z]’ file find — find files – $ find . -name ay – $ find . -newer empty – $ find . -type d –print gzip — compression – often use .gz extension tar — archive files – use .tar extension – use .tgz extension when combining gzip wc — word count
40
Any Questions?
41
Shells
42
Shells
The shell sits between you and the operating system – acts as a command interpreter – reads input – translates commands into actions to be taken by the system To see what your current login shell is: – $ echo $SHELL
43
Shells
Basic Shells
Bourne Shell (sh) – good features for I/O control — often used for scripts – other shells based on Bourne may be suited for interactive users – default prompt is $ C Shell (csh) – uses C-like syntax for scripting – I/O more awkward than Bourne shell – job control – history – default prompt is % – uses ˜ symbol to indicate a home directory (user’s or others’)
44
Shells
Other Shells
Based on the Bourne Shell: – Korn (ksh) – Bourne-Again Shell ( bash ) • job control • history • uses ˜ symbol to indicate a home directory (user’s or others’) – Z Shell (zsh) Based on the C Shell: – T-C shell (tcsh)
45
Shells
Built-in Shell Commands The shells have a number of built-in commands:
– executed directly by the shell – don’t have to call another program to be run – different for the different shells – cd, echo, exit, for, if, pwd, …
46
Shells
Environment Variables
Environmental variables are used to provide information to the programs you use.
Global environment variables are set by your login shell and new programs and shells inherit the environment of their parent shell.
– GROUP your login group, e.g. staff – HOME path to your home directory, e.g. /home/frank – HOST – PATH the hostname of your system, e.g. nyssa paths to be searched for commands, e.g.
/usr/bin:/usr/ucb:/usr/local/bin – SHELL the login shell you’re using, e.g. /usr/bin/csh – USER Your username, e.g. frank
47
Any Questions?
48
http://en.wikipedia.org/wiki/File:Tux.svg
Now, we are more familiar with this penguin
49
http://blog.sherweb.com/wp-content/uploads/linux-vs-windows.jpg
50
Linux Vs. Windows
Interface • Kernel/GUI-Based • Target Users Support • Developers • Drivers/Games/Virus Popularity • Users • Habits
Business
• Pirate Copy • Open Source
51
Linux Vs. Windows
History
Linux was originally built by Linus Torvalds at the University of Helsinki in 1991. Linux is a Unix-like, to Macs.
kernel-based , fully memory protected, multitasking operating system. It runs on a wide range of hardware from PCs First version of Windows — Windows 3.1 released in 1992 by Microsoft. Windows is a GUI-based operating system. It has powerful networking capabilities, is multitasking, and extremely user friendly.
52
Linux Vs. Windows
Functionalities
Linux seems to be more reliable, flexible and generous.
Ironically, even Linux is open source, it falls short in the number of different applications available for it.
Windows seems to be less mature (at first) in most measures of evaluating a good OS.
However, it proves that the appearance is more important than everything. Crucial but real.
53
http://www.nudonation.com/archivos/bill-gates.jpg
Of course, this guy is probably the most successful sale ever
54
http://msnbcmedia2.msn.com/j/msnbc/Components/Photos/060615/060615_gatesFoundation_hmed_5p.hmedium.jpg
He helped many biomedical related researches
55
http://i5.tinypic.com/4yqudc7.jpg
As time goes by
56
http://4.bp.blogspot.com/_5irnbDcN0to/SwG_4mVCUlI/AAAAAAAAAfY/YRLLzWZE_po/S740/LinuxDistributions.jpg
Linux has many partners
57
Linux Vs. Windows
Things Changed
Linux has much improved UI – To me, the installation procedure of some distributions seems easier than Windows Windows keeps strengthening the ability of being a good OS, no matter what the reason is – For example, Microsoft improved IE to eliminate Netscape (it succeeded at IE3). Again, Microsoft wants to do it against Firefox now. Both IE7 and 8 failed. But who knows?
Although the functionality difference is decreasing, the popularity difference is increasing.
– Habit (this is even critical in search engine war) – Support (the hateful Windows update) – Is the flexibility of Linux an advantage?
58
http://static-p4.fotolia.com/jpg/00/11/93/51/400_F_11935145_JyxCv7ufq6qk48jfPraVyKoxrDs4obfy.jpg
Which distribution? (probably scared many beginners)
59
http://www.iconfinder.net/ajax/download/png/?id=33647&s=128 Ubuntu
60
http://art4linux.org/system/files/ubuntu-girls-mini.jpg
61
Ubuntu
Ubuntu is based on the Debian distribution (good package management). It is named after the Southern African ethical ideology Ubuntu (“humanity towards others”).
Ubuntu provides an up-to-date , stable operating system for the average user, with a strong focus on usability and ease of installation .
Web statistics from late 2009 suggest that Ubuntu's share is between 40 and 50%.
Ubuntu is sponsored by the UK-based company Canonical Ltd., owned by South African entrepreneur Mark Shuttleworth .
By keeping Ubuntu free and open source, Canonical is able to utilize the talents of community developers in Ubuntu's constituent components. Instead of selling Ubuntu for profit, Canonical creates revenue by selling technical support and from creating several services tied to Ubuntu.
62
http://upload.wikimedia.org/wikipedia/commons/7/78/Mark_Shuttleworth_by_Martin_Schmitt.jpg
63
Mark Shuttleworth
Born at 18 September 1973 Founded Thawte in 1995, which specialised in digital certificates and Internet security and then sold it in December 1999, earning about USD 575 million .
In September 2000, Shuttleworth formed HBD Venture Capital, a business incubator and venture capital provider.
In March 2004 he formed Canonical Ltd., for the promotion and commercial support of free software projects.
64
http://www.openfoundry.org/ There are speeches really valuable, do some homework
65
To Sum Up
Ubuntu is as friendly as any version of Windows. Everyone can start to use it without any introduction.
66
http://poietes.files.wordpress.com/2009/04/yoda-1.jpg
However, if you choose a dual system, you will never become a master
67
Shell Scripts
68
Shell Scripts
Similar to DOS batch files Quick and simple programming Text file interpreted by shell, effectively new command List of shell commands to be run sequentially Execute permissions, no special extension necessary Magic first line – #!
– Include full path to interpreter (shell) • #!/bin/sh
69
Shell Scripts
Interacting
Special variables for processing arguments – $# number of arguments on command line – $0 – $1 – $9 – $@ – $* – $?
– $$ name that script was called as command line arguments all arguments (separately quoted) all arguments numeric result code of previous command process ID of this running script Interacting With User – Talk to user (or ask questions) first, then get input from user, put it in variable • echo prompt read variable
70
Shell Scripts
Control Structure
if [ … ]; then …
fi
for variable in … ; do …
done
Check sh man page for details, also look at examples.
#!/bin/sh if [ $# -ge 2 ] then echo $2 elif [ $# -eq 1 ]; then echo $1 else echo No input fi
71
Any Questions?
72
Can you
Use shell script to change filenames from lower- to uppercase? Remember that the wild card symbol * can help you get all files.
73
#!/bin/sh for file in *; do echo "processing $file" mv $file `echo $file | tr '[a-z]' '[A-Z]‘` done How would you do in Windows?
BTW, why Perl ? It can be done in one line – $ ls | perl -nle 'my $o=$_; tr/a-z/A-Z/; \ rename $o, $_' How would you do with C?
74
Any Questions?
75
Code Size Calculator
In Out a file code size Requirement - input from command line - do not count space characters - do not count comments (C style) must complete in Unix - if you don’t have one, contact me ASAP - using C would be the best Bonus - write a shell script version
76
Deadline
2010/3/9 23:59 Zip your code, a step-by-step README how to execute the code and anything of worthy extra credit. Email to [email protected]
.
77
gcc
78
gcc
gcc is the GNU C Compiler, and g++ is the GNU C++ compiler, while cc and CC are the Sun C and C++ compilers also available on Sun workstations.
Notice that, C++ is different to C in a certain extent. A safe way is to regard they are two different languages with very similar basic structures.
79
gcc
Compiling a Simple Program
Consider the following example: let “hello.c” be a file that contains the following C code – #include “stdio.h” int main() { printf(“Hello\n”); } The standard way to compile this program is with the command – $ gcc hello.c -o hello This command compiles hello.c into an executable program named “hello”. It does nothing more than print the word “hello” on the screen.
– $ chmod 755 – $ ./ hello hello
80
Alternatively, the above program could be compiled using the following two commands – $ gcc -c hello.c
– $ gcc hello.o -o hello The end result is the same, but this two-step method first compiles hello.c into a machine code file named “hello.o” and then links hello.o
with some system libraries to produce the final program “hello”.
In fact the first method also does this two stage process of compiling and linking, but the stages are done transparently, and the intermediate file “hello.o” is deleted in the process.
81
gcc
Frequently Used Options
The examples below demonstrate how to use many of the more commonly used options. Some options can be combined, although it is generally not useful to use “debugging” and “optimization” options together.
Makes the resulted executable contain symbolic information for the gdb debugger – $ gcc -g myprog.c -o myprog Have the compiler generate many warnings about syntactically correct but questionable looking code. – $ gcc -Wall myprog.c -o myprog It is good practice to always use this option with gcc and g++ Generate optimized code. The -O is a capital o and not the number 0!
– $ gcc -O myprog.c -o myprog Compile a C program that uses math functions such as “sqrt” – $ gcc myprog.c -o myprog -lm
82
gcc
Multiple Source Files
If there are multiple source file – $ gcc file1.c file2.c -o myprog Or – $ gcc -c file1.c
$ gcc -c file2.c
$ gcc file1.o file2.o -o myprog The second one compiles source files separately. If only file1.c was modified – $ gcc -c file1.c
$ gcc file1.o file2.o -o myprog Notice that file2.c does not need to be recompiled.
– significant time savings when there are numerous source files This process, though somewhat complicated, is generally handled automatically by a makefile .
83