Bioinformatics Programming - National Cheng Kung University

Download Report

Transcript Bioinformatics Programming - National Cheng Kung University

Bioinformatics Programming

EE, NCKU Tien-Hao Chang (Darby Chang)

1

Background

Preparation for this class

2

We talk about

Terminology

3

http://farm3.static.flickr.com/2109/2178878189_56a2d16d39.jpg

Synchronization

4

Linux

Difference to UNIX

5

  

UNIX

To put it very generically, Linux is an operating system kernel, and UNIX is a certification for operating systems.

The UNIX standard evolved from the original Unix system developed at Bell Labs (1969). After Unix System V, it ceased to be developed as a single operating system, and was instead developed by various competing companies, such as Solaris (from Sun Microsystems), AIX (from IBM), HP-UX (from Hewlett-Packard), and IRIX (from Silicon Graphics).

UNIX is a specification for baseline interoperability between these systems, even though there are many major architectural differences between them.

6

  Linux was born out of the desire to create a free software alternative free alternative to UNIX, was introduced.

to the commercial UNIX environments. Its history dates back to 1991, or further back to 1983, when the GNU project, whose original aims where to provide a Linux has never been certified as being a version of UNIX, so it is described as being “Unix-like.”

7

    UNIX

History

1960s 1970s 1970s/80s 1980s   1990s now multics project (MIT, GE, AT&T) AT&T Bell Labs UC Berkeley DOS imitated many Unix ideas Commercial Unix fragmentation GNU Project Linux Unix is widespread and available from many sources, both free and commercial

8

http://upload.wikimedia.org/wikipedia/commons/5/51/Unix_history.svg

9

   UNIX

Flavors

Sun's Solaris, Hewlett-Packard's HP-UX, and IBM's AIX® are all flavors of UNIX that have their own unique elements and foundations.

Windows has two main lines. The older flavors are referred to as "Win9x" and consist of Windows 95, 98, 98SE and Me. The newer flavors are referred to as "NT class" and consist of Windows NT, 2000, XP, Vista, and 7. Microsoft no longer supports Windows NT, all the 9x versions.

The flavors of Linux are referred to as "distros").

distributions (or

10

  Linux

Distributions

All the Linux distributions released around the same time frame will use the same kernel. They differ in the – add-on software – GUI – install process – price – documentation – technical support All the flavors of Windows come from Microsoft, the various distributions of Linux come from different companies/vendors such ass Linspire, Red Hat , SuSE , Ubuntu , Xandros, Knoppix, Slackware, Lycoris, and so on.

11

 UNIX

Philosophy Multiuser / Multitasking

Flexibility / Freedom

Everything is a file

File system has places, processes have life

Designed by programmers programmers for

12

UNIX

Structure

Programs Kernel Hardware

13

UNIX

The File System

http://www.comsci.us/fs/notes/images/unixfs.gif

14

  UNIX

Programs

Shell is the command line interpreter Shell is just another program  A program or command – interacts with the kernel – may be any of: • built-in shell command • interpreted script • compiled object code file

15

Any Questions?

16

Vs. Windows

Which is better?

Of course, this is a open question.

17

Terminology

Operating System

18

Vs. Windows

To you, are Linux and Windows the same thing? Or, Linux is an platform for only specific usage?

19

Terminology

Terminal

20

http://www.linuxmail.info/images/windows-xp/putty-terminal-vncserver.png

What is inside the terminal?

21

http://linux.vbird.org/linux_server/0310telnetssh/Xserver_client.png

22

http://rohansplace.com/TSWeb/Remote_desktop_connection_icon.png

Yes, Remote Desktop, is a terminal

23

http://images.ptt.cc/connect.gif

Similar to anything you use to access BBS, conceptually

24

Getting Started

25

You’re welcome to

Interrupt me, anytime!

26

  Getting Started

Logging In Login and password prompt to log in

– login is user’s unique name – password is changeable; known only to user, not to system staff

Unix is case sensitive

– issued login and password (usually in lower case)

27

 Getting Started

Passwords

Do: – make sure nobody is looking over your shoulder when you are entering your password – change your password often – choose a password you can remember – use eight characters, more on some systems – use a mixture of character types – include punctuation and other symbols

28

 Getting Started

Passwords

Don’t: – use a word (or words) in any language – use a proper name – use information in your wallet – use information commonly known about you – use control characters – write your password anywhere – EVER give your password to anybody  Your password is your account security: – To change your password, use the passwd command – Change your initial password immediately

29

 Getting Started

Unix Command Line Structure

A command is a program that tells the Unix system to do something. It has the form:

command options arguments

– “Whitespace” separates parts of the command line – An argument indicates on what the command is to perform its action – An option modifies the command, usually starts with “ ”

30

  Getting Started

Getting Help

Not all Unix commands will follow the same standards Options and syntax for a command are listed in the “man page” for the command  man: On-line manual – $ man command – $ man -k keyword

31

 Getting Started

Directory Navigation pwd print working directory

cd change working directory (“go to” directory)

mkdir make a directory

 

rmdir remove directory ls list directory contents

32

  Getting Started

Permissions

Each line (when using -l option of ls) includes the following: – type field (first character) – access permissions (characters 2–10): – first 3: user/owner – second 3: assigned unix group – last 3: others Permissions are designated: – r – w – x – read permission write permission execute permission no permission

33

      Getting Started

File Maintenance Commands

chmod chgrp chown rm cp mv change the file or directory access permission change the group of the file change the owner of a file remove (delete) a file copy file move (or rename) file  chmod [options] file – Using + and - with a single letter: • u user owning file • g those in assigned group • o others – $ chmod u+w file # gives the user (owner) write permission – $ chmod g+r file # gives the group read permission – $ chmod o-x file # removes execute permission for others

34

chmod [options] file – using numeric representations for permissions: • r • w • x = = = 4 2 1 – $ chmod 777 file • gives user, group, and others r, w, x permissions – $ chmod 750 file • gives the user read, write, execute • gives group members read, execute • gives others no permissions

35

 Getting Started

Display Commands echo echo the text string to stdout

cat concatenate (list)

 

head display first -n lines of file tail display last -n lines of file

Useful in pipe

36

Any Questions?

37

         Getting Started

System Resource Commands

df du ps kill report file system disk space usage estimate file space usage show status of processes (options vary from system to system — see the man pages) terminate a process whereis which hostname reports the name of the machine the user is logged into uname has additional options to print info about system hardware and software date report program locations report the command found print or set the system date and time

38

 Getting Started

More Fun with Files

ln — link to another file – symbolic link (soft link) • $ ln -s source target • A symbolic link is used to create a new path to another file or directory. Useful when the target file has versions.

– hard link • $ ln source target • A hard link creates a new directory entry pointing to the same inode as the original file. The file will not be deleted until all the hard links to it are removed.

– Very different when you delete the original file.

39

        sort — sort file contents uniq — remove duplicate lines file — file type tr — translate characters – $ tr ‘[a-z]’ ‘[A-Z]’ file find — find files – $ find . -name ay – $ find . -newer empty – $ find . -type d –print gzip — compression – often use .gz extension tar — archive files – use .tar extension – use .tgz extension when combining gzip wc — word count

40

Any Questions?

41

Shells

42

 

Shells

The shell sits between you and the operating system – acts as a command interpreter – reads input – translates commands into actions to be taken by the system To see what your current login shell is: – $ echo $SHELL

43

  Shells

Basic Shells

Bourne Shell (sh) – good features for I/O control — often used for scripts – other shells based on Bourne may be suited for interactive users – default prompt is $ C Shell (csh) – uses C-like syntax for scripting – I/O more awkward than Bourne shell – job control – history – default prompt is % – uses ˜ symbol to indicate a home directory (user’s or others’)

44

  Shells

Other Shells

Based on the Bourne Shell: – Korn (ksh) – Bourne-Again Shell ( bash ) • job control • history • uses ˜ symbol to indicate a home directory (user’s or others’) – Z Shell (zsh) Based on the C Shell: – T-C shell (tcsh)

45

 Shells

Built-in Shell Commands The shells have a number of built-in commands:

– executed directly by the shell – don’t have to call another program to be run – different for the different shells – cd, echo, exit, for, if, pwd, …

46

  Shells

Environment Variables

Environmental variables are used to provide information to the programs you use.

Global environment variables are set by your login shell and new programs and shells inherit the environment of their parent shell.

– GROUP your login group, e.g. staff – HOME path to your home directory, e.g. /home/frank – HOST – PATH the hostname of your system, e.g. nyssa paths to be searched for commands, e.g.

/usr/bin:/usr/ucb:/usr/local/bin – SHELL the login shell you’re using, e.g. /usr/bin/csh – USER Your username, e.g. frank

47

Any Questions?

48

http://en.wikipedia.org/wiki/File:Tux.svg

Now, we are more familiar with this penguin

49

http://blog.sherweb.com/wp-content/uploads/linux-vs-windows.jpg

50

Linux Vs. Windows

Interface • Kernel/GUI-Based • Target Users Support • Developers • Drivers/Games/Virus Popularity • Users • Habits

Business

• Pirate Copy • Open Source

51

  Linux Vs. Windows

History

Linux was originally built by Linus Torvalds at the University of Helsinki in 1991. Linux is a Unix-like, to Macs.

kernel-based , fully memory protected, multitasking operating system. It runs on a wide range of hardware from PCs First version of Windows — Windows 3.1 released in 1992 by Microsoft. Windows is a GUI-based operating system. It has powerful networking capabilities, is multitasking, and extremely user friendly.

52

    Linux Vs. Windows

Functionalities

Linux seems to be more reliable, flexible and generous.

Ironically, even Linux is open source, it falls short in the number of different applications available for it.

Windows seems to be less mature (at first) in most measures of evaluating a good OS.

However, it proves that the appearance is more important than everything. Crucial but real.

53

http://www.nudonation.com/archivos/bill-gates.jpg

Of course, this guy is probably the most successful sale ever

54

http://msnbcmedia2.msn.com/j/msnbc/Components/Photos/060615/060615_gatesFoundation_hmed_5p.hmedium.jpg

He helped many biomedical related researches

55

http://i5.tinypic.com/4yqudc7.jpg

As time goes by

56

http://4.bp.blogspot.com/_5irnbDcN0to/SwG_4mVCUlI/AAAAAAAAAfY/YRLLzWZE_po/S740/LinuxDistributions.jpg

Linux has many partners

57

  Linux Vs. Windows

Things Changed

Linux has much improved UI – To me, the installation procedure of some distributions seems easier than Windows Windows keeps strengthening the ability of being a good OS, no matter what the reason is – For example, Microsoft improved IE to eliminate Netscape (it succeeded at IE3). Again, Microsoft wants to do it against Firefox now. Both IE7 and 8 failed. But who knows?

 Although the functionality difference is decreasing, the popularity difference is increasing.

– Habit (this is even critical in search engine war) – Support (the hateful Windows update) – Is the flexibility of Linux an advantage?

58

http://static-p4.fotolia.com/jpg/00/11/93/51/400_F_11935145_JyxCv7ufq6qk48jfPraVyKoxrDs4obfy.jpg

Which distribution? (probably scared many beginners)

59

http://www.iconfinder.net/ajax/download/png/?id=33647&s=128 Ubuntu

60

http://art4linux.org/system/files/ubuntu-girls-mini.jpg

61

    

Ubuntu

Ubuntu is based on the Debian distribution (good package management). It is named after the Southern African ethical ideology Ubuntu (“humanity towards others”).

Ubuntu provides an up-to-date , stable operating system for the average user, with a strong focus on usability and ease of installation .

Web statistics from late 2009 suggest that Ubuntu's share is between 40 and 50%.

Ubuntu is sponsored by the UK-based company Canonical Ltd., owned by South African entrepreneur Mark Shuttleworth .

By keeping Ubuntu free and open source, Canonical is able to utilize the talents of community developers in Ubuntu's constituent components. Instead of selling Ubuntu for profit, Canonical creates revenue by selling technical support and from creating several services tied to Ubuntu.

62

http://upload.wikimedia.org/wikipedia/commons/7/78/Mark_Shuttleworth_by_Martin_Schmitt.jpg

63

   

Mark Shuttleworth

Born at 18 September 1973 Founded Thawte in 1995, which specialised in digital certificates and Internet security and then sold it in December 1999, earning about USD 575 million .

In September 2000, Shuttleworth formed HBD Venture Capital, a business incubator and venture capital provider.

In March 2004 he formed Canonical Ltd., for the promotion and commercial support of free software projects.

64

http://www.openfoundry.org/ There are speeches really valuable, do some homework

65

To Sum Up

Ubuntu is as friendly as any version of Windows. Everyone can start to use it without any introduction.

66

http://poietes.files.wordpress.com/2009/04/yoda-1.jpg

However, if you choose a dual system, you will never become a master

67

Shell Scripts

68

    

Shell Scripts

Similar to DOS batch files Quick and simple programming Text file interpreted by shell, effectively new command List of shell commands to be run sequentially Execute permissions, no special extension necessary  Magic first line – #!

– Include full path to interpreter (shell) • #!/bin/sh

69

  Shell Scripts

Interacting

Special variables for processing arguments – $# number of arguments on command line – $0 – $1 – $9 – $@ – $* – $?

– $$ name that script was called as command line arguments all arguments (separately quoted) all arguments numeric result code of previous command process ID of this running script Interacting With User – Talk to user (or ask questions) first, then get input from user, put it in variable • echo prompt read variable

70

   Shell Scripts

Control Structure

if [ ]; then

for variable in ; do

done

Check sh man page for details, also look at examples.

 #!/bin/sh if [ $# -ge 2 ] then echo $2 elif [ $# -eq 1 ]; then echo $1 else echo No input fi

71

Any Questions?

72

Can you

Use shell script to change filenames from lower- to uppercase? Remember that the wild card symbol * can help you get all files.

73

  #!/bin/sh for file in *; do echo "processing $file" mv $file `echo $file | tr '[a-z]' '[A-Z]‘` done How would you do in Windows?

  BTW, why Perl ? It can be done in one line – $ ls | perl -nle 'my $o=$_; tr/a-z/A-Z/; \ rename $o, $_' How would you do with C?

74

Any Questions?

75

Code Size Calculator

In Out a file code size Requirement - input from command line - do not count space characters - do not count comments (C style) must complete in Unix - if you don’t have one, contact me ASAP - using C would be the best Bonus - write a shell script version

76

Deadline

2010/3/9 23:59 Zip your code, a step-by-step README how to execute the code and anything of worthy extra credit. Email to [email protected]

.

77

gcc

78

 

gcc

gcc is the GNU C Compiler, and g++ is the GNU C++ compiler, while cc and CC are the Sun C and C++ compilers also available on Sun workstations.

Notice that, C++ is different to C in a certain extent. A safe way is to regard they are two different languages with very similar basic structures.

79

   gcc

Compiling a Simple Program

Consider the following example: let “hello.c” be a file that contains the following C code – #include “stdio.h” int main() { printf(“Hello\n”); } The standard way to compile this program is with the command – $ gcc hello.c -o hello This command compiles hello.c into an executable program named “hello”. It does nothing more than print the word “hello” on the screen.

– $ chmod 755 – $ ./ hello hello

80

   Alternatively, the above program could be compiled using the following two commands – $ gcc -c hello.c

– $ gcc hello.o -o hello The end result is the same, but this two-step method first compiles hello.c into a machine code file named “hello.o” and then links hello.o

with some system libraries to produce the final program “hello”.

In fact the first method also does this two stage process of compiling and linking, but the stages are done transparently, and the intermediate file “hello.o” is deleted in the process.

81

     gcc

Frequently Used Options

The examples below demonstrate how to use many of the more commonly used options. Some options can be combined, although it is generally not useful to use “debugging” and “optimization” options together.

Makes the resulted executable contain symbolic information for the gdb debugger – $ gcc -g myprog.c -o myprog Have the compiler generate many warnings about syntactically correct but questionable looking code. – $ gcc -Wall myprog.c -o myprog It is good practice to always use this option with gcc and g++ Generate optimized code. The -O is a capital o and not the number 0!

– $ gcc -O myprog.c -o myprog Compile a C program that uses math functions such as “sqrt” – $ gcc myprog.c -o myprog -lm

82

     gcc

Multiple Source Files

If there are multiple source file – $ gcc file1.c file2.c -o myprog Or – $ gcc -c file1.c

$ gcc -c file2.c

$ gcc file1.o file2.o -o myprog The second one compiles source files separately. If only file1.c was modified – $ gcc -c file1.c

$ gcc file1.o file2.o -o myprog Notice that file2.c does not need to be recompiled.

– significant time savings when there are numerous source files This process, though somewhat complicated, is generally handled automatically by a makefile .

83