投影片 1 - National Cheng Kung University

Download Report

Transcript 投影片 1 - National Cheng Kung University

Bioinformatics Programming
EE, NCKU
Tien-Hao Chang (Darby Chang)
1
In the last slide

More Unix features worthy to
mention
– job control
– I/O redirection and piping
– text processing (vi, grep, sed, awk, …)

Programming vs. language
2
Programming
3
Before
Learning advanced data structures
and the associated algorithms
4
struct
A brick to construct advanced data structure in C
5
struct

struct is similar to array from the view that
both of them can aggregate a set of objects
into a single object (here is not that one in
object-oriented)
– array: aggregate objects with the same type
– struct: aggregate objects with different types


struct is the condensation of ‘structure’
Each entry is a struct declaration is usually
called a ‘field’ or ‘member’
6
struct
Declaration

A struct declaration consists of a list of fields, each of
which can have any type
– struct mydata {
//declare the structure of mydata
char name[8];
char id[10];
int math;
int eng;
};
– defines a type, referred to as struct mydata

To create a new variable of this type
– // define a variable ‘student’ of the type ‘mydata’
struct mydata student;
7
struct
The Memory Space
Memory
name
id
Student
math
eng
8
struct
Test Memory Space


#include<stdio.h>
#include<stdlib.h>
int main(void) {
struct data {
char name[10];
char sex[2];
int math;
};
struct data student;
printf("sizeof(student)=%d\n", sizeof(student));
return 0;
}
Result 16
9
struct
Access Fields

The dot (.) operator
– struct_variable.field_name

For example
– student.math = 90;
– student.eng = 20;
– printf("%s’s Math score is %d\n",
student.name, student.math);

A convenient shortcut to initializing
members of struct is shown below
– struct data student={"Mary Wang",74};
10
struct
Array of Structures

You may define an array of structures
– struct student { //declare the structure of student
char name[8];
char id[10];
int math;
int eng;
};
// define an array of 3 variable of the type ‘student’
struct student stu[3];
name [0] [1]
…
id

stu[0]
math
stu[1]
eng
stu[2]
[0]
[1]
.
.
.
…
[7]
[9]
11
struct
Pointer to Structure

Pointers can be used to refer to a struct by its address
– struct mydata {
char name[8];
char id[10];
int math;
int eng;
} student;
struct mydata * ptr;
ptr = &student;

// declare the structure of mydata
// define a mydata variable, student
// define a pointer of mydata
// point ptr to the variable, student
Access files from struct pointers
– the dereference (->) operator
– struct_pointer_variable->field_name
– student->math = 90
12
struct
Nested Structures

Since struct declaration constructs new types, it is trivial to use struct fields
just like normal types such as int, double, …
–
#include<stdio.h>
#include<stdlib.h>
int main(void) {
struct date { // declare date
int month;
int day;
};
struct student { // declare nested structure, student
char name[10];
int math;
struct date birthday;
} s1={"David Li", 80, {2,10}}; // define a student variable, s1
printf("student name:%s\n",s1.name);
printf("birthday:%d month, %d day\n", s1.birthday.month, s1.birthday.day);
printf("math grade:%d\n",s1.math);
return 0;
}
13
struct
Self-referential Structure



Fields are not allowed to be defined as the
same type as the declaration they belong
But fields can be defined as pointers to the
same type as the declaration they belong
Such a struct with pointer fields referencing to
the same strcut type, is called self-referential
structure
– struct PERSON {
name
age
son
char name[8];
int age;
struct PERSON * son; // self-referential pointer
};
14
Any Questions?
15
Why
Fields are not allowed to be defined as the
same type as the declaration they belong?
But fields can be defined as pointers to the
same type as the declaration they belong?
Hint: think from the perspective of memory
16
The Closeness
Between C and the realistic
representation is the reason of both a)
why C-based program is so fast and b)
why C is suitable for teaching
17
Languages Comparison


Since the 1950s, computer scientists have devised
thousands of programming languages. Many are obscure,
perhaps created for a Ph.D. thesis and never heard of since.
Compiling to machine code
– some languages transform programs directly into Machine
Code—the instructions that a CPU understands directly
– this transformation process is called compilation
– assembly, C, and C++

Interpreted languages
– other languages are either interpreted such as Basic, Perl, and
Javascript
– or a mixture of both being compiled to an intermediate
language, including Java and C#
18
Languages Comparison
Compile vs. Interpret

An interpreted language is processed at runtime. Every line is
read, analyzed, and executed. Having to reprocess a line every
time in a loop is what makes interpreted languages so slow.
– this overhead results in that interpreted code runs between 5–10 times
slower than compiled code
– their advantage is not needing to be recompiled after changes and that is
handy when you're learning to program.


Because compiled programs almost always run faster than
interpreted, languages such as C and C++ tend to be the most
popular for writing games.
Java and C# both compile to an interpreted language which is
very efficient. Because the Virtual Machine that interprets Java
and the .NET framework that runs C# are heavily optimized, it's
claimed that applications in those languages are as fast if not
faster as compiled C++.
19
Languages Comparison
Level of Abstraction





How close a particular language is to the hardware?
Machine Code is the lowest level followed by assembly.
C++ is higher than C because C++ offers greater
abstraction.
Java and C# are higher than C++ because they compile to
an intermediate language called bytecode.
When computers first became popular in the 1950s,
programs were written in machine code. Programmers had
to physically flip switches to enter values. This is such a
tedious and slow way of creating an application that higher
level computer languages had to be created.
20
http://www.evula.org/dragoon/pics/supercoder.jpg
Super coder!
21

Assembler: Fast to run, slow to write
– The readable version of Machine Code
• Mov A,$45
– Because it is tied to a particular CPU, assembly is not very portable.
– Languages like C have reduced the need for assembly except where
memory is limited or time critical code is needed. This is typically in the
kernel code or in a driver.

Basic: For beginners
– Basic is an acronym for Beginners All purpose Symbolic Instruction Code
and was created to teach programming in the 1960s.
– Microsoft have made the language their own with many different versions
including VBScript for websites and the very successful Visual Basic.
– It is an interpreted language with the only advantage of easy-to-learn. But
now it is more like a syntax alternative to C because most programmers
are lazy.

Pascal: Conscientious programming
– Pascal was devised as a teaching language a few years before C but had
limited usage.
– Until Borland's Turbo Pascal (for Dos) and Delphi (for Windows) appeared,
it is suitable for commercial development.
– However Borland was up against Microsoft and lost the battle.
22

C: System programming
– C was devised in the early 1970s by Dennis Ritchie. It can be thought of as
a general purpose tool—very useful and powerful but very easy to let bugs
through that can make systems insecure.
– C has been described as portable assembly.
– The syntax of many scripting languages is based on C.

C++: A classy language
– C++ (or C plus classes as it was originally known) came about ten years
after C and successfully introduced Object Oriented Programming to C, as
well as features like exceptions and templates.
– Learning all of C++ is a big task—it is by far the most complicated of the
programming languages here but once you have mastered it, you'll have no
difficulty with any other language.

C#: Microsoft's big bet
– C# was created by Delphi's architect Anders Hejlsberg after he moved to
Microsoft and Delphi developers will feel at home with features such as
Windows forms.
– C# syntax is very similar to Java, which is not surprising as Hejlsberg also
worked on J++ after he moved to Microsoft.
– Learn C# and you are well on the way to knowing Java. Both languages are
semi-compiled, so that instead of compiling to machine code, they compile
to bytecode and are then interpreted.
23

Perl: Websites and utilities
– Very popular in the Linux world, Perl was one of the first web languages
and remains very popular today.
– For doing ‘quick and dirty’ programming on the web it remains unrivalled
and drives many websites.
– It has though been somewhat eclipsed by PHP as a web scripting language.

PHP: Websites coding
– PHP was designed as a language for Web Servers and is very popular in
conjunction with Linux, Apache, MySql and PHP or LAMP for short.
– It is interpreted, but pre-compiled so code executes reasonably quickly.
– It can be run on desktop computers but is not as widely used for
developing desktop applications.
– Based on C syntax, it also includes Objects and Classes.

JavaScript : Programs in your browser
– Javascript is nothing like Java, instead its a scripting language based on C
syntax but with the addition of Objects and is used mainly in browsers.
– JavaScript is interpreted and a lot slower than compiled code but works
well within a browser.
– Invented by Netscape and in doldrums for years. Popular again because of
AJAX; Asynchronous Javascript and XML. This allows parts of web pages to
update from the server without redrawing the entire page.
24
Position 2010 Position 2009 Delta in Position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
4
3
5
7
6
9
10
8
11
32
15
13
22
17
16
19
27
Language
Ratings 2010
=
Java
17.509%
=
C
17.279%
↑
PHP
9.908%
↓
C++
9.610%
=
(Visual) Basic 6.574%
↑
C#
4.264%
↓
Python
4.230%
↑
Perl
3.821%
↑
Delphi
2.684%
↓↓
JavaScript
2.651%
=
Ruby
2.327%
↑↑↑↑↑↑↑↑↑↑ Objective-C
1.970%
↑↑↑↑↑↑↑↑↑↑
Go
0.921%
↑
SAS
0.769%
↓↓
PL/SQL
0.737%
↑↑↑↑↑↑
MATLAB
0.661%
=
ABAP
0.639%
↓↓
Pascal
0.603%
=
ActionScript
0.594%
↑↑↑↑↑↑↑
Fortran
0.563%
Delta 2009
Status
-2.29%
+1.42%
+0.42%
-0.75%
-1.71%
-0.06%
-0.95%
+0.40%
-0.03%
-0.96%
-0.27%
+1.79%
+0.92%
-0.03%
-0.31%
+0.20%
+0.00%
-0.13%
+0.11%
+0.24%
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
25
http://www.simplyhired.com/a/jobtrends/graph/q-Perl%2C+Ruby%2C+Python%2C+Php%2C+Javascript%2C+Flex%2C+Groovy/t-line
26
Languages Comparison
Summary
»
»
»
»
Lower Level
more readable
faster to develop
more coding sugar
avoid careless mistakes
»
»
»
»
easy to debug
faster program
general purpose
powerful to do evil
Higher Level

Other noteworthy programming languages
–

The popularity forms for many reasons
–

Java, Python, Ruby, Go, …
history (programmers are lazy), business, and functionality
Lasting wars
–
–
–
–
Java vs. .NET (C will, in some form, live forever)
Perl vs. PHP vs. Ruby (web programming)
Perl vs. Python (scripting)
There might be a dominant system language and a scripting language in the future,
but it probably converges to a coexistence world.
27
Any Questions?
28
Algorithm
29
Algorithm

Specification
– a finite set of instructions that accomplishes a particular
task
– criteria
•
•
•
•
•

input: zero or more quantities that are externally supplied
output: at least one quantity is produced
definiteness: clear and unambiguous
finiteness: terminate after a finite number of steps
effectiveness: instruction is basic enough to be carried out
Representation
– a natural language, like English or Chinese
– a graphic, like flowcharts
– a computer language, like C
30
Algorithm
Selection Sort

From those integers that are currently unsorted, find the
smallest and place it next in the sorted list
i
[0]
[1]
[2]
[3]
[4]
0
1
2
3
30
10
10
10
10
10
30
20
20
20
50
50
50
30
30
40
40
40
40
40
20
20
30
50
50

31
32
Algorithm
Binary Search


[0]
8
[1]
14
[2]
26
[3]
30
[4]
43
left
0
4
4
right
6
6
4
middle
3
5
4
[middle] :
30
<
50
>
43
==
target
43
43
43 (found)
0
0
2
2
6
2
2
1
3
1
2
-
30
14
26
18
18
18
(not found)
>
<
>
[5]
50
[6]
52
Searching a sorted list
while (there are more integers to check) {
middle = (left + right) / 2;
if (target < list[middle])
right = middle - 1;
else if (targeeet == list[middle])
return middle;
else left = middle + 1;
}
33

int binsearch(
int list[], int target,
int left, int right)
{
int middle;
while (left <= right) {
middle = (left + right) / 2;
switch (COMPARE(list[middle], target)) {
case -1: left = middle + 1;
break;
case 0: return middle;
case 1: right = middle – 1;
}
}
return -1;
}
» Program 1.6: Searching an ordered list
34
Algorithm
Recursive Algorithms

Beginning programmers view a function as something
that is invoked (called) by another function
– it executes its code and then returns control to the
calling function


This perspective ignores the fact that functions can
call themselves (direct recursion)
They may call other functions that invoke the calling
function again (indirect recursion)
– extremely powerful
– frequently allow us to express an otherwise complex
process in very clear term

We should express a recursive algorithm when the
problem itself is defined recursively
35

int binsearch(
int list[], int target,
int left, int right)
{
int middle;
while (left <= right) {
middle = (left + right) / 2;
switch (COMPARE(list[middle], target)) {
case -1: return
binsearch(list,target,middle+1,right);
case 0: return middle;
case 1 : return
binsearch(list,target,left,middle-1);
}
}
return -1;
}
» Program 1.7: Recursive implementation of binary search
36
Any Questions?
37
Data Abstraction
38
Data Abstraction

Data type
– A data type is a collection of objects and a set of operations that act on
those objects
– For example, the data type int consists of the objects {0, +1, -1, +2, -2, …,
INT_MAX, INT_MIN} and the operations +, -, *, /, and %

The data types of C
–
–
–
–

basic data types: char, int, float, and double
group data types: array and struct
pointer data type
user-defined types
Abstract data type
– An abstract data type (ADT) is a data type that is organized in
such a way that the specification of the objects and the
operations on the objects is separated from the representation
of the objects and the implementation of the operations.
– We know what is does, but not necessarily how it will do it.
39
40
The array as an ADT
41
To
Evaluate which algorithm is better
42
Algorithm
Performance Analysis

Criteria
– Is it correct?
– Is it readable?
–…

Performance analysis (machine independent)
– space complexity: storage requirement
– time complexity: computing time

Performance measurement (machine
dependent)
43
Performance Analysis
Space Complexity


S(P)=C+SP(I)
Fixed space requirements (C)
– independent of the inputs and outputs
– instruction, constants, simple variables

Variable space requirements (SP(I))
– depend on the instance characteristic I
– number, size, values of inputs and outputs
associated with I
– recursive stack space, including formal
parameters, local variables, and return address
44
Any Questions?
45
Analyze
Someone’s exercise
46
The recursion stack space needed is 6(n+1),
since the depth of recursion is n+1.
47
Performance Analysis
Time Complexity



T(P)=C+TP(I)
The time, T(P), taken by a program, P, is
the sum of its compile time C and its run (or
execution) time, TP(I)
TP(I)=caADD(I)+csSUB(I)+…
– Program step: A syntactically or semantically meaningful
program segment whose execution time is independent
of the instance characteristics.
– Introduce a new variable, count, into the
program
– Tabular method
48
Time Complexity
Iterative Summation


float sum(float list[], int n) {
float tmp = 0; ++count; // for assignment
int I;
for (i = 0; i < n; ++i) {
++count; // for the for loop
tmp += list[i];
++count; // for assignment
}
++count; // last execution of for
++count; // for return
return tempsum;
}
2n+3 steps
49
Time Complexity
Tabular Method
Statement
s/e Frequency
Total Steps
float sum(float list[], int n)
0
0
0
{
0
0
0
float tmp=0;
1
1
1
int i;
0
0
0
for (i=0; i<n; ++i)
1
n+1
n+1
1
n
n
1
1
1
0
0
0
tmp+=list[i];
return tmp;
}
Total
2n+3
50
Any Questions?
51
Asymptotic notation
52
Asymptotic Notation
Basic Concepts

There are two programs, one with
complexity c1n2+c2n and the other with
complexity c3n
– for sufficiently large of value of n, c3n will
be faster than c1n2+c2n
– for small values of n, either could be
faster
• c1=1, c2=2, c3=100  c1n2+c2n  c3n for n  98
• c1=1, c2=2, c3=1000  c1n2+c2n  c3n for n  998
53
Asymptotic Notation
O, , 

O [big “oh’’]
– f(n)=O(g(n)) iff there exist positive constants c and n0 such that f(n)  cg(n)
for all n, n  n0
– upper bound, worst case

 [big omega]
– f(n) = (g(n)) (read as “f of n is big omega of g of n”) iff there exist
positive constants c and n0 such that f(n)  cg(n) for all n, n  n0
– lower bound, best case

 [big theta]
– f(n) = (g(n)) iff there exist positive constants c1, c2, and n0 such that
c1g(n)  f(n)  c2g(n) for all n, n  n0
– upper and lower bound

Notice that relationship between analyses and notations. For
example, sometimes we would analyze the big theta of the
worst case of an algorithm.
54
Asymptotic Notation
Theorems




If f(n) = amnm+…+a1n+a0, then f(n) = O(nm)
If f(n) = amnm+…+a1n+a0 and am > 0, then f(n) = Ω(nm)
If f(n) = amnm+…+a1n+a0 and am > 0, then f(n) = Θ(nm)
Examples
– f(n) = 3n+2
3n+2  4n, for all n  2, ∴3n+2 = O(n)
3n+2  3n, for all n  1, ∴3n+2 = Ω(n)
3n  3n+2  4n, for all n  2, ∴3n+2 = Θ (n)
– f(n) = 10n2+4n+2
10n2+4n+2  11n2, for all n  5, ∴ 10n2+4n+2 = O(n2)
10n2+4n+2  n2, for all n  1, ∴10n2+4n+2 = Ω(n2)
n2  10n2+4n+2  11n2, for all n  5, ∴10n2+4n+2 = Θ(n2)
– 10n2+4n+2 = O(n2) // 10n2+4n+2  11n2 for n  5
– 6*2n+n2 = O(2n)
// 6*2n+n2  7*2n for n  4
55
Practical Complexity
To get a feel for how the various
functions grow with n, you are advised to
study the following three figures
56
57
58
59
Performance Measurement

Although performance analysis gives us a
powerful tool for assessing an algorithm’s
space and time complexity, at some point
we also must consider how the algorithm
executes on our machine
60
Any Questions?
61
Fibonacci
In
Out
n
the n-th Fibonacci number
Requirement
- a recursive version and an iterative version
- report
- time/space complexity
- practical time
- code size (less meaningful in C)
- using C would be the best
Bonus
- an algorithm of O(n) time and O(1) space complexity
- the best time complexity is O(1)
- use Makefile to automate the report
62
Fibonacci
A Reference


Kenji Mikawa and Ichiro Semba (2005). "An
O (1) time algorithm for generating
Fibonacci strings." Electronics and
Communications in Japan (Part II:
Electronics) 88(9): 67-72.
Provided by 陳偉銘
– “However, the majority in this course is male,
so…”
63
Deadline
2010/3/23 23:59
Zip your code, a step-by-step README of
how to execute the code and anything
worthy extra credit. Email to
[email protected].
64
http://www.dianadepasquale.com/ThinkingMonkey.jpg
Recall that
65
gcc
Multiple Source Files

If there are multiple source file
– $ gcc file1.c file2.c -o myprog

Or
– $ gcc -c file1.c
$ gcc -c file2.c
$ gcc file1.o file2.o -o myprog

The second one compiles source files separately. If only
file1.c was modified
– $ gcc -c file1.c
$ gcc file1.o file2.o -o myprog

Notice that file2.c does not need to be recompiled.
– significant time savings when there are numerous source files

This process, though somewhat complicated, is generally
handled automatically by a makefile.
66
http://faculty.northseattle.edu/tfurutani/che140/labbook_files/image005.jpg
But how do you know
which files should be re-compiled?
67
http://www.morphcoaching.com/mypics/Wheel_invention.jpg
Don’t invent the wheel
68
Makefile
69
Makefile



A Makefile is the configuration file used by a
standard program called “make”
make is like a project manager in a graphical
development environment, but includes many
extra features
Allows an entire project to be intelligently built
with one command on the command line
– make avoids re-building targets which are up-todate, thus, saving typing and compiling time a lot
– Makefiles largely similar to the Project and
Workspace files you might be used to from Visual
C++, JBuilder, Eclipse, etc
70
Makefile
Filenames

When you key in make, the make looks for the default
filenames in the current directory. For GNU make
these are
– GNUMakefile
– makefile
– Makefile


If there more than one of the above in the current
directory, the first one according to the above chosen
It is possible to name the Makefile anyway you want,
then for make to interpret it
– $ make -f <your-filename>
71
Makefile
Dependencies

Sometimes one file depends on another file
– e.g. a C file depends on its header files

If a header file changes, the C files that
#include that header file should be recompiled
to take into account the changes to the header
final executable file
(my_project)
main.o
interface.o

main.c
interface.h
interface.c
72
Makefile
A Simple Makefile “Rule”





hello: hello.c
gcc hello.c -o hello
Save this text as name “Makefile” in the
same directory as the source code
To build the project, type “make”
Result is an executable named hello
If hello file exists, and the file creation time
is newer than hello.c, what should “make”
do?
– nothing
73
Makefile
Generic Form of a Rule






target1 target2 ..: prerequisite1 prerequisite2 ...
<tab>command1
<tab>command2
Target is the output file
Prerequisites are the files that are needed by target (and
that can cause target to be recompiled if they change).
Command (or action) is the actual command to turn the
prerequisites into the target.
Characters after “#” are regarded as comments
Line oriented
– If the dependencies or commands are too long and you would
like to span them across several lines for clarity and
convenience, escape the end of line by “\” at the end.
– Make sure NOT to use tabs for such lines.
74
Makefile
Target



make performs corresponding actions of specific
targets
Target could be a filename that you want to generate
or a phony target, where the later is specially useful
for many action automation
Suggested phony targets from GNU
– all
– install
– clean

Default action (build/compile the executable)
install previously built executable
clean temporary files generated during the build
process, usually the .o or .obj files
The first target listed in the file will be used if no
target is formally specified
75
Makefile
Multiple Targets


MyProject: main.o interface.o
gcc main.o interface.o -o MyProject
main.o: main.c interface.h
gcc -c main.c -o main.o
interface.o: interface.c interface.h
gcc -c interface.c -o interface.o
final executable file
(my_project)
main.o
interface.o
Build MyProject
– $ make
main.c
interface.h
interface.c
– $ make MyProject
– make will figure out the appropriate order from
the prerequisites

Compile a non-master targets
– $ make main.o
76
Makefile
Command





A list of actions needed to generate the rule’s target
May be empty (just indicate dependencies)
Every action is usually a typical shell command you would
normally type to do the same thing
You can hide commands with a preceding ‘@’ symbol
Every command MUST be preceded with a tab!
– This is how make identifies actions as opposed to variable
assignments and targets. Do not indent actions with spaces!

Each action line invoke a sub shell to execute the
commands
– The sub shell ends after that line
– Some changes (such as cd to another directory or set shell
variables) won’t pass to the next line
– Use ‘;’ symbol to execute multiple commands in one line
77
Makefile
Variables


In a large Makefile, a good idea is to use
variables to make later changes easy
For example, rather than typing ‘gcc’ in the
command part of every rule, create a variable
at the top of the Makefile
– CC = gcc

Commands can then be
– ${CC} source_file.c -o executable_file



Case sensitive
Use only alphabets, numbers, and ‘_’
Both $(VAR) or ${VAR} are okay
78
Makefile
Other Features

Implicit rules
– GNU make thus provides some implicit rules for common practices such as
the object file of foo.c would be foo.o. For example, the following rules are
unnecessary
• foo.o: foo.c
gcc -c -o foo.o foo.c

Phony target
– The target is always out-of-date and thus the actions are always performed
– e.g. ‘.PHONY: clean’

Automatic variables (internal macros)
–
–
–
–
–

$@
$<
$?
$^
$*
the
the
the
the
the
filename of the target of the rule
name of the first prerequisite
names of all the prerequisites that are newer than the target
names of all the prerequisites
main filename of the target of the rule
Flow control
– ifeq, ifneq, ifdef, ifndef, for, if-then-else, …
79