CS235102 Data Structures - National Cheng Kung University

Download Report

Transcript CS235102 Data Structures - National Cheng Kung University

Bioinformatics Programming
EE, NCKU
Tien-Hao Chang (Darby Chang)
1
Data Abstraction
2
Data Abstraction

Data type
– A data type is a collection of objects and a set of operations that act on
those objects
– For example, the data type int consists of the objects {0, +1, -1, +2, -2, …,
INT_MAX, INT_MIN} and the operations +, -, *, /, and %

The data types of C
–
–
–
–

basic data types: char, int, float, and double
group data types: array and struct
pointer data type
user-defined types
Abstract data type
– An abstract data type (ADT) is a data type that is organized in
such a way that the specification of the objects and the
operations on the objects is separated from the representation
of the objects and the implementation of the operations.
– We know what is does, but not necessarily how it will do it.
3
4
The array as an ADT
5
Any Questions?
6
Stack
7
The Stack ADT



A stack is an ordered list in which insertions
and deletions are made at one end called the
top
If we add the elements A, B, C, D, and E to
the stack, in that order, then E is the first
element we delete from the stack
A stack is also known as a Last-In-First-Out
(LIFO) list
8
9
Implementation with an array
10
11
http://www.beaconelevator.com/i/elevator_myth.jpg
Why we need such a data structure?
12
Stack
Evaluation of Expressions

The representation and evaluation of expressions is of great
interest to computer scientists
– (rear+1==front) || (rear==MAX_QUEUE_SIZE-1)
– x=a/b-c+d*e-a*c

If we examine these expressions, we notice that they contains:
– operators
– operands
– parentheses

(3.1)
(3.2)
==, +, -, ||, &&, !
a, b, c, e
()
Understanding the meaning of expressions
– assume a=4, b=c=2, d=e=3 in the statement (3.2)
• interpretation 1: ((4/2)-2)+(3*3)-(4*2) = 0+8+9 = 1
• interpretation 2: (4/(2-2+3))*(3-4)*2 = (4/3)*(-1)*2 = -2.66666…

The challenge is to efficiently generate the machine instructions
corresponding to a given expression with precedence and
associative rule
13
Evaluation of Expressions
Postfix Expressions

The standard wry of writing expressions is known as infix notation
– binary operator in-between its two operands

Infix notation is not the one used by compilers to evaluate
expressions
– Actually, Java virtual machine is a stack machine

Instead compilers typically use a parenthesis-free notation
referred to as postfix notation

14
Evaluation of Expressions
Evaluate Postfix Expressions

Evaluating postfix expressions is much
simpler than the evaluation of infix
expressions
– no parentheses
– no precedence



There are no parentheses to consider
To evaluate an expression we make a
single left-to-right scan of it
We can evaluate an expression easily
by using a stack
15
Evaluating 62/3-42*+
16
Evaluation of Expressions
Data Representation

We now consider the representation of
both the stack and the expression
17
get_token()
18
19
Any Questions?
20
Can
You write a program to evaluate
expressions?
If A
not,
what’s
missing?
further
question
21
Evaluation of Expressions
Infix to Postfix

We can describe am algorithm for producing a
postfix expression from an infix one as follows
– fully parenthesize expression
•a / b - c + d * e - a * c
• ((((a / b) - c) + (d * e)) - (a * c))
– all operators replace their corresponding right
parentheses
• ((((a / b) - c) + (d * e)) - (a * c))
/
*+
*-
– delete all parentheses

The order of operands is the same in infix and
postfix
22
icp
isp
13
20
13
0
0
12
12
13
13
13
13
12
19
13
0
23
Evaluation of Expressions
From Infix to Postfix

Assumptions
– operators
– operands
character


(, ), +, -, *, /, %
single digit integer or variable of one
Operands are taken out immediately
Operators are taken out of the stack as long as their instack precedence (isp) is higher than or equal to the
incoming precedence (icp) of the new operator
– if (isp >= icp)
pop

‘(’ has low isp, and high icp
– op
Isp
Icp
(
0
20
)
19
19
+
12
12
12
12
*
13
13
/
13
13
%
13
13
eos
0
0
24
25
Such two-phase strategy (a. infix to
postfix and then b. evaluate postfix) is
used in practice
26
Precedence hierarchy and associative for C
27
Any Questions?
About stack
28
Queue
29
The Queue ADT



A queue is an ordered list in which all insertion take
place one end, called the rear and all deletions take
place at the opposite end, called the front
If we insert the elements A, B, C, D, E, in that order,
then A is the first element we delete from the queue
A stack is also known as a First-In-First-Out (FIFO)
list
30
31
Implementation with an 1D array and two variables
32
There might be available space when IsFullQ is true
Answer
(movement is required)
33
Queue
Regard Array as Circular

We can obtain a more efficient
representation if we regard the array
queue[MAX_QUEUE_SIZE] as circular
– front: one position counterclockwise
from the first element
– rear: current end

Only one space left when full
34
35
addq() and deleteq() are slightly more complicated
36
http://devcentral.f5.com/weblogs/images/devcentral_f5_com/weblogs/Joe/WindowsLiveWriter/PowerShellABCsQisforQueues_919A/queue_2.jpg
Queue is much trivial in life
37
A Maze Problem

The most obvious choice is a 2D array
– 0s the open paths and 1s the barriers


Notice that not every position has eight
neighbors
To avoid checking for these border
conditions we can surround the maze by a
border of ones
– an mp maze requires an (m+2)(p+2) array
– from [1][1] to [m][p]
38
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1
1 1 0 0 0 1 1 0 1 1 1 0 0 1 1 1 1
1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1
1 1 1 0 1 1 1 1 0 1 1 0 1 1 0 0 1
1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1
1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 1 1
1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1
1 0 0 1 1 0 1 1 0 1 1 1 1 1 0 1 1
1 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0 1
1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 1
1 0 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
39
Possible moves from maze[row][col]
40
A Maze Problem
Implementation of Move


typedef struct {
short int vert;
short int horiz;
} offsets;
offsets move[8]; // array of moves for each direction
If we are at maze[row][col] and we wish to find the position of
the next move, maze[next_row][next_col]
– next_row = row + move[dir].vert;
next_col = col + move[dir].horiz;
41
A Maze Problem
Maze Traversal Algorithm


Maintain a second two-dimensional
array, mark, to record the maze
positions already checked
Use stack to keep path history
– typedef struct {
short int row;
short int col;
short int dir;
} element;
element stack[MAX_STACK_SIZE];
42
43
44
Any Questions?
45
Can
We use queue to do the maze problem?
If yes,Awhat’s
differences ?
furtherthe
question
46
A Maze Problem
Analysis of path()


The worst case of computing time of
path is O(mp), where m and p are the
number of rows and columns of the
maze respectively
The choice of add() and delete()
decides the search behavior
47
List
48
List
Ordered List

Consider the following alphabetized list of three letter
English words
– bat, cat, sat, vat

If we store this list in an array
– add the word mat to this list
• move sat and vat one position to the right before we insert
mat
– remove the word cat from the list
• move sat and vat one position to the left

Problems of a sequence representation (ordered list)
– arbitrary insertion and deletion from arrays can be very
time-consuming
– waste storage
49
List
Linked Representation




An elegant solution of ordered list
Items may be placed anywhere in
memory
Store the address, or location, of the
next element for accessing elements in
the correct order
Associated with each element is a node
which contains both a data component
and a pointer to the next item
50
List
Pointers in C

Two most important operators used with the pointer
type :
– & the address operator
– * the dereferencing (or indirection) operator

Example
– int i, *pi;
• i is an integer variable and pi is a pointer to an integer
– pi = &i;
• &i returns the address of i and is assigned as the value of pi
– to assign a value to i we can use
• i = 10;
• *pi = 10;
51
List
Dynamically Allocated Storage




When programming, you may not know how much
space you will need, nor do you wish to allocate some
vary large area that may never be required
C provides heap, for allocating storage at runtime
You may call a function, malloc, and request
the amount of memory you need
When you no longer need an area of memory,
you may free it by calling another function, free,
and return the area of memory to the system
Dynamically Allocated Storage
Example
List
Singly Linked Lists

Linked lists are drawn as an order sequence of nodes
with links represented as arrows
– the name of the pointer to the first node in the list is the
name of the list (the list of Figure 4.1 is called ptr)
– notice that we do not explicitly put in the values of
pointers, but simply draw allows to indicate that they
are there

54
List
Insertion





To insert the word mat between cat can sat, we must
Get a node that is currently unused; let its address be paddr
Set the data field of this node to mat
Set paddr’s link field to point to the address found in the link field
of the node containing cat
Set the link field of the node containing cat to point to paddr

55
List
Deletion



Delete mat from the list
We only need to find the element that immediately
precedes mat, which is cat, and set its link field to
point to mat’s link (Figure 4.3)
We have not moved any data, and although the link
field of mat still points to sat, mat is no longer in the
list

56
List
Implementation


We need the following capabilities to make linked
representations possible
Defining a node’s structure, that is, the fields it
contains
– self-referential structures

Create new nodes when we need them
– malloc()
– new in C++

Remove nodes that we no longer need
– free()
– delete in C++
57
List
Invert


For a list of length ≧1 nodes, the while loop
is executed length times and so the
computing time is linear or O(length)
Two extra pointers are required

58
List
More about Lists

Circularly linked lists
– the link field of the last node points to
the first node in the list

Maintain an available List
– the space of freed nodes can be reused
later

Doubly linked lists
59
Any Questions?
60
How
About using linked list to implement
stacks and queues instead of using array?
Which one is better? Give me some
advantages and disadvantages.
61
List
Stacks and Queues




When several stacks and queues coexisted, there was no
efficient way to represent them sequentially
The solution presented above to the n-stack, m-queue
problem is both computationally and conceptually simple
We no longer need to shift stacks or queues to make space
Computation can proceed as long as there is memory
available
62
Longest Common
Subsequence
In
Out
two strings
length of the longest common subsequence
Requirement
- dynamic programming
- time/space analyses
- using C would be the best
Bonus
- output a longest common subsequence
- output all longest common subsequences
63
Dynamic Programming


Like divide-and-conquer, perform iterative
calculations
P(n)
The most difference
is that divided
sub-problems
P(m1) P(m2) … P(mk)
are overlapped
(or say, dependent)
S1
S2
…
Sk
S
64
Dynamic Programming
Matrix Multiplication


Given a sequence of matrices, <A1, A2, …, An>, where
the size of Ai is pi-1pi, find the best order for
minimum scalar multiplications
For example
–
A1A2A3A4
pi: 13 5 89 3 34
– 5 possiblities
•
•
•
•
•

(A1(A2(A3A4)))
(A1((A2A3) A4))
((A1A2)(A3A4))
((A1(A2A3))A4)
(((A1A2) A3)A4)
costs
costs
costs
costs
costs
=
=
=
=
=
26418
4055
54201
2856
10582
n marices result in C(2n,n)/(n+1)=(4n/n3/2) orders
65
Matrix Multiplication
Observation of Sub-problems

Let T is a order for <A1, A2, …, An>, T1 is a order for
<A1, A2, …, Ak>, and T2 is a order for <Ak+1, A2, …,
An>
– if T is an optimal solution for <A1, A2, …, An> then, T1
and T2 are the optimal solutions for <A1, A2, …, Ak>and
<Ak+1, A2, …, An>, respectively


Let m[i,j] be the minimum number of scalar
multiplications needed to compute the product Ai…Aj,
for 1ijn
If the optimal solution splits the product
Ai…Aj=(Ai…Ak)(Ak+1…Aj), for some k, ik<j, then
m[i,j]=m[i,k]+m[k+1,j]+pi-1pkpj.
– we have m[i,j]=minik<j{m[i,k]+m[k+1,j]+pi-1pkpj}
66
Dynamic Programming
Elements

Optimal sub-structure (a problem exhibits optimal
sub-structure if an optimal solution to the problem
contains within it optimal solutions to sub-problems)
Overlapping sub-problems
Memorization (usually by a table, i.e., a 2D array)

Procedure


– characterize the structure of an optimal solution
– derive a recursive formula for computing the values of
optimal solutions
• the relation between the problem and its sub-problems
67
Dynamic Programming
Longest Common Subsequence


Given two sequences X=<x1, x2, … ,
xm> and Y=<y1, y2, … , yn>, find a
maximum-length common subsequence
of X and Y
For example
– X is 'ABCBDAB' and Y is 'BDCABA'
– common subsequences: 'AB', 'ABA', 'BCB',
'BCAB', 'BCBA' …
– longest common subsequences: A B C B D A B
'BCAB', 'BCBA', … (length = 4) B D C A B A
68
Longest Common Subsequence
The Recursive Formula


Let L[i,j] be the length of an LCS of the
prefixes Xi=<x1, x2, …, xi> and Yj=<y1,
y2, …, yj>, for 1im and 1jn
L[i, j] = L[i-1, j-1]+1 if xi=yj
= max(L[i,j-1], L[i-1, j]) if xiyj
B
D
C
A
B
A
A
0
0
0
1
1
1
B
1
1
1
1
2
2
C
1
1
2
2
2
2
B
1
1
2
2
3
3
D
1
2
2
2
3
3
A
1
2
2
3
3
4
B
1
2
2
3
4
4
A LCS: BCBA
69