Introduction

Download Report

Transcript Introduction

Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service Clouds

Future Generation Computer Systems(FGCS.J) journal homepage: www.elsevier.com/locate/fgcs Saeid Abrishami a, ∗ , Mahmoud Naghibzadeha, Dick H.J. Epemab

Tai, Yu-Chang

4/29/2013

*

Outline

* * * *

Introduction Scheduling system model IaaS cloud partial critical paths algorithms An illustrative example

*

Time complexity

*

Performance evaluation

*

Conclusions

*

Introduction

* Clouds are different from utility Grids - on-demand resource provisioning - homogeneous networks - the pay-as-you-go pricing model * consider the benefits of using Cloud computing for executing scientific workflows -there exist several commercial Clouds, such as Amazon

*

Introduction

Infrastructure as a Service (IaaS) Clouds, has some potential benefits for executing scientific workflows 1. users can dynamically obtain and release resources on demand, and charged on a pay-as-you-go basis 2.resource provisioning 3. illusion of unlimited resources important parameter : economic cost -faster resources are more expensive than slower ones -time-cost tradeoff in selecting appropriate services -belongs to the multi-criteria optimization problems minimize the execution cost of the workflow, while completing the workflow before the user specified deadline

IaaS Cloud Partial Critical Paths (IC-PCP) IaaS Cloud Partial Critical Paths with Deadline Distribution (IC-PCPD2)

*

Scheduling system model

* An application is modeled by a directed acyclic graph G(T , E) * T is a set of n tasks {t 1 , t 2 , . . . , t n } * E is a set of dependencies e i,j =(t i ,t j ) * two dummy tasks t entry and t exit to the beginning and the end of the workflow (zero execution time and they are connected with zero-weight dependencies to the actual entry and exit tasks)

*

Scheduling system model

* services S = {s 1 ,s 2 ,…,s m } with different QoS parameters such as CPU type and memory size, and different prices * The pricing model is based on a pay-as-you-go basis similar to the current commercial Clouds, i.e., the users are charged based on the number of time intervals that they have used the resource, even if they have not completely used the last time interval c1= 5 c2= 2 c3= 1

* ET(t i , s j ) : execution time of task t i on computation service s j * average bandwidth between the computation services is roughly equal * * TT( e i,j ) : data transfer time of a dependency e i,j MET(t i ) : Minimum Execution Time of a task t i -execution time of task t i on a service s j ∈ S which has the minimum ET(t i , s j ) between all available services p p c t i t i c

*

SS (t i ) = s j,k

*

s j,k

: Selected Service for each scheduled task ti : kth instance of service sj. *

AST (t i

) : Actual Start Time of ti * assigned node :has already been assigned to (scheduled on) a service * Critical Parent : of a node ti is the unassigned parent of t i that has the latest data arrival time at ti, that is, it is the parent tp of ti for which EFT(t p ) + TT(e p,i ), is maximal * PCP: The Partial Critical Path of a node t i is: - empty if t i does not have any unassigned parents -consists of the Critical Parent tp of t i and the Partial Critical Path of t p if has any unassigned parents

IC-PCP

*

Algorithm1

*

example

1 0 2 3 10 20 30 0~ 2 __ 19

2 5 8

0~ 5 __ 16

5 12 16

0~ 3 __ 16

3 5 9

3~ 7 __ 24

4 6 10

7~ 10 __ 23

3 8 11

7~ 11 __ 22

4 8 11

8~ 13 __ 30

5 8 11

14~ 17 __ 30

3 6 8

14~ 19 __ 30

5 8 14

D=30

*

example

3 1 0 2

S 2,1

10 28 20 2 30 0~ 2 __ 19

2 5 8

Path{t2,t6,t9} 3~ 7 __ 24

4 6 10

0~ 12 __ 14 0~ 5 __ 16

5 12 16

14~ 17 __ 23 7~ 10 __ 23

3 8 11

0~ 3 __ 12 0~ 3 __ 16

3 5 9

12~ 20 __ 23 7~ 11 __ 22

4 8 11

8~ 13 __ 30

5 8 11

21~ 24 __ 23 14~ 17 __ 30

3 6 8

20~ 28 __ 30 14~ 19 __ 30

5 8 14

D=30

3 10 1 2 0

S 2,1 S 3,1

9 1 28 20 2 30 0~ 2 __ 19

2 5 8

0~ 12 __ 14

5 12 16

0~ 9 __ 12 0~ 3 __ 12

3 5 9

Path{t3} 3~ 7 __ 24

4 6 10

14~ 17 __ 23

3 8 11

12~ 20 __ 22

4 8 11

8~ 13 __ 30

5 8 11

21~ 24 __ 30

3 6 8

20~ 28 __ 30

5 8 14

D=30

20 1 2 0

S 2,1

10 28

S 2,2

14 2 30 6 3

S 3,1

9 1 0~ 2 __ 18 0~ 2 __ 19

2 5 8

0~ 12 __ 14

5 12 16

Path{t5,t8} 3~ 7 __ 23 3~ 7 __ 24

4 6 10

14~ 22 __ 24 14~ 17 __ 23

3 8 11

0~ 9 __ 12

3 5 9

12~ 20 __ 22

4 8 11

8~ 13 __ 30

5 8 11

22~ 28 __ 30 21~ 24 __ 30

3 6 8

20~ 28 __ 30

5 8 14

D=30

30 20 1 2 0

S 2,1

10 28

S 2,2

14 2 6 3

S 3,1

9 1

S 3,2

18 2 0~ 8 __ 13 0~ 2 __ 18

2 5 8

0~ 12 __ 14

5 12 16

0~ 9 __ 12

3 5 9

Path{t1,t4} 8~ 18 __ 23 3~ 7 __ 23

4 6 10

14~ 22 __ 24

3 8 11

12~ 20 __ 22

4 8 11

19~ 24 __ 30 8~ 13 __ 30

5 8 11

22~ 28 __ 30

3 6 8

20~ 28 __ 30

5 8 14

D=30

20 5 2 1 2 0

S 2,1

10 28

S 2,2

14 2 30 6 1 3

S 3,1

9 1

S 3,2

18 2

S 3,3

11 1 COST=2* 5 +1* 4 =14 0~ 8 __ 13

2 5 8

0~ 12 __ 14

5 12 16

0~ 9 __ 12

3 5 9

Path{t7} 8~ 18 __ 23

4 6 10

14~ 22 __ 24

3 8 11

12~ 20 __ 22

4 8 11

18~ 29 __ 30 19~ 24 __ 30

5 8 11

22~ 28 __ 30

3 6 8

20~ 28 __ 30

5 8 14

D=30

* Applicable * applicable instance for a path if it satisfies two conditions: - The path can be scheduled on the instance such that each task of the path is finished before its latest finish time - The new schedule uses (a part of) the extra time of the instance,which is the remaining time of the last time interval of thatinstance.

P C P C Cost=zero

Call PLANNING(G(T,E))

IC-PCPD2

*

Algorithm2

Assign subdeadline on PCP node (assigned node)

t entety t 1 t 2 t 3 t 4

sb=0 0~ 5 __ 6 0~ 2 __ 6

2 5 8

0~ 5 __ 7

5 12 16

0~ 3 __ 16

3 5 9 S 1,1

0

S 2,1 t 1 S 3,1

10 20 30 6~ 10 __ 24 3~ 7 __ 24

4 6 10

7~ 10 __ 13

3 8 11

7~ 11 __ 17

4 8 11

11~ 16 __ 30 8~ 13 __ 30

5 8 11

14~ 17 __ 30

3 6 8

14~ 19 __ 30

5 8 14

D=30

t entety t 1 t 2 t 3 t 4 t 5

sb=0 0~ 5 __ 6

2 5 8

0~ 5 __ 7

5 12 16

0~ 3 __ 16

3 5 9 S 1,1

0

t 2 S 2,1 t 1 S 3,1

10 20 30 6~ 10 __ 24

4 6 10

7~ 10 __ 13

3 8 11

7~ 11 __ 17

4 8 11

11~ 16 __ 30

5 8 11

14~ 17 __ 30

3 6 8

14~ 19 __ 30

5 8 14

D=30

t entety t 1 t 2 t 3 t 4 t 5 t 6

sb=0 0~ 5 __ 6

2 5 8

0~ 5 __ 7

5 12 16

0~ 9 __ 16 0~ 3 __ 16

3 5 9 S 1,1

0

t 2 S 2,1 t 1 S 3,1 t 3

10 20 30 6~ 10 __ 24

4 6 10

7~ 10 __ 13

3 8 11

11~ 15 __ 17 7~ 11 __ 17

4 8 11

11~ 16 __ 30

5 8 11

14~ 17 __ 30

3 6 8

18~ 23 __ 30 14~ 19 __ 30

5 8 14

D=30

t entety t 1 t 2 t 3 t 4 t 5

sb=0

t 6 t

0~ 5 __ 6

2 5 8

0~ 5 __ 7

5 12 16 7

0~ 9 __ 16

3 5 9 S 1,1

0

t 2 S 2,1 t 1

10

S 3,1 t 3 S 3,2 t 4

6~ 16 __ 24 6~ 10 __ 24

4 6 10

20 7~ 10 __ 13

3 8 11

30 17~ 22 __ 30 11~ 16 __ 30

5 8 11

17~ 20 __ 30 14~ 17 __ 30

3 6 8

11~ 15 __ 17

4 8 11

18~ 23 __ 30

5 8 14

D=30

30

S 1,1

0

t 2 t 5

10

S 2,1 t 1 t entety t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 S 3,1 t 3 S 3,2 t 4

20 0~ 5 __ 6

2 5 8

6~ 16 __ 24

4 6 10

sb=0 0~ 5 __ 7

5 12 16

7~ 10 __ 13

3 8 11

17~ 22 __ 30

5 8 11

17~ 20 __ 30

3 6 8

0~ 9 __ 16

3 5 9

11~ 15 __ 17

4 8 11

18~ 23 __ 30

5 8 14

D=30

30

S 1,1

0

t 2

10

t 5 S 1,2 t 6 S 2,1 t 1 t entety t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 S 3,1 t 3 S 3,2 t 4

20 0~ 5 __ 6

2 5 8

6~ 16 __ 24

4 6 10

sb=0 0~ 5 __ 7

5 12 16

7~ 10 __ 13

3 8 11

17~ 22 __ 30

5 8 11

17~ 20 __ 30

3 6 8

0~ 9 __ 16

3 5 9

11~ 15 __ 17

4 8 11

18~ 23 __ 30

5 8 14

D=30

30

S 1,1

0

t 2

10

t 5 S 1,2 t 6 S 2,1 t 1 t entety t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 S 3,1 t 3 S 3,2 t 7 t 4

20 0~ 5 __ 6

2 5 8

6~ 16 __ 24

4 6 10

sb=0 0~ 5 __ 7

5 12 16

7~ 10 __ 13

3 8 11

16~ 28 __ 30 17~ 29 __ 30 17~ 22 __ 30

5 8 11

17~ 20 __ 30

3 6 8

0~ 9 __ 16

3 5 9

11~ 15 __ 17

4 8 11

18~ 23 __ 30

5 8 14

D=30

0~ 9 __ 16

3 5 9

7~ 10 __ 13

3 8 11

11~ 15 __ 17

4 8 11

18~ 23 __ 30

5 8 14

D=30

30

t entety t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9

5

S 1,1

0

t 2

10

t 5 S 1,2 t 6

2

S 2,1 t 1

1

S 3,1 t 3 S 3,2

20

S 2,2 t 7 t 4 S 3,3 t 8 t 9

0~ 5 __ 6

2 5 8

6~ 16 __ 24

4 6 10

16~ 28 __ 30

5 8 11

COST=5* 2 +2* 2 +1* 4 =18 sb=0 0~ 5 __ 7

5 12 16

0~ 9 __ 16

3 5 9

7~ 10 __ 13

3 8 11

11~ 15 __ 17

4 8 11

17~ 25 __ 30

3 6 8

18~ 26 __ 30 18~ 23 __ 30

5 8 14

D=30

*

Time complexity

O(n+e)~O(n^2)

IC-PCP=O(n^2)

O(n) O(n-1) O(n^2) O(m*n)=O(n^2)

*

Time complexity

Call PLANNING(G(T,E)) O(n^2) O(n^2)

IC-PCPD2=O(n^2)

Assign subdeadline on PCP node O(n)

Algo1 IC-PCP *

evaluation

Algo2 IC-PCPD2 Algo3 IC-LOSS Fastest schedule : scheduling each workflow task on a distinct instance of the fastest computation service, while all data transmission times are considered to be zero MF = makespan of the Fastest schedule deadline factor α set the deadline = α ・ MF -Since the problem has no solution for α = 1, we let α ranges from 1.5 to 5 in our experiments, with a step length equal to 0.5

Cheapest schedule : scheduling all workflow tasks on a single instance of the cheapest computation service normalize the total cost of each workflow execution

*

evaluation

Algo1 IC-PCP Algo2 IC-PCPD2 Algo3 IC-LOSS

1 > 2 > 3 1 > 2 > 3 1 ≈ 2 > 3 1 > 2 > 3 2 > 1 > 3 1 > 2 > 3 1 > 2 > 3 1 ≈ 2 > 3 1 > 2 > 3 2 > 1 > 3

*

Conclusions

* The new algorithms consider the main features of the current commercial Clouds such as on-demand resource provisioning, homogeneous networks, and the pay-as-you-go pricing model * The time complexity of both algorithms is O(n2), The polynomial time complexity makes them suitable options for the large workflows * IC-PCP outperforms both, IC-PCPD2 and IC-Loss in most cases * experiments show that the computation times of the algorithms are very low, less than 500 ms for the large workflows * intend to improve our algorithms for the real Cloud environments

* *