Monitoring Data Dependencies to Support Recovery in Concurrent Process Execution* Susan D.

Download Report

Transcript Monitoring Data Dependencies to Support Recovery in Concurrent Process Execution* Susan D.

Monitoring Data Dependencies to Support
Recovery in Concurrent Process Execution*
Susan D. Urban
Department of Computer Science
February 6, 2009
*This research is partially supported by NSF Grant No. CCF-0820152.
The Challenge of Concurrent Execution in a
Service-Oriented Environment
Serializability
 The concurrent execution of two or more transactions must be
equivalent to the serial execution of those transactions
 Two-phase locking and two-phase commit support serializability in
controlled distributed environments
Isolation
 Data changes should not be released before the commit of a transaction
 Lack of isolation leads to cascaded rollbacks when transaction failure
occurs.
• Transaction A fails and performs rollback
• If transaction B reads modified data from transaction A, transaction B must
also rollback
The problem: Serializability and isolation are not generally applicable
to long-running workflow or process scenarios composed of
distributed, autonomous services.
 Compensation can be used to logically undo a process
 Compensation does not account for the affect of the failure and recovery
process on concurrently executing processes
2
Concurrent Process Execution Scenario
Process1
Service
operation1
Service
operation3
Service
operation2
Service
operation4
Service
operation5
…….
Service
operationm
Service Provider1
Service Provider3
Service Provider2
Process2
Service
operation2
Service
operation4
Service
operation5
…….
Service
operationn
Scenario



Process1 fails at service operation5
Compensation can be executed to restore Process1
Process2 may be operating with incorrect data
3
Research Challenges
Can we capture and share data changes and
Harnessing Moore’s Law, by Mark Hill
data dependencies
among concurrently
executing processes that invoke Grid/Web
Services?
“Our Success in hiding computers when they work brings with it a
responsibility to hide them when they fail. Imagine Web Services as
Can we provide aavailable
moreasintelligent
way to
telephones
….we will have toanalyze
design systems
that they willthat
fail….
dynamically
theassuming
relationships
should seek to ensure
exist betweenweconcurrently
executing
that all systems mask almost all of those failures from users.”
processes?
From Computer Science: Reflections on the Field, Reflections from the Field,
Can we
determine how the recovery of one
National Research Council of the National Academies, 2004.
process can affect other concurrently
executing processes based on application
semantics?
4
Overview of Presentation
Related Work
The DeltaGrid Approach





Overview of the Approach
Delta-Enabled Grid Services (DEGS)
Process Dependency Model
Service Composition and Recovery Model
Process Interference Rules and Recovery Algorithm
Implementation, Simulation, and Performance Evaluation
DeltaGrid Research Contributions
Current Directions (NSF Grant No. CCF-0820152)
 The D3 Project: Decentralized Data Dependency Analysis and
Recovery for Concurrent Processes
5
THE REACTIVE BEHAVIOR AND DATA
MANAGEMENT RESEARCH TEAM
Past Members from Arizona State University
 Luther Blake (M.S.) The Design and Implementation of Delta-
Enabled Grid Services, 2006.
 Yang Xiao (Ph.D.) Using Deltas to Analyze Data Dependencies and
Semantic Correctness in the Recovery of Concurrent Processes,
2006.
 Vidya Gopalan (M.S.) Simulation and Evaluation of an ObjectOriented Condition Evaluator for Process Interference Rules, 2008.
Current Team from Texas Tech University
 Ziao Liu, M.S. Student, Decentralized Data Dependency Analysis
for Concurrent Process Execution – in progress
 Le Gao, Ph.D. Student – in progress
 Andrew Courter, B.S./M.S. Student - in progress
http://reactive.cs.ttu.edu
6
Related Work:
Transactions and Workflows
Transactional Workflow

The ConTract Model (compensation, pre-/post-condition) (Wachter and Reuter 1992)
 METEOR (pre-defined hierarchical error model) (Worah 1997)
 CREW (explicitly specify data dependency) (Kamath and Ramamritham 1998)
 WAMO (automatic exception handling for workflow execution) (Eder and Liebhart 1995)
Exception handling in service composition environment





Transaction protocols: WS-Transaction (Cabrera et al. 2002)
Transactional Attitude (Mikalsen, Tai, and Rouvellou 2002)
Web Service Composition Action (contingency) (Tartanoglu et al. 2003) (Tartanoglu et
al. 2003)
BPEL4WS (Andrews et al. 2003)
BPML (Arkin 2002)
Our Research



Supports relaxed isolation and user-defined semantic correctness
Rule-based approach to resolving failure and recovery impact on concurrent
processes.
Dynamically analyzes data dependencies from streaming database log files.
7
The DeltaGrid Approach
Overview of the Approach
The DeltaGrid Approach
A semantically-robust execution environment for processes
that execute over distributed, autonomous services
App Exceptions
deltas
Data
Invoke services
s
Sy
tem
re
r
o
ec
ve
r
v
ye
en
ts
deltas
Delta-Enabled
Grid Services
lu
fai
DeltaGrid Event
Processor
Sy
ste App
m
fail Exce
ure ptio
re c n s
o ve &
ry
e ve
n
Process History
Capture System
ts
Failure Recovery System
Use analysis
interface
Query history, write
process info
Process Execution Engine
Rule-based
Failure recovery
Metadata Manager
Read process
script
Process Metadata
Rule Metadata
Event
Rule Processor
Execute rules
One-way interaction between system components
two-way interaction between system components
9
DeltaGrid Abstract Execution Model
The DeltaGrid Abstract Execution Model
Service Composition and
Recovery Model
Process Interference Rules
Composition Structure
Rule Specification
Execution Semantics
Triggering Procedure
Recovery Algorithms
Global
Global
Execution
Execution
History
History
Global Execution History Interface
Read
Read/write
and writeDependency
Dependency
Process Dependency Model
10
The DeltaGrid Approach
Delta-Enabled Grid Services
Delta-Enabled Grid Services
Invoke service operation
Delta-Enabled
Grid Service
)
Client Application
Delta notification
lta
s
lta
d
o
m
h
s
u
(p
De
qu
Delta Event
Processor
y
OGSA-DAI
er
u
(p
ll m
od
e)
it
o
n
a
tl
e
D
De
Invoke DML activity
Execute DML statement
Source
Database
Delta propagation
Delta
Repository
• Delta – An incremental
change in a data element
• Captures data changes
using either
• Triggers
• Oracle Streams
• Sends deltas back to the
delta event processor
in either a push or pull
mode using XML
• Provides a way to
externalize the DB log
file as a stream of data
change events
12
Triggers vs. Streams
Triggers
 Tightly coupled to update transaction
 Doubles time for update
S. Urban, Y. Xiao, L. Blake, and S. Dietrich,
Monitoring Data Dependencies in
automatic
Concurrent Process Execution Through
 Easy to use but inflexible Delta-Enabled Grid Services, to appear in
International Journal Of Web and Grid
Oracle Streams
Services, 2009.
 Decoupled from update
transaction
 Offload delta repository to limit affect on updates
 Automatic streaming to multiple destinations
 Complex but versatile
 Push of deltas is not
Expanding Investigation to DB2 and SQL Server
Use of Object Deltas
p1
Process
p2
p1
Process
op11
op12
op21
op22
Y (y0)
op11
op12
op21
x1
x2
x3
op22
Object Deltas
Object Deltas
X (x0)
p2
x1
x2
y1
X (x0)
x3
y2
Dynamically analyze data
dependencies in concurrent
process execution to identify
process interference when
failures occur.
Y (y0)
y1
y2
Delta-Enabled Rollback (DErollback) can be used if
recoverability conditions are
satisfied.
14
The DeltaGrid Approach
Process Dependency Model
Write/Potential Read Dependency
Write Dependency
 Process-level
A write dependency exists if a process pi, writes a data item x that
has been written by another process pj before pj completes (i≠j).
 Operation-level
 Write dependency set
Potential Read Dependency
 Process-level
A read dependency exists if a process pi, read a data item x that
has been written by another process pj before pj completes (i≠j).
 Operation-level
 Potential read dependency set
16
Global Execution History
DEGS1 Local Execution History
Delta
Delta
Delta
Delta
DEGSn Local Execution History
…...
Delta
Delta
time
Delta
Delta
Delta
Delta
time
Write
dependency
Global Execution History
deltas
Delta
Delta
Delta
Delta
Delta
Delta
Delta
Delta
Delta
Delta
time
execution context
operation1 (input, output, state, degsID, tss, tse)
process1 (input, output, state, tss, tse)
…...
…...
operationn (input, output, state, degsID, tss, tse)
Potential
Read
dependency
processm (input, output, state, tss, tse)
Y. Xiao and S. Urban, Process Dependencies and Process Interference Rules for
Analyzing the Impact of Failure in a Service Composition Environment, Journal of
Information Science and Technology, 2008. Special issue from 10th International
Conference on Business Information Systems, Poznan, Poland, 2007.
17
Process Execution Scenario
Process
p2
p1
Operation
op11
op21
ts1
op13
op12
ts2
ts3
op22
ts4
op14
ts5
ts6
tss
tse time+
X (x0)
DEGS1
DEGS2
System
Invocation
Event
Sequence
Y (y0)
Z (z0)
x1
x2
x3
x4
y1
z2
z1
Local Execution History of DEGS1
Global Execution History
Local Execution History of DEGS2
18
The DeltaGrid Approach
Service Composition and Recovery Model
Service Composition Structure
abstract
Process
Execution
Entities:
• Operation
• Compensation
• Contingency
• Atomic Group
• Composite
Group
• Process
1
Composite
Group
1
1
*
1
*
Atomic Group
1
1
1
1
1
Operation
0..1
Compensation
0..1
0..1
Contingency
0..1
20
Abstract Process Definition Example
Atomic Group
p1 = cg1
 Compensation
 Contingency
cg11
ag111
op11
cop11
top11
cg12
ag121 op14 (non-critical)
ag112
ag113
op12
cop12
op13
top13
op15
cop15
ag122
ag13
op16
cop16
top16
cg12.top
cg11.cop
cg11.top
cg1.cop
cg1.top
Yang Xiao and Susan D. Urban, The DeltaGrid Service
Composition and Recovery Model, to appear
International Journal of Web Services Research, 2009.
21
Composite Group
 Deep/Shallow
compensation
 Contingency
Supports DE-Rollback
Provides state
diagrams and
algorithms for recovery
semantics of the
service composition
model (single and
concurrent execution
cases)
Example: Process Interference
Caused by Write Dependency
Write
dependent on
Pc1.
Pc1=place
ClientOrder
Check Check
receive
ClientOrder Credit Inventory
Charge
CreditCard
packO
dec
Inventory rder
Inc
verifyVO
dec
Inventory Inventory
Item
Pr=replenish
Inventory
Write
dependent
on Pc1ts and
Pr.
Pc2=place
ClientOrder
Check
receive
ClientOrder Credit
ts2 ts3
1
ts4
Check
Inventory
ts5
packBac
kOrder
Charge
CreditCard
ts6
cop: unpack
BackOrder
cop:inc
cop:dec
Inventory Inventory
dec
Inventory
ts7
ts9
ts8
time+
DEGS1
Inventory
Item (I0)
I1
I2
I3
I4
I5
I6
DEGS2
ClientOr
der(CA0) CA1
ClientOr
der(CB0)
CA2
CB1
22
The DeltaGrid Approach
Process Interference Rules and
Recovery Algorithm
PIR Specification
create rule
event
define
condition
action
ruleName
failureRecoveryEvent
[viewName as <OQL expression>]
[when condition]
recovery commands
event:
<processName>ReadDependency(pf, rdp)
<processName>WriteDependency(pf, wdp)
define:
query over the global execution history interface
condition: determine if process interference exists
action:
deepCompensate/re-execute process
post-commitRecover/re-execute operation
24
Process Interference Rule Example
Compensation of replenishInventory removed inventory
items needed in placeClientOrder
Triggered after
failure recovery of
failedProcess
Create rule
inventoryDecrease
Event
placeClientOrderWriteDependency(failedProcess, wdProcess)
Define
decreasedItems as
select fd.oId
from fd in failedProcess.getDeltasByRecovery(“InventoryItem”, “quantity”)
group by fd.oId
having sum(fd.newValue – fd.oldValue) < 0
Condition
Action
when exists decItem in decreasedItems:
decItem in
(select d
from d in wdProcess.getDeltas(“InventoryItem”, “quantity”))
deepCompensate(wdProcess);
Query deltas using
object model
Use application
semantics to
determine if process
interference exists 25
Concurrent Process Recovery





Execution queue holding active
processes
Generate recovery commands for the
failed process p1
Generate process dependency graph
(PDG) for p1
Dependent processes are temporarily
suspended to evaluate PIRs.
Breadth-first traversal for PDG and
PIR evaluation



P1
P2
P
5
A process depends on multiple
processes
A process with PIR evaluated to be false
P3
P6
P7
P4
P8
P9
Results show the correctness of the
PDG formation, the traversal process,
use of DE-rollback, and the PIR
evaluation process
26
Cascaded Process Recovery Example
Recovery
Not
Needed
Recovery
Needed
P1
P2
P5
P3
P6
P7
P1
P4
P8
P2
P9
P5
P3
P6
P7
P4
P8
P9
27
Special Cases to Consider

P1
P2

P3

P4
P5
P2
Handles cyclic dependencies
Guarantees that updates are not lost in the
recovery process.
 Compensation has higher priority than DErollback
 DE-rollback is only performed if no write
dependencies exist.
Two failed processes p1 and p2 can have a common
dependent process p3.
 Recovery of failed processes p1 and p2 are
ordered by timestamps
 If p3 is recovered with p1, p3 does not appear
in the dependency graph of p2 but dependencies
introduced by the recovery of p3 are considered
in determining DE-rollback applicability in the
recovery of p2
28
The DeltaGrid Approach
Implementation, Simulation, and
Performance Evaluation
Process History Capture System (PHCS) and
Process Recovery System (PRS)
Delta-Enabled
Grid Service
XML
files
(deltas)
Failure Recovery
System
DeltaGrid Event
Processor
Process
Execution Engine
Query process
history
XML files (deltas)
Delta Receiver
Process History
Analyzer
XML files (deltas)
Service Layer
Parser
Global
schedule
Java
objects
(deltas)
GlobalScheduleAccess
Process
runtime info
deltas
DeltaAccess
Write
process
execution
context
ProcessInfoAccess
Data Access
Layer
Global Delta
Object Schedule
Data Storage
Layer
OODB
Delta
Repository
Process History Capture System
Process
Runtime
Info
30
Simulation and Evaluation Framework
DEVSJAVA (B. Zeigler & H. Sarjoughian)
Implemented PHCS and PRS
500
Processing time
(Millisecond)
Simulated DEGS and Execution Engine
Evaluation Setup for WD Retrieval
Write Dependency Retrieval Time (n:10~100)
400
300
100 objects
1000 objects
200
100
0
 Vary number of concurrent processes (10~100,
10
20
30
40
50
60
70
80
90 100
Number of concurrent processes
100~1000)
 Vary an operation’s distribution over objects
(100 objects, 1000 objects)
Evaluation Result Analysis
not matter
 Exponential increase without optimization
 Linear increase with optimization based on
segmenting the global schedule
 Advocates a distributed PHCS
31
120000
100000
80000
60000
40000
20000
0
100objects
1000objects
segment
10
0
20
0
30
0
40
0
50
0
60
0
70
0
80
0
90
0
10
00
 An operation’s distribution over objects does
Processing time
(Millisecond)
Write Dependency Retrival Time (n:100~1000)
Number of concurrent processes
Other Evaluation Results
Evaluation setup for Recovery Algorithm
 Vary number of concurrent processes (10~100, 100~1000)
 Vary process nesting level (1-5)
Evaluation result and analysis
 Linear increase when the number of concurrent processes grows
• Delta parsing/storage time (increases faster than global schedule)
• Global schedule construction time
• Operation-level read dependency retrieval time
 Exponential increase in PDG construction time with high process density
 Constant cascaded recovery processing time
 Advocates distributed PHCS
• Large amount of concurrent deltas
• High process dependency density
Improved delta object model interface performance through
the use of SODA (Simple Object
Data Access) interface.
32
The DeltaGrid Approach
Research Contributions
DeltaGrid Research Contributions
Defined the functionality required for the capture and use
of incremental changes to autonomous data sources in
a distributed Grid Service environment.
Designed a flexible approach to recovery of service
execution failure, providing multi-level protection and
maximizing forward recovery
Defined algorithms for analysis of data dependencies
among concurrently executing processes based on
deltas collected from distributed sites
Designed a rule-based approach for process interference
handling based on application semantics
Design, implementation, and evaluation of the DeltaGrid
simulation framework
34
The DeltaGrid Approach
Current Directions: The Decentralized Data
Dependency (D3) Analysis Project
The D3 Project
NSF Grant No. CCF 0820152 (Software for Real-World
Systems Program)
A Decentralized and Rule-Based Approach to Data
Dependency Analysis and Failure Recovery in a ServiceOriented Environment
Objective: To enhance service-oriented environments with theories and
methods that support dynamic, flexible, and user-defined approaches
to the recovery of failed processes that execute in a loosely-coupled
environment without isolation guarantees.
Builds on and integrates three main concepts:
 The DEGS capability of externalizing database log files.
 Decentralized, peer-to-peer techniques for sharing and merging log files.
 Event and rule-driven techniques for dynamic process recovery and
exception handling.
36
Decentralized Process Execution Units
Deltas are stored locally
for services that execute
at the PEXA site.
A decentralized community
of PEXAs, each controlling
the execution of multiple
processes.
PEXAs communicate in a decentralized manner
to dynamically discover data dependencies and
to support event and rule driven recovery among
concurrent processes.
Research Challenges
Decentralized data dependency analysis

Representation, communication, correctness, performance
Dynamic aspects of service composition



Event-driven service composition
Refinement of process interference rules
Introduce application exception events and rules
Correctness of execution and recovery with respect to
intended user semantics.

Using formal methods to express execution and recovery correctness in a dynamic,
decentralized, concurrent execution environment.
Decentralized algorithms for data dependency analysis,
rule execution, and recovery procedures.
Questions?
S. D. Urban, Y. Xiao, L. Blake, and S. Dietrich, Monitoring Data Dependencies in Concurrent Process
Execution Through Delta-Enabled Grid Services, to appear in International Journal Of Web and Grid
Services, 2009.
Y. Xiao and S. D. Urban, The DeltaGrid Service Composition and Recovery Model, to appear International
Journal of Web Services Research, 2009.
Y. Xiao and S. Urban, Process Dependencies and Process Interference Rules for Analyzing the Impact of
Failure in a Service Composition Environment, Journal of Information Science and Technology, 2008.
Y. Xiao and S. D. Urban, “Using Data Dependencies to Support the Recovery of Concurrent Processes in a
Service Composition Environment,” Proceedings of the Cooperative Information Systems Conference
(COOPIS), Monterrey, Mexico, November, 2008.
Y. Xiao and S. D. Urban. 2007. Process Dependencies and Process Interference Rules for Analyzing the
Impact of Failure in a Service Composition Environment, Proceedings of the 10th International Conference
on Business Information Systems, Poznan, Poland, April 2007, pp. 67-81.
Y. Xiao., S. D. Urban, and N. Liao. 2006. The DeltaGrid Abstract Execution Model: service composition and
process interference handling. Proceedings of the 25th Int. Conference on Conceptual Modeling, pp. 40-53,
Tucson, Arizona.
Y. Xiao, S. D. Urban, and S. W. Dietrich. 2006. A Process History Capture System for Analysis of Data
Dependencies in Concurrent Process Execution. Proceedings of the 2nd Int. Workshop on Data
Engineering Issues in E-Commerce and Services, pp.152-166, San Francisco, California.
H. Ma, S. D. Urban, Y. Xiao, and S. W. Dietrich. 2005. GridPML: A Process Modeling Language and
Process History Capture System for Grid Service Composition. Proceedings of IEEE Int. Conference on eBusiness Engineering, pp.433-440, Beijing, China.
39
Global Execution History
Delta – An incremental change in a data value.

Δ(oID, a, Vold, Vnew, tsn, opij)
DEGS Local Execution History


lh(degsID) = <tss,tse,δ(degsID)>
δ(degsID) = [Δ(oIDA, a, Vold, Vnew, tsx, opij)| opij.degsID=degsID and tss<=tsx<=tse]
([] indicates a list of elements ordered by timestamp)
Execution Context
 Operation execution context ec(opij) = <tss, tse, Input, Output, State>
 Process execution context ec(pi) = <tss, tse, Input, Output, State>
 Global execution context gec = [ec(entity) | (entity=opij or entity=pi) and
(tss≤ ec(entity).tss< ec(entity).tse≤ tse)]
Global execution history
 gh = <tss, tse, δg, gec>
 Δg = [Δ(oIDA, a, Vold, Vnew, tsx, opij)| tss<=tsx<=tse]
System Invocation Event Sequence
 Eseq = [eentity | entity = opij or entity = pi]
40
A Process Definition Example
Compensation
Process placeClientOrder (p1 = cg1)
ag11
receiveClientOrder
cop:chgOrderStatus
ag12
checkCredit
no
good credit?
rejectClientOrder
ag14
yes
ag13
checkInventory
sufficient
inventory items?
cg15
no
yes
cg16
Contingenc
y
ag151
chargeCreditcard
cop:creditBack
top:eCheckPay
ag161
chargeCreditcard
cop:creditBack
top:eCheckPay
ag152
decInventory
cop:incInventory
ag162
addBackorder
cop:rmvBackorder
ag17
packOrder
cop:unpackOrder
ag18
upsShipOrder
cop:upsShipback
top:fedexShipOrder
Atomic Group
 Compensation
 Contingency
Composite Group
 Deep/Shallow
compensation
 Contingency
Delta-Enabled
Rollback
State diagrams and
algorithms for
defining recovery
semantics of the
service composition
model (single and
concurrent
execution cases)
41
The Global Delta Object Schedule
Data Storage
Index Structure
Instance View
Operation Index
OperationIndex
processId
operationId
oIndex1
p1
op1
oIndex2
p1
op2
oIndex3
p2
op3
...
oIndexN
px
opy
OODB
Time-sequence Index
Process
Runtime
Info
Delta
Repository
TimeSequenceIndex
processId
operationId
Timestamp
seqNum
tsIndex1
p1
op1
ts1
1
tsIndex2
p2
op1
ts2
1
tsIndex3
p1
op1
ts3
1
...
tsIndexN
p5
op2
tsN
1
Time+
Node
Node
className
ObjectId
propertyName
node1
classA
Object1
property1
node2
classB
Object1
property2
42
node3
classA
Object1
property1
...
nodeN
classC
Object3
property2
The Global Execution History
Interface Supported by the PHCS
Global Execution History
Object Model
Data
sources
DEGS
Data
access
Process
Process
Execution
Engine
GlobalScheduelAccess
Process
1
1
* Operation 1
* Operation
DeltaAccess
Global Delta
Object Schedule
Delta
1
1
1
1
43
Process
* DeltaValue History
Analyzer
Process runtime info
repository
* DeltaProperty
DataChange
Delta
ProcessInfoAccess
Delta repository
Data
storage
1
*
*
PropertyValue
ProcessInfo
1
*
OperationInfo
Global Execution History Object Model
wdProcessForP
rdProcessForP
1
*
1
*
Process
rdOperations
wdOperations
-pID
-pName
1
* 1
*
-startTime
getOperations
-endTime
1
Delta
Operation
*
-state
-oID
getProcess -opID
+getOperation(in opName)
-className
-opName
+getCurrentOperation()
wdProcessesForOP -startTime
-attrName
getDeltas
*
1
+getDeltas()
-oldValue
-endTime
+getDeltas(in className)
1
* -newValue
-state
+getDeltas(in className, in attrName)
getOperation
-dataType
* rdProcessesForOP 1 -degsID
+getDeltasBeforeRecovery()
-timestamp
+getDeltas(in className)
+getDeltasBeforeRecovery(in className)
*
+getDeltas(in
in attrName)
1 className,
+getDeltasBeforeRecovery(in className, in attrName)
+getMostRecentDeltaBeforeRecovery(in className, in attrName)
+getDetlasByRecovery()
1
1
1
1
+getDeltasByRecovery(in className)
getContingency
getCompensation
+getDeltasByRecovery(in className, in attrName)
+getMostRecentDeltaByRecovery(in className, in attrName)
44
Application Exception Rules