Monitoring Data Dependencies to Support Recovery in Concurrent Process Execution* Susan D.
Download ReportTranscript Monitoring Data Dependencies to Support Recovery in Concurrent Process Execution* Susan D.
Monitoring Data Dependencies to Support Recovery in Concurrent Process Execution* Susan D. Urban Department of Computer Science February 6, 2009 *This research is partially supported by NSF Grant No. CCF-0820152. The Challenge of Concurrent Execution in a Service-Oriented Environment Serializability The concurrent execution of two or more transactions must be equivalent to the serial execution of those transactions Two-phase locking and two-phase commit support serializability in controlled distributed environments Isolation Data changes should not be released before the commit of a transaction Lack of isolation leads to cascaded rollbacks when transaction failure occurs. • Transaction A fails and performs rollback • If transaction B reads modified data from transaction A, transaction B must also rollback The problem: Serializability and isolation are not generally applicable to long-running workflow or process scenarios composed of distributed, autonomous services. Compensation can be used to logically undo a process Compensation does not account for the affect of the failure and recovery process on concurrently executing processes 2 Concurrent Process Execution Scenario Process1 Service operation1 Service operation3 Service operation2 Service operation4 Service operation5 ……. Service operationm Service Provider1 Service Provider3 Service Provider2 Process2 Service operation2 Service operation4 Service operation5 ……. Service operationn Scenario Process1 fails at service operation5 Compensation can be executed to restore Process1 Process2 may be operating with incorrect data 3 Research Challenges Can we capture and share data changes and Harnessing Moore’s Law, by Mark Hill data dependencies among concurrently executing processes that invoke Grid/Web Services? “Our Success in hiding computers when they work brings with it a responsibility to hide them when they fail. Imagine Web Services as Can we provide aavailable moreasintelligent way to telephones ….we will have toanalyze design systems that they willthat fail…. dynamically theassuming relationships should seek to ensure exist betweenweconcurrently executing that all systems mask almost all of those failures from users.” processes? From Computer Science: Reflections on the Field, Reflections from the Field, Can we determine how the recovery of one National Research Council of the National Academies, 2004. process can affect other concurrently executing processes based on application semantics? 4 Overview of Presentation Related Work The DeltaGrid Approach Overview of the Approach Delta-Enabled Grid Services (DEGS) Process Dependency Model Service Composition and Recovery Model Process Interference Rules and Recovery Algorithm Implementation, Simulation, and Performance Evaluation DeltaGrid Research Contributions Current Directions (NSF Grant No. CCF-0820152) The D3 Project: Decentralized Data Dependency Analysis and Recovery for Concurrent Processes 5 THE REACTIVE BEHAVIOR AND DATA MANAGEMENT RESEARCH TEAM Past Members from Arizona State University Luther Blake (M.S.) The Design and Implementation of Delta- Enabled Grid Services, 2006. Yang Xiao (Ph.D.) Using Deltas to Analyze Data Dependencies and Semantic Correctness in the Recovery of Concurrent Processes, 2006. Vidya Gopalan (M.S.) Simulation and Evaluation of an ObjectOriented Condition Evaluator for Process Interference Rules, 2008. Current Team from Texas Tech University Ziao Liu, M.S. Student, Decentralized Data Dependency Analysis for Concurrent Process Execution – in progress Le Gao, Ph.D. Student – in progress Andrew Courter, B.S./M.S. Student - in progress http://reactive.cs.ttu.edu 6 Related Work: Transactions and Workflows Transactional Workflow The ConTract Model (compensation, pre-/post-condition) (Wachter and Reuter 1992) METEOR (pre-defined hierarchical error model) (Worah 1997) CREW (explicitly specify data dependency) (Kamath and Ramamritham 1998) WAMO (automatic exception handling for workflow execution) (Eder and Liebhart 1995) Exception handling in service composition environment Transaction protocols: WS-Transaction (Cabrera et al. 2002) Transactional Attitude (Mikalsen, Tai, and Rouvellou 2002) Web Service Composition Action (contingency) (Tartanoglu et al. 2003) (Tartanoglu et al. 2003) BPEL4WS (Andrews et al. 2003) BPML (Arkin 2002) Our Research Supports relaxed isolation and user-defined semantic correctness Rule-based approach to resolving failure and recovery impact on concurrent processes. Dynamically analyzes data dependencies from streaming database log files. 7 The DeltaGrid Approach Overview of the Approach The DeltaGrid Approach A semantically-robust execution environment for processes that execute over distributed, autonomous services App Exceptions deltas Data Invoke services s Sy tem re r o ec ve r v ye en ts deltas Delta-Enabled Grid Services lu fai DeltaGrid Event Processor Sy ste App m fail Exce ure ptio re c n s o ve & ry e ve n Process History Capture System ts Failure Recovery System Use analysis interface Query history, write process info Process Execution Engine Rule-based Failure recovery Metadata Manager Read process script Process Metadata Rule Metadata Event Rule Processor Execute rules One-way interaction between system components two-way interaction between system components 9 DeltaGrid Abstract Execution Model The DeltaGrid Abstract Execution Model Service Composition and Recovery Model Process Interference Rules Composition Structure Rule Specification Execution Semantics Triggering Procedure Recovery Algorithms Global Global Execution Execution History History Global Execution History Interface Read Read/write and writeDependency Dependency Process Dependency Model 10 The DeltaGrid Approach Delta-Enabled Grid Services Delta-Enabled Grid Services Invoke service operation Delta-Enabled Grid Service ) Client Application Delta notification lta s lta d o m h s u (p De qu Delta Event Processor y OGSA-DAI er u (p ll m od e) it o n a tl e D De Invoke DML activity Execute DML statement Source Database Delta propagation Delta Repository • Delta – An incremental change in a data element • Captures data changes using either • Triggers • Oracle Streams • Sends deltas back to the delta event processor in either a push or pull mode using XML • Provides a way to externalize the DB log file as a stream of data change events 12 Triggers vs. Streams Triggers Tightly coupled to update transaction Doubles time for update S. Urban, Y. Xiao, L. Blake, and S. Dietrich, Monitoring Data Dependencies in automatic Concurrent Process Execution Through Easy to use but inflexible Delta-Enabled Grid Services, to appear in International Journal Of Web and Grid Oracle Streams Services, 2009. Decoupled from update transaction Offload delta repository to limit affect on updates Automatic streaming to multiple destinations Complex but versatile Push of deltas is not Expanding Investigation to DB2 and SQL Server Use of Object Deltas p1 Process p2 p1 Process op11 op12 op21 op22 Y (y0) op11 op12 op21 x1 x2 x3 op22 Object Deltas Object Deltas X (x0) p2 x1 x2 y1 X (x0) x3 y2 Dynamically analyze data dependencies in concurrent process execution to identify process interference when failures occur. Y (y0) y1 y2 Delta-Enabled Rollback (DErollback) can be used if recoverability conditions are satisfied. 14 The DeltaGrid Approach Process Dependency Model Write/Potential Read Dependency Write Dependency Process-level A write dependency exists if a process pi, writes a data item x that has been written by another process pj before pj completes (i≠j). Operation-level Write dependency set Potential Read Dependency Process-level A read dependency exists if a process pi, read a data item x that has been written by another process pj before pj completes (i≠j). Operation-level Potential read dependency set 16 Global Execution History DEGS1 Local Execution History Delta Delta Delta Delta DEGSn Local Execution History …... Delta Delta time Delta Delta Delta Delta time Write dependency Global Execution History deltas Delta Delta Delta Delta Delta Delta Delta Delta Delta Delta time execution context operation1 (input, output, state, degsID, tss, tse) process1 (input, output, state, tss, tse) …... …... operationn (input, output, state, degsID, tss, tse) Potential Read dependency processm (input, output, state, tss, tse) Y. Xiao and S. Urban, Process Dependencies and Process Interference Rules for Analyzing the Impact of Failure in a Service Composition Environment, Journal of Information Science and Technology, 2008. Special issue from 10th International Conference on Business Information Systems, Poznan, Poland, 2007. 17 Process Execution Scenario Process p2 p1 Operation op11 op21 ts1 op13 op12 ts2 ts3 op22 ts4 op14 ts5 ts6 tss tse time+ X (x0) DEGS1 DEGS2 System Invocation Event Sequence Y (y0) Z (z0) x1 x2 x3 x4 y1 z2 z1 Local Execution History of DEGS1 Global Execution History Local Execution History of DEGS2 18 The DeltaGrid Approach Service Composition and Recovery Model Service Composition Structure abstract Process Execution Entities: • Operation • Compensation • Contingency • Atomic Group • Composite Group • Process 1 Composite Group 1 1 * 1 * Atomic Group 1 1 1 1 1 Operation 0..1 Compensation 0..1 0..1 Contingency 0..1 20 Abstract Process Definition Example Atomic Group p1 = cg1 Compensation Contingency cg11 ag111 op11 cop11 top11 cg12 ag121 op14 (non-critical) ag112 ag113 op12 cop12 op13 top13 op15 cop15 ag122 ag13 op16 cop16 top16 cg12.top cg11.cop cg11.top cg1.cop cg1.top Yang Xiao and Susan D. Urban, The DeltaGrid Service Composition and Recovery Model, to appear International Journal of Web Services Research, 2009. 21 Composite Group Deep/Shallow compensation Contingency Supports DE-Rollback Provides state diagrams and algorithms for recovery semantics of the service composition model (single and concurrent execution cases) Example: Process Interference Caused by Write Dependency Write dependent on Pc1. Pc1=place ClientOrder Check Check receive ClientOrder Credit Inventory Charge CreditCard packO dec Inventory rder Inc verifyVO dec Inventory Inventory Item Pr=replenish Inventory Write dependent on Pc1ts and Pr. Pc2=place ClientOrder Check receive ClientOrder Credit ts2 ts3 1 ts4 Check Inventory ts5 packBac kOrder Charge CreditCard ts6 cop: unpack BackOrder cop:inc cop:dec Inventory Inventory dec Inventory ts7 ts9 ts8 time+ DEGS1 Inventory Item (I0) I1 I2 I3 I4 I5 I6 DEGS2 ClientOr der(CA0) CA1 ClientOr der(CB0) CA2 CB1 22 The DeltaGrid Approach Process Interference Rules and Recovery Algorithm PIR Specification create rule event define condition action ruleName failureRecoveryEvent [viewName as <OQL expression>] [when condition] recovery commands event: <processName>ReadDependency(pf, rdp) <processName>WriteDependency(pf, wdp) define: query over the global execution history interface condition: determine if process interference exists action: deepCompensate/re-execute process post-commitRecover/re-execute operation 24 Process Interference Rule Example Compensation of replenishInventory removed inventory items needed in placeClientOrder Triggered after failure recovery of failedProcess Create rule inventoryDecrease Event placeClientOrderWriteDependency(failedProcess, wdProcess) Define decreasedItems as select fd.oId from fd in failedProcess.getDeltasByRecovery(“InventoryItem”, “quantity”) group by fd.oId having sum(fd.newValue – fd.oldValue) < 0 Condition Action when exists decItem in decreasedItems: decItem in (select d from d in wdProcess.getDeltas(“InventoryItem”, “quantity”)) deepCompensate(wdProcess); Query deltas using object model Use application semantics to determine if process interference exists 25 Concurrent Process Recovery Execution queue holding active processes Generate recovery commands for the failed process p1 Generate process dependency graph (PDG) for p1 Dependent processes are temporarily suspended to evaluate PIRs. Breadth-first traversal for PDG and PIR evaluation P1 P2 P 5 A process depends on multiple processes A process with PIR evaluated to be false P3 P6 P7 P4 P8 P9 Results show the correctness of the PDG formation, the traversal process, use of DE-rollback, and the PIR evaluation process 26 Cascaded Process Recovery Example Recovery Not Needed Recovery Needed P1 P2 P5 P3 P6 P7 P1 P4 P8 P2 P9 P5 P3 P6 P7 P4 P8 P9 27 Special Cases to Consider P1 P2 P3 P4 P5 P2 Handles cyclic dependencies Guarantees that updates are not lost in the recovery process. Compensation has higher priority than DErollback DE-rollback is only performed if no write dependencies exist. Two failed processes p1 and p2 can have a common dependent process p3. Recovery of failed processes p1 and p2 are ordered by timestamps If p3 is recovered with p1, p3 does not appear in the dependency graph of p2 but dependencies introduced by the recovery of p3 are considered in determining DE-rollback applicability in the recovery of p2 28 The DeltaGrid Approach Implementation, Simulation, and Performance Evaluation Process History Capture System (PHCS) and Process Recovery System (PRS) Delta-Enabled Grid Service XML files (deltas) Failure Recovery System DeltaGrid Event Processor Process Execution Engine Query process history XML files (deltas) Delta Receiver Process History Analyzer XML files (deltas) Service Layer Parser Global schedule Java objects (deltas) GlobalScheduleAccess Process runtime info deltas DeltaAccess Write process execution context ProcessInfoAccess Data Access Layer Global Delta Object Schedule Data Storage Layer OODB Delta Repository Process History Capture System Process Runtime Info 30 Simulation and Evaluation Framework DEVSJAVA (B. Zeigler & H. Sarjoughian) Implemented PHCS and PRS 500 Processing time (Millisecond) Simulated DEGS and Execution Engine Evaluation Setup for WD Retrieval Write Dependency Retrieval Time (n:10~100) 400 300 100 objects 1000 objects 200 100 0 Vary number of concurrent processes (10~100, 10 20 30 40 50 60 70 80 90 100 Number of concurrent processes 100~1000) Vary an operation’s distribution over objects (100 objects, 1000 objects) Evaluation Result Analysis not matter Exponential increase without optimization Linear increase with optimization based on segmenting the global schedule Advocates a distributed PHCS 31 120000 100000 80000 60000 40000 20000 0 100objects 1000objects segment 10 0 20 0 30 0 40 0 50 0 60 0 70 0 80 0 90 0 10 00 An operation’s distribution over objects does Processing time (Millisecond) Write Dependency Retrival Time (n:100~1000) Number of concurrent processes Other Evaluation Results Evaluation setup for Recovery Algorithm Vary number of concurrent processes (10~100, 100~1000) Vary process nesting level (1-5) Evaluation result and analysis Linear increase when the number of concurrent processes grows • Delta parsing/storage time (increases faster than global schedule) • Global schedule construction time • Operation-level read dependency retrieval time Exponential increase in PDG construction time with high process density Constant cascaded recovery processing time Advocates distributed PHCS • Large amount of concurrent deltas • High process dependency density Improved delta object model interface performance through the use of SODA (Simple Object Data Access) interface. 32 The DeltaGrid Approach Research Contributions DeltaGrid Research Contributions Defined the functionality required for the capture and use of incremental changes to autonomous data sources in a distributed Grid Service environment. Designed a flexible approach to recovery of service execution failure, providing multi-level protection and maximizing forward recovery Defined algorithms for analysis of data dependencies among concurrently executing processes based on deltas collected from distributed sites Designed a rule-based approach for process interference handling based on application semantics Design, implementation, and evaluation of the DeltaGrid simulation framework 34 The DeltaGrid Approach Current Directions: The Decentralized Data Dependency (D3) Analysis Project The D3 Project NSF Grant No. CCF 0820152 (Software for Real-World Systems Program) A Decentralized and Rule-Based Approach to Data Dependency Analysis and Failure Recovery in a ServiceOriented Environment Objective: To enhance service-oriented environments with theories and methods that support dynamic, flexible, and user-defined approaches to the recovery of failed processes that execute in a loosely-coupled environment without isolation guarantees. Builds on and integrates three main concepts: The DEGS capability of externalizing database log files. Decentralized, peer-to-peer techniques for sharing and merging log files. Event and rule-driven techniques for dynamic process recovery and exception handling. 36 Decentralized Process Execution Units Deltas are stored locally for services that execute at the PEXA site. A decentralized community of PEXAs, each controlling the execution of multiple processes. PEXAs communicate in a decentralized manner to dynamically discover data dependencies and to support event and rule driven recovery among concurrent processes. Research Challenges Decentralized data dependency analysis Representation, communication, correctness, performance Dynamic aspects of service composition Event-driven service composition Refinement of process interference rules Introduce application exception events and rules Correctness of execution and recovery with respect to intended user semantics. Using formal methods to express execution and recovery correctness in a dynamic, decentralized, concurrent execution environment. Decentralized algorithms for data dependency analysis, rule execution, and recovery procedures. Questions? S. D. Urban, Y. Xiao, L. Blake, and S. Dietrich, Monitoring Data Dependencies in Concurrent Process Execution Through Delta-Enabled Grid Services, to appear in International Journal Of Web and Grid Services, 2009. Y. Xiao and S. D. Urban, The DeltaGrid Service Composition and Recovery Model, to appear International Journal of Web Services Research, 2009. Y. Xiao and S. Urban, Process Dependencies and Process Interference Rules for Analyzing the Impact of Failure in a Service Composition Environment, Journal of Information Science and Technology, 2008. Y. Xiao and S. D. Urban, “Using Data Dependencies to Support the Recovery of Concurrent Processes in a Service Composition Environment,” Proceedings of the Cooperative Information Systems Conference (COOPIS), Monterrey, Mexico, November, 2008. Y. Xiao and S. D. Urban. 2007. Process Dependencies and Process Interference Rules for Analyzing the Impact of Failure in a Service Composition Environment, Proceedings of the 10th International Conference on Business Information Systems, Poznan, Poland, April 2007, pp. 67-81. Y. Xiao., S. D. Urban, and N. Liao. 2006. The DeltaGrid Abstract Execution Model: service composition and process interference handling. Proceedings of the 25th Int. Conference on Conceptual Modeling, pp. 40-53, Tucson, Arizona. Y. Xiao, S. D. Urban, and S. W. Dietrich. 2006. A Process History Capture System for Analysis of Data Dependencies in Concurrent Process Execution. Proceedings of the 2nd Int. Workshop on Data Engineering Issues in E-Commerce and Services, pp.152-166, San Francisco, California. H. Ma, S. D. Urban, Y. Xiao, and S. W. Dietrich. 2005. GridPML: A Process Modeling Language and Process History Capture System for Grid Service Composition. Proceedings of IEEE Int. Conference on eBusiness Engineering, pp.433-440, Beijing, China. 39 Global Execution History Delta – An incremental change in a data value. Δ(oID, a, Vold, Vnew, tsn, opij) DEGS Local Execution History lh(degsID) = <tss,tse,δ(degsID)> δ(degsID) = [Δ(oIDA, a, Vold, Vnew, tsx, opij)| opij.degsID=degsID and tss<=tsx<=tse] ([] indicates a list of elements ordered by timestamp) Execution Context Operation execution context ec(opij) = <tss, tse, Input, Output, State> Process execution context ec(pi) = <tss, tse, Input, Output, State> Global execution context gec = [ec(entity) | (entity=opij or entity=pi) and (tss≤ ec(entity).tss< ec(entity).tse≤ tse)] Global execution history gh = <tss, tse, δg, gec> Δg = [Δ(oIDA, a, Vold, Vnew, tsx, opij)| tss<=tsx<=tse] System Invocation Event Sequence Eseq = [eentity | entity = opij or entity = pi] 40 A Process Definition Example Compensation Process placeClientOrder (p1 = cg1) ag11 receiveClientOrder cop:chgOrderStatus ag12 checkCredit no good credit? rejectClientOrder ag14 yes ag13 checkInventory sufficient inventory items? cg15 no yes cg16 Contingenc y ag151 chargeCreditcard cop:creditBack top:eCheckPay ag161 chargeCreditcard cop:creditBack top:eCheckPay ag152 decInventory cop:incInventory ag162 addBackorder cop:rmvBackorder ag17 packOrder cop:unpackOrder ag18 upsShipOrder cop:upsShipback top:fedexShipOrder Atomic Group Compensation Contingency Composite Group Deep/Shallow compensation Contingency Delta-Enabled Rollback State diagrams and algorithms for defining recovery semantics of the service composition model (single and concurrent execution cases) 41 The Global Delta Object Schedule Data Storage Index Structure Instance View Operation Index OperationIndex processId operationId oIndex1 p1 op1 oIndex2 p1 op2 oIndex3 p2 op3 ... oIndexN px opy OODB Time-sequence Index Process Runtime Info Delta Repository TimeSequenceIndex processId operationId Timestamp seqNum tsIndex1 p1 op1 ts1 1 tsIndex2 p2 op1 ts2 1 tsIndex3 p1 op1 ts3 1 ... tsIndexN p5 op2 tsN 1 Time+ Node Node className ObjectId propertyName node1 classA Object1 property1 node2 classB Object1 property2 42 node3 classA Object1 property1 ... nodeN classC Object3 property2 The Global Execution History Interface Supported by the PHCS Global Execution History Object Model Data sources DEGS Data access Process Process Execution Engine GlobalScheduelAccess Process 1 1 * Operation 1 * Operation DeltaAccess Global Delta Object Schedule Delta 1 1 1 1 43 Process * DeltaValue History Analyzer Process runtime info repository * DeltaProperty DataChange Delta ProcessInfoAccess Delta repository Data storage 1 * * PropertyValue ProcessInfo 1 * OperationInfo Global Execution History Object Model wdProcessForP rdProcessForP 1 * 1 * Process rdOperations wdOperations -pID -pName 1 * 1 * -startTime getOperations -endTime 1 Delta Operation * -state -oID getProcess -opID +getOperation(in opName) -className -opName +getCurrentOperation() wdProcessesForOP -startTime -attrName getDeltas * 1 +getDeltas() -oldValue -endTime +getDeltas(in className) 1 * -newValue -state +getDeltas(in className, in attrName) getOperation -dataType * rdProcessesForOP 1 -degsID +getDeltasBeforeRecovery() -timestamp +getDeltas(in className) +getDeltasBeforeRecovery(in className) * +getDeltas(in in attrName) 1 className, +getDeltasBeforeRecovery(in className, in attrName) +getMostRecentDeltaBeforeRecovery(in className, in attrName) +getDetlasByRecovery() 1 1 1 1 +getDeltasByRecovery(in className) getContingency getCompensation +getDeltasByRecovery(in className, in attrName) +getMostRecentDeltaByRecovery(in className, in attrName) 44 Application Exception Rules