3070 CICS TG and CICS in a High Availability Environment A customer experience with CA-SILCA Sylvie Constans Manager of the CICS & IMS team at CA-SILCA 07/11/2015
Download ReportTranscript 3070 CICS TG and CICS in a High Availability Environment A customer experience with CA-SILCA Sylvie Constans Manager of the CICS & IMS team at CA-SILCA 07/11/2015
3070 CICS TG and CICS in a High Availability Environment A customer experience with CA-SILCA Sylvie Constans Manager of the CICS & IMS team at CA-SILCA 07/11/2015 PAGE 1 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS CA-SILCA and the groupe Crédit Agricole SA The activities of the group Crédit Agricole SA are organized in 4 business lines. Retail bank in France and World Wide Asset management Specialised financial services Corporate and investment bank CA-SILCA is the IT subsidiary of the groupe Crédit Agricole SA. Is located in France (region of Paris) 07/11/2015 PAGE 2 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS CA-SILCA: Overview Founded in 2005 , groups at the beginning the IT productions of its 3 founding members Currently we have more than 40 customers only subsidiaries Center of expertise for the group Operation services of IT applications Buildings infrastructure services telephony, network, office automation Services for providing workstations for employees 07/11/2015 PAGE 3 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS CA-SILCA: Overview Some keys figures 3 PetaB of storage for the servers 36 000 workstations 33 000 mail boxes 550 hosted web sites 4 000 logical servers 16 000 phone lines A new data center (5000 m2) composed of 2 sites 07/11/2015 PAGE 4 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS CA-SILCA: Mainframe infrastructure 4 ZEC12 located in a 2-site environment (bi-site) separated by 10 kms 2 CPC active /passive in each site. Each LPAR on a active CPC has its image on the passive CPC of the other site. 2827-731 models 60.000 Mips Primary disk array (active data) in one site Secondary disk array in the second site Third site for data replication 07/11/2015 PAGE 5 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS CA-SILCA: Mainframe activity 9 customers (59 LPARs, 60 000 Mips) CA-SILCA manages the system environment of 5 customers o 400 CICS (currently migrating from CICSTS4.1 to CICSTS 5.1) o 45 CICS TG (version 8.1) o 75 DB2 (DB2 V10 migration in CM mode in 2014) o 60 WebSphere MQ (version 7.1) o 15 IMS (migration from IMS V11 to IMS V13 in 2014) o 40 LPAR z/OS 1.13 (z/OS2.1 migration planned in 2015) o 6 sysplexes o Coupling facilities on each CPC 07/11/2015 PAGE 6 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL: Le Crédit Lyonnais Our main customer is LCL (retail bank) Founded in Lyon in 1863 1925 agencies, 6 millions customers in France 70 divisions of private banking (150 000 customers) Retail bank for professionals Its technical environment 130 CICS, 40 CICS TG, 25 DB2, 25 Websphere MQ o Some applications in a non HA TOR/AOR architecture o Benefits from a High Availability Environment for e-banking and intranet applications 07/11/2015 PAGE 7 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL : At the beginning The agencies are connected to the mainframe (Datacenter) based on their location 07/11/2015 PAGE 8 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL : At the beginning Région Parisienne Région Sud Est TOR1 TOR2 TOR3 AOR1 AOR2 AOR3 Local DB2 LPAR1 Région Centre Local DB2 Local DB2l local VSAM LPAR2 local VSAM LPAR3 local VSAM 07/11/2015 PAGE 9 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL : Current architecture 07/11/2015 PAGE 10 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL : At the beginning In 1995, the mainframes were grouped in the region of Paris on 2 sites. The agencies are still connected based on their location Several production problems led LCL to think about a High Availability architecture In the early 2000s a sysplex was implemented with the help of IBM 9 production LPARs and one DEV o Implementation of RLS/SMSVSAM o DB2 Data sharing First on the DEV LPAR (mono partition) to validate the cost 07/11/2015 PAGE 11 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL : At the beginning At the same time, a major project of merging data is started Regional files become national DB2 regional databases become national TORs are accessed with generic resources A 3-tier architecture is implemented WAS are connected to the SNA servers The applications must comply with this architecture Very few 3270 applications left 4 LPARs are dedicated to this architecture o 2 for network purposes, 2 for applications 07/11/2015 PAGE 12 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL architecture: the start of High Availability Ferme WAS de Production TCP/IP RES1 TCP/IP Passerelles SNA LU6.2 VTAM LU6.2 LU6.2 Ferme WAS Pilote LU6.2 RES2 VTAM LU6.2 LU6.2 LU6.2 TOR Prod TOR Prod AOR AOR Prod Prod AOR Prod TOR Pilote TOR Pilote TOR Prod TOR Prod AOR Prod AOR Prod LU6.2 AOR Pilote LP1 AOR Pilote LP2 07/11/2015 PAGE 13 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL architecture: the start of High Availability Pilot Some agencies are connected to the pilot WAS They have access to the pilot CICS TG and the pilot CICS We can deploy new versions of programs without impacting all of production The CICS pilots have one specific load library ahead in the DFHRPL Production The rest 07/11/2015 PAGE 14 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL architecture : the start of High Availability The routing of transactions to the AORs is managed by the dynamic routing program (DFHDYP) we have customised At the end of 2003, ETU9XLOP , the dynamic routing program of CICSPlex SM (TS2.2) was implemented Because our routing program didn’t satisfed us entirely Round robin algorithm Simplistic Number of transactions/ day : 4 Million (TOR+AOR) Implementation in goal mode, uses the service class definitions o Provides average response time and not a percentage These CICS are clones No affinities between transactions 07/11/2015 PAGE 15 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL architecture: the start of High Availability Implementation of shared TSQ servers Implementation of named counters servers Give each application a unique id in the Sysplex The DEV environments have the same architecture(1 TOR, 2 AORs) To be sure not to generate affinities between transactions during the development of applications 07/11/2015 PAGE 16 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL architecture: the start of High Availability After one year we decided to stop using EYU9XLOP Because of the failover of one LPAR : business impact o CICS has response time heavy degradation in a LPAR (DB2) o CICSPlex SM continues to route to this LPAR The services classes were probably not correctly set The CICSPlex SM/WLM delay in reacting was too long o The LPAR fails, we have to do an IPL o The remaining LPAR couldn’t handle the workload o The WAS fail to handle the incoming requests We decided to rewrite our routing program to better fit our needs 07/11/2015 PAGE 17 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL architecture : the start of High Availability What we learned about this architecture Having only one LPAR is not sufficient in case of failover during the day We have to restart the critical applications first We decided to have 4 application LPARs. In case of failover only 25% of the workload has to be dispatched onto the 3 others The SNA servers have been replaced by CICS TGs on z/OS We noticed some affinities between LU6.2 connections and TORs Loose coupling between WAS and CICS (logical names instead of applids) White paper :ftp://public.dhe.ibm.com/software/htp/cics/tserver/v32/library/WSW14020-USEN00_systemz_harmony_0324A.pdf 07/11/2015 PAGE 18 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL architecture: the start of High Availability At that time, CICS TGs were seen as black boxes (prior to version 7.0) No statistics available At first CICS TG architecture is mapped on the existing CICS one Multi channel No standard monitoring available with the tools on the z platform Introscope (Wily Technologie) was implemented on one CTG o Only one CICS TG because of the overhead A dashboard was implemented with the help of the vendor o To monitor the JCA pools activity o To monitor the activity and the CICS response time 07/11/2015 PAGE 19 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS The tools: Introscope (Wily Technologie) 07/11/2015 PAGE 20 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Monitoring tools D B E F O R E C U R R E N T D+1 MAINVIEW CICS/DB2/WMQ . Real time vision . System Administration and tuning . Analyse with 3 tools, LPAR by LPAR . History (LCL) < 1day (TP DB2 : 15 mn) INTROSCOPE Limited problem analysis (impossible for DB2/LCL) CICSPA . Analyze the CICS and CTG SMF records (DB2, WMQ informations) Chiffres SAS / METROLOGIE .Aggregated SMF records . Aggregation on 1 hour . Monthly consolidation INTROSCOPE No problem analysis CICSPA . Analyze the CICS and CTG SMF records (DB2, WMQ informations) MAINVIEW CICS/DB2/WMQ . Real time vision . System administration and tuning Problem analysis in 5 minutes Trend analysis From D + 2 No problem analysis CICSPA . Analyze the CICS and CTG SMF records (DB2, WMQ informations) SAS / METROLOGY figures . Monthly Consolidation Problem analysis Trend analysis Problem analysis Trend analysis 07/11/2015 PAGE 21 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS The tools: CICS Performance Analyzer (IBM) CICS Performance Analyzer allows us to do: Performance analysis by exploiting SMF110(1) records Tuning for our CICS TGs ( SMF111 records since V7.0) o Do we have enough connection managers? o Workers information are taken from the DFHXCURM o We would like to have « cross domain » informations in batch reports • Peak numbers of connection managers, number of requests, CICS response time, Daemon response time… • A RFE has been raised n°46252: you can vote for it !! 07/11/2015 PAGE 22 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL: Current architecture 2 sysplex distributors : algorithm in round robin (can backup each other) CICS TG on z/OS using port sharing (dedicated by business) DB2 datasharing, RLS, Websphere MQ sharing group, shared TSQ servers, named counters servers 4 application LPARs 2100 transactions/sec in peak hour (Tuesday morning) 95 million transactions per day 30 ms average response time 07/11/2015 PAGE 23 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL : Current architecture 4 coupling facilities for the production sysplex for LCL 1 on each of the active CPC 2 external coupling facilities (passive CPCs ) o For the DB2 and RLS lock structures 11 Gb memory each Use of duplexing o Expensive, only for the DB2 group buffer pools o Save 20% CPU on IRLM since the suppression of duplexing for DB2 lock structure Automatic rebuild for the other structures Double failure not handled: loss at the same time of DB2 and its lock structure 07/11/2015 PAGE 24 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL: Current architecture Sysplex Distributor System P Production servers System P Pilot servers 07/11/2015 PAGE 25 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Transaction routing CICS TG – TOR : DFHXCURM The routing between CICS TGs and TORs is performed by a customised DFHXCURM The ECI request provides o The sysplex distributor address o The CICS TG port number to be joined o A logical server name( we are independent if we need to add/suppress a TOR) o The program name to be executed o The transaction id (best practice) Integrates a routing table which takes into account the following criteria o LPAR on which the CICS TG is running o A set of target TORs with « handicap » (local TORs are preferred) 07/11/2015 PAGE 26 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Transaction routing CICS TG – TOR : DFHXCURM Part of the routing table integrated in the module SYSA SYSB XCURMTAB TYPE=SYSTEM,SYSNAME=SYSA XCURMTAB TYPE=GROUP,GROUPID=ZZPL1, TARGET=((PL11,0),(PL12,05)) XCURMTAB TYPE=GROUP,GROUPID=ZZPL9, TARGET=((PL91,0),(PL92,05)) XCURMTAB TYPE=ENDSYS XCURMTAB TYPE=SYSTEM,SYSNAME=SYSB XCURMTAB TYPE=GROUP,GROUPID=ZZPL1, TARGET=((PL12,0),(PL11,05)) XCURMTAB TYPE=GROUP,GROUPID=ZZPL9, TARGET=((PL92,0),(PL91,05)) XCURMTAB TYPE=ENDSYS 07/11/2015 PAGE 27 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Transaction routing TOR - AOR: DFHDYP The request is routed to the less busy AOR The program knows the number of sessions defined between the TOR and each AOR Counts the current number of tasks between the TOR and each AOR We can isolate one AOR or several AORs from routing to lighten an LPAR We can route a transaction or a set of transactions to a AOR or a set of AORs thanks to a configuration file In case of affinity The CSMI transaction is forbidden for routing In order to be able to tune and analyze performance Can’t be set disabled in case of problems 07/11/2015 PAGE 28 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Exploitation Monthly IPL for all the LPARs For LCL only 2 out of 4 LPARs are eligible for batch processing During the IPL of one LPAR, batch is starting on the second one OPCplex Schedule environment For each IPL a change request is associated Our changes (that we declare) can be IPL dependent (ie migration) If the IPL is delayed, we know all the changes associated 07/11/2015 PAGE 29 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Exploitation All the CICS in the same LPAR are stopped/started once a week Start = auto but overriden as cold start after analyzing the DFHGCD (if previous stop OK) To be sure not to forget changes done dynamically CICS TGs are stopped/started one after another every midnight To suppress the affinity between the CICS TGs and the TORs : CTG_PIPE_REUSE= ALL To avoid memory problems 07/11/2015 PAGE 30 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Migration Strategy (1/2) Use of aliases for libraries and symbols : member IEASYM in the z/OS parmlib Transparent for the developers : no JCL modification to do for compiling Example : o CICSTS.CIC.SDFHAUTH : alias (used by developers) o Points for instance at CICSTS.CIC10.SDFHAUTH o CICSTS.CIC&VERCIC..SDFHAUTH : is referenced in the z/OS parmlib o SYMDEF(&VERCIC.=‘10') 07/11/2015 PAGE 31 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Migration Strategy (2/2) A new level is generated for the migration (ie 15) Definitions hard coded in the parmlib for the new libraries to coexist with the 2 CICS versions (LINKLIST,LPA, APF) CICSTS.CIC15.SDFHLINK CICSTS.CIC&VERCIC..SDFHLINK (with &VERCIC = « 10 ») Taken into account during the IPL Allows us to not migrate all the CICS of the same partition at one time The new CICS procedure has the CICS libraries hardcoded The symbol is set to the new level once the last CICS of the last production LPAR has been migrated REXX procedures have been written (we provide the name of the CICS to be migrated) Generates the CICS files Assembles the PLTxx, SIT.. tables Creates the new DFHCSD (one CSD by version) Creates the new CICS procedure 07/11/2015 PAGE 32 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Failover (1/2) The continuity of service is only ensured for the components in high availability Failover of a LPAR o The sysplex distributor will route the requests to one of the CICS TGs running on the 3 LPARs left o If DB2 had retained locks, it will be automatically restarted on another LPAR to release locks, then stops Failover of CICS TG o The sysplex distributor will send the request to another CICS TG listening on the same port Failover of TOR o DFHXCURM detects the error ‘NO CICS ‘ and routes the request to the TOR of another LPAR 07/11/2015 PAGE 33 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Failover(2/2) Failover of AOR o DFHDYP can’t send requests to this CICS anymore; no more connection is available, scans the AORs left and sends the request to the less busy AOR Failover of DB2 o DB2 abend : DB2 is restarted by ARM o DB2 frozen: the number of current tasks inscreases (sessions TOR /AOR) DFHDYP sends the request to another LPAR 07/11/2015 PAGE 34 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS IT contingency plan Every year, we perform 2 IT contingency plans for our customers (one for each site) We isolate one site o The LPARs are restarted on the »passive » CPC of the other site o Activation of CBU o Depending on the preference of our customers, their production can run on this CPC for the weekend or the whole week o We must have the keys for the products for the « passive » CPC if they are depending on serial number 07/11/2015 PAGE 35 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS High Availability : the pros and the cons Pros Continuity of service in case of unavailability of components CICS and CICS TG migrations in production during the service Cons It’s a real project, not only CICS Not magic If there is a application problem (loop, lock on data) it is propagated to the whole CICSplex 07/11/2015 PAGE 36 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS LCL Architecture: what next Planning CICSPlex SM implementation To use CICSTS5.1 new functionalities Lack of assembler skill: use the routing program of CICSPlex SM o The algorithm has been enhanced, use of data spaces o Service classes specified in percentile WUI (Web User Interface) : centralized administration of CICS (SPOC) But we would like to keep the same flexibility for migration in service We don’t use CICS Explorer yet Must be installed on virtual servers: Citrix The flow should be opened from the IP addresses of these servers to the mainframe 07/11/2015 PAGE 37 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS Questions ? 07/11/2015 PAGE 38 IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS