3070 CICS TG and CICS in a High Availability Environment A customer experience with CA-SILCA Sylvie Constans Manager of the CICS & IMS team at CA-SILCA 07/11/2015

Download Report

Transcript 3070 CICS TG and CICS in a High Availability Environment A customer experience with CA-SILCA Sylvie Constans Manager of the CICS & IMS team at CA-SILCA 07/11/2015

3070 CICS TG and CICS
in a High Availability
Environment
A customer experience with
CA-SILCA
Sylvie Constans
Manager of the CICS & IMS team at CA-SILCA
07/11/2015 PAGE 1
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA and the groupe Crédit Agricole SA

The activities of the group Crédit Agricole SA are organized in 4 business lines.
 Retail bank in France and World Wide
 Asset management
 Specialised financial services
 Corporate and investment bank

CA-SILCA is the IT subsidiary of the groupe Crédit Agricole SA.
Is located in France (region of Paris)

07/11/2015 PAGE 2
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA: Overview



Founded in 2005 , groups at the beginning the IT productions of its 3 founding members
Currently we have more than 40 customers
 only subsidiaries
Center of expertise for the group
 Operation services of IT applications
 Buildings infrastructure services telephony, network, office automation
 Services for providing workstations for employees
07/11/2015 PAGE 3
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA: Overview

Some keys figures







3 PetaB of storage for the servers
36 000 workstations
33 000 mail boxes
550 hosted web sites
4 000 logical servers
16 000 phone lines
A new data center (5000 m2) composed of 2 sites
07/11/2015 PAGE 4
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA: Mainframe infrastructure

4 ZEC12 located in a 2-site environment (bi-site)
separated by 10 kms







2 CPC active /passive in each site.
Each LPAR on a active CPC has its image on
the passive CPC of the other site.
2827-731 models
60.000 Mips
Primary disk array (active data) in one site
Secondary disk array in the second site
Third site for data replication
07/11/2015 PAGE 5
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA: Mainframe activity

9 customers (59 LPARs, 60 000 Mips)
 CA-SILCA manages the system environment of 5 customers
o 400 CICS (currently migrating from CICSTS4.1 to CICSTS 5.1)
o 45 CICS TG (version 8.1)
o 75 DB2 (DB2 V10 migration in CM mode in 2014)
o 60 WebSphere MQ (version 7.1)
o 15 IMS (migration from IMS V11 to IMS V13 in 2014)
o 40 LPAR z/OS 1.13 (z/OS2.1 migration planned in 2015)
o 6 sysplexes
o Coupling facilities on each CPC
07/11/2015 PAGE 6
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL: Le Crédit Lyonnais

Our main customer is LCL (retail bank)
 Founded in Lyon in 1863
 1925 agencies, 6 millions customers in France
 70 divisions of private banking (150 000 customers)
 Retail bank for professionals

Its technical environment
 130 CICS, 40 CICS TG, 25 DB2, 25 Websphere MQ
o Some applications in a non HA TOR/AOR architecture
o Benefits from a High Availability Environment for e-banking and intranet
applications
07/11/2015 PAGE 7
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : At the beginning
The agencies are connected to the mainframe
(Datacenter) based on their location
07/11/2015 PAGE 8
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : At the beginning
Région Parisienne
Région Sud Est
TOR1
TOR2
TOR3
AOR1
AOR2
AOR3
Local
DB2
LPAR1
Région Centre
Local
DB2
Local
DB2l
local
VSAM
LPAR2
local
VSAM
LPAR3
local
VSAM
07/11/2015 PAGE 9
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : Current architecture
07/11/2015 PAGE 10
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : At the beginning


In 1995, the mainframes were grouped in the region of Paris on 2 sites.
 The agencies are still connected based on their location
Several production problems led LCL to think about a High Availability architecture
 In the early 2000s a sysplex was implemented with the help of IBM
 9 production LPARs and one DEV
o Implementation of RLS/SMSVSAM
o DB2 Data sharing
 First on the DEV LPAR (mono partition) to validate the cost
07/11/2015 PAGE 11
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : At the beginning

At the same time, a major project of merging data is started
 Regional files become national
 DB2 regional databases become national

TORs are accessed with generic resources

A 3-tier architecture is implemented
 WAS are connected to the SNA servers
 The applications must comply with this architecture
 Very few 3270 applications left
 4 LPARs are dedicated to this architecture
o 2 for network purposes, 2 for applications
07/11/2015 PAGE 12
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability
Ferme WAS
de Production TCP/IP
RES1
TCP/IP
Passerelles SNA
LU6.2
VTAM
LU6.2
LU6.2
Ferme WAS
Pilote
LU6.2
RES2
VTAM
LU6.2
LU6.2
LU6.2
TOR Prod
TOR Prod
AOR
AOR Prod
Prod
AOR Prod
TOR Pilote
TOR Pilote
TOR Prod
TOR Prod
AOR Prod
AOR Prod
LU6.2
AOR Pilote
LP1
AOR Pilote
LP2
07/11/2015 PAGE 13
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability


Pilot
 Some agencies are connected to the pilot WAS
 They have access to the pilot CICS TG and the pilot CICS
 We can deploy new versions of programs without impacting all of production
 The CICS pilots have one specific load library ahead in the DFHRPL
Production
 The rest
07/11/2015 PAGE 14
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture : the start of High Availability

The routing of transactions to the AORs is managed by the dynamic routing program (DFHDYP)
we have customised

At the end of 2003, ETU9XLOP , the dynamic routing program of CICSPlex SM (TS2.2) was
implemented
 Because our routing program didn’t satisfed us entirely
 Round robin algorithm
 Simplistic
 Number of transactions/ day : 4 Million (TOR+AOR)
 Implementation in goal mode, uses the service class definitions
o Provides average response time and not a percentage
 These CICS are clones
 No affinities between transactions
07/11/2015 PAGE 15
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability



Implementation of shared TSQ servers
Implementation of named counters servers
 Give each application a unique id in the Sysplex
The DEV environments have the same architecture(1 TOR, 2 AORs)
 To be sure not to generate affinities between transactions during the development of
applications
07/11/2015 PAGE 16
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability

After one year we decided to stop using EYU9XLOP
 Because of the failover of one LPAR : business impact
o CICS has response time heavy degradation in a LPAR (DB2)
o CICSPlex SM continues to route to this LPAR
 The services classes were probably not correctly set
 The CICSPlex SM/WLM delay in reacting was too long
o The LPAR fails, we have to do an IPL
o The remaining LPAR couldn’t handle the workload
o The WAS fail to handle the incoming requests

We decided to rewrite our routing program to better fit our needs
07/11/2015 PAGE 17
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture : the start of High Availability

What we learned about this architecture
 Having only one LPAR is not sufficient in case of failover during the day
 We have to restart the critical applications first

We decided to have 4 application LPARs.
 In case of failover only 25% of the workload has to be dispatched onto the 3 others

The SNA servers have been replaced by CICS TGs on z/OS
 We noticed some affinities between LU6.2 connections and TORs
 Loose coupling between WAS and CICS (logical names instead of applids)

White paper :ftp://public.dhe.ibm.com/software/htp/cics/tserver/v32/library/WSW14020-USEN00_systemz_harmony_0324A.pdf
07/11/2015 PAGE 18
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability



At that time, CICS TGs were seen as black boxes (prior to version 7.0)
 No statistics available
At first CICS TG architecture is mapped on the existing CICS one
 Multi channel
No standard monitoring available with the tools on the z platform
 Introscope (Wily Technologie) was implemented on one CTG
o Only one CICS TG because of the overhead
 A dashboard was implemented with the help of the vendor
o To monitor the JCA pools activity
o To monitor the activity and the CICS response time
07/11/2015 PAGE 19
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
The tools: Introscope (Wily Technologie)
07/11/2015 PAGE 20
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Monitoring tools
D
B
E
F
O
R
E
C
U
R
R
E
N
T
D+1
MAINVIEW CICS/DB2/WMQ
. Real time vision
. System Administration and tuning
. Analyse with 3 tools, LPAR by LPAR
. History (LCL) < 1day (TP DB2 : 15 mn)
INTROSCOPE
Limited problem analysis
(impossible for DB2/LCL)
CICSPA
. Analyze the CICS and CTG SMF records
(DB2, WMQ informations)
Chiffres SAS / METROLOGIE
.Aggregated SMF records
. Aggregation on 1 hour
. Monthly consolidation
INTROSCOPE
No problem analysis
CICSPA
. Analyze the CICS and CTG SMF records
(DB2, WMQ informations)
MAINVIEW CICS/DB2/WMQ
. Real time vision
. System administration and tuning
Problem analysis in 5 minutes
Trend analysis
From D + 2
No problem analysis
CICSPA
. Analyze the CICS and CTG SMF records
(DB2, WMQ informations)
SAS / METROLOGY figures
. Monthly Consolidation
Problem analysis
Trend analysis
Problem analysis
Trend analysis
07/11/2015 PAGE 21
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
The tools: CICS Performance Analyzer (IBM)

CICS Performance Analyzer allows us to do:
 Performance analysis by exploiting SMF110(1) records
 Tuning for our CICS TGs ( SMF111 records since V7.0)
o Do we have enough connection managers?
o Workers information are taken from the DFHXCURM
o We would like to have « cross domain » informations in batch reports
• Peak numbers of connection managers, number of requests, CICS response time,
Daemon response time…
•
A RFE has been raised n°46252: you can vote for it !!
07/11/2015 PAGE 22
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL: Current architecture
 2 sysplex distributors : algorithm in round robin (can backup each other)
 CICS TG on
z/OS using port sharing (dedicated by business)
 DB2 datasharing, RLS, Websphere MQ sharing group, shared TSQ servers, named counters servers
 4 application LPARs
 2100 transactions/sec in peak hour (Tuesday morning)
 95 million transactions per day
 30 ms average response time
07/11/2015 PAGE 23
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : Current architecture

4 coupling facilities for the production sysplex for LCL
 1 on each of the active CPC
 2 external coupling facilities (passive CPCs )
o For the DB2 and RLS lock structures
 11 Gb memory each
 Use of duplexing
o Expensive, only for the DB2 group buffer pools
o Save 20% CPU on IRLM since the suppression of duplexing for DB2 lock structure
 Automatic rebuild for the other structures
 Double failure not handled: loss at the same time of DB2 and its lock structure
07/11/2015 PAGE 24
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL: Current architecture
Sysplex Distributor
System P
Production servers
System P
Pilot servers
07/11/2015 PAGE 25
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Transaction routing CICS TG – TOR : DFHXCURM

The routing between CICS TGs and TORs is performed by a customised DFHXCURM
 The ECI request provides
o The sysplex distributor address
o The CICS TG port number to be joined
o A logical server name( we are independent if we need to add/suppress a TOR)
o The program name to be executed
o The transaction id (best practice)

Integrates a routing table which takes into account the following criteria
o LPAR on which the CICS TG is running
o A set of target TORs with « handicap » (local TORs are preferred)
07/11/2015 PAGE 26
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Transaction routing CICS TG – TOR : DFHXCURM
Part of the routing table integrated in the module
SYSA
SYSB
XCURMTAB TYPE=SYSTEM,SYSNAME=SYSA
XCURMTAB TYPE=GROUP,GROUPID=ZZPL1,
TARGET=((PL11,0),(PL12,05))
XCURMTAB TYPE=GROUP,GROUPID=ZZPL9,
TARGET=((PL91,0),(PL92,05))
XCURMTAB TYPE=ENDSYS
XCURMTAB TYPE=SYSTEM,SYSNAME=SYSB
XCURMTAB TYPE=GROUP,GROUPID=ZZPL1,
TARGET=((PL12,0),(PL11,05))
XCURMTAB TYPE=GROUP,GROUPID=ZZPL9,
TARGET=((PL92,0),(PL91,05))
XCURMTAB TYPE=ENDSYS
07/11/2015 PAGE 27
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Transaction routing TOR - AOR: DFHDYP

The request is routed to the less busy AOR
 The program knows the number of sessions defined between the TOR and each AOR
 Counts the current number of tasks between the TOR and each AOR

We can isolate one AOR or several AORs from routing to lighten an LPAR

We can route a transaction or a set of transactions to a AOR or a set of AORs thanks to a
configuration file
 In case of affinity

The CSMI transaction is forbidden for routing
 In order to be able to tune and analyze performance
 Can’t be set disabled in case of problems
07/11/2015 PAGE 28
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Exploitation

Monthly IPL for all the LPARs

For LCL only 2 out of 4 LPARs are eligible for batch processing

During the IPL of one LPAR, batch is starting on the second one
 OPCplex
 Schedule environment

For each IPL a change request is associated

Our changes (that we declare) can be IPL dependent (ie migration)
 If the IPL is delayed, we know all the changes associated
07/11/2015 PAGE 29
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Exploitation

All the CICS in the same LPAR are stopped/started once a week
 Start = auto but overriden as cold start after analyzing the DFHGCD (if previous stop OK)
 To be sure not to forget changes done dynamically

CICS TGs are stopped/started one after another every midnight
 To suppress the affinity between the CICS TGs and the TORs : CTG_PIPE_REUSE= ALL
 To avoid memory problems
07/11/2015 PAGE 30
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Migration Strategy (1/2)

Use of aliases for libraries and symbols : member IEASYM in the z/OS parmlib
 Transparent for the developers : no JCL modification to do for compiling
 Example :
o CICSTS.CIC.SDFHAUTH : alias (used by developers)
o Points for instance at CICSTS.CIC10.SDFHAUTH
o CICSTS.CIC&VERCIC..SDFHAUTH : is referenced in the z/OS parmlib
o SYMDEF(&VERCIC.=‘10')
07/11/2015 PAGE 31
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Migration Strategy (2/2)






A new level is generated for the migration (ie 15)
Definitions hard coded in the parmlib for the new libraries to coexist with the 2 CICS versions
(LINKLIST,LPA, APF)
 CICSTS.CIC15.SDFHLINK
 CICSTS.CIC&VERCIC..SDFHLINK (with &VERCIC = « 10 »)
 Taken into account during the IPL
Allows us to not migrate all the CICS of the same partition at one time
The new CICS procedure has the CICS libraries hardcoded
The symbol is set to the new level once the last CICS of the last production LPAR has been migrated
REXX procedures have been written (we provide the name of the CICS to be migrated)
 Generates the CICS files
 Assembles the PLTxx, SIT.. tables
 Creates the new DFHCSD (one CSD by version)
 Creates the new CICS procedure
07/11/2015 PAGE 32
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Failover (1/2)

The continuity of service is only ensured for the components in high availability
 Failover of a LPAR
o The sysplex distributor will route the requests to one of the CICS TGs running on the 3
LPARs left
o If DB2 had retained locks, it will be automatically restarted on another LPAR to release
locks, then stops
 Failover of CICS TG
o The sysplex distributor will send the request to another CICS TG listening on the same
port
 Failover of TOR
o DFHXCURM detects the error ‘NO CICS ‘ and routes the request to the TOR of another
LPAR
07/11/2015 PAGE 33
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Failover(2/2)

Failover of AOR
o DFHDYP can’t send requests to this CICS anymore; no more connection is available,
scans the AORs left and sends the request to the less busy AOR

Failover of DB2
o DB2 abend : DB2 is restarted by ARM
o DB2 frozen: the number of current tasks inscreases (sessions TOR /AOR)
 DFHDYP sends the request to another LPAR
07/11/2015 PAGE 34
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
IT contingency plan

Every year, we perform 2 IT contingency plans for our customers (one for each site)
 We isolate one site
o The LPARs are restarted on the »passive » CPC of the other site
o Activation of CBU
o Depending on the preference of our customers, their production can run on this CPC for
the weekend or the whole week
o We must have the keys for the products for the « passive » CPC if they are depending
on serial number
07/11/2015 PAGE 35
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
High Availability : the pros and the cons

Pros


Continuity of service in case of unavailability of components
CICS and CICS TG migrations in production during the service

Cons

It’s a real project, not only CICS
Not magic
 If there is a application problem (loop, lock on data) it is propagated to the whole CICSplex

07/11/2015 PAGE 36
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL Architecture: what next


Planning CICSPlex SM implementation
 To use CICSTS5.1 new functionalities
 Lack of assembler skill: use the routing program of CICSPlex SM
o The algorithm has been enhanced, use of data spaces
o Service classes specified in percentile
 WUI (Web User Interface) : centralized administration of CICS (SPOC)
 But we would like to keep the same flexibility for migration in service
We don’t use CICS Explorer yet
 Must be installed on virtual servers: Citrix
 The flow should be opened from the IP addresses of these servers to the mainframe
07/11/2015 PAGE 37
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Questions ?
07/11/2015 PAGE 38
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS