3070 CICS TG and CICS in a High Availability Environment A customer experience with CA-SILCA Sylvie Constans Manager of the CICS & IMS team at CA-SILCA 07/11/2015
Download
Report
Transcript 3070 CICS TG and CICS in a High Availability Environment A customer experience with CA-SILCA Sylvie Constans Manager of the CICS & IMS team at CA-SILCA 07/11/2015
3070 CICS TG and CICS
in a High Availability
Environment
A customer experience with
CA-SILCA
Sylvie Constans
Manager of the CICS & IMS team at CA-SILCA
07/11/2015 PAGE 1
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA and the groupe Crédit Agricole SA
The activities of the group Crédit Agricole SA are organized in 4 business lines.
Retail bank in France and World Wide
Asset management
Specialised financial services
Corporate and investment bank
CA-SILCA is the IT subsidiary of the groupe Crédit Agricole SA.
Is located in France (region of Paris)
07/11/2015 PAGE 2
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA: Overview
Founded in 2005 , groups at the beginning the IT productions of its 3 founding members
Currently we have more than 40 customers
only subsidiaries
Center of expertise for the group
Operation services of IT applications
Buildings infrastructure services telephony, network, office automation
Services for providing workstations for employees
07/11/2015 PAGE 3
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA: Overview
Some keys figures
3 PetaB of storage for the servers
36 000 workstations
33 000 mail boxes
550 hosted web sites
4 000 logical servers
16 000 phone lines
A new data center (5000 m2) composed of 2 sites
07/11/2015 PAGE 4
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA: Mainframe infrastructure
4 ZEC12 located in a 2-site environment (bi-site)
separated by 10 kms
2 CPC active /passive in each site.
Each LPAR on a active CPC has its image on
the passive CPC of the other site.
2827-731 models
60.000 Mips
Primary disk array (active data) in one site
Secondary disk array in the second site
Third site for data replication
07/11/2015 PAGE 5
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
CA-SILCA: Mainframe activity
9 customers (59 LPARs, 60 000 Mips)
CA-SILCA manages the system environment of 5 customers
o 400 CICS (currently migrating from CICSTS4.1 to CICSTS 5.1)
o 45 CICS TG (version 8.1)
o 75 DB2 (DB2 V10 migration in CM mode in 2014)
o 60 WebSphere MQ (version 7.1)
o 15 IMS (migration from IMS V11 to IMS V13 in 2014)
o 40 LPAR z/OS 1.13 (z/OS2.1 migration planned in 2015)
o 6 sysplexes
o Coupling facilities on each CPC
07/11/2015 PAGE 6
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL: Le Crédit Lyonnais
Our main customer is LCL (retail bank)
Founded in Lyon in 1863
1925 agencies, 6 millions customers in France
70 divisions of private banking (150 000 customers)
Retail bank for professionals
Its technical environment
130 CICS, 40 CICS TG, 25 DB2, 25 Websphere MQ
o Some applications in a non HA TOR/AOR architecture
o Benefits from a High Availability Environment for e-banking and intranet
applications
07/11/2015 PAGE 7
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : At the beginning
The agencies are connected to the mainframe
(Datacenter) based on their location
07/11/2015 PAGE 8
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : At the beginning
Région Parisienne
Région Sud Est
TOR1
TOR2
TOR3
AOR1
AOR2
AOR3
Local
DB2
LPAR1
Région Centre
Local
DB2
Local
DB2l
local
VSAM
LPAR2
local
VSAM
LPAR3
local
VSAM
07/11/2015 PAGE 9
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : Current architecture
07/11/2015 PAGE 10
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : At the beginning
In 1995, the mainframes were grouped in the region of Paris on 2 sites.
The agencies are still connected based on their location
Several production problems led LCL to think about a High Availability architecture
In the early 2000s a sysplex was implemented with the help of IBM
9 production LPARs and one DEV
o Implementation of RLS/SMSVSAM
o DB2 Data sharing
First on the DEV LPAR (mono partition) to validate the cost
07/11/2015 PAGE 11
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : At the beginning
At the same time, a major project of merging data is started
Regional files become national
DB2 regional databases become national
TORs are accessed with generic resources
A 3-tier architecture is implemented
WAS are connected to the SNA servers
The applications must comply with this architecture
Very few 3270 applications left
4 LPARs are dedicated to this architecture
o 2 for network purposes, 2 for applications
07/11/2015 PAGE 12
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability
Ferme WAS
de Production TCP/IP
RES1
TCP/IP
Passerelles SNA
LU6.2
VTAM
LU6.2
LU6.2
Ferme WAS
Pilote
LU6.2
RES2
VTAM
LU6.2
LU6.2
LU6.2
TOR Prod
TOR Prod
AOR
AOR Prod
Prod
AOR Prod
TOR Pilote
TOR Pilote
TOR Prod
TOR Prod
AOR Prod
AOR Prod
LU6.2
AOR Pilote
LP1
AOR Pilote
LP2
07/11/2015 PAGE 13
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability
Pilot
Some agencies are connected to the pilot WAS
They have access to the pilot CICS TG and the pilot CICS
We can deploy new versions of programs without impacting all of production
The CICS pilots have one specific load library ahead in the DFHRPL
Production
The rest
07/11/2015 PAGE 14
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture : the start of High Availability
The routing of transactions to the AORs is managed by the dynamic routing program (DFHDYP)
we have customised
At the end of 2003, ETU9XLOP , the dynamic routing program of CICSPlex SM (TS2.2) was
implemented
Because our routing program didn’t satisfed us entirely
Round robin algorithm
Simplistic
Number of transactions/ day : 4 Million (TOR+AOR)
Implementation in goal mode, uses the service class definitions
o Provides average response time and not a percentage
These CICS are clones
No affinities between transactions
07/11/2015 PAGE 15
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability
Implementation of shared TSQ servers
Implementation of named counters servers
Give each application a unique id in the Sysplex
The DEV environments have the same architecture(1 TOR, 2 AORs)
To be sure not to generate affinities between transactions during the development of
applications
07/11/2015 PAGE 16
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability
After one year we decided to stop using EYU9XLOP
Because of the failover of one LPAR : business impact
o CICS has response time heavy degradation in a LPAR (DB2)
o CICSPlex SM continues to route to this LPAR
The services classes were probably not correctly set
The CICSPlex SM/WLM delay in reacting was too long
o The LPAR fails, we have to do an IPL
o The remaining LPAR couldn’t handle the workload
o The WAS fail to handle the incoming requests
We decided to rewrite our routing program to better fit our needs
07/11/2015 PAGE 17
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture : the start of High Availability
What we learned about this architecture
Having only one LPAR is not sufficient in case of failover during the day
We have to restart the critical applications first
We decided to have 4 application LPARs.
In case of failover only 25% of the workload has to be dispatched onto the 3 others
The SNA servers have been replaced by CICS TGs on z/OS
We noticed some affinities between LU6.2 connections and TORs
Loose coupling between WAS and CICS (logical names instead of applids)
White paper :ftp://public.dhe.ibm.com/software/htp/cics/tserver/v32/library/WSW14020-USEN00_systemz_harmony_0324A.pdf
07/11/2015 PAGE 18
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL architecture: the start of High Availability
At that time, CICS TGs were seen as black boxes (prior to version 7.0)
No statistics available
At first CICS TG architecture is mapped on the existing CICS one
Multi channel
No standard monitoring available with the tools on the z platform
Introscope (Wily Technologie) was implemented on one CTG
o Only one CICS TG because of the overhead
A dashboard was implemented with the help of the vendor
o To monitor the JCA pools activity
o To monitor the activity and the CICS response time
07/11/2015 PAGE 19
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
The tools: Introscope (Wily Technologie)
07/11/2015 PAGE 20
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Monitoring tools
D
B
E
F
O
R
E
C
U
R
R
E
N
T
D+1
MAINVIEW CICS/DB2/WMQ
. Real time vision
. System Administration and tuning
. Analyse with 3 tools, LPAR by LPAR
. History (LCL) < 1day (TP DB2 : 15 mn)
INTROSCOPE
Limited problem analysis
(impossible for DB2/LCL)
CICSPA
. Analyze the CICS and CTG SMF records
(DB2, WMQ informations)
Chiffres SAS / METROLOGIE
.Aggregated SMF records
. Aggregation on 1 hour
. Monthly consolidation
INTROSCOPE
No problem analysis
CICSPA
. Analyze the CICS and CTG SMF records
(DB2, WMQ informations)
MAINVIEW CICS/DB2/WMQ
. Real time vision
. System administration and tuning
Problem analysis in 5 minutes
Trend analysis
From D + 2
No problem analysis
CICSPA
. Analyze the CICS and CTG SMF records
(DB2, WMQ informations)
SAS / METROLOGY figures
. Monthly Consolidation
Problem analysis
Trend analysis
Problem analysis
Trend analysis
07/11/2015 PAGE 21
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
The tools: CICS Performance Analyzer (IBM)
CICS Performance Analyzer allows us to do:
Performance analysis by exploiting SMF110(1) records
Tuning for our CICS TGs ( SMF111 records since V7.0)
o Do we have enough connection managers?
o Workers information are taken from the DFHXCURM
o We would like to have « cross domain » informations in batch reports
• Peak numbers of connection managers, number of requests, CICS response time,
Daemon response time…
•
A RFE has been raised n°46252: you can vote for it !!
07/11/2015 PAGE 22
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL: Current architecture
2 sysplex distributors : algorithm in round robin (can backup each other)
CICS TG on
z/OS using port sharing (dedicated by business)
DB2 datasharing, RLS, Websphere MQ sharing group, shared TSQ servers, named counters servers
4 application LPARs
2100 transactions/sec in peak hour (Tuesday morning)
95 million transactions per day
30 ms average response time
07/11/2015 PAGE 23
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL : Current architecture
4 coupling facilities for the production sysplex for LCL
1 on each of the active CPC
2 external coupling facilities (passive CPCs )
o For the DB2 and RLS lock structures
11 Gb memory each
Use of duplexing
o Expensive, only for the DB2 group buffer pools
o Save 20% CPU on IRLM since the suppression of duplexing for DB2 lock structure
Automatic rebuild for the other structures
Double failure not handled: loss at the same time of DB2 and its lock structure
07/11/2015 PAGE 24
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL: Current architecture
Sysplex Distributor
System P
Production servers
System P
Pilot servers
07/11/2015 PAGE 25
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Transaction routing CICS TG – TOR : DFHXCURM
The routing between CICS TGs and TORs is performed by a customised DFHXCURM
The ECI request provides
o The sysplex distributor address
o The CICS TG port number to be joined
o A logical server name( we are independent if we need to add/suppress a TOR)
o The program name to be executed
o The transaction id (best practice)
Integrates a routing table which takes into account the following criteria
o LPAR on which the CICS TG is running
o A set of target TORs with « handicap » (local TORs are preferred)
07/11/2015 PAGE 26
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Transaction routing CICS TG – TOR : DFHXCURM
Part of the routing table integrated in the module
SYSA
SYSB
XCURMTAB TYPE=SYSTEM,SYSNAME=SYSA
XCURMTAB TYPE=GROUP,GROUPID=ZZPL1,
TARGET=((PL11,0),(PL12,05))
XCURMTAB TYPE=GROUP,GROUPID=ZZPL9,
TARGET=((PL91,0),(PL92,05))
XCURMTAB TYPE=ENDSYS
XCURMTAB TYPE=SYSTEM,SYSNAME=SYSB
XCURMTAB TYPE=GROUP,GROUPID=ZZPL1,
TARGET=((PL12,0),(PL11,05))
XCURMTAB TYPE=GROUP,GROUPID=ZZPL9,
TARGET=((PL92,0),(PL91,05))
XCURMTAB TYPE=ENDSYS
07/11/2015 PAGE 27
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Transaction routing TOR - AOR: DFHDYP
The request is routed to the less busy AOR
The program knows the number of sessions defined between the TOR and each AOR
Counts the current number of tasks between the TOR and each AOR
We can isolate one AOR or several AORs from routing to lighten an LPAR
We can route a transaction or a set of transactions to a AOR or a set of AORs thanks to a
configuration file
In case of affinity
The CSMI transaction is forbidden for routing
In order to be able to tune and analyze performance
Can’t be set disabled in case of problems
07/11/2015 PAGE 28
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Exploitation
Monthly IPL for all the LPARs
For LCL only 2 out of 4 LPARs are eligible for batch processing
During the IPL of one LPAR, batch is starting on the second one
OPCplex
Schedule environment
For each IPL a change request is associated
Our changes (that we declare) can be IPL dependent (ie migration)
If the IPL is delayed, we know all the changes associated
07/11/2015 PAGE 29
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Exploitation
All the CICS in the same LPAR are stopped/started once a week
Start = auto but overriden as cold start after analyzing the DFHGCD (if previous stop OK)
To be sure not to forget changes done dynamically
CICS TGs are stopped/started one after another every midnight
To suppress the affinity between the CICS TGs and the TORs : CTG_PIPE_REUSE= ALL
To avoid memory problems
07/11/2015 PAGE 30
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Migration Strategy (1/2)
Use of aliases for libraries and symbols : member IEASYM in the z/OS parmlib
Transparent for the developers : no JCL modification to do for compiling
Example :
o CICSTS.CIC.SDFHAUTH : alias (used by developers)
o Points for instance at CICSTS.CIC10.SDFHAUTH
o CICSTS.CIC&VERCIC..SDFHAUTH : is referenced in the z/OS parmlib
o SYMDEF(&VERCIC.=‘10')
07/11/2015 PAGE 31
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Migration Strategy (2/2)
A new level is generated for the migration (ie 15)
Definitions hard coded in the parmlib for the new libraries to coexist with the 2 CICS versions
(LINKLIST,LPA, APF)
CICSTS.CIC15.SDFHLINK
CICSTS.CIC&VERCIC..SDFHLINK (with &VERCIC = « 10 »)
Taken into account during the IPL
Allows us to not migrate all the CICS of the same partition at one time
The new CICS procedure has the CICS libraries hardcoded
The symbol is set to the new level once the last CICS of the last production LPAR has been migrated
REXX procedures have been written (we provide the name of the CICS to be migrated)
Generates the CICS files
Assembles the PLTxx, SIT.. tables
Creates the new DFHCSD (one CSD by version)
Creates the new CICS procedure
07/11/2015 PAGE 32
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Failover (1/2)
The continuity of service is only ensured for the components in high availability
Failover of a LPAR
o The sysplex distributor will route the requests to one of the CICS TGs running on the 3
LPARs left
o If DB2 had retained locks, it will be automatically restarted on another LPAR to release
locks, then stops
Failover of CICS TG
o The sysplex distributor will send the request to another CICS TG listening on the same
port
Failover of TOR
o DFHXCURM detects the error ‘NO CICS ‘ and routes the request to the TOR of another
LPAR
07/11/2015 PAGE 33
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Failover(2/2)
Failover of AOR
o DFHDYP can’t send requests to this CICS anymore; no more connection is available,
scans the AORs left and sends the request to the less busy AOR
Failover of DB2
o DB2 abend : DB2 is restarted by ARM
o DB2 frozen: the number of current tasks inscreases (sessions TOR /AOR)
DFHDYP sends the request to another LPAR
07/11/2015 PAGE 34
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
IT contingency plan
Every year, we perform 2 IT contingency plans for our customers (one for each site)
We isolate one site
o The LPARs are restarted on the »passive » CPC of the other site
o Activation of CBU
o Depending on the preference of our customers, their production can run on this CPC for
the weekend or the whole week
o We must have the keys for the products for the « passive » CPC if they are depending
on serial number
07/11/2015 PAGE 35
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
High Availability : the pros and the cons
Pros
Continuity of service in case of unavailability of components
CICS and CICS TG migrations in production during the service
Cons
It’s a real project, not only CICS
Not magic
If there is a application problem (loop, lock on data) it is propagated to the whole CICSplex
07/11/2015 PAGE 36
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
LCL Architecture: what next
Planning CICSPlex SM implementation
To use CICSTS5.1 new functionalities
Lack of assembler skill: use the routing program of CICSPlex SM
o The algorithm has been enhanced, use of data spaces
o Service classes specified in percentile
WUI (Web User Interface) : centralized administration of CICS (SPOC)
But we would like to keep the same flexibility for migration in service
We don’t use CICS Explorer yet
Must be installed on virtual servers: Citrix
The flow should be opened from the IP addresses of these servers to the mainframe
07/11/2015 PAGE 37
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS
Questions ?
07/11/2015 PAGE 38
IBM IMPACT 2014 Conference 27 April- 1 May LAS VEGAS