High Availability for IBM Power i www.quick-software-line.com Troubleshooting Communications Replication jobs Audit & journaling Database IFS Other objects Performances END How to contact.
Download ReportTranscript High Availability for IBM Power i www.quick-software-line.com Troubleshooting Communications Replication jobs Audit & journaling Database IFS Other objects Performances END How to contact.
High Availability for IBM Power i www.quick-software-line.com Troubleshooting Communications Replication jobs Audit & journaling Database IFS Other objects Performances END How to contact support www.quick-software-line.com Communications How the communication daemon works? EDH_xx_SND Sender PMSYSDEM Daemon TCP/IP The source job EDH_xx_SND is started on the source system; xx is the name of the environment It sends a connection request to the target system The daemon receives the request ; it submits the target job EDH_xx_RCV EDH_xx_RCV Receiver END Note: Once communication is established between SND and RCV, the daemon doesn’t intervene into the communication process www.quick-software-line.com TCP/IP daemon « PMSYSDEM » is active on the target system Then, it does a « GiveDescriptor » to transfer the communication control to the « RCV » job The job EDH_xx_RCV starts. It does a « TakeDescriptor » to take control of the communication with the sender job Communications Analysis of communication errors Step Function In case of issue Sender job is submitted, through QuickEDD/HA menu (S=Start) or by the command PMEDHCTL, or by the command PMEDHSTR Check that the job is submitted correctly and active: WRKACTJOB SBS(PMEDH) Communication with the target system Call is received by the TCP/IP daemon Do your systems communicate ? Perform a PING to check that Is the daemon active ? Use the command PMSYSDEM to check it. Check also with a NETSTAT that the TCP/IP port is in « Listen » status Daemon submits the receiver job Check the PMSYSDEMON joblog, active in sub system QSYSWRK to ensure that the request has been received and that a job has been submitted 4 The « communication descriptor » is transmitted Check the PMSYSDEMON joblog, to check the order « Give descriptor » 5 The « communication descriptor » is received Check the receiver joblog, to check the order « Take descriptor » at the beginning of its execution 6 Communications are active between the source and the target The communications scheme is functional Problem within Quick-EDD/HA or execution issue / Check the joblogs of sender and receiver jobs 1 2 3 www.quick-software-line.com Communications Communication daemon PMSYSDEM Additional elements • As long as the communications are not properly established, the « sender » and « receiver » jobs of Quick-EDD/HA can be stopped in *IMMED mode, without any specific control • The «TCP/IP DAEMON» is necessary only to establish the communications – afterwards it can be stopped and restarted any time you need, without any risk for the running jobs Problem ? The PMSYSDEMON job uses two PMSYSDEM objects, one *DTAQ and one *USRSPC. It might happen that those objects become damaged. It’s the case if the job uses a lot of CPU and if new communications can’t be started. Procedure to follow : - Stop manually the job PMSYSDEMON (ENDJOB) - Destroy the two objects – WRKOBJ QUSRSYS/PMSYSDEM then option 4=Delete - Restart the daemon: PMSYSDEM OPTION(*STR) www.quick-software-line.com Replication jobs Diagram of replication jobs The journals server(s) feed(s) the job EDH_xx_SND EDH_xx_Jnn Journals reading The « SND » job transmits the events to RCV EDH_xx_SND Sender EDH_xx_RCV Receiver The RCV job transmits the events to the data servers. Once processed, the events are acknowledged EDH_xx_Xnn I/Os processing END EDH_xx_S01 Synchro 1 www.quick-software-line.com EDH_xx_R01 Synchro 1 In case of negative acknowledgement or for the new objects, a synchronization is performed The synchro is acknowledged thanks to a journal entry Jobs Analysis In case the replication has suddenly stopped • If the replication stops abnormally, the concerned jobs are EDH_xx_SND on the source system and EDH_xx_RCV on the target system Check the JOBLOG of each one of those jobs You can check also the JOBLOG or the data servers EDH_xx_Xnn on the target system END In case a synchronization job stops suddenly • If there is a severe error on an object, the synchronization job can stop in a abnormal way. REPLICATION KEEPS RUNNING. A new server will be launched Depending on the kind of mistake, the concerned object will be synchronized again, or a manual action will be needed if the error is « fatal » The error message shows the number of the server – check the JOBLOG of the jobs EDH_xx_Snn on the source system and corresponding EDH_xx_Rnn on the target system, « nn » being the number of the server www.quick-software-line.com Jobs Analysis - Replication On the source machine, WRKJOB EDH_xx_SND Enter 4, Spools files management . The QPJOBLOG file contains the joblog of EDH_xx_SND On the target system, WRKJOB EDH_xx_RCV Enter 4, Spools files management END . The QPJOBLOG file contains the joblog of EDH_xx_RCV NB: WRKJOB JOB(NUMBER/USER/JOB) OUTPUT(*PRINT) OPTION(*JOBLOG) www.quick-software-line.com prints the log of an active job Jobs Analysis – Synchronization Abnormal stop of a job On the target system, WRKJOB EDH_xx_Rnn Enter 4, Spools files management. The file QPJOBLOG contains the joblog of EDH_xx_Rnn END www.quick-software-line.com Jobs Analysis – Synchronization Error without abnormal end of a job - 1 On the source system, enter + in front of the line where there is one « nok » object Enter F8, then put 1 in the part « Synchro ». The object in error appears in blue reverse video. END www.quick-software-line.com Jobs Analysis – Synchronization Error without abnormal end of a job - 2 Enter M in front of the object in error. The error displays the source synchronization job, here EDH_xx_S01. Enter W to access this job, then its JOBLOG. END If the information of the joblog of EDH_xx_Snn is not explicit, you’ll have to check the joblog of EDH_xx_Rnn on the target system. www.quick-software-line.com Audit During the installation of Quick-EDD/HA the audit of the system is automatically activated Creation of the audit journal and its associated receiver Activation of the system values QAUDCTL and QAUDLVL • All the objects included in the perimeter of Quick-EDD/HA will be automatically audited with *CHANGE level (during start in 9 or 0 and when a new object appears) END Problem ? • In case of general trouble check that the audit is still active on the system and that the audit journal is present Quick-EDD/HA can’t start if the audit journal is missing – The system value QAUDCTL must not be equal to *NONE • In case of issues on some objects check that the object is audited (command DSPOBJD – the audit level must be *CHANGE) • The audit journal represents big volumes every day. Check the contents of the journal with the command DSPJRN to determine the kind of journal entries you have. The most probable cause is a level of audit which is too high (*ALL) on some objects www.quick-software-line.com Journaling The journaling is mandatory to process the database properly It’s mandatory to be able to replicate in real time Quick-EDD/HA supports all options (*AFTER or *BOTH – With or without the Open/Close – MINENTDTA - Journal Cache, …) • The journaling is optional for the IFS Only the applications which are able to UPDATE inside the IFS need journaling: txt files, Java files, SAP, Movex, JDEdwards, Adobe Same rules as for the database Problem ? • A file is not replicated Check that it is journaled (command DSPFD) Check that the journal is taken into account by Quick-EDD/HA and read in sequence Check that the replication has no delay, generally speaking and for that journal • No object is replicated END Check Check Check Check the the the the list of journals which are processed by Quick-EDD/HA journals servers jobs EDH_xx_J01,02, … target jobs which apply the entries EDH_xx_X01, 02, … communication jobs EDH_xx_SND et EDH_xx_RCV www.quick-software-line.com Journals receivers management Journals receivers management • Quick-EDD/HA manages the journal receivers automatically : Entirely Management rules for the detachment and deletion of the receivers Partially – for a journal, you can choose: To use the standard rules of management To keep the receivers – detachment management but no deletion No action. The journal is entirely managed externally Problem ? • The journal receivers are not deleted on the source system Check the standard management rules. Do you have the issue for only one journal ? Check if the journal has specific options This operation is managed by the « SND » job – check its JOBLOG Check that there is no receiver in partial status for that journal • The journal receivers are not deleted on the target system END Check the setting « Receivers management » in the target system description to check that option 1 or 2 is activated Check that the receivers library is replicated This operation is managed by « RCV » job – check its JOBLOG www.quick-software-line.com Database The database replication represents the bigger part of the replication, often more than 90% of the activity. Several issues can appear on those objects : • Management of the database object Complex object structure Ex. Fields BLOB, CLOB, … Object dependences (LF, joined file, referential constraints …) Triggers Management Number of access paths, having an impact over performances • Data Management Replication in real time – all the journal entries are taken into account SQL management has rules which are different from classical DB/2 END • Use of the target data To provide R.O.I., Quick-EDD/HA allow you to access (read mode) to the data on the target system The different needs on the target system can create constraints for the real time replication www.quick-software-line.com IFS The IFS replication is often very simple because, most of the time, it deals with files which are created, then stored (EDI, archiving …). The main difficulty with IFS is the contents control. In fact, the IFS files are often simple and small ; however, coherence controls become tough, because of the tree structure and the numnber of objects (you can have millions of objects on hundreds of levels). • Audit and journaling As any other object, the IFS files are audited journaling is rarely mandatory (txt files, Java files, SAP, Movex, JDEdwards, Adobe) Problem ? • The replication is not done As for any other object, check the audit level The replication is managed by the synchronization jobs – Display the messages of the objects to find the concerned synchronization job, then check the JOBLOGs of the source and target jobs END • journaled IFS? In case of a journaled IFS, as for the database, check that the object is journaled properly, then check that the journal is included in the list of Quick-EDD/HA and that it is properly processed www.quick-software-line.com IFS - QDLS QDLS comes from older releases of the OS and corresponds to a « DOS » structure. It is integrated to the IFS with some specific considerations : • Audit and journaling The files of QDLS are audited as any other object The journaling of QDLS is IMPOSSIBLE To use QDLS the user profile must be registered in the system directory (WRKDIRE) Problem ? • Replication is not done As for any other object, check the audit level of the object Check that the user profile used for the replication is properly registered in the WRKDIRE As for the IFS, check the messages at the object level, and display the JOBLOG of the corresponding source and target synchonization jobs END www.quick-software-line.com Other objects There are many objects types in the system. However, they all work the same way : • For Quick-EDD/HA, all the objects are managed with the same rules Definition inside a group Real time replication of the journal events thanks to the data servers The synchronization servers process all the types of objects, system objects, IFS, system values or spools files Problem ? • The replication is not done For any type of object, check the audit level Check the messages at the level of the objects, and display the JOBLOG of the corresponding source and target jobs For the spools files, check that the system value QAUDLVL uses the special value *SPLFDTA END www.quick-software-line.com Performances Performances rely on three distinct points : • The ability to read the journal entries on the SOURCE system • The communications, with the bandwith of the line between the source and target • The ability of the target system to process the I/Os END Problem ? • Does the SOURCE system have enough ressources for the EDH_xx_Jnn servers Check the memory pool usage. By default, jobs run in the *BASE pool. It can be beneficial to create a dedicated pool. • Is the communication line bandwith adapted to the replication needs ? Check the adequation between the line and the replication needs • Is the communication line dedicated to the replication ? Check the line usage • Does the target system have enough ressources to process the I/Os at the same rythm as the SOURCE system? The disks and number of arms of the Target system are very important and must be equivalent to the ones on the source system. This point if often neglicted: either the target system uses old generation disks, or less arms because of large capacity disks. www.quick-software-line.com Contact the SUPPORT Several ways to contact support : By phone +33 153 102 767 By Email [email protected] Via Skype support.traders.fr END www.quick-software-line.com Contact the SUPPORT To register properly your issue, the Support will probably ask the following elements: - release of Quick-EDD/HA used on your systems - release of OS/400 of the Source and Target machines If you have an issue regarding a product abnormality, you’ll have to provide: - the JOBLOG of the source jobs - the JOBLOG of the target jobs END Note : - If you have a specific issue, the Support may need the concerned object, in order to the development teams to analyze the issue on our test systems www.quick-software-line.com