High Availability for IBM Power i www.quick-software-line.com Troubleshooting  Communications  Replication jobs  Audit & journaling  Database  IFS  Other objects  Performances END  How to contact.

Download Report

Transcript High Availability for IBM Power i www.quick-software-line.com Troubleshooting  Communications  Replication jobs  Audit & journaling  Database  IFS  Other objects  Performances END  How to contact.

High Availability
for IBM Power i
www.quick-software-line.com
Troubleshooting
 Communications
 Replication jobs
 Audit & journaling
 Database
 IFS
 Other objects
 Performances
END
 How to contact support
www.quick-software-line.com
Communications
How the communication daemon works?
EDH_xx_SND
Sender
PMSYSDEM
Daemon TCP/IP
The source job
EDH_xx_SND is
started on the source
system;
xx is the name of the
environment
It sends a connection request to the
target system
The daemon receives the request ; it
submits the target job EDH_xx_RCV
EDH_xx_RCV
Receiver
END
Note: Once communication is
established between SND and
RCV, the daemon doesn’t
intervene into the communication
process
www.quick-software-line.com
TCP/IP daemon « PMSYSDEM » is
active on the target system
Then, it does a « GiveDescriptor » to
transfer the communication control to
the « RCV » job
The job EDH_xx_RCV starts. It
does a « TakeDescriptor » to take
control of the communication with
the sender job
Communications
Analysis of communication errors
Step
Function
In case of issue
Sender job is submitted, through QuickEDD/HA menu (S=Start) or by the
command PMEDHCTL, or by the
command PMEDHSTR
Check that the job is submitted correctly and active:
WRKACTJOB SBS(PMEDH)
Communication with the target system
Call is received by the TCP/IP daemon
Do your systems communicate ? Perform a PING to check
that
Is the daemon active ? Use the command PMSYSDEM to
check it. Check also with a NETSTAT that the TCP/IP port is
in « Listen » status
Daemon submits the receiver job
Check the PMSYSDEMON joblog, active in sub system
QSYSWRK to ensure that the request has been received
and that a job has been submitted
4
The « communication descriptor » is
transmitted
Check the PMSYSDEMON joblog, to check the order « Give
descriptor »
5
The « communication descriptor » is received
Check the receiver joblog, to check the order « Take
descriptor » at the beginning of its execution
6
Communications are active between the
source and the target
The communications scheme is functional
 Problem within Quick-EDD/HA or execution issue / Check
the joblogs of sender and receiver jobs
1
2
3
www.quick-software-line.com
Communications
Communication daemon PMSYSDEM
Additional elements
• As long as the communications are not properly established, the « sender » and « receiver »
jobs of Quick-EDD/HA can be stopped in *IMMED mode, without any specific control
• The «TCP/IP DAEMON» is necessary only to establish the communications – afterwards it
can be stopped and restarted any time you need, without any risk for the running jobs
Problem ?
The PMSYSDEMON job uses two PMSYSDEM objects, one *DTAQ and one *USRSPC. It might
happen that those objects become damaged. It’s the case if the job uses a lot of CPU and if
new communications can’t be started.
Procedure to follow :
- Stop manually the job PMSYSDEMON (ENDJOB)
- Destroy the two objects – WRKOBJ QUSRSYS/PMSYSDEM then option 4=Delete
- Restart the daemon: PMSYSDEM OPTION(*STR)
www.quick-software-line.com
Replication jobs
Diagram of replication jobs
The journals server(s) feed(s) the job
EDH_xx_SND
EDH_xx_Jnn
Journals
reading
The « SND » job transmits the events
to RCV
EDH_xx_SND
Sender
EDH_xx_RCV
Receiver
The RCV job transmits the events to
the data servers. Once processed,
the events are acknowledged
EDH_xx_Xnn
I/Os
processing
END
EDH_xx_S01
Synchro 1
www.quick-software-line.com
EDH_xx_R01
Synchro 1
In case of negative acknowledgement
or for the new objects, a
synchronization is performed
The synchro is acknowledged thanks
to a journal entry
Jobs Analysis
In case the replication has suddenly stopped
• If the replication stops abnormally, the concerned jobs are EDH_xx_SND on the source
system and EDH_xx_RCV on the target system
 Check the JOBLOG of each one of those jobs
 You can check also the JOBLOG or the data servers EDH_xx_Xnn on the target system
END
In case a synchronization job stops suddenly
• If there is a severe error on an object, the synchronization job can stop in a abnormal way.
 REPLICATION KEEPS RUNNING. A new server will be launched
 Depending on the kind of mistake, the concerned object will be synchronized again, or
a manual action will be needed if the error is « fatal »
 The error message shows the number of the server – check the JOBLOG of the jobs
EDH_xx_Snn on the source system and corresponding EDH_xx_Rnn on the target
system, « nn » being the number of the server
www.quick-software-line.com
Jobs Analysis - Replication
On the source machine, WRKJOB EDH_xx_SND
Enter 4, Spools files management
.
The QPJOBLOG file contains the joblog of EDH_xx_SND
On the target system, WRKJOB EDH_xx_RCV
Enter 4, Spools files management
END
.
The QPJOBLOG file contains the joblog of EDH_xx_RCV
NB:
WRKJOB JOB(NUMBER/USER/JOB) OUTPUT(*PRINT) OPTION(*JOBLOG)
www.quick-software-line.com
prints the log of an active job
Jobs Analysis – Synchronization
Abnormal stop of a job
On the target system, WRKJOB EDH_xx_Rnn
Enter 4, Spools files management.
The file QPJOBLOG contains the joblog of EDH_xx_Rnn
END
www.quick-software-line.com
Jobs Analysis – Synchronization
Error without abnormal end of a job - 1
On the source system, enter + in front of the line where there is one « nok » object
Enter F8, then put 1 in the part « Synchro ».
The object in error appears in blue reverse video.
END
www.quick-software-line.com
Jobs Analysis – Synchronization
Error without abnormal end of a job - 2
Enter M in front of the object in error.
The error displays the source synchronization job, here EDH_xx_S01.
Enter W to access this job, then its JOBLOG.
END
If the information of the joblog of EDH_xx_Snn is not explicit, you’ll have to check the joblog
of EDH_xx_Rnn on the target system.
www.quick-software-line.com
Audit
During the installation of Quick-EDD/HA the audit of the system is automatically activated
 Creation of the audit journal and its associated receiver
 Activation of the system values QAUDCTL and QAUDLVL
• All the objects included in the perimeter of Quick-EDD/HA will be automatically audited with
*CHANGE level (during start in 9 or 0 and when a new object appears)
END
Problem ?
• In case of general trouble check that the audit is still active on the system and that the
audit journal is present  Quick-EDD/HA can’t start if the audit journal is missing – The
system value QAUDCTL must not be equal to *NONE
• In case of issues on some objects check that the object is audited (command DSPOBJD –
the audit level must be *CHANGE)
• The audit journal represents big volumes every day. Check the contents of the journal
with the command DSPJRN to determine the kind of journal entries you have.
 The most probable cause is a level of audit which is too high (*ALL) on some objects
www.quick-software-line.com
Journaling
The journaling is mandatory to process the database properly
 It’s mandatory to be able to replicate in real time
 Quick-EDD/HA supports all options (*AFTER or *BOTH – With or without the Open/Close –
MINENTDTA - Journal Cache, …)
• The journaling is optional for the IFS
 Only the applications which are able to UPDATE inside the IFS need journaling:
txt files, Java files, SAP, Movex, JDEdwards, Adobe
 Same rules as for the database
Problem ?
• A file is not replicated
 Check that it is journaled (command DSPFD)
 Check that the journal is taken into account by Quick-EDD/HA and read in sequence
 Check that the replication has no delay, generally speaking and for that journal
• No object is replicated
END




Check
Check
Check
Check
the
the
the
the
list of journals which are processed by Quick-EDD/HA
journals servers jobs EDH_xx_J01,02, …
target jobs which apply the entries EDH_xx_X01, 02, …
communication jobs EDH_xx_SND et EDH_xx_RCV
www.quick-software-line.com
Journals receivers management
Journals receivers management
• Quick-EDD/HA manages the journal receivers automatically :
 Entirely  Management rules for the detachment and deletion of the receivers
 Partially – for a journal, you can choose:
 To use the standard rules of management
 To keep the receivers – detachment management but no deletion
 No action. The journal is entirely managed externally
Problem ?
• The journal receivers are not deleted on the source system




Check the standard management rules. Do you have the issue for only one journal ?
Check if the journal has specific options
This operation is managed by the « SND » job – check its JOBLOG
Check that there is no receiver in partial status for that journal
• The journal receivers are not deleted on the target system
END
 Check the setting « Receivers management » in the target system description to check that option 1 or 2
is activated
 Check that the receivers library is replicated
 This operation is managed by « RCV » job – check its JOBLOG
www.quick-software-line.com
Database
The database replication represents the bigger part of the replication, often more than 90% of
the activity. Several issues can appear on those objects :
• Management of the database object
 Complex object structure Ex. Fields BLOB, CLOB, …
 Object dependences (LF, joined file, referential constraints …)
 Triggers Management
 Number of access paths, having an impact over performances
• Data Management
 Replication in real time – all the journal entries are taken into account
 SQL management has rules which are different from classical DB/2
END
• Use of the target data
 To provide R.O.I., Quick-EDD/HA allow you to access (read mode) to the data on
the target system
 The different needs on the target system can create constraints for the real time
replication
www.quick-software-line.com
IFS
The IFS replication is often very simple because, most of the time, it deals with files which are
created, then stored (EDI, archiving …). The main difficulty with IFS is the contents control. In
fact, the IFS files are often simple and small ; however, coherence controls become tough,
because of the tree structure and the numnber of objects (you can have millions of objects on
hundreds of levels).
• Audit and journaling
 As any other object, the IFS files are audited
 journaling is rarely mandatory (txt files, Java files, SAP, Movex, JDEdwards,
Adobe)
Problem ?
• The replication is not done
 As for any other object, check the audit level
 The replication is managed by the synchronization jobs – Display the messages of
the objects to find the concerned synchronization job, then check the JOBLOGs of
the source and target jobs
END
• journaled IFS?
 In case of a journaled IFS, as for the database, check that the object is journaled
properly, then check that the journal is included in the list of Quick-EDD/HA and that
it is properly processed
www.quick-software-line.com
IFS - QDLS
QDLS comes from older releases of the OS and corresponds to a « DOS » structure. It is
integrated to the IFS with some specific considerations :
• Audit and journaling
 The files of QDLS are audited as any other object
 The journaling of QDLS is IMPOSSIBLE
 To use QDLS the user profile must be registered in the system directory
(WRKDIRE)
Problem ?
• Replication is not done
 As for any other object, check the audit level of the object
 Check that the user profile used for the replication is properly registered in the
WRKDIRE
 As for the IFS, check the messages at the object level, and display the JOBLOG of
the corresponding source and target synchonization jobs
END
www.quick-software-line.com
Other objects
There are many objects types in the system. However, they all work the same way :
• For Quick-EDD/HA, all the objects are managed with the same rules
 Definition inside a group
 Real time replication of the journal events thanks to the data servers
 The synchronization servers process all the types of objects, system
objects, IFS, system values or spools files
Problem ?
• The replication is not done
 For any type of object, check the audit level
 Check the messages at the level of the objects, and display the JOBLOG of the
corresponding source and target jobs
 For the spools files, check that the system value QAUDLVL uses the special value
*SPLFDTA
END
www.quick-software-line.com
Performances
Performances rely on three distinct points :
• The ability to read the journal entries on the SOURCE system
• The communications, with the bandwith of the line between the source and target
• The ability of the target system to process the I/Os
END
Problem ?
• Does the SOURCE system have enough ressources for the EDH_xx_Jnn servers
 Check the memory pool usage. By default, jobs run in the *BASE pool. It can be
beneficial to create a dedicated pool.
• Is the communication line bandwith adapted to the replication needs ?
 Check the adequation between the line and the replication needs
• Is the communication line dedicated to the replication ?
 Check the line usage
• Does the target system have enough ressources to process the I/Os at the same rythm as
the SOURCE system?
 The disks and number of arms of the Target system are very important and must be
equivalent to the ones on the source system. This point if often neglicted: either the
target system uses old generation disks, or less arms because of large capacity
disks.
www.quick-software-line.com
Contact the SUPPORT
Several ways to contact support :
By phone
+33 153 102 767
By Email
[email protected]
Via Skype
support.traders.fr
END
www.quick-software-line.com
Contact the SUPPORT
To register properly your issue, the Support will probably ask the
following elements:
- release of Quick-EDD/HA used on your systems
- release of OS/400 of the Source and Target machines
If you have an issue regarding a product abnormality, you’ll have
to provide:
- the JOBLOG of the source jobs
- the JOBLOG of the target jobs
END
Note :
- If you have a specific issue, the Support may need the concerned
object, in order to the development teams to analyze the issue on
our test systems
www.quick-software-line.com