Transcript Slide 1

Slide 1
WWTSS-11
Lessons from Large Implementations
Presenters:
Peter VonTluck
Tony Vella
Invensys Operation Management
October, 2012
© 2012 Invensys. All Rights Reserved. The names, logos, and taglines identifying the products and services of Invensys are proprietary marks of
Invensys or its subsidiaries. All third party trademarks and service marks are the proprietary marks of their respective owners.
Slid
©
Invensys proprietary &
Contents:
•
•
•
•
•
•
Optimizing Runtime HMI performance in a Multi-Users Galaxy
Duplicate Alarms
Long time Deployment
System is running at high CPU load
Unwanted import of UDO version IAS 3.1 sp3 p1 into a Galaxy
3.1sp.3.
Tuning Platform\Engine for Large application:
• Platform XXX exceed maximum heartbeat timeout of XXX
• Engines\platforms fails to startup Start Up or Shut Down (watchdog time)
•
Tips and Tricks Nice to know:
• Customize Application Manager to show Name and description for new managed
app.
• Good query for objects in checked out + force everybody checked in
• Automatically Generate Minidump for App Engines
• Enable PAE to support maximum RAM memory
Slide 5
What do we mean by Large System?
We have observed these issues happening on many large customer
systems : TetraPak, Porsche, Nestle, Thales
• Porsche Zuffenhausen, Germany (Paintshop)
• 150 Platforms
• 120 HMI node
• >60 Engines
• > 20k Objects (240K Scripts)
• 250K I\O (>50 PLCs)
•
Thales, UK (Network Rails)
• 20 Platforms
• 4 HMI node (1400 users!)
• 80 Engines
• 30k Objects
• 600K I\O
Slide 6
What do we mean by Large System?
We have observed these issues happening on many large customer
systems : TetraPak, Porsche, Nestle, Thales
• TetraPak SP 2012, (Vinamilk - project in progress)
•
•
•
•
•
Nestle (Biessenhofen, Avenches, Timashevsk, Torun)
•
•
•
•
Slide 7
20 platforms
80 engines
3000 objects
6000 I/O.
60 platforms
40 HMI
100 engines
20k objects
Optimizing
Ideal
situation:
Runtime
define
HMI
a real
performance
Multi-Userin
a Environment
Multi-User Galaxy
TN 665 from WDN well documents this topic
Slide 8
Optimizing Runtime HMI performance in
a Multi-Users Galaxy
ISSUE:
When GR Node is busy with deploy operations (examples: imports,
checkin Objects, deploy\undeploy etc) it is possible to observe in
the InTouch client application that performance related to
visualization data in AG is slowing (example: a little delay of 10-20
seconds before that values appear in the Archestra Graphic)
Slide 9
Optimizing Runtime HMI performance in
a Multi-Users Galaxy: Solution part 1
Solution:
If the delay is observed only when the GR is busy or if the GR node is
shutdown then you can add the registry DWORD value
“ResolverShortcutEnabled” in the following location and set the
value to 1.
HKEY_LOCAL_MACHINE\Software\ArchestrA\Framework\Lmx
Note that the registry value does not exist by default
Note: with those setting you will change the standard behavior of the
system and enable the Enable Round Robin for Anonymous Engine
cache file
Slide 10
Optimizing Runtime HMI performance in
a Multi-User Galaxy: What is Round
Robin? What is Anonymous Engine?
Round Robin: It’s a simple algorithm where the time slices are
assigned to each process in equal portions and in circular order,
handling all processes without priority
Anonymous engine file: this file has the object handle cache that is
needed for resolving the object part of the reference. This file gets
created when a platform is deployed. The file can be located in any
one of the following locations location depending on the OS.
<RootDrive>\Documents and Settings\All Users\Application
Data\ArchestrA\Cache (or)
<RootDrive>\ProgramData\ArchestrA\Cache
Slide 11
Optimizing Runtime HMI performance in
a Multi-User Galaxy: How does the
System work?
Standard Behavior:
The HMI application first check if the object portion of the reference is
already resolved or not by looking up in the object handle cache
(i.e. anonymous engine cache file). If the object handle is not found
in the cache then the reference will be resolved by the GR.
As a consequence, all indirect references linked in the HMI application
are in standby, to be resolved
Practical effect : a series of ####### instead of good value
Slide 12
Optimizing Runtime HMI performance in
a Multi-User Galaxy: Registry Setting by
Default
Slide 13
Optimizing Runtime HMI performance in
a Multi-User Galaxy: Registry setting
with the keys created
Slide 14
Optimizing Runtime HMI performance in
a Multi-User Galaxy: Changing Standard
behavior
With the above registry setting in the case where the GR is busy, the
references will be resolved by the cache file present in the first
available engine of the galaxy*.
IMPORTANT*:
The system will start the search for available Anonymous engines
based on the Platform ID. As it will go from engine to engine if the
GR is busy, it can take a significant time to return to WindowViewer
if there are a lot of engines and platforms to examine. So it is
highly recommended to have AOS on a Platform with a low
Platform-ID
Slide 15
Enable Round Robin for Anonymous
Engine cache file: Optimization –
Solution Part 2
Optimization: It has been observed that even enabling round robin
for anonymous engines, the runtime performance was not as
expected. On further investigation, we discovered the issue was due
to an high number of broken references.
Broken references: Broken references cause massive reference
binding on the GR. It is important to find if there are broken
references in the PLC and also in the Graphics, and delete them.
Slide 16
Optimization–Solution Part 2: How to
Find Out if You Have Broken References
You can check how many Bind Counts are resolved by the GR
monitoring the following attributes:
Gr.BindCnt: Number of bind requests resolved by GR
GR.BindFailCnt: Number of bind failures
Slide 17
How to Find Out if you have Broken
References: GR Log Flags
Enabling the following LogFlags on the GR, you can search for
Reference binding requests hitting the GR:
wwPackageServer: ReferenceBinding
LMX: ReferenceBinding
Slide 18
How to Find out if you have Broken
References: What you Need to Find in
the GR Logger
Slide 19
Optimizing Runtime HMI Performance in
a Multi-Users Galaxy: GR log Flags –
Good References
How the logger looks:
Good references have Galaxy, Platform, and Engine addressed in «exit
results»
Slide 20
Optimizing Runtime HMI Performance in
a Multi-Users Galaxy: Log Flags –
Broken References
Broken references: exit rsults has a serieres of Galaxy 0, Platform 0,
Engines 0....
Slide 21
Optimizing Runtime HMI Performance in
a Multi-User Galaxy: Client Logger
“Invalid Reference”
Slide 22
Optimizing Runtime HMI Performance in
a Multi-User Galaxy: Note on the
Registry Setting
THE HF is embedded from WAS 3.1 SP3.
Check ReadMe file – Resolved Issues of Intouch 10.1 SP3 for details
Xref HF 1996 - L00104841
Note: for previous version you need to contact Tech Support to get the
right HF
Slide 23
Duplicate Alarms
IF your WWAlmDB is increasing the size quickly, you might have
duplicates alarms stored on it. Duplicate alarms is a row stored in
the AlarmDB multiple time (same values for each fields including
datetime)
Invensys-Wonderware have released the below HFs to prevent this
issue:
Intouch 10.1 SP2 - CR L00112069-HF 2467
Intouch 10.1 SP2 P01-CR L00117164
Intouch 10.1 SP3 CR L00114473
Intouch 10.1 SP3 P01- CR L00113075 HF 2514
Note: HF is embedded in SP2012.
Slide 24
Duplicate Alarms: Cleaning Up
WWALMDB
HF will prevent AlarmDB logger from storing duplicate alarms in your
DB.
However you still need to clean up the DB.
Many queries available with different performance.....
And the winner is....!
Do not use it in production!!!!
Slide 25
Duplicate Alarms: Cleaning up
WWALMDB – Test Results
Tests done:
•
7GB DB –
•
9000.000 of rows –
•
with more then 8500000 duplicates
•
Delete of duplicates from alarm master and detailed\consolidated
running time: about 1h !!! ( 2 days first query)
Slide 26
Long time Deployment
ISSUE:
InTouch view applications in some cases take a long time to deploy.
At the end graphics are missing.
The problem mainly occurs if a lot of interlaced graphics are used.
Slide 27
Long time Deployment
Cause of the behavior:
Take a look at SQL Profiler during deployment:
Slide 28
Long Time Deployment
Solution:
This issue was related to repetitive calls to a stored procedure
(internal_get_clientcontrol_and_feature_files.sql).
This has been changed to one call only.
The hotfix – available for IAS 3.1 sp.3 - (L00114486) will be part of
IAS 3.5 sp.1.
Slide 29
System is Running at High CPU Load
ISSUE:
Historian and Wonderware Information Server on one node are running
on high CPU.
aaRetSVC.exe and w3wp.exe need CPU load.
As a consequence, connection problems occur.
Slide 30
System is Running at High CPU Load
Solution:
Due to repeated connection problems (caused by high CPU load)
history blocks are fragmented. Therefore retrieval takes longer and
consumes higher CPU.
As a consequence, connection problems occur again.
Historian and Information Server should not run on the same node, as
they affect performance of each other.
Slide 31
Unwanted Import of UDO version IAS
3.1 SP3 P1 into a Galaxy 3.1 SP3
ISSUE:
In a Multiuser environment, it may be necessary to import objects from
a Galaxy with a higher version. In that case (UserdefinedObject) all
the Platforms are then marked with the following icon:
Export/import
aapkg file
Local GR
version
3.1 SP3
Slide 32
Production
GR Version
3.1 SP3 P1
Unwanted Import of UDO version IAS
3.1 SP3 P1 into a Galaxy 3.1 SP3
Consequences:
• No deployment of objects is possible BEFORE the Platform nodes are
redeployed.
• Unwanted software changes are deployed to all the Platforms.
Slide 33
Unwanted Import of UDO version IAS
3.1 SP3 P1 into a Galaxy 3.1 SP3
Worse case scenario:
Install the backup and redeploy all the nodes during
production!
Any other solutions?
Slide 34
Unwanted Import of UDO version IAS 3.1
SP3 P1 into a Galaxy 3.1 SP3
Solution:
First answer the question:
Which files are changed when a UserdefinedObject
with a higher version is imported into a galaxy?
Slide 35
Recover the Galaxy
Slide 36
Changed files
Recover the Galaxy
1. Run the following commands on the Galaxy Database.
update dbo.gobject set software_upgrade_needed =0
where
software_upgrade_needed <> 0 and
template_definition_id =15;
delete from dbo.file_pending_update;
2. Stop the GR Platform Engine.
3. Copy (overwrite) the original binaries in the above 2 locations.
4. If the customer had opened any fresh remote IDE sessions after
they imported the object, some of the editor and package binaries
(new) might have copied to those remote machines, too. In that
case they need to close those IDE sessions and overwrite those files
with the original binaries in the 2 locations mentioned above.
Changes like modifying the registry are not required.
Slide 37
Tuning the Platform for Large
Applications
Issue:
«Warning - Platform XXX exceed maximum heartbeat timeout of XXX
ms (The component - NmxSvc )”
Slide 38
Tuning the Platform for Large
Applications
Solution:
Setting the proper value in your Platform and AppEngine Configuration
Editor
Slide 39
Tuning the Engine for Large Applications
Explanation:
maximum heartbeats timeout =
WinPlatform.NetNMXHeartbeatPeriod *
(WinPlatform.NetNMXHeartbeatsMissedConsecMax + 1)
By default, the value for this formula is:
2000 (3 + 1) = 8000 ms
Which corresponds to the timeout message in the Logger "Platform
1 exceed maximum heartbeats timeout of 8000ms"
Slide 40
Tuning the Engine for Large Applications
Example: increasing the cons. number of missed heartbeats to 6,
you see the same message with timeout 14.000ms
•
Maximum heartbeats timeout must be higher than the existing time
difference. (Any communication failure depends on this time limit.)
•
Time difference will never exceed the one defined by the formula
mentioned above.
Slide 41
Tuning the Engine for Large Applications
Issue: Engines\Platforms fail to launch Start Up or Shut Down
Solution: increase the watchdog timeout ( default value is 30000ms)
Insert the following Registry Settings:
[HKEY_LOCAL_MACHINE\SOFTWARE\ArchestrA\Framework\Platform]
Enter the following values:
•
"WatchdogStartupTimeout"=dword:000493e0 (300000ms)
•
"WatchdogShutdownTimeout"=dword:000493e0
Note: This should be sufficient for a large system. Setting the values
too high could lead to delays in discovery that the Engine has
hung/crashed during startup or shutdown, since the Bootstrap
considers the Engine healthy until the timeout expires.
Slide 42
Tips & Tricks Nice to Know:
Customize Application Manager to show Name and
Description for new managed apps.
Default: creating a new Intouch managed app the provided default
name is: IntouchBlanktemplate. You can rename it from IDE but
default name remains in Intouch Aplication Manager on the clients
machines
How ?
Slide 43
Tips & Tricks Nice to Know
Scripts to visualize all objects in checked out state
Scripts to force all objects in checked In – (NOT
supported – ONLY for diagnostics!!!)
Slide 44
Tips & Tricks Nice to Know
Enable PAE to support maximum RAM memory
Physical Address Extension (PAE) is a feature to allow
(32bit) x86 processors to access a physical address space
(including random access memory and memory mapped devices)
larger than 4 gigabytes.
You can enable PAE: Opening the Boot.ini file, and then add
the /PAE parameter to the ARC path, as shown in the following
example
Slide 45
References:
• Wonderware Tech Notes & Articles available:
– Multi-User Development of an ArchestrA Galaxy: Best Practices (TN 665)
– Optimizing InTouch Application Performance (WDN Article)
– Improving Application Performance with ArchestrA Graphics (TN 644)
– Deleting InTouch Application Files Without Affecting the Application (TN 85)
– Industrial Application Server Platform Deployment Checklist (TN 478)
– Fine-Tuning AppEngine Redundancy Settings (TN 401)
– Tuning Recommendations for Redundancy in Large Systems (WDN Article)
Slide 46
Questions?
THANK YOU
Slide 47
Slide 48