MBUF Problems and solutions on VxWorks SNS Integrated Control System

Download Report

Transcript MBUF Problems and solutions on VxWorks SNS Integrated Control System

MBUF Problems and solutions on VxWorks

Dave Thompson and cast of many.

SNS Integrated Control System

MBUF Problems

This is usually how it lands in my inbox: On Tue, 2003-05-06 at 20:38, Kay-Uwe Kasemir wrote: > Hi: > > Neither ics-accl-srv1 nor the CA gateway were able to get to dtl-hprf-ioc3.

> > Via "cu", the IOC looked fine except for error messages > (CA_TCP): CAS: Client accept error was "S_errno_ENOBUFS" (CA_online): ../online_notify.c: CA beacon error was "S_errno_ENOBUFS“ • • This has been a problem since before our front end commissioning even though we are using power pc IOCs and a fully switched, full duplex, 100 MHz Cisco based network infrastructure.

The error is coming from the Channel Access Server.

SNS Integrated Control System

Contributing Circumstances

   

(According to Jeff Hill)

The total number of connected clients is high.

the server's sustained (data) production rate is higher than the client's sustained consumption rate.

clients that subscribe for monitor events but do not call ca_pend_event() or ca_poll() to process their CA input queue The server does not get a chance to run

The server has multiple stale connections And also probably:

tNetTask does not get to run

SNS Integrated Control System

Contributing Circumstances

 SNS Now has a number of different IOCs : » » » 21 VxWorks IOCS 21 +/- Windows IOCs 1 Linux IOC     4 OPIs in control room and many others on site Servers running CA clients like the archiver Users remotely logged in running edm via ssh’s X tunnel.

CA Gateway   Other IP clients and services running on vxWorks and servers.

Other IP applications running on IOCs such as log tasks, etherIP and serial devices running over IP.

SNS Integrated Control System

Our experience to date

At SNS we have seen all of the contributing circumstances that Jeff mentions.

 At BNL, Larry Hoff saw the problem on an IOC where the network tasks were being starved.

 Many of our IOCs have heavy connection loads.

 There are some CA client and Java CA client applications which need to be checked.

 IOCs get hard reboots to fix problems and thus leave stale connections.

 Other network problems have existed and been “fixed” including CA gateway loopback.

SNS Integrated Control System

Late breaking:

Jeff Hill was at ORNL last week.

 One of the things he suspected was that the noise on the Ethernet wiring causes the link to re-negotiate speed and full/half duplex operation.

 He confirmed that the combination of the MV2100 and the Cisco switches is prone to frequent auto negotiation, shutting down Ethernet I/O on the IOC.

 This is not JUST a boot-up problem.

SNS Integrated Control System

What is an mbuf anyway?

VxWorks uses this structure to avoid calls to the heap functions malloc() and free() from within the network driver. • • • • mBlks are the nodes that make up a linked list of clusters.

The clusters store the data while it is in the network stack.

There is a fixed number of clusters of differing sizes.

Since a given cluster block can exist on more than one list, then you need 2X as many mBlks as clusters.

SNS Integrated Control System

Mbuf and cluster pools

 Each network interface has its own mbuf pool netStackDataPoolShow() (aka mbufShow)  The system has a separate mbuf/cluster pool used for routing, socket information, and the arp table.

netStackSysPoolShow()

SNS Integrated Control System

Output from mbufShow

number of mbufs: 400 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0 size clusters free usage High ------------------------------------------------------------------------------ 64 200 199 1746 turnover rate 128 400 400 190088 256 80 80 337 512 80 80 0 1024 50 50 1 2048 50 50 0 4096 50 50 0 8192 50 50 0 Added at SNS This one is mis-configured. Why?

SNS Integrated Control System

Our Default Net Pool Sizes

You should add these lines to config.h or maybe configAll.h

#define NUM_64 #define NUM_128 #define NUM_256 #define NUM_512 100 200 40 40 /* no. 64 byte clusters */ /* no. 256 byte clusters */ /* no. 512 byte clusters */ #define NUM_1024 #define NUM_2048 #define NUM_CL_BLKS 25 25 /* no. 1024 byte clusters */ /* no. 2048 byte clusters */ (NUM_64 + NUM_128 + NUM_256 + \ NUM_512 + NUM_1024 + NUM_2048+ \ NUM_4096+NUM_8192) #define NUM_NET_MBLKS 2*(NUM_CL_BLKS) These will override the definitions in usrNetwork.c.

SNS Integrated Control System

What we are doing at SNS

 We are using a kernel addition that provides for setting the network stack sizes on the bootline.

 4X the vxWorks default sizes are working well.

 We see high use rates for the 128 byte clusters so that allocation is set extra high.

 Use huge numbers only if trying to diagnose problem such as a resource leak.

 Configuring the network interfaces to disable auto negotiation of speed and full-duplex.

Code for the kernel addition is available at http://ics web1.sns.ornl.gov/EPICS-S2003

SNS Integrated Control System