Reset Search
 

 

Article

SNMP process crashes randomly with signal 6 after running switch for long time.

« Go Back

Information

 
TitleSNMP process crashes randomly with signal 6 after running switch for long time.
Symptoms
If system uptime exceeds 497 days, SNMP master process can crash with Signal 6:
Backtrace (top 20 frames):
#0  0xb7fdf424 in __kernel_vsyscall ()
#1  0xb7ccc086 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82
#2  0xb7eaa6a8 in ipmlRecvEpoll (epollfd=5, peerId=peerId@entry=0xbffff2bc, pkt=pkt@entry=0x8104ce8, flags=flags@entry=0x0, timeout=timeout@entry=0xbffff2d0, status=status@entry=0xbffff22c) at /data3/release-manager/v15_5_3_4/everest_mm/code/system/ipml/src/ipml.c:1416
#3  0xb7eac35b in ipmlRecvOnServiceGroupInternal (group=0x80fb960, peerId=peerId@entry=0xbffff2bc, pkt=pkt@entry=0x8104ce8, flags=0x0, flags@entry=0x8104ce8, timeout=timeout@entry=0xbffff2d0) at /data3/release-manager/v15_5_3_4/everest_mm/code/system/ipml/src/ipmlServiceGroup.c:434
#4  0xb7e9b6dd in dispatchHandler () at dispatch.c:764
#5  0x08082699 in main (argc=1, argv=0xbffff894, envp=0xbffff89c) at mastmain_extr.c:271

Following messages are logged:
03/28/2016 11:56:01.29 <Erro:EPM.proc_conn_lost> MM-B: Connection lost with process snmpMaster
03/28/2016 11:56:01.01 <Crit:Kern.Alert> MM-B: b7fdf42c:  73 74   72   74   61   62   00   2e
03/28/2016 11:56:01.01 <Crit:Kern.Alert> MM-B: b7fdf424: <5d>  5a   59   c3   00   2e   73   68
03/28/2016 11:56:01.01 <Crit:Kern.Alert> MM-B: b7fdf41c:  90 90   90   90   90   90   eb   f3
03/28/2016 11:56:01.01 <Crit:Kern.Alert> MM-B: b7fdf414:  51 52   55   89   e5   0f   34   90
03/28/2016 11:56:01.01 <Crit:Kern.Alert> MM-B: b7fdf40c:  b8 ad   00   00   00   cd   80   90
03/28/2016 11:56:01.01 <Crit:Kern.Alert> MM-B: b7fdf404:  00 00   cd   80   90   8d   76   00
03/28/2016 11:56:01.01 <Crit:Kern.Alert> MM-B: Code:
03/28/2016 11:56:01.01 <Crit:Kern.Alert> MM-B: Process snmpMaster pid 7055 died with signal 6
03/28/2016 11:56:00.54 <Warn:EPM.proc_kill> MM-B: Process snmpMaster ID 7055 killed
03/28/2016 11:56:00.54 <Erro:EPM.Msg.timer_thread> MM-B: Because the snmpMasterMainThread (3082660720) thread of process 7055, has not responded within 41 periods of 15 seconds, the process will be terminated.

The issue can also be triggered by changing system time.
Environment
  • EXOS 15.4.1 or higher
  • SNMP
  • 497 days of system uptime
Cause
Root cause is xos0064114 - "SNMP process crashes randomly with signal 6 after running switch for long time"
Resolution
Upgrade to one of the following EXOS versions or newer:
  • 22.1.1
  • 21.1.2
  • 16.2.2
  • 16.1.3.6-patch1-11
  • 15.6.5.2-patch1-3
  • 15.7.4.2-patch1-2
     
As a temporary workaround you can periodically restart snmpMaster process to aviod this issue:
X460G2.1 # restart process "snmpMaster" 
Do you want to save configuration changes to currently selected configuration
file (primary.cfg)? (y or n) Yes
Saving configuration on master ..... done!
Step 1: terminating process snmpMaster gracefully ...
Step 2: starting process snmpMaster ...
Restarted process snmpMaster successfully
X460G2.2 #
Additional notes

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255