Can't find what you need?


• Ask the Community
• Create a Case
Reset Search
 

 

Article

Why SLX unexpectedly reloaded with kernel panic due to DCMd unexpected termination with no user activity and intervention?

« Go Back

Information

 
TitleWhy SLX unexpectedly reloaded with kernel panic due to DCMd unexpected termination with no user activity and intervention?
Symptoms
Switch reloaded with kernel panic due to DCMd unexpected termination.
No user configuration and/or network configuration was in progress at the time.
Switch reloaded and has been running normal.

The switch crashed

[HASM-1000], 1780588,, CRITICAL, SLX9640, Daemon dcm terminated. System initiated reload/failover for recovery.

from corepd ...
This is Snowball (0xfa1) at slot 0 (0)
module_pre_init():Set snowball_version_b0 detected new snowball fruid_ver : 5xml_parse_signature_node(977): Total SFP media found: [170]
xml_parse_signature_node(980): Total QSFP media found: [40]
xml_parse_signature_node(983): Total QSFP28 media found: [53]
...
********************************************************************************************************
    • asserted in OM/Worker (DcmNs::ConfdInterfaceObjectManager::readControlAndWorkerSockets())
      ********************************************************************************************************
DcmNs::ConfdInterfaceObjectManager::readControlAndWorkerSockets()
DcmNs::ConfdInterfaceObjectManager::bootCompleteForThisLocationEventHandler(WaveNs::BootCompleteForThisLocationEvent const*&)
WaveNs::WaveObjectManager::handlePrismEvent(WaveNs::PrismEvent const*&)
WaveNs::PrismThread::start()
WaveNs::PrismPosixThread::pthreadStartMethod(WaveNs::PrismPosixThread*)
0x73d4) [0x7feafb46d3d4]
clone
[HASM-1200], 1780584, FFDC, WARNING, SLX9640, Detected termination of process Dcmd:3774. [HASM-1000], 1780588,, CRITICAL, SLX9640, Daemon dcm terminated. System initiated reload/failover for recovery.
********************************************
  • Daemon dcm:3774 died (0)
Failover/Reboot (died), Tue May 19 08:26:26 2020
********************************************

Core file generated!
Tue May 19 08:27:16 2020
: Panic dump in 10 seconds
Graceful shutting down of Databases:
[ RUNNING ] : /fabos/bin/shutdowndcmdb
waiting for server to shut down.... done
server stopped
[1068571.117491] kSWD: jump into panic dump ...
[1068571.168817] Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[1068571.300175] PD Start
[1068571.328406] PD: collecting panicdump
[1068571.374344] sysmodPanicdump size=65568

 
Environment
SLX 9640
SLXOS 20.1.1
Cause
This is a Rare Scenarios of reload when Confd and DCMD control sockets were timeout.
 
Resolution
The switch needs to be reload in order to recover.
Once rebooted, it functions correctly and normally, and most of the time without recurrence.

Code fix was added to 20.2.1b to eliminate the failover/reload by preventing the control socket timeout of Confd and DCMd.

This Issue was tracked by defect SLXOS-50960 and was fixed with SLXOS-51294 in SLXOS 20.2.1b

 
Additional notes

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255