Reset Search
 

 

Article

Network unstable due to core VSP9000s

« Go Back

Information

 
TitleNetwork unstable due to core VSP9000s
Symptoms
Customer has an unstable network environment for a long time where two VSP 9000s are forming an IST cluster in legacy SMLT. The symptoms and the impact has become more severe recently : one of VSP9000s (Core A) rebooting with core file and always logging task yielding footprints ( # of uSecs elapsed since smltTick last ran.  tMAIN latency is HIGH !) – whereas on its IST peer (Core C) no switch reboots but has similar footprints for task yielding.

Core A almost failed 20 times with core files in last 30 days and these core files and exceptions recorded in the core core files are almost identical. The exception is looking a generic one where as runprof & wd_stats indicating multicast related activities (IGMP and PIM) are probably keeping CPU too busy.

Exception :

Lifecycle Crash Reporter: Process Name: cbcp-main.x, Thread Name: tMainTask, Signal 6, Slot: 1, PID 2685, LWP: 3091
[bt] Execution path:
[bt] /opt/appfs/lib/cp/libndlcs.so.1(_Z30nd_lcs_crash_exception_handleriP7siginfoPv+0x138) [0xe657274]
[bt] /opt/appfs/lib/cp/libee_infrastructure.so.1(ee_sigaction_dispatcher+0x47c) [0xdd4ad54]
[bt] [0x100350]
[bt] [0x6a157c80]
[bt] /lib/libc.so.6(abort+0x254) [0xdf0fa04]
[bt] /opt/appfs/lib/cp/libndlog.so.1(_Z13nd_log_dumpbtv+0) [0xe9b2358]
[bt] /opt/appfs/lib/cp/libndlog.so.1(nd_log_report+0x5a0) [0xe9b381c]
[bt] /opt/appfs/lib/cp/libcpp.so.1(ProcessCpHbStopped+0x284) [0xf98c338]
[bt] /opt/appfs/lib/cp/libcpp.so.1(cppHbTic+0x194) [0xf98c548]
[bt] cbcp-main.x(comTimerTask+0x1d0) [0x10644010]
[bt] /opt/appfs/lib/cp/libcpp.so.1(cppScheduleBody+0x528) [0xf987ce8]
[bt] /opt/appfs/lib/cp/libcpp.so.1(cppMainTask+0x1dc) [0xf988670]
[bt] /opt/appfs/lib/cp/libv2l.so.1(task_wrapper+0x278) [0xe35198c]
[bt] cbcp-main.x(ckrmThreadStarter+0x664) [0x1150b434]
[bt] /opt/appfs/lib/cp/libee_infrastructure.so.1(ee_thread_create_start_routine+0x100) [0xdd4cf08]
[bt] /lib/libpthread.so.0(+0x6844) [0xe2f1844]
[bt] /lib/libc.so.6(clone+0x84) [0xdfc1518]  



Snippet from Wd_stats :

Lifecycle: current wd stats for process cbcp-main.x (PID 2685) sub-process main (Sub-PID 1):
Total watchdog outage events: 111
Last missed feed detected at timestamp: 7801472976 (12:24:03 09-04-2019)
Last feeding recovered at timestamp: 7814947050 (12:24:12 09-04-2019)
Last watchdog outage took: 13474 msec
Last missed feed count: 9
[bt] Execution path:
[bt] /opt/appfs/lib/cp/libndlcs.so.1(_Z34nd_lcs_sw_wd_backtrace_sig_handleriP7siginfoPv+0x80) [0xe65c50c]/opt/appfs/lib/cp/libee_infrastructure.so.1(ee_sigaction_dispatcher+0x47c) [0xdd4ad54]
[bt] /opt/appfs/lib/cp/libee_infrastructure.so.1(ee_sigaction_dispatcher+0x47c) [0xdd4ad54]
[bt] [0x100350]
[bt] cbcp-main.x(_ZN5MCPIM20find_pimrcvrport_recEtjP8mrtentry+0x114) [0x10863268]
[bt] cbcp-main.x(_ZN5MCPIM23k_compute_mfc_EvifEntryEP8mrtentryjjtt+0x19f0) [0x108534bc]
[bt] cbcp-main.x(_ZN5MCPIM9k_chg_mfcEP8mrtentryjjt10VifBitListj+0x1a70) [0x1085567c]
[bt] cbcp-main.x(_ZN5MCPIM17change_interfacesEP8mrtentryt10VifBitListS2_S2_S2_t+0x11ec) [0x1089590c]
[bt] cbcp-main.x(_ZN5MCPIM34check_and_adjust_vifs_for_mrtentryEtP8mrtentry+0x260) [0x1088a8ec]
[bt] cbcp-main.x(_ZN5MCPIM22delete_pimrcvrport_recEtjP8mrtentry+0xf8) [0x10861514]
[bt] cbcp-main.x(_ZN5MCPIM31set_leaf_ports_in_pimrcvr_portsEtP8mrtentryP8PortList+0x238) [0x1087dcd0]
[bt] cbcp-main.x(_ZN5MCPIM8add_leafEtjjjj+0x97c) [0x1087ed78]
[bt] cbcp-main.x(pim_add_leaf+0x64) [0x1084d52c]
[bt] cbcp-main.x(_ZN4IGMP12igmpPimEventEjP15tIGMP_INTERFACEP11tIGMP_GROUPjjP15igmp_access_tbl+0x118) [0x1cbcp-main.x(_ZN4IGMP15igmpTrigerEventEjP15tIGMP_INTERFACEP11tIGMP_GROUPjjP15igmp_access_tbl+0x514) [cbcp-main.x(_ZN4IGMP19igmpReporterExpiredEP12tIGMP_MEMBERP10tIGMP_PORT+0xcec) [0x10798bb8]
[bt] cbcp-main.x(_ZN4IGMP15igmpTrigerEventEjP15tIGMP_INTERFACEP11tIGMP_GROUPjjP15igmp_access_tbl+0x514) [cbcp-main.x(_ZN4IGMP19igmpReporterExpiredEP12tIGMP_MEMBERP10tIGMP_PORT+0xcec) [0x10798bb8]
[bt] cbcp-main.x(_ZN4IGMP19igmpReporterExpiredEP12tIGMP_MEMBERP10tIGMP_PORT+0xcec) [0x10798bb8]
[bt] cbcp-main.x(igmpReporterExpired+0xd8) [0x10761cc8]
[bt] cbcp-main.x(_ZN4IGMP7igmpTicEv+0x4cc) [0x1078b4bc]
[bt] cbcp-main.x(igmpTicToAllVrf+0x58) [0x10761778]
[bt] cbcp-main.x(comTimerTask+0x1d0) [0x10644010]
[bt] /opt/appfs/lib/cp/libcpp.so.1(cppScheduleBody+0x528) [0xf987ce8]
[bt] /opt/appfs/lib/cp/libcpp.so.1(cppMainTask+0x1dc) [0xf988670]
[bt] /opt/appfs/lib/cp/libv2l.so.1(task_wrapper+0x278) [0xe35198c]
[bt] cbcp-main.x(ckrmThreadStarter+0x664) [0x1150b434]
[bt] /opt/appfs/lib/cp/libee_infrastructure.so.1(ee_thread_create_start_routine+0x100) [0xdd4cf08]
[bt] /lib/libpthread.so.0(+0x6844) [0xe2f1844]
[bt] /lib/libc.so.6(clone+0x84) [0xdfc1518]   


Snippet from Runprof data :

Profile data for thread tMainTask (6a180c80, lwp 3091), started by Lifecycle Software Watchdog
Profiler ran from 12:24:03 09-04-2019
              to 12:24:12 09-04-2019 (9010325 usec)
Total samples taken: 893, trace backs stored: 338
Total timer count: 900, signal pending count:6
Back trace count:        74 hash:0x6c0695ed
       108534bc _ZN5MCPIM23k_compute_mfc_EvifEntryEP8mrtentryjjtt+0x19f0
       1085567c _ZN5MCPIM9k_chg_mfcEP8mrtentryjjt10VifBitListj+0x1a70
       1089590c _ZN5MCPIM17change_interfacesEP8mrtentryt10VifBitListS2_S2_S2_t+0x11ec
       1088a8ec _ZN5MCPIM34check_and_adjust_vifs_for_mrtentryEtP8mrtentry+0x260
       10861514 _ZN5MCPIM22delete_pimrcvrport_recEtjP8mrtentry+0xf8
       1087dcd0 _ZN5MCPIM31set_leaf_ports_in_pimrcvr_portsEtP8mrtentryP8PortList+0x238
       1087ed78 _ZN5MCPIM8add_leafEtjjjj+0x97c
       1084d52c pim_add_leaf+0x64
       10777318 _ZN4IGMP12igmpPimEventEjP15tIGMP_INTERFACEP11tIGMP_GROUPjjP15igmp_access_tbl+0x118
       107965dc _ZN4IGMP15igmpTrigerEventEjP15tIGMP_INTERFACEP11tIGMP_GROUPjjP15igmp_access_tbl+0x514
       10798bb8 _ZN4IGMP19igmpReporterExpiredEP12tIGMP_MEMBERP10tIGMP_PORT+0xcec
       10761cc8 igmpReporterExpired+0xd8
       1078b4bc _ZN4IGMP7igmpTicEv+0x4cc
       10761778 igmpTicToAllVrf+0x58
       10644010 comTimerTask+0x1d0
       0f987ce8 cppScheduleBody+0x528
       0f988670 cppMainTask+0x1dc
       0e35198c task_wrapper+0x278
       1150b434 ckrmThreadStarter+0x664
       0dd4cf08 ee_thread_create_start_routine+0x100
       0e2f1844 +0xe2f1844
       0dfc1518 clone+0x84

The profile data points to IGMP cleaning up receivers due to IGMP group membership expiring, resulting in PIM being notified and doing its recalculation.

Environment
  • VSP 9000 - v4.2.1.6.0
Cause
By looking at the multicast routes and  IGMP sender records, it is noticed that one particular multicast cast group “239.83.100.109” has many sources from multiple vlans. In Core A there are 1304 different mroutes(different sources) for this group “239.83.100.109” -  Core A is the RP(Rendezvous Point) for this stream.

Command:[141] [ show ip igmp sender count member-subnet default ]
---------------
*******************************************************************************
             Command Execution Time: Thu Sep 05 12:53:22 2019 UTC 
*******************************************************************************
IGMP Sender Count on vrf GlobalRouter: 1337



So 1304 out of 1337 is belongs to sources  to multicast group  239.83.100.109. These sources from different vlans (these vlans have PIM/SM enabled with passive interfaces – there are totally 32 PIM/SM enabled vlans and 31 out of 32 interfaces are passive).

Per internet search it has been seen that one particular application called  "Landesk" uses this particular multicast group and it causes high CPU utilization on networking devices. Upon checking with customer it is understood  that the same application is installed on customer's PCs.
Resolution
Customer implemented an ACL rule to drop this multicast stream on VSP9000s and this stabilized their environment. Sample ACL  :

filter acl ace 121 30 name "Drop_MC_239.83.100.109"
filter acl ace action 121 30 deny count
filter acl ace ethernet 121 30 ether-type eq ip
filter acl ace ip 121 30 dst-ip eq 239.83.100.109
filter acl ace 121 30 enable
Additional notes
Instead of adding the ACL rule an IGMP filter list for this particular multicast group could be used. Sample list :

(config)# ip prefix-list "filter_landesk" 239.83.100.109/32  ge 32 le 32
 
(Note: This is just a sample vlan – it should be applied all relevant vlans )
 
(config)# interface vlan 5
(config)# ip igmp access-list "filter_landesk " 172.16.0.0/255.255.0.0 deny-both

 

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255