Can't find what you need?


• Ask the Community
• Create a Case
Reset Search
 

 

Article

Flapping of the fabric ports causing impact to traffic in the module with error message "MSM-A: Sys-Health-Check: Switch fabric port state is Down for Module in MSM-A"

« Go Back

Information

 
TitleFlapping of the fabric ports causing impact to traffic in the module with error message "MSM-A: Sys-Health-Check: Switch fabric port state is Down for Module in MSM-A"
Symptoms
  • The fabric port transitions between the I/O module and slot causing generating routing issues and below mentioned error messages displayed in switch logs:

MSM-A: Sys-Health-Check: Switch fabric port state is Down for Module in MSM-A

Environment
  • BD-8810
  • EXOS 15.7.2.9
Cause


 
Resolution
To rectify the fabric port transitions, please follow the below mentioned recommendations:
  • Check which MSM continue to report incrementing transitions for suspected slot, it is recommended to reseat that specific MSM. (If it backup MSM, then we will have minimum to no effect on traffic. If it is Master MSM, then we can try failover command and try re-seating the MSM)
  • Reboot the chassis during maintenance window, if problem continues to happen after MSM re-seat. 
  • As long as the fabric port transitions are not incrementing and the port status is up, we do not need to perform additional steps of failing over, visual inspection, or diagnostics on both the chassis. 
  • We don’t need to take any intrusive actions on chassis if the transitions are not incrementing.
  • if any further assistance required, please contact GTAC.
Additional notes
  • Clear the sys health check counters by issuing the following command:-

            “debug hal clear sys-health-check counters”

          (This command is hidden debug command and needs to be typed exactly as it appears (Tab auto completion is not available).

  • Monitor the sys health check output for any reoccurrence. 

  • Run the “debug hal show sys-health-check” command multiple times (3 time at 2 minutes interval).

  • If you see any new fabric port transitions between the IO module in slot and the MSMs, then this could be related to communication data path errors between the I/O module in slot and the MSMs. To check for any data path communication error, enable the sys health check on suspected slot and check for data path errors using the following steps:

             “enable sys-health-check slot <slot number>”

  • After 3 minutes check the “debug hal show sys-health-check” output and look for any data path error reported for the communication between the I/O module in suspected slot and each of the MSMs, as show in the following example output, under the field: "DataPath MsmA & MsmB" highlighted in yellow below:

"debug hal show sys-health-check”

   
[Card State (Mask = 7FCF)]
   Slot Hardware Abstraction Layer(HAL) Boot/  DataPath   AsyncQueue
   No.        CardType CardState        #HFO  MsmA  MsmB  Curr/Hi/Total/Last 
   -------------------------------------------------------------------------
   1    8900-10G8X-xl OPERATIONAL      Cold              0/207/14491941/2699
   2    8900-10G8X-xl OPERATIONAL      Cold              0/193/14491960/2699
   3    8900-10G24X-c OPERATIONAL      Cold              0/404/14360181/2699
   4    8900-10G24X-c OPERATIONAL      Cold              0/378/14352634/2699
   7     8900-G48T-xl OPERATIONAL      Cold              0/248/14287411/2699
   8     8900-G48T-xl OPERATIONAL      Cold              0/216/14311870/2699
   9     8900-G48T-xl OPERATIONAL      Cold              0/269/14349860/2699
  10    8900-10G24X-c OPERATIONAL      Cold              0/865/13419005/2699
   A      8900-MSM128 OPERATIONAL      Cold 
   B      8900-MSM128 OPERATIONAL      Cold

  • If data path errors are seen, please run the “debug hal show sys-health-check” command 2 more times at 2 minute intervals. 

  • Capture and provide the outputs mentioned in the above steps along with the “show log” & “show log messages nvram” and send them to us for further analysis.

  • Disable the sys health check on suspected  slot by issuing the command: 

                 “disable sys-health-check slot <slot number>

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255