Reset Search
 

 

Article

FN-2018-422 - SLX 9540 File System Becomes Read-Only

« Go Back

Information

 
Notice Summary
Due to an SSD (Solid State Disk) defect, the file system on an SLX 9540 may be set to read only. The fault can be cleared by power-cycling the device to recover fully, with no risk of any data loss.

This is an update to the originally released FN-2018-422, which included a new SSD firmware image to correct this issue but inadvertently introduced a performance issue. A new SSD firmware image FW1198 is now available that corrects the original issue without any performance impact.
Background
A defect in the firmware of the SSD used in the SLX 9540 may cause the SSD to stop responding. The defect is activated by a timing relationship between specific disk operations, but this appears random to the user as the specific operations are commonly used. When this happens, SLX-OS places the file system in to read-only mode to indicate that the file system is hung. A power-cycle fully recovers the device.
Impact
Login access is lost, and the only way to recover is to power cycle the SLX 9540. Traffic stops during the power cycle but no SSD data is lost or corrupted.

SLX 9540 that were upgraded with the SSD firmware FW1166, may have noticed a performance issue (for example, longer boot time, longer time to execute a support save).
Products Affected
SLX 9540
Software Affected
All SLX-OS Versions
Symptoms
Login access is lost. A console or dmesg log shows these two lines:
ata1: COMRESET failed (errno=-16)
Read-Only filesystem

This indicates that the OS has tried to reset the link to the SSD and failed because the SSD is not responding. The OS then places the file system in read-only mode so that applications can respond to the failure gracefully.

 
Workaround
The fault can be recovered from by power-cycling the SLX 9540. The performance issue does not have any effect, other than some operations being slower and no workaround is required.
Solution
The root cause is a defect in the SSD firmware that causes the SSD to stop responding. The defect is activated by a specific timing relationship between two commands. New firmware that corrects this is available from Extreme. This FW software is installed on the SSD using the standard Linux “hdparm” utility, that is already bundled with SLX-OS. This can be done remotely and there is no requirement to be physically present on the device. Data on the drive is not affected. The upgrade must be done during a maintenance window as a power cycle is required to activate the new firmware once loaded.

SLX 9540 units should be upgraded with the FW1198 version of the SSD firmware.  
Please refer to the procedure document “Diag Smart SSD Controller Firmware Update_v1.5.pdf” to determine if your system requires this FW update. You can find this document in the ZIP file linked below.

The upgrade files and instructions are available here.
MD5: a600c5916121b0fc18a94a7b5c0e6348

Feedback

 

Was this article helpful?


   

Feedback

Please tell us how we can make this article more useful.

Characters Remaining: 255