Unsolved
This post is more than 5 years old
46 Posts
0
2643
November 15th, 2013 07:00
cx300i burped!
YIKES!
our cx300i just dropped and rebooted. I got the following errors in this sequence:
(I removed most of the N/A fields and reformated to try to cut down on the size of the post....)
7:44 Event Code:0x9
Description:The device, \Device\Scsi\fcdmtl3, did not respond within the timeout period. 00 00 10 00 01 00 66 00 00 00 00 00 09 00 04 c0 01 01 00 50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00
Subsystem:APM00060903193 Device:N/A SP:N/A Host:cx300_1_spb Source:fcdmtl
07:44:11 AM Event Code:0x873 Description:Flare's ATM detects one CMI connection is down.
Subsystem:APM00060903193 Device:SP B SP:SPB Host:cx300_1_spb
07:44:11 AM Event Code:0x908 Description:Fault - Cache Disabling
Subsystem:APM00060903193 Device:SP B SP:SPB
07:44:12 AM Event Code:0x9
Description:The device, \Device\Scsi\fcdmtl5, did not respond within the timeout period. 00 00 10 00 01 00 66 00 00 00 00 00 09 00 04 c0 01 01 00 50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00
Subsystem:APM00060903193 Device:N/A SP:N/A Host:cx300_1_spb Source:fcdmtl
07:44:12 AM Event Code:0x944 Description:Hard Peer Bus Error
Subsystem:APM00060903193 Device:SP B SP:SPB Host:cx300_1_spb
Source:N/A Category:N/A Log:Storage Array Sense Key:0x2 Ext Code1:0xebd77c5c Ext Code2:0x0
07:44:12 AM Event Code:0x944 Description:Hard Peer Bus Error Subsystem:APM00060903193
Device:SP B SP:SPB Host:cx300_1_spb
Source:N/A Category:N/A Log:Storage Array Sense Key:0x1 Ext Code1:0xebd77cc0 Ext Code2:0x0 Type:Error
07:44:12 AM Event Code:0x40004001 Description:#THREADO: Peer died in Run: 1073774611 40 00 40 01
Subsystem:APM00060903193 Device:N/A SP:N/A Host:cx300_1_spb
Source:MessageDispatcher Category:NT Application Log Log:NT Application Log
Type:Warning
07:44:12 AM Event Code:0xa23 Description:Peer SP Down.
Subsystem:APM00060903193 Device:SP A SP:SPB Host:cx300_1_spb
Source:N/A Category:N/A Log:Storage Array Sense Key:0x3 Ext Code1:0x0 Ext Code2:0x0
Type:Critical Error
07:44:12 AM Event Code:0x944 Description:Hard Peer Bus Error
Subsystem:APM00060903193 Device:SP B SP:SPB Host:cx300_1_spb
Source:N/A Category:N/A Log:Storage Array Sense Key:0x13 Ext Code1:0xebd77bf8 Ext Code2:0x0
Type:Error
07:44:58 AM Event Code:0x2580
Description:Storage Array Faulted Bus 0 Enclosure 0 : Faulted Bus 0 Enclosure 0 SPS A : Removed SP A : Removed
Subsystem:APM00060903193 Device:N/A SP:N/A Host:cx300_1_spb Source:N/A Category:N/A Log:Application
Type:Error
07:44:58 AM Event Code:0x1
Description:EV_HBAPort::_handleHBASPStateChanges() - list lengths differ, 3 4
Subsystem:APM00060903193 Device:N/A SP:N/A Host:cx300_1_spb Source:Navisphere Agent
Category:NT Application Log Log:NT Application Log
Type:Warning
07:44:58 AM Event Code:0x1 Description:Cabling status is unknown
Subsystem:APM00060903193 Device:N/A SP:N/A Host:cx300_1_spb
Source:Navisphere Agent Category:NT Application Log Log:NT Application Log
Type:Warning
7:44:58 AM Event Code:0x1 Description:EV_Object::~EV_Object, entries: 1
Subsystem:APM00060903193 Device:N/A SP:N/A Host:cx300_1_spb Source:Navisphere Agent
Category:NT Application Log Log:NT Application Log
Type:Warning
07:44:58 AM Event Code:0x6 Description:11/15/13 07:44:58 SP A - SP has been removed on host
Subsystem:APM00060903193 Device:SP A SP:N/A Host:cx300_1_spb
Type:Error
07:48:28 AM Event Code:0x7404
Description:Standby Power Supply (Bus 0 Enclosure 0 SPS A) is faulted. See Navisphere Manager Alerts for details.
Subsystem:APM00060903193 Device:Enclosure 0 SPS A SP:N/A Host:cx300_1_spb Log:Application
Type:Error
07:48:32 AM Event Code:0x7409
Description:Disk Processore Enclosure (Bus 0 Enclosure 0) is faulted. Servers may have lost access to disk drives in this storage system. See Navisphere Manager Alerts for details.
Subsystem:APM00060903193 Host:cx300_1_spb Log:Application
Type:Error
07:48:38 AM Event Code:0x743a
Description:Navisphere can no longer manage (SP A). This does not impact server I/O to the storage system. See Navisphere Manager Alerts for details.
Subsystem:APM00060903193 Host:cx300_1_spb Log:Application
Type:Error
07:48:38 AM Event Code:0x720e
Description:Initiator (iqn.1991-05.com.microsoft:Srv062n.dom.com) on Server (srv062N.dom.com) registered with the storage system is now inactive. It does not have a working physical connection. See Navisphere Manager for details.
Subsystem:APM00060903193 Host:cx300_1_spb Log:Application
Type:Warning
(The above message repeated fo several servers)
After the outage -
which lasted abt 5 mins - All servers reconnected - mostly with no problems. 1 server showed empty folders for all of its shares and had to be rebooted - at which time it was OK too.
Can someone explain what happened, and if remedial action is needed? The san appears to be functioning normally now with no alerts.
What I think I can guess from the errors is:
\Device\Scsi\fcdmtl3 did not respond in a timely fashion (is this the vault area?)
A CMI rebooted SPA?
Of course as a result of that, SPB could not see SPA
Caches were disabled, etc
SPS A has an error (probably unrelated?)
Things come back up and servers reregister
The errors seem to indicate it still doesn't like SPA, but I dont see any actual evidence of that. There are no trespassed LUNs and SPA status looks fine...
Thanks for any insight you can provide!
kelleg
4.5K Posts
0
November 22nd, 2013 13:00
From this message, it appears that SPA may have rebooted - Hard Peer Bus Error usually means that the SP reporting this error can not talk to the other SP.
07:44:12 AM Event Code:0x944 Description:Hard Peer Bus Error
Subsystem:APM00060903193 Device:SP B SP:SPB Host:cx300_1_spb
Source:N/A Category:N/A Log:Storage Array Sense Key:0x2 Ext Code1:0xebd77c5c Ext Code2:0x0
But without the complete spcollects, it would be difficult to determine why this occurred. Check your version of flare running on the array and make sure you are running the most current version - 26.032 for the CX300i is the current version.
glen