Start a Conversation

Unsolved

This post is more than 5 years old

3238

October 26th, 2015 10:00

Storage pool has failed capacity

I have a small scaleIO test setup on vSphere 6.x, which is reporting "Storage pool has failed capacity". I also have Device failed errros on the SDS like this "ScaleIO-7304a552 (esx-a2-4.etc.etc)

Version : R1_32.2451.4

I Looked in /opt/emc/scaleio/sds/logs/trc.0 and saw this message constantly reported

26/10 11:22:01.314821 raidComb_SendMdmCombState:00529: Failed to send Comb State, a new try is needed 195a00000312, gen 5, tgt 55b041c300000002, state 16, type REBUILD, rc ABORTED

26/10 11:22:01.314857 ioh_NewRequest:02927: Write to comb 195a00000312 - Done rc is IO_HARD_ERROR (Lba 260096 1), volume dff4f35d00000000 (dit)

On vcenter my VMs using this storage are now dead, and I can't even delete them. vsphere does not report anything useful.


Not sure what to do apart form rebuilding this setup again.

The symptoms appears to be similar to this issue.

devices go offline, how to debug

The steps I went through that got me here were something like this

1. Installed 4 node scaleIO system using web installer

2. Created two VMs that use volumes from a single storage pool

3. On one of the ESX hosts the physical switch lost power so the the vmnic ports the SDS were using were down. Out of band management was still up so talking to the MDM.  In this case the SDS was not able to get network access during the weekend.

4. The ESX host above became unresponsive, I was able to ssh to it, but esxcli commands would not run, I was not able to login to the console

5. Rebooted ESX server (pulled the power)

Now I'm in the state above.

7 Posts

October 26th, 2015 12:00

Updating before its posted, another of my esx servers has gone into a disconnected state. This was the state I found this morning, on the esx server that I rebooted. I have no rebooted this one yet.

Although I can ssh into it, I cannot run esxcli commands on it (they just hang). I ran vm-support on it and it just hung doing gathering storageHostProfile data, I assume because the storage is not working.

18:58:47: Gathering output from /usr/lib/vmware/vm-support/bin/storageHostProfiles.sh

14 Posts

April 11th, 2016 20:00

I have same issue, scaleio dashborad shows "Storage pool has failed capacity" and find out one of device are failed.
But I tried to remove device, it seems pending can't normally remove.


To expect someone can resolve this problem.

1 Rookie

 • 

13 Posts

June 14th, 2019 17:00

Hi everyone  !
I have same error. Have everyone fixed this error ? Please tell to me some suggestion to fix it.
Thank you very much !

No Events found!

Top