Unsolved
This post is more than 5 years old
11 Posts
0
3352
September 11th, 2015 10:00
Behavior of scaleio disk failure during rebuild
Can someone explain what happens during a double failure scenario? Let's say a fault set goes down and the system starts a rebuild. During the rebuild, a single disk in another fault set fails before the rebuild is complete. The disk is still there but returns read errors for a few sectors and the rebuild states there is a small set of data that still needs to finish rebuild.
How does scaleio handle this situation? How long will scaleio continue to try and read data from the bad sectors? If the failed disk is removed, is there any way to determine which blocks in the are unrecoverable? Is the volume still available if data is missing?
Thanks,
Steve
No Events found!
sschultz1
11 Posts
0
September 11th, 2015 15:00
I bet you can guess why I am asking this question.
Steve
sschultz1
11 Posts
0
September 11th, 2015 22:00
Update: Pulled the failed disk out of the 2nd failed fault set for around ~30 sec. Put disk back in. Scaleio didn't seem to notice and continued to attempt to read bad sectors over and over again. Pulled disk out again several hours later, but took ~60 sec. This seemed to be enough time to make the system angry. Received a fail capacity message for the pool status. Rebooted sds and then marked failed drive as having cleared errors. Scaleio is attempting to rebuild again.
Surprisingly, my main servers running in this environment haven't noticed a change. I have been running checksums on the main multi-TB file store and it has been returning I/O errors for a few files. I have since restored affected files from backup. I guess scaleio returning an I/O error for missing data would be the correct answer.
Luckily this is a lab and non production data. Hopefully someone finds this useful.
Steve
BMoorthy
3 Posts
0
October 25th, 2015 14:00
Volume is available to SDC with fault data & if there is active operation in volume file itself corrupts.. In My case I rebooted the multiple SDS & checked .. All my vhdx are crashed but static files were ok. That means as a object storage its ok .
RuNguyen
1 Rookie
•
13 Posts
0
June 14th, 2019 23:00
Hi everyone !
Have you fixed this problem, I have same error. I can not remove failed disk because storage pool has failed capacity.
Please help me, thanks !
Antonio Nguyen
2 Posts
0
December 29th, 2019 23:00
I have same behavior and problem on ScaleIO 2.0.
Have anyone fixed this problem?
thanks