Unsolved
This post is more than 5 years old
22 Posts
1
1125
October 1st, 2013 17:00
After 5876 Micro Code upgrade , we have seen lot of disk failures on VMAX. I.E one disk every day. I won't say Micro code could be the reason which might be a coincident. Can anyone help me out.
Can any one help find the reason for so many disk failures, from the past 15 days we had 15 disk failures.
No Events found!
taz_at_emc
54 Posts
0
October 1st, 2013 21:00
Kiran,
The code is "never" a problem with VMAX. Being an EMC Customer I have never faced this problem in our environment and nor seen in other environments. A code is released only when the EMC Engineering team and the other high level reviewers review it and give a GO LIVE signal. Coming to the disk failures - it depends on the type of disk and the high disk read/write ratio.
I strongly recommend opening a high priority support case and call in the EMC CE onsite to physically inspect your VMAX.
Note: You can provide the code level and the VMAX type in this forum.
Taz
umichklewis
3 Apprentice
•
1.2K Posts
0
October 2nd, 2013 06:00
One thing to consider is that the drives are most likely not "failed" in the classic sense. The VMAX leverages predictive analysis on the drives and attempts to spare them "before" they actually fail out. There's always a chance that the code upgrade is now seeing issues that meet criteria for sparing them out. Be sure to check with your CE to see if a hotfix or patch is needed to address the issue.
Hope this helps!
kiranmims
22 Posts
0
October 3rd, 2013 13:00
This is what EMC has to say.
WIth 5876 code there is an enhancement in the code to proactively spare out the drive after certain number of disk errors.
As the code is upgraded just Two weeks back, you would see high replacement count. It should gradually reduce in the next one week or so.
kiranmims
22 Posts
0
October 3rd, 2013 13:00
Thanks for your reply.. I've opened a SEV2 case with EMC to inspect the same as everyday one Disk is getting failed.
Fenglin1
4 Operator
•
2.1K Posts
0
October 7th, 2013 22:00
It might also be the historic events in your array which already invoked spare previously. We were experiencing similar issue that after code upgrading Enginuity will trigger disk failure event in symmwin log that indicate disk is failure but the disk was replaced indeed in previously maintenance. You'd better confirm with your CE or call PSE for checking the failed disk status in you array then ignore the elder disk failure events.