VMware Hosts not Responding after OE Update

Question

Hi my freinds

I have a Vnx 5300 with 4 Pools.Luns Presented only to VMware Esxi 5.5 hosts.I have four BL660C and about 20 DL 380 G8 and G7.

all of my Blade Hosts (Except one of them) went to Not responding state and and Vms disconnected in Vcenter.This Problem was happened During OE Upgrade and after rebooting second SP(it was occured in the middle of DAE Update) but my DL hosts didn't get into any trouble.some Vms which is hosted on Blade servers works fine and some of them didn't.

Worthwhile to mention Even I couldn't connect to Esxi host directly by Vsphere Client.

I tried to Restart Management Agent by DCUI but after 5 hours one of them get back but it didn't work for other hosts.

would you please let me know what did happened during this update?why my hosts went to not responding state?

P.S:Also four NLSAS Disks's LED change to amber LED.

thanks in advanced

Rainer_EMC · Answer

I would suggest to make sure your clients multipathing is configured correctly if you want root cause analysis what has happened I would suggest to open a service request with support

ITstarlight · Answer

Thank you Rainer_EMC.for some reason I don't have EMC support right now. as you mentioned it may be for multiplying issue,I want to know May it be related to wrong config of blade Virtual connect?

Rainer_EMC · Answer

could be any number of reasons - most likely outside of VNX see the ESX host config guide on elabnavigator on the VNX VMware TechBook

kelleg · Answer

Has the array returned to normal - do you still see the four NL-SAS disks with amber lights? Are there any errors reported in Unisphere?

The Amber lights on the four NL-SAS disks would seem to indicate that there is something wrong with the disks. You may need to reboot the Array - one SP at a time to allow for proper failover for the hosts. Start with SPA, then wait for SPA to come on-line, then reboot SPB.

As Rainer mentioned you may have issues with the way the VMware environment has been configured. I would check with VMware to see if the hosts are all properly setup.

glen

ITstarlight · Answer

Dear Glen

Thank you very much for your quick response.

1-Has the array returned to normal - do you still see the four NL-SAS disks with amber lights?

I reseated amber LED disks after OE Update while my Esxi hosts have Not responding State.it started to equalizing and the pool was rebuild.I had reason to decide to reseat disks.Because no hot spare disk replaced with them and it seemed disks are ok but I don't know why they changed to amber.3 amber disks was a member of RAID5 Pool,hence if they were really faulty disks,all of my data should be lost but after reseating and equalizing disks all of them get back to pool.

good to know all of amber disks was NLSAS 7.2 2TB in one DAE next to each other (Disk 11-12-13-14).Disk 11-12-13 presented via iscsi to a linux Server.

2-Are there any errors reported in Unisphere?

when I run SPA or SPB check it returned a lot of unit shutdown for trespass (Enclosure X bus Y Disk Z).I don't know is it normal?

3-VMware environment

As I checked different forums this issue was related to vmware Esxi 5.5 Update3.hosts went to not responding state in response to delays in completing heartbeat I/O.I should disable using ATS for heartbeat I/O and revert to the 5.5 Update 1 heartbeat method.

IBM Host Disconnects Using VMware vSphere 5.5 Update 2, 5.5 Update 3, 6.0, 6.0 Update 1 and 6.0 Update 2 - United States

VmWare Vsphere running on Blade (ESXi 5.5 U3 problems) - Jose Antonio Roa - Blog Blog

What do you think about item 3 and above links?

thank you

ITstarlight · Answer

Thanks Rainer_EMC

What do you think about VmWare Vsphere running on Blade (ESXi 5.5 U3 problems) - Jose Antonio Roa - Blog Blog and IBM Host Disconnects Using VMware vSphere 5.5 Update 2, 5.5 Update 3, 6.0, 6.0 Update 1 and 6.0 Update 2 - United States?

kelleg · Answer

For Item #2 - trespassing is an indication of pathing issues on the host. This is typically a problem with the hosts that have access to the same LUN - all host should show the same Host LUN Number for the same LUN. This can also be caused when a host loses a path to the LUN and tries to trespass the LUN to the peer SP.

For Item #3 - yes - you should following the VMware KB for disabling ATS Heartbeat as that is an issue. I believe that this may be fixed in the latest ESX 6 update 3 release from VMware.

See KB https://support.emc.com/kb/463284

glen

ITstarlight · Answer

Based on https://community.emc.com/thread/221797?start=0&tstart=0

it seems no need to disable VAAI,if I update OE to 221 version.this issue is solved in OE 05.32.000.5.221

I should disabled VAAI before Updating OE but unfortunately I didn't aware of that.hence I faced host disconnect issue

VNX

VMware Hosts not Responding after OE Update

Was this post helpful?