Host disconnects related to RPA?

Question

Literally my 3rd day on the job. I don't know the infrastructure well, but the VMware guy sent me these logs from vCenter.

RPA (I believe they are virtual) is new for me, so it's not my strong point. It's a shot in the dark, but can anyone glean anything from the logs below? Point me in a direction?

I have 4 hosts that lost connectivity to the SAN (everything is back up now) about the time these errors began. Looking back in the logs these errors don't exist. I spoke with others working on this, they tell me the SAN and switches have been eliminated as the problem.

Anyone, any insights?

2017-05-25T17:40:36.273Z cpu38:60805698)esx_splitter: KL_ERROR:940: #0 - CTOR(EsxUscsiSender): device not found naa.6001248001030000b32d81af49a41d21

2017-05-25T17:40:36.273Z cpu38:60805698)esx_splitter: KL_ERROR:940: #0 - spl_esx_ioctl_uspace_uscsi: failed to create EsxUscsiSender. Failing Uscsi request

2017-05-25T17:40:36.273Z cpu38:60805698)esx_splitter: KL_ERROR:940: #0 - CTOR(EsxUscsiSender): device naa.6001248000020000b22d81af49a41d21 is in error state TO_DISCONNECT

2017-05-25T17:40:36.273Z cpu38:60805698)esx_splitter: KL_ERROR:940: #0 - spl_esx_ioctl_uspace_uscsi: failed to create EsxUscsiSender. Failing Uscsi request

2017-05-25T17:40:36.273Z cpu34:60805663)esx_splitter: KL_ERROR:940: #0 - CTOR(EsxUscsiSender): device not found naa.6001248001020000b32d81af49a41d21

2017-05-25T17:40:36.273Z cpu34:60805663)esx_splitter: KL_ERROR:940: #0 - spl_esx_ioctl_uspace_uscsi: failed to create EsxUscsiSender. Failing Uscsi request

2017-05-25T17:40:36.273Z cpu34:60805663)esx_splitter: KL_ERROR:940: #0 - CTOR(EsxUscsiSender): device not found naa.6001248000030000b22d81af49a41d21

2017-05-25T17:40:36.273Z cpu34:60805663)esx_splitter: KL_ERROR:940: #0 - spl_esx_ioctl_uspace_uscsi: failed to create EsxUscsiSender. Failing Uscsi request

2017-05-25T17:40:36.311Z cpu111:60384986)esx_splitter: KL_INFO:865: #2 - IfSendIoEsx s_disconnect: Disconnecting from RPA worldID = 0, scsiHandle = 0x410b899abf80

2017-05-25T17:40:37.025Z cpu90:60384986)esx_splitter: KL_INFO:865: #2 - PathRpa_v_check_status_impl: Rpa Port 4116ca51e120 (state OK) lease expired (120 < 234294034 - 234290821)

2017-05-25T17:40:37.025Z cpu90:60384986)esx_splitter: KL_INFO:865: #2 - PathRpa_s_refresh_inactive_path: path(0x4116ca4e8948), state(TO_DISCONNECT)

2017-05-25T17:40:37.025Z cpu90:60384986)esx_splitter: KL_INFO:865: #2 - PathRpaEsx_disconnect: starting to flush splits. path naa.6001248002020000b42d81af49a41d21

2017-05-25T17:40:37.025Z cpu46:60805698)esx_splitter: KL_ERROR:940: #0 - CTOR(EsxUscsiSender): device not found naa.6001248000020000b22d81af49a41d21

2017-05-25T17:40:37.025Z cpu46:60805698)esx_splitter: KL_ERROR:940: #0 - spl_esx_ioctl_uspace_uscsi: failed to create EsxUscsiSender. Failing Uscsi request

2017-05-25T17:40:37.026Z cpu30:60805663)esx_splitter: KL_ERROR:940: #0 - CTOR(EsxUscsiSender): device not found naa.6001248000030000b22d81af49a41d21

2017-05-25T17:40:37.026Z cpu30:60805663)esx_splitter: KL_ERROR:940: #0 - spl_esx_ioctl_uspace_uscsi: failed to create EsxUscsiSender. Failing Uscsi request

2017-05-25T17:40:37.085Z cpu34:60805698)esx_splitter: KL_ERROR:940: #0 - CTOR(EsxUscsiSender): device not found naa.6001248003020000b52d81af49a41d21

2017-05-25T17:40:37.085Z cpu34:60805698)esx_splitter: KL_ERROR:940: #0 - spl_esx_ioctl_uspace_uscsi: failed to create EsxUscsiSender. Failing Uscsi request

2017-05-25T17:40:37.085Z cpu30:60805663)esx_splitter: KL_ERROR:940: #0 - CTOR(EsxUscsiSender): device not found naa.6001248003030000b52d81af49a41d21

2017-05-25T17:40:37.085Z cpu30:60805663)esx_splitter: KL_ERROR:940: #0 - spl_esx_ioctl_uspace_uscsi: failed to create EsxUscsiSender. Failing Uscsi request

Idan · Answer

Hi there, What RPVM and splitter versions are we talking about here ? is there a SR open ? Regards, Idan Kentor RecoverPoint Corporate Systems Engineering @IdanKentor

csbrown28 · Answer

Yes, It took several days to get this posted.  In that time it's been determined that there's a bug in the RP4VM.  Honestly, I don't know the versions.  If you're still interested I'll ask my VMware guy.

Idan · Answer

Hi,

Can you please send me the SR# to the email below ? I want to see what "bug" we're talking about here.

Thanks,

Idan Kentor

RecoverPoint Corporate Systems Engineering

idan.kentor@emc.com

@IdanKentor

csbrown28 · Answer

Idan,Thanks for getting back.&#xa0; I'm getting a lot of data from my VMware person.&#xa0; I'm now being told that the cause of our problem is the access mode (image attached) was mixed.&#xa0; It should be 'Public' or 'Public-ATS only', but not mixed together.&#xa0; The problems we're seeing have nothing to do with the RP4VM. That was bad info.&#xa0; The RCA states:After working with VMware and the feedback from EMC, I am comfortable saying that the issues we experienced in the XXXX environment were most likely caused by the difference in configurations on the datastore access modes. Some of the datastores are set with access mode 'Public' and some of the datastores are set with 'Public ATS-Only.' In public access mode, each host will boot using ATS as a locking mechanism and if performance falls too low, the locking mechanism will fall back and stay at SCSI-2.&#xa0; This sets up contention between hosts which are still attempting to lock the VMFS volume using ATS and the ones that have fallen back to using SCSI-2.&#xa0; If a host still using ATS against a particular LUN attempts to place a lock on the particular block of data and then finds the volume locked by SCSI-2 it will throw an error message and retry to place the lock again.&#xa0; The longer this goes on, the more contention which can take place.See below

Idan · Answer

Thanks for the update.

RecoverPoint

Host disconnects related to RPA?

Was this post helpful?