protocol error in packet received from initiator

Question

I have a Windows 2008 R2 vm connected to a PS group (6210xs and 6000). I have 4 iscsi LUNs for Windows data volumes, two on the 6210 and two on the 6000. The vm lun is on the 6210. We set up the 6210 back around the beginning of the year, and all of the data volumes were set up with MPIO 5 or 6 months ago.

This morning, one of the 2 data volumes on the 6210 disappeared. Windows Disk Mgmt did not show it online or offline, but the windows iscsi initiator showed the lun as connected. I disconnected the lun and reconnected it, and the volume immediately came back online and was fully accessible. The EQ monitor had an info message for the disconnection with "protocol error in packet received from initiator" at the end of the message. 2 seconds later, the lun reconnected.

This sounds like a windows initiator problem rather than an EQ problem, but I thought I'd check here anyway. It's interesting that it only happened to 1 volume out of 4. This volume has been connected since May with no problems up until today, and I haven't made any config changes since May. Both of the multipathed vnics are vmxnet3 with all of the same settings (jumbo frames enabled, offloading disabled, specific numbers for Rx rings, etc)

Any ideas on any settings I should check or modify, particularly with the vnics? I'll probably post this on technet, too.

Thanks.

Tim

trnc · Answer

Update -

FWIW, I did update the EQ firmware from 7.1.4 to 7.1.7 a couple of weeks ago, and I've found a fair number of iscsiprt disconnect errors in the windows system event log starting about that time. Generally 3 errors at a time, then info:

Target failed to respond in time to a NOP request (id 48)

Connection to the target was lost. The initiator will attempt to retry the connection (20)

The initiator could not send an iscsi pdu. Error status is given in the dump data. (7)

(Info) A connection to the target was lost, but Initiator successfully reconnected to the target. (34)

This shows up for different volumes, so apparently after the event today on one volume, it just wasn't able to reconnect.

My large receive buffer is set to 1024. Perhaps I should increase to 2048 or more?

trnc · Answer

By coincidence, we're scheduled to meet with our support rep about the EQ next Tuesday, so I have a 24 hour dpack. Unfortunately, there were no disk related errors during that run. Would a SanHQ archive contain that kind of info if I grab one from the last few days?

I used to see a lot of the disconnect/reconnect messages, but I believe those almost disappeared after I set up the mpio. The windows event errors appear to have started up again around the day I upgraded the firmware to 7.1.7 (I'm searching for the actual date), but there are no disk or iscsi errors before Sep 3rd. I'm still gathering info, though.

trnc · Answer

OK. We have redundant Dell PowerConnect 5424 switches, and the paired mpio connections are sent through different switches.

trnc · Answer

We have SanHQ and I have Support Assist enabled. I'll look into getting more robust switches. We have 4 ports trunked right now, so I'll log into the switches and check the stats. I'll also open a ticket.

I had iscsi optimization disabled for years, but I enabled it at the suggestion of Bill Tarvin (Dell Sales Engineer) at the time we joined the 6210 to the group with the 6000.

trnc · Answer

OK, I'll turn off the iscsi optimization, verify the flow control and more trunk ports. As always,  thanks for all the info.

trnc · Answer

OK. I just downloaded 7.1.8. I usually like to wait a couple of weeks and check the forums before applying updates when I can, but I'll probably shoot for early next week based on a couple of the fixes.

Question about our 5424 switches. We were going to get 10Gb switches when we got the 6210, but we then discovered that we couldn't have the 6000 and 6210 in the same group with 10G switches unless we went through a pretty complicated reconfiguration. Our network saturation had never reached 15%, so we decided to wait until we replace the 6000 in about a year and then get new switches. We're still only using 17% or so on average, with a couple of peaks at less than 30%. Are there newer 1Gb switches we should consider, or just make your suggested changes on the current ones and monitor?

Origin3k · Answer

We start in the year 2008 with a EQL solutution which comes with PC5448 and to be honest it was never a good storage switch. Buffer not large enough and latency was "high" and the iscsi optimization wasnt designed for EQL.

With the first PS6010XV we switched to PC8024f and later to Force10 s4810. The Force10 is also available as s4820T with Base-T instead of SFP+. With Base-T you can attach all EQLs and Servers and migrate slowly to 10GbE if you want.

Dell also have the N4032(f) and N4064 in the portofolio. We have a pair of N4032f with PS6210X and 2 Hosts and it use SyncRepl. to another location which have PC8024F and PS6110X/6010X and also 2 Hosts.

Regards,

Joerg

EqualLogic

protocol error in packet received from initiator

Was this post helpful?