fox_inti1

12 Posts

1487

June 10th, 2015 02:00

devices go offline, how to debug

i setup scaleio 1.32 on a vsphere6 test environement consisting of 4 esx servers.

3 servers have a local datastore with some free space, 1 server has an extra volume available

the sds servers have 2xiGb network connection

all 4 servers are sdc and sds, using the free space either in vmdk mode or as direct device

everithing is up and running fine until i start to but some load on it, then the device gets an error, this happens:

- always the device gets an error on the esx where the vm creating the load resides

- when vmotion a vm into the scallio volume the error happens after a random time

- when booting also only a little test vm, the device of that esx server where the vm boots get's on error

- the error can always be cleared and everithing seems to continue fine

- never lost access to the volume

are there somewhere logs where i can look at this at some more detail?

Responses(5)

alexkh

60 Posts

0

June 10th, 2015 03:00

Are these physical ESX servers or nested ESX?

fox_inti1

12 Posts

0

June 10th, 2015 04:00

Physical dell R610, H700 Raidcontroller, Intel Xeon X5660 CPU's, 96GB RAM, 4 to six SAS disks of different sizes.

Apart from that i would setup the disks differently to allow scaleio to access them directly i think it's a setup where it should run pretty stable

alexkh

60 Posts

0

June 10th, 2015 06:00

Please correct me if I'm wrong, did you use the local datastore as an SDS device?

Did you install ScaleIO manually or using the web plugin?

fox_inti1

12 Posts

0

June 10th, 2015 12:00

i used the local datastore as sds device

i followed the vmware installation guide for the installation. so i installed the vsphere plugin, changed the advanced setting to "enable vmdk creation" = true.

this whole process lead to a gw instance, four sds instances (which are very unbalanced from storage point of view) and sdc driver installed on each esx. each sds instance has a mgmt network and two storage networks in different subnets.

apart from vmdk storage instead of rdm everyithing looks pretty as it should.

chi_sox

19 Posts

0

June 16th, 2015 20:00

fox_initi, you can check the following logs on the primary MDM to see if there are more details on the device errors. Please feel to post any errors to the post.

General info

/opt/emc/scaleio/mdm/bin/showevents.py

More verbose logging

/opt/emc/scalio/mdm/logs/trc.0

View All

No Events found!

PowerFlex

devices go offline, how to debug

Was this post helpful?