Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1843

February 7th, 2017 09:00

Scaleio failed capacity

Hello, we have a problem with scaleio cluster.

Failed capacity is reported if only one sds is down.

48830707 2017-02-07 17:55:20.649 SDS_DISCONNECTED          ERROR         SDS: devvirtp0024l00 (id: 1ce2de4e00000003) decoupled.

48830716 2017-02-07 17:55:21.650 MDM_DATA_FAILED           CRITICAL      The system is now in DATA FAILURE state. Some data is unavailable.

48830745 2017-02-07 17:55:23.490 SDS_RECONNECTED           INFO          SDS: devvirtp0024l00 (ID 1ce2de4e00000003) reconnected

48830770 2017-02-07 17:55:24.649 MDM_DATA_DEGRADED         ERROR         The system is now in DEGRADED state.

Cluster is running 3 fault-sets.

There are enought capacity :

SDS Summary:

        Total 15 SDS Nodes

        15 SDS nodes have membership state 'Joined'

        15 SDS nodes have connection state 'Connected'

        51.7 TB (52935 GB) total capacity

        28.5 TB (29227 GB) unused capacity

        201.9 GB (206752 MB) snapshots capacity

        15.3 TB (15682 GB) in-use capacity

        15.1 TB (15480 GB) thin capacity

        15.3 TB (15682 GB) protected capacity

Any ideas ?

Thanks,

Matas

306 Posts

February 8th, 2017 05:00

Hi Matas,

In a correctly configured, healthy cluster, I can't recall a situation when a single disconnected SDS would cause a DATA FAILURE state.

Can you please check in the GUI (or CLI - scli --query_sds) if there are any devices in the Error state?

Probably the best way of investigation would be through the Service Request, so we can view all the logs and see what exactly was going on there - please open an SR and we'll take it from there.

Thank you,

Pawel

February 7th, 2017 22:00

Was there any rebuild activity already going on when the SDS disconnected?

This can happen when already some rebuild activity was going on and another SDS fails. In that case both the combs for certain data may not be available.

Following 2 events show that even after the SDS was reconnected, Data was still in degraded state:-

48830745 2017-02-07 17:55:23.490 SDS_RECONNECTED           INFO          SDS: devvirtp0024l00 (ID 1ce2de4e00000003) reconnected

48830770 2017-02-07 17:55:24.649 MDM_DATA_DEGRADED         ERROR         The system is now in DEGRADED state.

22 Posts

February 7th, 2017 23:00

Hi, any rebuilding or re-balancing activities was noticed during SDS shut off.

Data was degraded after sds reconnected because  IO was happening and some data needed to be rebuild. Also DRL was cleaned so degraded is normal after SDS was connected. but failed capacity is not expected and unacceptable.

We tried it on 4 SDS nodes (each time  cluster was healthy) and in all times failed data was detected.

Anything else ?

22 Posts

February 9th, 2017 04:00

Hello, it seems we were running SW bug, with checksum protection enabled. We disabled checksum protection, and tried maintenance mode and SDS off. worked perfectly. So root cause Checksum protection. Thank you.

No Events found!

Top