SRDF/A link issues some RDFG got suspended, Transidle enabled

I had encountered a typical scenario, we have TransmitIdle enabled for the groups and due to an extended link issue all the groups went into TransIdle state ( except a few, which went into suspended ).

The snow cache was set to 94% ( default ) and the overall cache consumption by srdf/a was 64%. As other applications in the DMX-4 was starting to get affected specially the BCV syncs .. etc , i had to suspend all the RDF groups with a "suspend -immediate" to prevent the capture cycles from accumulating into the cache.

Cache Size (Mirrored) is 163840 (MB).

I didn't get a chance to analyze the situation of why some groups were in TransIdle and some were suspended as i had to prevent the production stuffs from being affected. But still wondering why it happened.

Anyone can throw some light on this one ?

Responses(3)

anoopcr

148 Posts

0

June 21st, 2012 07:00

it may due to difference is session priority

debmdig

27 Posts

0

June 21st, 2012 09:00

All the RDFGs have the same priority "33" .. I already checked that out ..

I happened to analyze the STP data for the R1 devices and the %write miss for the devices are reporting at about 5% during the time the event took place. I am assuming the devices may have touched there write pending limit, but i am unable to co-relate write pending limit and SRDF/A.

debmdig

27 Posts

0

June 21st, 2012 11:00

I happened to identify the error ( one of them is below ):

===========================

Detection time Dir Src Category Severity Error Num

------------------------ ------ ---- ------------ ------------ ----------

Tue May 01 01:14:53 2012 RF-10D Symm RDF (18) Error 0x004a

SRDF/A Session dropped, write pending limit reached. Host throttling disable

===========================

but still i am unable to locate any "write pending limit reached" situation for either of the R1 or R2 devices for the RDF group 18.

R1 devs:

R2 Devs:

The WP Count for the R2s are below 10.

There is no point in involving EMC PS as they suggested a bandwidth increment and currently we have mitigated the issue by removing several IO intensive devices from SRDF/A and putting them under SRDF/AR. I am trying to identify what actually wnet wrong and how they are inter-related to avoid the same in future.

Message was edited by: debmdig

View All

No Events found!

Symmetrix

SRDF/A link issues some RDFG got suspended, Transidle enabled

Was this post helpful?