Unsolved
This post is more than 5 years old
27 Posts
0
2132
June 19th, 2012 11:00
SRDF/A link issues some RDFG got suspended, Transidle enabled
I had encountered a typical scenario, we have TransmitIdle enabled for the groups and due to an extended link issue all the groups went into TransIdle state ( except a few, which went into suspended ).
The snow cache was set to 94% ( default ) and the overall cache consumption by srdf/a was 64%. As other applications in the DMX-4 was starting to get affected specially the BCV syncs .. etc , i had to suspend all the RDF groups with a "suspend -immediate" to prevent the capture cycles from accumulating into the cache.
Cache Size (Mirrored) is 163840 (MB).
I didn't get a chance to analyze the situation of why some groups were in TransIdle and some were suspended as i had to prevent the production stuffs from being affected. But still wondering why it happened.
Anyone can throw some light on this one ?
anoopcr
148 Posts
0
June 21st, 2012 07:00
it may due to difference is session priority
debmdig
27 Posts
0
June 21st, 2012 09:00
All the RDFGs have the same priority "33" .. I already checked that out ..
I happened to analyze the STP data for the R1 devices and the %write miss for the devices are reporting at about 5% during the time the event took place. I am assuming the devices may have touched there write pending limit, but i am unable to co-relate write pending limit and SRDF/A.
debmdig
27 Posts
0
June 21st, 2012 11:00
I happened to identify the error ( one of them is below ):
===========================
Detection time Dir Src Category Severity Error Num
------------------------ ------ ---- ------------ ------------ ----------
Tue May 01 01:14:53 2012 RF-10D Symm RDF (18) Error 0x004a
SRDF/A Session dropped, write pending limit reached. Host throttling disable
===========================
but still i am unable to locate any "write pending limit reached" situation for either of the R1 or R2 devices for the RDF group 18.
R1 devs:
R2 Devs:
The WP Count for the R2s are below 10.
There is no point in involving EMC PS as they suggested a bandwidth increment and currently we have mitigated the issue by removing several IO intensive devices from SRDF/A and putting them under SRDF/AR. I am trying to identify what actually wnet wrong and how they are inter-related to avoid the same in future.
Message was edited by: debmdig