Start a Conversation

Unsolved

This post is more than 5 years old

699

February 8th, 2012 23:00

Trace mysterious IO burst

Hi DMX experts,

We have a DMX4 doing around 12-14K IOps max, each Front-end director usually doesn't exceed 5K IOps.  Recently, we had an accident where peformance degraded mutiple VMs (Vmware ESXI 4.1 cluster).  Looking at performance stats from ECC performance Manager, one thing jumped out is one of the front-end directors has a mysterious IO burst, it jumps to million IOps in that one short 15 minutes interval, (WLA collection points every 15 min on our ECC).  I know 1 million IOps sounds crazy but at least that's what the graph show for that single instance.  This front-end director ties to Vmware, Windows, Linux host as one of the paths.  EMC Support checked our box and confirmed the hardware is OK.  Since the hardware is OK, the other possibility I can think of is one of SAN host mis-behaved and sent a bunch of bad SCSI operations.  Just wondering if anyone experience similar situations?  Any suggestion to look for the rogue SAN host would be greatly appreciated.

Thank you

1.3K Posts

February 9th, 2012 05:00

I doubt it is a rouge host, but more likely a rouge data collection issue.  I would assume that those IOPs are not real.

February 10th, 2012 10:00

The performance impact was real, multiple systems experienced the sluggishness during that time.  This counter jumped out as an abnormality, not sure if this will lead to anything.  Thanks

1.3K Posts

February 10th, 2012 10:00

Again, I would bet really good money that it is a counter error and isn't real.

February 10th, 2012 10:00

Thanks Quincy56,

Digged deeper into ECC Performance Manager stats, like you said it's not real IOPs.  Rather it's REQUEST per sec (read and write request).  Researched on the request per second definition, it has to do with the number of cache slots accessed by the FE.  What caused the millions Requests per sec?  I'll send a update when I have anything from EMC Support.

1.3K Posts

February 10th, 2012 10:00

So my guess is that whatever caused the performance issue also caused the counter issue.

Can you run a symaudit command and see if there was some event that happened at the same time?

2 Intern

 • 

1.3K Posts

May 8th, 2012 12:00

did u you get to the root of this eventually?

No Events found!

Top