Unsolved

This post is more than 5 years old

2 Posts

750

June 14th, 2011 21:00

PowerPath problem

Hi. Gurus.

I've question about EMC PowerPath product.

Is it normal situation when one of two paths is dead I/O traffic is blocked(?) at the alive path about 10~15 sec.?

One of our V-MAX's FA director was dead suddenly and at that moment all of Oracle (with RAC) was crashed simouteously

eventhough one path was alive. In a test environment I've found that if one path is dead (for many reasons

SAN switch fault or Storage Director falut and so on) another path's I/O was blocked about

10~15sec. In this case Oracle's wait event queued so many and after 70sec. waitng oracle CRS

dadmon (LMON) killed instance. (by alert log)

So, I wonder that blocked i/O about 10~15 sec. of alive path is normal or not. if not I think there are some

misconfigurtion of storage or OS or powerpath itself.

Storage : V-MAX with 4*FA director.

Server: IBM p570 (AIX 5.3 / 6.1)

powerpath version : PowerPath (c) Version 5.3 HF 02 (build 6)

SAN Switch: DB-5100B

thanks in advance and sorry for my poor English. :)

2 Posts

June 15th, 2011 00:00

Below is nmon data captured when I was trying to reproduce the situation.

Tested:

dd if=/dev/rldev of=/dev/null and unplugged port of FA to switch.

As you can see I/O was stopped at 3:02:29 and read I/O of fcs4 (alive path) was also 0 for 10sec.

Disk Adapter bzdbs01   (KB/s) fcs2_read fcs2_write fcs4_read fcs4_write
3:02:20 60180.8 0 120361.6 0
3:02:21 0 0 120297 0
3:02:22 60091.2 0 120182.4 15022.8
3:02:23 60150.4 0 240601.7 0
3:02:24 60130.4 0 120260.8 0
3:02:25 0 0 120315.8 0
3:02:26 60160.7 3760 120321.4 0
3:02:27 60161.4 0 240645.7 0
3:02:29 0 0 0 0
3:02:30 0 0 0 0
3:02:31 0 0 0 15050.1
3:02:32 0 0 0 0
3:02:33 0 0 0 0
3:02:34 0 0 0 0
3:02:35 0 0 0 0
3:02:36 0 0 0 0
3:02:37 0 0 0 0
3:02:38 0 0 0 0
3:02:39 0 0 0 0
3:02:41 0 0 0 0
3:02:42 0 0 120507.3 0
3:02:43 0 0 0 0
3:02:44 0 0 0 0
3:02:45 0 0 119529.7 0
3:02:46 0 0 118584.9 0
3:02:47 0 0 119338.4 0
3:02:48 0 0 119512.7 0
3:02:49 0 0 119570.1 0
3:02:50 0 0 120200.7 0

9 Legend

 • 

20.4K Posts

June 22nd, 2011 03:00

have you looked into these parameters on AIX ?  Looking at AIX host connectivity guide they mention the initial 15 seconds where host decides that the path is dead and fails over, sounds like yours is within that time ..not sure why Oracle is crashing.

Fast Fail (fc_err_recov) and Dynamic Tracking (dyntrk)

Look in host connectivity guide because based you can get different behavior based on the combination of these two parameters.

Top