Unsolved

This post is more than 5 years old

5 Posts

1733

October 15th, 2019 21:00

Isilon (6.5.2)SMART FAIL is running and failed FlexProtectLin job

Version:

Isilon OneFS v6.5.5.12 B_6_5_5_164(RELEASE)

Node-6# isi devices
Node 6, [ATTN]
Bay 1 Lnum 14 [HEALTHY] SN:XSV52J3A /dev/da12
Bay 2 Lnum 13 [HEALTHY] SN:XPV1R2ZA /dev/da11
Bay 3 Lnum 6 [SMARTFAIL] SN:JPW9J0HD1E9PPC /dev/da6
Bay 4 Lnum 12 [SMARTFAIL] SN:JPW9H0N013GRJV /dev/da3
Bay 5 Lnum 1 [HEALTHY] SN:JPW9K0HD2S8N8L /dev/da10
Bay 6 Lnum 4 [HEALTHY] SN:JPW9J0HD1HTK5C /dev/da8
Bay 7 Lnum 7 [SMARTFAIL] SN:JPW9K0HD2B7G5L /dev/da5
Bay 8 Lnum 10 [SMARTFAIL] SN:JPW9K0HD2AY83L /dev/da2
Bay 9 Lnum 2 [HEALTHY] SN:JPW9K0HD2NJDGL /dev/da9
Bay 10 Lnum 5 [HEALTHY] SN:JPW9K0HD2S8KJL /dev/da7
Bay 11 Lnum 8 [SMARTFAIL] SN:JPW9K0HD2S7X1L /dev/da4
Bay 12 Lnum 11 [SMARTFAIL] SN:JPW9K0HD2JA8DL /dev/da1

 

isi job status -v

Running jobs:
Job Impact Pri Policy Phase Run Time
-------------------------- ------ --- ---------- ----- ----------
FlexProtectLin[225484] Medium 1 MEDIUM 1/2 10:17:57
Progress: Processed 94829185 LINs and 7961 GB: 27009769 files, 67819343
directories; 73 errors
Last 10 of 73 errors
10/15 16:15:14 Node 6: LIN { item={ done=false }
linsid=1:1a56:0bcf::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:14 Node 6: LIN { item={ done=false }
linsid=1:1a56:0be4::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:14 Node 6: LIN { item={ done=false }
linsid=1:3362:a691::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:15 Node 6: LIN { item={ done=false }
linsid=1:3362:a6ff::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:1a56:0d16::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:3362:a707::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:3362:a70e::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:3362:a71e::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:3362:a725::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:17 Node 6: LIN { item={ done=false }
linsid=1:1a56:0d40::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor

Paused and waiting jobs:
Job Impact Pri Policy Phase Run Time State
-------------------------- ------ --- ---------- ----- ---------- -------------
SnapshotDelete[225483] Medium 2 MEDIUM 1/1 0:00:00 System Paused
Progress: n/a
FSAnalyze[225468] Low 6 LOW 1/2 12:13:04 System Paused
Progress: Processed 155854989 LINs; 0 errors
MediaScan[190752] Low 8 LOW 1/7 1:44:03 System Paused
Progress: Found 0 ECCs on 1 drive; last completed: 9:0; 1 error
03/31 23:41:54 Node 5: drive 0, sector 524288: Input/output error

Failed jobs:
Job Errors Run Time End Time Retries Left
-------------------------- ------ ---------- --------------- ------------
FlexProtectLin[225482] 400 4d 3:56 10/15 12:44:22 2
Progress: Processed 384986083 LINs and 39 TB: 200862417 files, 184123193
directories; 399 errors
Last 5 of 400 errors
10/14 17:03:16 Node 6: LIN { item={ done=false }
linsid=2:bde2:bf83::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/14 17:03:16 Node 6: LIN { item={ done=false }
linsid=2:bde2:bfa1::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/14 17:03:16 Node 6: LIN { item={ done=false }
linsid=3:1fc9:292b::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/14 17:43:16 Node 6: Bad file descriptor
10/15 12:44:22 Node 6: Phase failed with 399 previous errors

Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
08/17 17:05:04 SnapshotDelete[225026] Succeeded (MEDIUM)
08/17 17:14:57 SnapshotDelete[225027] Succeeded (MEDIUM)
08/17 17:35:05 SnapshotDelete[225028] Succeeded (MEDIUM)
08/17 17:45:02 SnapshotDelete[225029] Succeeded (MEDIUM)
08/17 17:54:53 SnapshotDelete[225030] Succeeded (MEDIUM)
08/17 21:35:20 SnapshotDelete[225031] Succeeded (MEDIUM)
08/22 01:52:42 SnapshotDelete[225063] Succeeded (MEDIUM)
10/15 12:44:22 FlexProtectLin[225482] Failed

 

Could you please let us know how to handle this situation

 

Regards,

DDSabale

 

3 Apprentice

 • 

637 Posts

October 16th, 2019 20:00

Is the Isilon cluster still under maintenance? If yes, please create SR.

5 Posts

October 16th, 2019 21:00

Hi Sir, Isilon is out of support that's why raised a concern over forum. Could you please assist on this issue? Regards, Dnyaneshwar

2 Intern

 • 

309 Posts

October 17th, 2019 18:00

As it looks like multiple disks are Smartfailing at same time, FlexProtectLIN are not working properly.

Try to run FlexProtect in stead.

 

4 Operator

 • 

1.2K Posts

October 18th, 2019 06:00

Seems like exactly the right half of the node has lost connectivity.

If I recall correctly the 12 disk SATA nodes like X200 and earlier

have one controller and two expanders for six drives each.

Check the expander for the right half (seen from front), maybe

it's only a cabling/connection problem if your're lucky, or the expander itself.

hth

-- Peter

0 events found

No Events found!

Top