Unsolved
This post is more than 5 years old
5 Posts
0
1733
October 15th, 2019 21:00
Isilon (6.5.2)SMART FAIL is running and failed FlexProtectLin job
Version:
Isilon OneFS v6.5.5.12 B_6_5_5_164(RELEASE)
Node-6# isi devices
Node 6, [ATTN]
Bay 1 Lnum 14 [HEALTHY] SN:XSV52J3A /dev/da12
Bay 2 Lnum 13 [HEALTHY] SN:XPV1R2ZA /dev/da11
Bay 3 Lnum 6 [SMARTFAIL] SN:JPW9J0HD1E9PPC /dev/da6
Bay 4 Lnum 12 [SMARTFAIL] SN:JPW9H0N013GRJV /dev/da3
Bay 5 Lnum 1 [HEALTHY] SN:JPW9K0HD2S8N8L /dev/da10
Bay 6 Lnum 4 [HEALTHY] SN:JPW9J0HD1HTK5C /dev/da8
Bay 7 Lnum 7 [SMARTFAIL] SN:JPW9K0HD2B7G5L /dev/da5
Bay 8 Lnum 10 [SMARTFAIL] SN:JPW9K0HD2AY83L /dev/da2
Bay 9 Lnum 2 [HEALTHY] SN:JPW9K0HD2NJDGL /dev/da9
Bay 10 Lnum 5 [HEALTHY] SN:JPW9K0HD2S8KJL /dev/da7
Bay 11 Lnum 8 [SMARTFAIL] SN:JPW9K0HD2S7X1L /dev/da4
Bay 12 Lnum 11 [SMARTFAIL] SN:JPW9K0HD2JA8DL /dev/da1
isi job status -v
Running jobs:
Job Impact Pri Policy Phase Run Time
-------------------------- ------ --- ---------- ----- ----------
FlexProtectLin[225484] Medium 1 MEDIUM 1/2 10:17:57
Progress: Processed 94829185 LINs and 7961 GB: 27009769 files, 67819343
directories; 73 errors
Last 10 of 73 errors
10/15 16:15:14 Node 6: LIN { item={ done=false }
linsid=1:1a56:0bcf::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:14 Node 6: LIN { item={ done=false }
linsid=1:1a56:0be4::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:14 Node 6: LIN { item={ done=false }
linsid=1:3362:a691::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:15 Node 6: LIN { item={ done=false }
linsid=1:3362:a6ff::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:1a56:0d16::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:3362:a707::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:3362:a70e::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:3362:a71e::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:16 Node 6: LIN { item={ done=false }
linsid=1:3362:a725::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/15 16:15:17 Node 6: LIN { item={ done=false }
linsid=1:1a56:0d40::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
Paused and waiting jobs:
Job Impact Pri Policy Phase Run Time State
-------------------------- ------ --- ---------- ----- ---------- -------------
SnapshotDelete[225483] Medium 2 MEDIUM 1/1 0:00:00 System Paused
Progress: n/a
FSAnalyze[225468] Low 6 LOW 1/2 12:13:04 System Paused
Progress: Processed 155854989 LINs; 0 errors
MediaScan[190752] Low 8 LOW 1/7 1:44:03 System Paused
Progress: Found 0 ECCs on 1 drive; last completed: 9:0; 1 error
03/31 23:41:54 Node 5: drive 0, sector 524288: Input/output error
Failed jobs:
Job Errors Run Time End Time Retries Left
-------------------------- ------ ---------- --------------- ------------
FlexProtectLin[225482] 400 4d 3:56 10/15 12:44:22 2
Progress: Processed 384986083 LINs and 39 TB: 200862417 files, 184123193
directories; 399 errors
Last 5 of 400 errors
10/14 17:03:16 Node 6: LIN { item={ done=false }
linsid=2:bde2:bf83::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/14 17:03:16 Node 6: LIN { item={ done=false }
linsid=2:bde2:bfa1::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/14 17:03:16 Node 6: LIN { item={ done=false }
linsid=3:1fc9:292b::HEAD btree_iter={ done=false depth=0
key_high=0x0000000000000000 key_low=0x0000000000000000 } } fstat failed:
Bad file descriptor
10/14 17:43:16 Node 6: Bad file descriptor
10/15 12:44:22 Node 6: Phase failed with 399 previous errors
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
08/17 17:05:04 SnapshotDelete[225026] Succeeded (MEDIUM)
08/17 17:14:57 SnapshotDelete[225027] Succeeded (MEDIUM)
08/17 17:35:05 SnapshotDelete[225028] Succeeded (MEDIUM)
08/17 17:45:02 SnapshotDelete[225029] Succeeded (MEDIUM)
08/17 17:54:53 SnapshotDelete[225030] Succeeded (MEDIUM)
08/17 21:35:20 SnapshotDelete[225031] Succeeded (MEDIUM)
08/22 01:52:42 SnapshotDelete[225063] Succeeded (MEDIUM)
10/15 12:44:22 FlexProtectLin[225482] Failed
Could you please let us know how to handle this situation
Regards,
DDSabale
0 events found


Phil.Lam
3 Apprentice
•
637 Posts
0
October 16th, 2019 20:00
Is the Isilon cluster still under maintenance? If yes, please create SR.
DDSabale
5 Posts
0
October 16th, 2019 21:00
Go.Y
2 Intern
•
309 Posts
0
October 17th, 2019 18:00
As it looks like multiple disks are Smartfailing at same time, FlexProtectLIN are not working properly.
Try to run FlexProtect in stead.
Peter_Sero
4 Operator
•
1.2K Posts
0
October 18th, 2019 06:00
Seems like exactly the right half of the node has lost connectivity.
If I recall correctly the 12 disk SATA nodes like X200 and earlier
have one controller and two expanders for six drives each.
Check the expander for the right half (seen from front), maybe
it's only a cabling/connection problem if your're lucky, or the expander itself.
hth
-- Peter