1 Rookie
•
19 Posts
0
2094
June 22nd, 2022 17:00
Getting correctable ecc error event details from iDrac
Hi
I have a few Dell Poweredge R720 , R620 etc machines that are running esxi 6.5
I see a correctable ecc error event in iDRac UI in one of them. when I ssh into it's iDrac IP and run
/admin1-> racadm getsel it gives-
Record: 4
Date/Time: 06/15/2022 17:21:59
Source: system
Severity: Non-Critical
Description: Persistent correctable memory errors detected on a memory device at location(s) DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8.
I want to know how can I get it outside the idrac on the esx server. Is there any way to do a http get or something to fetch such errors from idrac back to the esx server.
I tried running
ipmitool -v sel list which gives output as below
SEL Record ID : 0004
Record Type : 02
Timestamp : 06/15/2022 17:21:59
Generator ID : 0041
EvM Revision : 04
Sensor Type : Memory
Sensor Number : 53
Event Type : Sensor-specific Discrete
Event Direction : Assertion Event
Event Data : 00ffff
Description : Correctable ECC
But ipmitool does not tell which DIMM had the error. I can see that in the iDrac UI but I want to know what Command/API I can run on esx host to get the error DIMM information outside the iDrac. or can I make http get request to iDrac IP somehow to get the 'racadm getsel' data?
Thank you
DELL-Young E
Moderator
•
5.2K Posts
0
June 29th, 2022 20:00
How do I install it on esx host or idrac IP? idrac IP
DELL-Shine K
4 Operator
•
3K Posts
1
June 22nd, 2022 21:00
You can install iDRAC Tools on ESXi OS and run "racadm getsel" or "racadm lclog" command to retrieve the logs
iDRAC Tools for ESXi 6.5 can be downloaded from below link
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=jywk1&oscode=xi65&productcode=poweredge-r620
Details on racadm lclog command can be found in below link
https://www.dell.com/support/manuals/en-us/idrac7-8-lifecycle-controller-v2.50.50.50/idrac_2.50.50.50_racadm/lclog?guid=guid-a0e7cfb8-d01c-4695-afa9-dff5976feebd&lang=en-us
upceo
1 Rookie
•
19 Posts
0
June 23rd, 2022 19:00
Hi
Thanks. It worked. I downloaded the driver and can run racadm commands that I need.
I just want to know if I need to download different drivers for different models of servers like c series or tower models and different generations. Is there any way to automate the installation process on esx hosts?
Thanks again.
upceo
1 Rookie
•
19 Posts
0
June 25th, 2022 00:00
Hi
racadm getsel output shows following
Description: Persistent correctable memory errors detected on a memory device at location(s) DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8.
But the actual Dimm slot silkscreen labels on Dell poweredge R720 machine are DIMM_A1, DIMM_A2, DIMM_A3, DIMM_A4, and DIMM_B1, DIMM_B2, DIMM_B3, DIMM_B4.
DELL-Shine K
4 Operator
•
3K Posts
0
June 26th, 2022 05:00
iDRAC Tools installer is same for set of OS for a set of platform. You can use below link for those details. Refer "Compatible Systems" and "Supported Operating Systems" sections for details.
https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=jywk1
DELL-Shine K
4 Operator
•
3K Posts
0
June 26th, 2022 05:00
Did you have latest iDRAC and BIOS FW installed on the server? Can you also check whether LifeCycle log is showing the correct information?
upceo
1 Rookie
•
19 Posts
0
June 27th, 2022 16:00
Hi DELL-Shine K,
This is what I see in racadm lclog view
SeqNumber = 5339
Message ID = MEM0000
Category = System
AgentID = SEL
Severity = Warning
Timestamp = 2022-06-23 22:37:24
Message = Persistent correctable memory errors detected on a memory device at location(s) DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8.
Message Arg 1 = DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8
RawEventData = 0x07,0x00,0x02,0x23,0xEB,0xB4,0x62,0x81,0x10,0x04,0x0C,0x53,0x6F,0x00,0xFF,0xFF
FQDD =
--------------------------------------------------------------------------------
It shows similar DIMM locations but there is extra RawEventData and FQDD .
What do they mean?
Thank you
DELL-Young E
Moderator
•
5.2K Posts
0
June 27th, 2022 17:00
Hi, have you ever changed the memory? It doesn't seem like from when you first got the system.
https://dell.to/3bxe0j4
1 Attachment
418fd46c-be84-4cc2-a0a0-5eb037aa5aa4-557544780.PNG
DELL-Shine K
4 Operator
•
3K Posts
0
June 27th, 2022 20:00
Can you share iDRAC and BIOS FW version installed on the server?
upceo
1 Rookie
•
19 Posts
0
June 28th, 2022 16:00
Hi DELL-Shine K,
This is what I see in UI.
Bios Version: 2.2.2
Firmware version: 1.56.55 (Build 05)
How do I upgrade if I have to?
Thanks
upceo
1 Rookie
•
19 Posts
0
June 28th, 2022 16:00
Hi DELL-Young E,
I haven't changed the memory. I need to know the exact location.
the racadm command shows
Message = Persistent correctable memory errors detected on a memory device at location(s) DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8.
But actual slots are DIMM A1 to A12 and DIMM B1 to B12.
Thanks
DELL-Young E
Moderator
•
5.2K Posts
0
June 28th, 2022 17:00
https://dell.to/3bxrThl
We can try BIOS & iDRAC update
For R620, BIOS 2.9.0 , idrac 2.65.65.65 according to the official website
Support for PowerEdge R620 | Drivers & Downloads | Dell Cayman Islands
upceo
1 Rookie
•
19 Posts
0
June 29th, 2022 17:00
https://www.dell.com/support/home/en-ky/drivers/driversdetails?driverid=0ghf4&oscode=xi65&productcode=poweredge-r720
Is this where I download iDRAC with Lifecycle Controller v. 2.65.65.65 ?
It looks like a .exe file.
How do I install it on esx host or idrac IP?
I tried to upload this exe file at the iDRAC console, Overview > iDRAC Settings > Update and Rollback > Update.
but I keep getting extraction failed message.
Thanks
upceo
1 Rookie
•
19 Posts
0
June 29th, 2022 20:00
I could update the idrac version to v. 2.65.65.65 after extracting the .exe first and then uploading.
but still racadm commands show wierd DIMM location like DIMM 62
SeqNumber = 5418
Message ID = MEM0000
Category = System
AgentID = SEL
Severity = Warning
Timestamp = 2022-06-29 00:14:52
Message = Persistent correctable memory errors detected on a memory device at location(s) Card A DIMM62.
Message Arg 1 = Card A DIMM62
RawEventData = 0x0E,0x00,0x02,0x7C,0x99,0xBB,0x62,0x81,0x10,0x04,0x0C,0x53,0x6F,0x00,0x07,0x20
for some error I injected using ipmitool event command.
There are only DIMM A1-12 and B1-12 dimm slots. I don't understand how it shows Card A DIMM62.
Would appreciate the help
thanks
DELL-Shine K
4 Operator
•
3K Posts
0
June 29th, 2022 20:00
Can you update BIOS also to latest and check the behavior. You can download the .EXE file (Windows Update Package) and use iDRAC to perform BIOS update from "Overview > iDRAC Settings > Update and Rollback > Update" page