Start a Conversation

Solved!

Go to Solution

1 Rookie

 • 

19 Posts

2094

June 22nd, 2022 17:00

Getting correctable ecc error event details from iDrac

Hi

I have a few Dell Poweredge R720 , R620 etc machines that are running esxi 6.5

I see a correctable ecc error event in iDRac UI in one of them. when I ssh into it's iDrac IP and run 
 /admin1-> racadm getsel            it gives-
Record:     4
Date/Time: 06/15/2022 17:21:59
Source:     system
Severity:   Non-Critical
Description: Persistent correctable memory errors detected on a memory device at location(s) DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8.

I want to know how can I get it outside the idrac on the esx server. Is there any way to do a http get or something to fetch such errors from idrac back to the esx server.

I tried running
ipmitool -v sel list              which gives output as below
SEL Record ID   : 0004
 Record Type           : 02
 Timestamp             : 06/15/2022 17:21:59
 Generator ID         : 0041
 EvM Revision         : 04
 Sensor Type           : Memory
 Sensor Number         : 53
 Event Type           : Sensor-specific Discrete
 Event Direction       : Assertion Event
 Event Data           : 00ffff
 Description           : Correctable ECC

But ipmitool does not tell which DIMM had the error. I can see that in the iDrac UI but I want to know what Command/API I can run on esx host to get the error DIMM information outside the iDrac. or can I make http get request to iDrac IP somehow to get the 'racadm getsel' data? 

Thank you

Moderator

 • 

5.2K Posts

June 29th, 2022 20:00

How do I install it on esx host or idrac IP? idrac IP

4 Operator

 • 

3K Posts

June 22nd, 2022 21:00

You can install iDRAC Tools on ESXi OS and run "racadm getsel" or "racadm lclog" command to retrieve the logs

iDRAC Tools for ESXi 6.5 can be downloaded from below link

https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=jywk1&oscode=xi65&productcode=poweredge-r620 

Details on racadm lclog command can be found in below link

https://www.dell.com/support/manuals/en-us/idrac7-8-lifecycle-controller-v2.50.50.50/idrac_2.50.50.50_racadm/lclog?guid=guid-a0e7cfb8-d01c-4695-afa9-dff5976feebd&lang=en-us 

1 Rookie

 • 

19 Posts

June 23rd, 2022 19:00

Hi 

Thanks. It worked. I downloaded the driver and can run racadm commands that I need.
I just want to know if I need to download different drivers for different models of servers like c series or tower models and different generations. Is there any way to automate the installation process on esx hosts?

Thanks again.

1 Rookie

 • 

19 Posts

June 25th, 2022 00:00

Hi

racadm getsel output shows following

Description: Persistent correctable memory errors detected on a memory device at location(s) DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8.

But the actual Dimm slot silkscreen labels on Dell poweredge R720 machine are DIMM_A1,  DIMM_A2, DIMM_A3, DIMM_A4, and DIMM_B1, DIMM_B2, DIMM_B3, DIMM_B4.

How can I map DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8. to actual dimm slots 
DIMM_A1,  DIMM_A2, DIMM_A3, DIMM_A4, and DIMM_B1, DIMM_B2, DIMM_B3, DIMM_B4.
 
Why is idrac not showing the actual dimm slot labels in the sel log.
 
Thanks
 

4 Operator

 • 

3K Posts

June 26th, 2022 05:00

iDRAC Tools installer is same for set of OS for a set of platform. You can use below link for those details. Refer "Compatible Systems" and "Supported Operating Systems" sections for details.

https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=jywk1 

4 Operator

 • 

3K Posts

June 26th, 2022 05:00

Did you have latest iDRAC and BIOS FW installed on the server? Can you also check whether LifeCycle log is showing the correct information? 

1 Rookie

 • 

19 Posts

June 27th, 2022 16:00

Hi DELL-Shine K,

This is what I see in racadm lclog view

SeqNumber = 5339
Message ID = MEM0000
Category = System
AgentID = SEL
Severity = Warning
Timestamp = 2022-06-23 22:37:24
Message = Persistent correctable memory errors detected on a memory device at location(s) DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8.
Message Arg 1 = DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8
RawEventData = 0x07,0x00,0x02,0x23,0xEB,0xB4,0x62,0x81,0x10,0x04,0x0C,0x53,0x6F,0x00,0xFF,0xFF

FQDD =
--------------------------------------------------------------------------------

It shows similar DIMM locations but there is extra RawEventData and FQDD .

What do they mean?

Thank you

Moderator

 • 

5.2K Posts

June 27th, 2022 17:00

Hi, have you ever changed the memory? It doesn't seem like from when you first got the system. 

 

https://dell.to/3bxe0j4

1 Attachment

4 Operator

 • 

3K Posts

June 27th, 2022 20:00

Can you share iDRAC and BIOS FW version installed on the server?

1 Rookie

 • 

19 Posts

June 28th, 2022 16:00

Hi DELL-Shine K,

This is what I see in UI.

Bios Version: 2.2.2

Firmware version: 1.56.55 (Build 05)

How do I upgrade if I have to?

Thanks

1 Rookie

 • 

19 Posts

June 28th, 2022 16:00

Hi DELL-Young E,

I haven't changed the memory. I need to know the exact location.

the racadm command shows 
Message = Persistent correctable memory errors detected on a memory device at location(s) DIMM1,DIMM2,DIMM3,DIMM4,DIMM5,DIMM6,DIMM7,DIMM8. 

But actual slots are DIMM A1 to A12 and DIMM B1 to B12.

Thanks

Moderator

 • 

5.2K Posts

June 28th, 2022 17:00

https://dell.to/3bxrThl

 

We can try BIOS & iDRAC update

For  R620,  BIOS  2.9.0 , idrac  2.65.65.65 according to the official website

 

Support for PowerEdge R620 | Drivers & Downloads | Dell Cayman Islands

 

1 Rookie

 • 

19 Posts

June 29th, 2022 17:00


https://www.dell.com/support/home/en-ky/drivers/driversdetails?driverid=0ghf4&oscode=xi65&productcode=poweredge-r720
Is this where I download  iDRAC with Lifecycle Controller v. 2.65.65.65 ?

It looks like a .exe file.
How do I install it on esx host or idrac IP?
I tried to upload this exe file at the iDRAC console, Overview > iDRAC Settings > Update and Rollback > Update.

but I keep getting extraction failed message.

Thanks

1 Rookie

 • 

19 Posts

June 29th, 2022 20:00

I could update the idrac version to v. 2.65.65.65 after extracting the .exe first and then uploading.
but still racadm commands show wierd DIMM location like DIMM 62

SeqNumber       = 5418
Message ID      = MEM0000
Category        = System
AgentID         = SEL
Severity        = Warning
Timestamp       = 2022-06-29 00:14:52
Message         = Persistent correctable memory errors detected on a memory device at location(s) Card A DIMM62.
Message Arg   1 = Card A DIMM62
RawEventData    = 0x0E,0x00,0x02,0x7C,0x99,0xBB,0x62,0x81,0x10,0x04,0x0C,0x53,0x6F,0x00,0x07,0x20

for some error I injected using ipmitool event command. 
There are only DIMM A1-12 and B1-12 dimm slots. I don't understand how it shows Card A DIMM62.

Would appreciate the help

thanks

4 Operator

 • 

3K Posts

June 29th, 2022 20:00

Can you update BIOS also to latest and check the behavior. You can download the .EXE file (Windows Update Package) and use iDRAC to perform BIOS update from "Overview > iDRAC Settings > Update and Rollback > Update" page

No Events found!

Top