Start a Conversation

Unsolved

Closed

M

1 Rookie

 • 

9 Posts

716

March 24th, 2023 04:00

R720 Diagnostics - how to?

I need a step by step on how to run HW diagnostics / troubleshooting for my R720 which does not have an OS (rather I'm trying to run Unraid).

The server seems fine but after a varying amount of time, will just lock up completely (anything from 10 mins to 4+ hours).  I have enterprise level on the lifecycle controller but it's missing the HW diagnostics.

Can someone give me a guide on how to approach this and what downloads I'd need?  I've checked everything on this system and even given it new memory but still the same result so feel it could be a HW fault somewhere. The support results are confusing to say the least.

Thanks

Moderator

 • 

9.5K Posts

March 24th, 2023 08:00

Martouff,

 

You can find the walkthrough on running the hardware diagnostics here. Now if you don't have the hardware diagnostic selection then you may want to see if the server is up to date. Would you confirm the BIOS, iDrac/LCC, and raid controller versions?
Another option would be one of the diagnostics found under the R720 on the support page here

 

Let me know if this helps.

 

 

 

1 Rookie

 • 

9 Posts

March 31st, 2023 03:00

Forgot to add the Service Tag is {{{Svc Tag Removed by Moderator}}}(expired)

1 Rookie

 • 

9 Posts

March 31st, 2023 03:00

Thanks Chris and sorry for late reply.

I'm struggling to run hardware diagnostics.  They are missing from the LC and need to locate the correct file to update the LC with.

I loaded the SLI system from usb and ran the memtest - no issues.  Tried to run the processor test but couldn't find a start button - possibly due to older screen/ monitor resolution.  I'll try again today in a virtual console instead or alternatively a different monitor.

I had real issues with it though yesterday and gave up.  After a varying amount of time, the system locks up and stops responding.  This needs a reboot each time.  I think I would like to remove all the components and check connections for each one.  From the logs I'm able to see in LC, there are no issues being reported apart from a 'runtime critical stop has occurred' when I was working on the server in December.  There is no information for this error available and have put the investigation to one side for a few months.

My bios is V2.9 and LC/iDrac is 2.65.65.65 - so pretty much up to date (apart from the HW diagnostics).

Any suggestions about pulling each component to check?

I know when I purchased this server, when it arrived the SAS board was loose (due to shipping) but is working ok and I've flashed it to IT mode with no issues as I want to run unraid/ NAS.  Think I'll pull all but minimum memory, all the drives and start with the lan card before the processors (luckily I have thermal compount for them).

I thought this would be an easy system to work with.  My PE2900 is much less trouble!

 

Moderator

 • 

3.8K Posts

March 31st, 2023 07:00

Hello,

the random freeze it is not so easy to diagnose. 

I think the idea to remove all components and test it is the right one. You can test one memory bank at time and see if you have the same behaviour. Often this kind of issue you can resolve replacing motherboard and processor. You can have an esimate contacting support.

Thanks

 

Moderator

 • 

3.8K Posts

March 31st, 2023 07:00

Please can you remove the service tag from the public forum? It is a private data. Thanks

1 Rookie

 • 

9 Posts

March 31st, 2023 08:00

Marco

I'll do the pull and test and post any results but it just isn't financially viable to replace whole components.  The whole system with 16 drives has cost approx GBP200 but will, if necessary, look at 2nd hand items with warranty.  I really cannot justify spending much more on this.

1 Rookie

 • 

9 Posts

March 31st, 2023 08:00

Marco

Service tag was asked for by Chris - thanks for removing it - didn't know where I should have posted it.

1 Rookie

 • 

9 Posts

April 3rd, 2023 08:00

Hi again

My memory is not at fault but I did think it was the network daughter card causing a problem - but still not convinced after removal and refitting it.

Can you tell me how to install the LC repair from the download I have?  It asks for the drive - no problem, but then the file/ path but the usb hasn't been assigned a drive letter from what I can see.  I've tried all the letters with and just \ and / but it cannot find the .usc file on the drive.  If I could get the HW diagnositcs working it may help identify the issue(s).  In the meantime, I'll carry on testing until I hear from you guys.  I'll also try the server update utility too.

Thanks

Moderator

 • 

9.5K Posts

April 3rd, 2023 09:00

Do you have access to the server via the iDrac gui, if so then you can run it from the update tab.

 

 

No Events found!

Top