Start a Conversation

Unsolved

This post is more than 5 years old

3819

February 15th, 2016 02:00

SDC drv_cfg --query_guid causes kernel panic on CentOS 7

Steps to reproduce:

  • Install CentOS 7 (minimal install)
  • Execute yum update
  • Compile and install ixgbe 4.3.13 (needed for Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T)
  • Install ScaleIO using the gateway
  • Execute /opt/emc/scaleio/sdc/bin/drv_cfg --query_guid on any SDC

output of kdump crash:

KERNEL: /usr/lib/debug/lib/modules/3.10.0-327.4.5.el7.x86_64/vmlinux

    DUMPFILE: /var/crash/127.0.0.1-2016-02-12-17:41:36/vmcore  [PARTIAL DUMP]

        CPUS: 8

        DATE: Fri Feb 12 17:41:23 2016

      UPTIME: 00:12:39

LOAD AVERAGE: 1.21, 1.10, 0.69

       TASKS: 199

    NODENAME: node2

     RELEASE: 3.10.0-327.4.5.el7.x86_64

     VERSION: #1 SMP Mon Jan 25 22:07:14 UTC 2016

     MACHINE: x86_64  (2200 Mhz)

      MEMORY: 31.9 GB

       PANIC: "BUG: unable to handle kernel paging request at 00007fff3e4b8570"

         PID: 3366

     COMMAND: "drv_cfg"

        TASK: ffff88084d073980  [THREAD_INFO: ffff880836604000]

         CPU: 7

       STATE: TASK_RUNNING (PANIC)

output of lspci:

00:00.0 Host bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 (rev 02)

00:01.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 1 (rev 02)

00:02.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 02)

00:02.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 02)

00:03.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 02)

00:05.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Map/VTd_Misc/System Management (rev 02)

00:05.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO Hot Plug (rev 02)

00:05.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D IIO RAS/Control Status/Global Errors (rev 02)

00:05.4 PIC: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D I/O APIC (rev 02)

00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)

00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)

00:16.1 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #2 (rev 04)

00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)

00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)

00:1c.4 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 (rev d5)

00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)

00:1f.0 ISA bridge: Intel Corporation C224 Series Chipset Family Server Standard SKU LPC Controller (rev 05)

00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)

00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)

00:1f.6 Signal processing controller: Intel Corporation 8 Series Chipset Family Thermal Management Controller (rev 05)

02:00.0 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 0

02:00.1 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 1

02:00.2 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 2

02:00.3 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 3

03:00.0 Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T

03:00.1 Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T

06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)

07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)

ff:0b.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 02)

ff:0b.1 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 02)

ff:0b.2 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link 0/1 (rev 02)

ff:0b.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R3 QPI Link Debug (rev 02)

ff:0c.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0c.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0c.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0c.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0f.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0f.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0f.5 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:0f.6 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Caching Agent (rev 02)

ff:10.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R2PCIe Agent (rev 02)

ff:10.1 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D R2PCIe Agent (rev 02)

ff:10.5 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 02)

ff:10.6 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 02)

ff:10.7 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Ubox (rev 02)

ff:12.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 (rev 02)

ff:12.1 Performance counters: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Home Agent 0 (rev 02)

ff:13.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS (rev 02)

ff:13.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Target Address/Thermal/RAS (rev 02)

ff:13.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 02)

ff:13.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 02)

ff:13.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 02)

ff:13.5 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Memory Controller 0 - Channel Target Address Decoder (rev 02)

ff:13.6 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Channel 0/1 Broadcast (rev 02)

ff:13.7 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DDRIO Global Broadcast (rev 02)

ff:1e.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1e.1 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1e.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1e.3 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1e.4 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1f.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

ff:1f.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit (rev 02)

I have been able to reproduce it several times and I'm currently out of ideas to make it work.

Thanks a lot in advance for any help and best regards,

Luis Semprun

60 Posts

February 15th, 2016 07:00

can you please post the output of:

1. uname -a

2. cat /etc/release

5 Posts

February 15th, 2016 23:00

uname -a

Linux node2 3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


cat /etc/centos-release

CentOS Linux release 7.2.1511 (Core)

60 Posts

February 16th, 2016 04:00

Unfortunately, CentOS 7.2 is not yet supported.

Please downgrade to CentOS 7.1 and make sure SELinux is not installed.

5 Posts

February 18th, 2016 04:00

Hi,


Unfortunately, it also crashes on CentOS 7.1.


uname -a

Linux node1 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/centos-release

CentOS Linux release 7.1.1503 (Core)

getenforce

Disabled

Mounting, reading & writing operations do work as expected (so far). My guess is that there's some kind of bug with the drv_cfg tool when installed on bare metal (as it works when the OS is virtualized).

Any ideas?

51 Posts

February 18th, 2016 14:00

semprunl,

I just tested this on an (admittedly different, but similar) virtual machine at both 7.1.1503 as well as 7.2.1511.

I am unable to get drv_cfg to error in this way, which jives with your experience of "if virtualized, no issue".

Do you have successful volume access otherwise? Is it only --query_guid that it crashes on, or any drv_cfg command?

Did you check the md5sum of the RHEL7 installer archive or re-download and extract on other boxes, md5summing there to verify it's not just a faulty copy of the SDC binary?

Since it works virtualized I'd investigate the difference in driver modules running between bare metal and virtual installs, and test with/without them.


The fact that you need to compile the ethernet driver also doesn't comfort me, and I wonder if the kernel is 100% happy with the NIC.

A non-answer, I know, but let me know what you find.

5 Posts

February 19th, 2016 03:00

Hi Rush,

I do have successful volume access but the following drv_cfg commands are crashing the kernel:

--query_diag_counters

--query_guid


The md5sum is valid and the compiled driver is the most current release of the ixgbe driver for Linux, which supports kernel versions 2.6.18 up through 4.3.3.  It also has been tested on the following distributions:

  - RHEL 6.7

  - RHEL 7.2

  - SLES 11SP4

  - SLES 12PS1

I also tried with an older version of ixgbe (4.1.2) which was tested with RHEL 7.1 but still the same outcome (kernel crash).

Thanks for taking a look.

5 Posts

February 19th, 2016 03:00

You may find useful the fact that I can not reproduce the crash on CentOS 6.7 using the same server and with the same Ethernet driver.

April 21st, 2016 05:00

Can you please supply the get_info output (including the crash dump, if you have it)?

Either from run collect logs from the IM-web -> maintenance view, or run /opt/emc/scaleio/mdm/bin/get_info.sh

April 21st, 2016 05:00

Can you please supply the get_info output (including the crash dump, if you have it)?

Either from run collect logs from the IM-web -> maintenance view, or run/opt/emc/scaleio/mdm/bin/get_info.sh

306 Posts

April 22nd, 2016 05:00

Hi Tomer,

We will try to reproduce it in the lab and escalate to L3 if necessary.

Thank you,

Pawel

22 Posts

September 22nd, 2016 01:00

Hi, was anyone able to solve kernel crash problem ? We ran into the same issue. And we found out, that crash is related to Intel Xeon E5 v4 CPU ( on v3 SDC works fine).

Thanks,

Matas

22 Posts

September 22nd, 2016 05:00

HI, and seems we found the solution:   SDC version: EMC-ScaleIO-sdc-2.0-7120.0.el7.x86_64.rpm  doesn't crash kernel.

Our setup: CentOS 3.10.0-327.36.1.el7.x86_64

Host bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2 (rev 01)

Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+

306 Posts

September 22nd, 2016 08:00

Hi Matas,

It sounds like:

https://support.emc.com/kb/486909

drv_cfg problem fixed in ScaleIO 2.0.0.2 and onwards.

Thanks!

Pawel

No Events found!

Top