1 Rookie
•
87 Posts
0
34
December 30th, 2024 12:24
Random loss of gateway address on N3000 stack
Issue:
Every few hours the customer complains they loose access to the internet and servers. Pings to the default gateway which is a L3 address on the VLAN within the same N3000 stack fail for anywhere between 5-30 seconds, before returning and service is restored. Every device is affected, regardless of type or VLAN. It's like the routing engine is stopping. The issue started about 4 weeks ago, but no one is able to tell me if anything new has changed or been plugged in.
Environment:
2x N3048P switched stacked together. 10 VLANs, all with a L3 interface (.254), there is a default route on out to the firewalls and onwards to the internet, the switch stack is the RSTP root and there are no other managed switches in the building.
Oberservation:
It can sometimes go hours without happening, then can me quite prolific.
show process cpu, shows nothing higher than 15% even over 300 seconds.
The switch logs arn't being very helpful.
The network has been stable for 6 years, customer doesn't have access to the config.
Unlikly to be a malicous attack.
Mittigations:
Firmware updated, no change
Reloaded the stack, no change.
Any help greatfully received.
DELL-Charles R
Moderator
•
4.4K Posts
0
December 30th, 2024 21:18
Hello,
Yes, you can configure “storm-control multicast “ for that interface initially, as on page 1156 of the N3048 CLI document.
Dell EMC Networking N-Series Switches CLI Reference Guide version 6.6.1
https://dell.to/3W0XYTj
We refer to the official documentation for switch configurations. We do not have a "best practice" guide, as far as I can see because there are variations in the customers' requirements so I don't think we have a subset, just the CLI guide.
DELL-Charles R
Moderator
•
4.4K Posts
0
December 30th, 2024 16:41
Hello,
Random occurrences can be hard to find.
Have you tried verify that all stacking cables are securely connected and in good condition.
Loose or damaged cables can cause intermittent connectivity problems.
austin-t
1 Rookie
•
87 Posts
0
December 30th, 2024 17:16
Hi,
No, however there are no errors on the stacking ports and the cables have been fine for 6 years. Remembering how solid they clip in in not sure it’s that. Since posting, I have found a PC that was only connected at 10mbps causing millions of multicasts. Within 5 minutes for clearing counters it was back to 6 figures. I’ve toggled the port and all has been quiet since.
If this was this cause then was was there nothing in the logs to suggest data plane flooding?
Ill post back if this was the fix.
DELL-Charles R
Moderator
•
4.4K Posts
0
December 30th, 2024 18:06
Hello,
Good find on the PC. We have seen another case where a server connected to a switch was causing a loop and similar intermittent connectivity problems. So hopefully you have found the trigger for the issue to be the PC.
If there are ping drops in the network, there would be dropped packets at an N3048 interface somewhere if they are being dropped on the switch.
On N3048 stack, OS6 firmware, the interfaces can be checked for various interface errors using command “Show Interfaces Counters Errors” . Stack Ports Counters can also be checked for errors using “show switch stack-ports counters”.
austin-t
1 Rookie
•
87 Posts
0
December 30th, 2024 19:43
Thanks for your assistance Charles.
The stack counters are clean:
------------TX-------------- ------------RX--------------
Data Error Data Error
Rate Rate Total Rate Rate Total
Interface (Mb/s) (Errors/s) Errors (Mb/s) (Errors/s) Errors
---------------- ------ ---------- ---------- ------ ---------- ----------
Tw1/0/1 0 0 0 0 0 0
Tw1/0/2 0 0 0 0 0 0
Tw2/0/1 0 0 0 0 0 0
Tw2/0/2 0 0 0 0 0 0
Here is the interface counters from the effect port.
SW#show interfaces counters gi2/0/15
Port InTotalPkts InUcastPkts InMcastPkts InBcastPkts
--------- ---------------- ---------------- ---------------- ----------------
Gi2/0/15 3316585 55240 3261319 26
Port OutTotalPkts OutUcastPkts OutMcastPkts OutBcastPkts
--------- ---------------- ---------------- ---------------- ----------------
Gi2/0/15 120226 87783 16713 15730
FCS Errors: ................................... 6
Single Collision Frames: ...................... 0
Late Collisions: .............................. 0
Excessive Collisions: ......................... 0
Multiple Collisions: .......................... 0
Received packets dropped > MTU: ............... 0
Transmitted oversized packets: ................ 0
Internal MAC Rx Errors: ....................... 7
Received Pause Frames: ........................ 0
Transmitted Pause Frames: ..................... 0
Receive Packets Discarded: .................... 12
Transmit Packets Discarded: ................... 0
SW#show interfaces gi2/0/15
Interface Name................................. Gi2/0/15
SOC Hardware Info.............................. BCM56340_A0
Link Status.................................... Up /None
Keepalive Enabled.............................. FALSE
Err-disable Cause.............................. None
VLAN Membership Mode........................... Access Mode
VLAN Membership................................ 30
MTU Size....................................... 9216
Port Mode [Duplex]............................. Full
Port Speed..................................... 10
Link Debounce Flaps............................ 0
Auto-Negotation Status......................... Auto
Burned In MAC Address.......................... E4F0.xxxx.xxxx
L3 MAC Address................................. E4F0.xxxx.xxxx
Sample Load Interval........................... 300
Received Input Rate Bits/Sec................... 0
Received Input Rate Packets/Sec................ 0
Transmitted Input Rate Bits/Sec................ 632
Transmitted Input Rate Packets/Sec : .......... 1
Total frames received without errors........... 3316585
Unicast frames received........................ 55240
Multicast frames received...................... 3261319
Broadcast frames received...................... 26
Total frames received with MAC errors.......... 7
Jabbers received............................... 0
Fragments/Undersize received................... 1
Alignment errors............................... 0
FCS errors..................................... 6
Overruns....................................... 0
Total received frames not forwarded............ 12
Total frames transmitted successfully.......... 120276
Unicast frames transmitted..................... 87783
Multicast frames transmitted................... 16745
Broadcast frames transmitted................... 15748
Transmit frames discarded...................... 0
Total transmit errors.......................... 0
Total transmit frames discarded................ 0
Single collision frames........................ 0
Multiple collision frames...................... 0
Excessive collision frames..................... 0
I presume I need to configure storm-control multicast level to protect against this happening again. I understand every customer is different but are you able to suggest any best practive values please?
Appriciate this model of switch is no longer mainstream support, but I an curious to know why the switch can't log some sort of data plane flooding, also odd we don't see any CPU spikes.
Memory Utilization Report
status KBytes
------ ----------
free 240684
alloc 791768
CPU Utilization:
PID Name 5 Secs 60 Secs 300 Secs
---------- ------------------- -------- -------- --------
3 (ksoftirqd/0) 0.09% 0.02% 0.01%
12 (ksoftirqd/1) 0.09% 0.02% 0.01%
1088 (procmgr) 0.00% 0.07% 0.08%
1167 (syncdb) 0.00% 0.02% 0.01%
1199 envMonitorTask 0.09% 0.01% 0.00%
1208 osapiTimer 0.00% 0.02% 0.02%
1211 bcmINTR 0.58% 0.16% 0.09%
1212 socdmadesc.0 0.00% 0.18% 0.18%
1213 bcmMEM_SCAN.0 0.00% 0.01% 0.02%
1215 bcmIbodSync.0 0.00% 0.04% 0.04%
1216 bcmL2X.0 3.50% 3.51% 3.56%
1217 bcmCNTR.0 1.16% 1.32% 1.35%
1222 bcmRX 0.19% 0.09% 0.09%
1224 bcmATP-TX 0.00% 0.01% 0.02%
1225 bcmATP-RX 0.09% 0.02% 0.02%
1236 bcmLINK.0 0.77% 0.81% 0.85%
1237 cpuUtilMonitorTask 0.19% 0.13% 0.13%
1253 tap_monitor_task 0.00% 0.02% 0.02%
1271 dtlTask 0.00% 0.02% 0.01%
1275 hapiRxTask 0.00% 0.00% 0.01%
1283 hapiBroadBufferUsag 0.00% 0.03% 0.03%
1286 hapiBroadBfdCtrlTas 0.19% 0.11% 0.11%
1310 dot1s_timer_task 0.00% 0.01% 0.00%
1324 snoopTask 0.09% 0.04% 0.05%
1335 dhcpsPingTask 0.00% 0.01% 0.01%
1361 ipMapForwardingTask 0.00% 0.00% 0.02%
1382 openrTask 0.58% 0.55% 0.62%
1394 (ospf_app) 0.00% 0.01% 0.00%
1402 bgpMapNbrAutodetect 0.00% 0.01% 0.00%
1414 ip6MapLocalDataTask 0.00% 0.01% 0.00%
1420 lldpTask 0.00% 0.04% 0.05%
1430 isdpTask 0.00% 0.02% 0.02%
1432 RMONTask 0.00% 0.06% 0.07%
1451 StatsAppTask 0.48% 0.50% 0.50%
1492 poeRead.0 0.00% 0.02% 0.02%
1594 poe_monitor 0.09% 0.02% 0.03%
2801 (kworker/0:0) 0.00% 0.02% 0.02%
------------------------------ -------- -------- --------
Total CPU Utilization 8.26% 8.14% 8.29%