|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
Suse 11 SP 1 and other Linux flavor running on the server with Broadcom gigabit ethernet cards have an interesting bug: network connection periodically (but randomly) experience outages that last several seconds. In rare cases connectivity disappears completely and is never restored. In our case the card was four port version:
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
|
Frequency can vary. In our case this was approximately once or two times a week. Typical syslog fragment looks like
Case 1:
Jan 29 12:37:24 box1 kernel: [6003080.641907] do_IRQ: 3.172 No irq handler for vector (irq -1) Jan 29 12:37:32 box1 kernel: [6003089.013055] bnx2: eth0 DEBUG: intr_sem[0] Jan 29 12:37:32 box1 kernel: [6003089.013061] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[40000088] Jan 29 12:37:32 box1 kernel: [6003089.013067] bnx2: eth0 DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Jan 29 12:37:32 box1 kernel: [6003089.013070] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[01f70008] Jan 29 12:37:32 box1 kernel: [6003089.013073] bnx2: eth0 DEBUG: PBA[00000000] Jan 29 12:37:32 box1 kernel: [6003089.071200] bnx2: eth0 NIC Copper Link is Down Jan 29 12:37:35 box1 kernel: [6003092.181218] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ONCase 2:
Jan 18 15:44:04 box1 kernel: [5065630.877480] do_IRQ: 0.82 No irq handler for vector (irq -1) Jan 18 15:44:04 box1 kernel: klogd 1.4.1, ---------- state change ---------- Jan 18 15:44:11 box1 kernel: [5065638.250502] bnx2: eth0 DEBUG: intr_sem[0] Jan 18 15:44:11 box1 kernel: [5065638.250507] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[40000088] Jan 18 15:44:11 box1 kernel: [5065638.250513] bnx2: eth0 DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] Jan 18 15:44:11 box1 kernel: [5065638.250516] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[01bf0040] Jan 18 15:44:11 box1 kernel: [5065638.250519] bnx2: eth0 DEBUG: PBA[00000000] Jan 18 15:44:11 box1 kernel: [5065638.308650] bnx2: eth0 NIC Copper Link is Down Jan 18 15:44:14 box1 kernel: [5065641.580332] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
We have two servers on which this is observed. On one those are pretty frequent(each week), on the other no so much (just two instances for the last 6 months). But on the other this situation is really frequent:
# bzip2 -cd mes*bz2 | grep "Link is Down" Feb 7 08:54:47 box1 kernel: [1176597.882294] bnx2: eth0 NIC Copper Link is Down Feb 16 18:22:43 box1 kernel: [1988273.802087] bnx2: eth0 NIC Copper Link is Down Apr 10 09:18:31 box1 kernel: [6617621.854161] bnx2: eth0 NIC Copper Link is Down Apr 19 12:38:39 box1 kernel: [7407229.802312] bnx2: eth0 NIC Copper Link is Down Apr 22 09:24:58 box1 kernel: [7654808.874319] bnx2: eth0 NIC Copper Link is Down Apr 22 22:39:16 box1 kernel: [7702466.802227] bnx2: eth0 NIC Copper Link is Down Apr 24 19:00:59 box1 kernel: [7862169.814121] bnx2: eth0 NIC Copper Link is Down May 7 07:12:33 box1 kernel: [8942863.790287] bnx2: eth0 NIC Copper Link is Down May 8 15:22:26 box1 kernel: [9058656.814135] bnx2: eth0 NIC Copper Link is Down May 9 16:55:49 box1 kernel: [9150659.802288] bnx2: eth0 NIC Copper Link is Down May 21 14:01:58 box1 kernel: [10177028.797703] bnx2: eth0 NIC Copper Link is Down May 24 14:17:06 box1 kernel: [10437136.862092] bnx2: eth0 NIC Copper Link is Down Jun 5 15:49:14 box1 kernel: [11479464.790159] bnx2: eth0 NIC Copper Link is Down Jun 12 10:30:12 box1 kernel: [12065122.790354] bnx2: eth0 NIC Copper Link is Down Jul 10 08:21:39 box1 kernel: [14476610.813667] bnx2: eth0 NIC Copper Link is Down Jul 13 10:12:47 box1 kernel: [14742478.790192] bnx2: eth0 NIC Copper Link is Down Jul 18 16:07:05 box1 kernel: [15195736.814217] bnx2: eth0 NIC Copper Link is Down Jul 31 19:48:53 box1 kernel: [16332244.802367] bnx2: eth0 NIC Copper Link is Down Aug 6 15:21:51 box1 kernel: [16834622.814217] bnx2: eth0 NIC Copper Link is Down Aug 15 12:37:19 box1 kernel: [17602350.789994] bnx2: eth0 NIC Copper Link is Down Aug 23 11:23:42 box1 kernel: [18289133.790364] bnx2: eth0 NIC Copper Link is Down Aug 23 14:55:30 box1 kernel: [18301841.802322] bnx2: eth0 NIC Copper Link is Down Aug 28 14:44:04 box1 kernel: [18733155.790356] bnx2: eth0 NIC Copper Link is Down Aug 29 14:26:47 box1 kernel: [18818518.882215] bnx2: eth0 NIC Copper Link is Down Sep 13 15:42:50 box1 kernel: [20119081.874202] bnx2: eth0 NIC Copper Link is Down Sep 17 13:15:29 box1 kernel: [20455840.814129] bnx2: eth0 NIC Copper Link is Down Oct 5 14:47:36 box1 kernel: [75141.760185] bnx2: eth0 NIC Copper Link is Down Oct 10 09:35:14 box1 kernel: [487629.769073] bnx2: eth0 NIC Copper Link is Down Nov 5 11:26:38 box1 kernel: [2740108.822741] bnx2: eth0 NIC Copper Link is Down Nov 8 13:50:31 box1 kernel: [3007442.753450] bnx2: eth0 NIC Copper Link is Down Nov 28 10:23:03 box1 kernel: [648217.869074] bnx2: eth0 NIC Copper Link is Down Dec 4 15:59:51 box1 kernel: [1185822.121589] bnx2: eth0 NIC Copper Link is Down Dec 10 06:09:24 box1 kernel: [1667895.065491] bnx2: eth0 NIC Copper Link is Down Dec 14 15:25:08 box1 kernel: [2046132.868741] bnx2: eth0 NIC Copper Link is Down Dec 20 10:12:17 box1 kernel: [2544830.746232] bnx2: eth0 NIC Copper Link is Down Jan 3 10:36:05 box1 kernel: [3753601.877507] bnx2: eth0 NIC Copper Link is Down Jan 7 16:40:24 box1 kernel: [4120376.092928] bnx2: eth0 NIC Copper Link is Down Jan 8 14:16:02 box1 kernel: [4197969.222614] bnx2: eth0 NIC Copper Link is Down Jan 9 12:59:10 box1 kernel: [4279604.816098] bnx2: eth0 NIC Copper Link is Down Jan 9 17:01:29 box1 kernel: [4294116.717905] bnx2: eth0 NIC Copper Link is Down Jan 11 09:31:13 box1 kernel: [4439629.059153] bnx2: eth0 NIC Copper Link is Down Jan 11 15:18:16 box1 kernel: [4460413.248058] bnx2: eth0 NIC Copper Link is Down Jan 14 14:12:59 box1 kernel: [4715220.529843] bnx2: eth0 NIC Copper Link is Down Jan 17 16:04:17 box1 kernel: [4980603.066416] bnx2: eth0 NIC Copper Link is Down
# grep "Link is Down" messages Jan 18 15:44:11 box1 kernel: [5065638.308650] bnx2: eth0 NIC Copper Link is Down Jan 27 17:13:45 box1 kernel: [5847153.265359] bnx2: eth0 NIC Copper Link is Down Jan 28 08:14:54 box1 kernel: [5901121.437566] bnx2: eth0 NIC Copper Link is Down Jan 29 12:37:32 box1 kernel: [6003089.071200] bnx2: eth0 NIC Copper Link is Down
The culprit is version of bnx2 driver used on Suse 11 SP 1. Version is pretty old: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.4 (Mar 03, 2010) is installed.
cd /lib/modules/2.6.32.59-0.7-default/kernel/drivers/net # ll bnx* -rw-r--r-- 1 root root 118840 2012-07-14 13:58 bnx2.ko -rw-r--r-- 1 root root 376944 2012-07-14 13:58 bnx2x.koHardware-wise we have
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
In our case bnx2x.ko is not needed as we do not have 10GB card, only one gigabit card.
One way to deal with this situation is to disable MSI, the other to update the driver to version 7.4.27 available from the NetXtreme II 1 Gigabit Drivers Broadcom.
One optional parameter "disable_msi" can be supplied as a command line argument to the modprobe command for bnx2. This parameter is used to disable Message Signaled Interrupts (MSI) and MSI-X. The parameter is only valid on 2.6/3.x kernels that support MSI/MSI-X. By default, the driver will enable MSI or MSI-X if it is supported by the kernel. MSI-X is only supported on 5709 devices.
The driver will run an interrupt test during initialization to determine if MSI/ MSI-X is working. If the test passes, the driver will enable MSI/MSI-X. Otherwise, it will use legacy INTx mode.
Set the "disable_msi" parameter to 1 as shown below to always disable MSI/MSI-X on all NetXtreme II NICs in the system.
modprobe bnx2 disable_msi=1
The parameter can also be set in modprobe.conf :
options bnx2 disable_msi=1
In Suse the preferred place is modprobe.conf.local, not the modprobe.conf . Entries made in modprobe.conf can and will be overwritten during an RPM installation. This could mean that you will loose the setting, when there is e.g. an update in the kernel. modprobe.conf.local is overruling entries in modprobe.conf and will never be overwritten by an update or installation.
Broadcom package includes
The current versions of the drivers have been tested on all 2.6.x kernels. The
driver may not compile on kernels
older than 2.4.24. Testing was performed mainly on i386 and x86_64 architectures.
Only limited testing has been done on some other architectures.
Minor changes to some source files and Makefile may be needed on some kernels.
Additionally, the Makefile will not compile the cnic driver on kernels older than 2.6.16. iSCSI offload is only supported on 2.6.16 and newer kernels. FCoE offload is only supported on 2.6.32 and newer kernels.
The driver is released in two packaging formats: source RPM and compressed tar formats. The structure of the file name for the source RPM is:
netxtreme2-<version>.src.rpm
The file name for the tar archive is:
netxtreme2-<version>.tar.gz.Identical source files to build the drivers are included in both packages.
Following is a list of files included
1. Install the source RPM package:
rpm -ivh netxtreme2-<version>.src.rpm2. CD to the RPM path and build the binary driver for your kernel:
cd /usr/src/{redhat,OpenLinux,turbo,packages,rpm ..}(For RHEL 6.0 and above, cd ~/rpmbuild )
rpm -bb SPECS/netxtreme2.specor
rpmbuild -bb SPECS/netxtreme2.spec (for RPM version 4.x.x)
Note that the RPM path is different for different Linux distributions. The driver will be compiled for the running kernel by default. To build the driver for a kernel different than the running one, specify the kernel by defining it in KVER:
rpmbuild -bb SPECS/netxtreme2.spec --define "KVER <kernel version>"where <kernel version> in the form of 2.x.y-z is the version of another kernel that is installed on the system. 3. Install the newly built package (driver and man page):
rpm -ivh RPMS/<arch>/netxtreme2-<version>.<arch>.rpmwhere <arch> is the machine architecture such as i386:
rpm -ivh RPMS/i386/netxtreme2-<version>.i386.rpm
Note that the --force option may be needed on some Linux distributions if conflicts are reported.
The drivers will be installed in the following path:
2.6.16 and newer kernels:
/lib/modules/<kernel_version>/kernel/drivers/net/bnx2.ko /lib/modules/<kernel_version>/kernel/drivers/net/bnx2x.ko /lib/modules/<kernel_version>/kernel/drivers/net/cnic.koNewer RHEL and SLES distros:
/lib/modules/<kernel_version>/updates/bnx2.ko /lib/modules/<kernel_version>/updates/cnic.ko /lib/modules/<kernel_version>/updates/bnx2x.ko /lib/modules/<kernel_version>/updates/bnx2i.ko /lib/modules/<kernel_version>/updates/bnx2fc.ko4. Unload existing driver if necessary:
rmmod bnx2 rmmod bnx2xIf the cnic driver is loaded, it should also be unloaded along with dependent drivers:
rmmod bnx2fc rmmod bnx2i rmmod cnic5. Load the bnx2 driver for the BCM5706/BCM5708/5709/5716 devices:
insmod bnx2.koor
modprobe bnx2
To load the bnx2x driver for the BCM57710/BCM57711/BCM57711E/BCM57712 devices:
modprobe bnx2xa) Reboot the server OR
NOTES:
6. To configure network protocol and address, refer to various Linux documentations.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Jan 21, 2013This article describes a specific condition. If you observe an FTQ Dump without the Tx Ring Full error also being logged, the workaround and fix described may not be applicable. If you observe a loss of network connectivity to an ESX/ESXi host without these symptoms, see Determining why an host is labeled as Not Responding and multiple virtual machines are labeled as Disconnected (1019082).
Resolution
This issue occurs when the IRQ balancer disables the Message Signaled Interrupt vector (MSI-X) during a chip reset. The MSI-X vector gets remapped at the beginning of the Base Address Register (BAR). The driver attempts to disable the MSI, but the memory access bit is disabled instead.
The Broadcom bnx2 driver did not complete a chip reset correctly after some condition (eg, a transmission timeout). This results in corruption of the PCI configuration space, which can cause invalid address references (such as 0xffffffff), also seen in dump and logs.
This issue has been observed in bnx2 driver version 2.0.7c.
This issue is resolved in the following asynchronous Broadcom driver releases:
To resolve this issue, ensure that your ESX/ESXi host has one of these driver version installed. To download the latest Broadcom NetXtreme II Ethernet Network Controller driver version, see the VMware Download Center.
- ESX/ESXi 4.0 – Broadcom driver version 2.0.15g.8.v40.1
- ESX/ESXi 4.1 – Broadcom driver version 2.0.15g.8.v41.1
To workaround this issue, disable MSI support in the Broadcom bnx2 driver. This causes the driver to fall back to the PIN-IRQ assertion method of raising an interrupt.
To disable MSI:
- Log into the ESX/ESXi host's terminal directly or via SSH. For additional information, see Connecting to an ESX host using a SSH client (1019852).
- Reconfigure the driver module using the command:
esxcfg-module -s 'disable_msi=1' bnx2
- Reboot the server. The changes are loaded next time the module loads.
- After the ESX/ESXi host has finished booting, verify that disable_msi is set by running the command:
esxcfg-module -g bnx2
Google matched content |
NetXtreme II 1 Gigabit Drivers Broadcom
VMware KB ESX-ESXi host loses network connectivity with a Broadcom bnx2 driver FTQ dump
linux - bnx2 and e1000e drivers on RHEL 5.3 detects repeated link
Linux Kernel Documentation PCI MSI-HOWTO.txt
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019