Jump to content
 English      
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
     Forums advanced search
HP.com Home
IT Resource Center Forums > Linux > general

NIC Bonding on RHEL 4

» 

IT Resource Center

» Login
» Register
» My profile
» Search knowledge base
» Forums
» Patch database
» Download drivers, software and firmware
» Warranty check
» Support Case Manager
» Software Update Manager
» Training and Education
» More maintenance and support options
» Online help
» Site map

Member icons
 
 HP moderator  HP moderator
 Expert in this area  Expert in this area
Member status
ITRC Pro ITRC Pro
250 points
ITRC Graduate ITRC Graduate
500 points
ITRC Wizard ITRC Wizard
1000 points
ITRC Royalty ITRC Royalty
2500 points
ITRC Pharaoh ITRC Pharaoh
7500 points
Olympian Olympian
20000 points
1-Star Olympian 1-Star Olympian
40000 points
2-Star Olympian 2-Star Olympian
80000 points
»  How to earn points
»  Support forums FAQs
Question status
Magical answer Magical answer
Message with a response that solved the author's question
Favorites status
Add to my favorites Add to my favorites
Delete from my favorites Delete from my favorites
This thread has been closed Thread closed
 

Content starts here
   Create a new message    Receive e-mail notification if a new reply is posted  Reply to this message
Author Subject: NIC Bonding on RHEL 4      Add to my favorites
Ross Kennedy
Jul 20, 2005 05:53:44 GMT    Attachment is 263459.Zip 

Hi
I'm trying to set up Bonding on a DL380G4.
OS is RHEL 4 with the latest patches plus the latest HP Softpaq's (7.30).

While the box is still on the network, I don't think I have bonding set up properly (if at all) as only eth0 is "live". It doesn't fail over to eth0 if I disconnect eth0.

Before attempting bonding, I did check that both NIC's do work!

I've attached a file with my config files and the output from ifconfig. Somewhere I read that ifconfig should show bond0 as well as eth0 and eth1 but on my system ifconfig only shows bond0.

I've lost count of the number of manuals I've read. Also trawled through this forum and everything looks as though bonding is set correctly.

Can anyone suggestion how I go about troubleshooting?
Note: If you are the author of this question and wish to assign points to any of the answers, please login first.For more information on assigning points ,click here


Sort Answers By: Date or Points
Patrick Terlisten This member has accumulated 2500 or more points
Jul 20, 2005 06:36:43 GMT  6 pts

Hi Ross,

look at this Bugzilla Report:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=159500

Other question: Can you reach the server with ping or ssh? Is the server available on the network? Can you bring eth0 and eth1 up with "ifup eth0" or "ifup eth1"?

Regards,
Patrick
Stuart Browne This member has accumulated 7500 or more points
Jul 20, 2005 06:48:08 GMT  6 pts

Hrm.. Start by trying the HP supplied 'bcm5700' driver instead of the RH provided 'tg3' driver ( http://h18000.www1.hp.com/support/files/server/us/download/22318.html ).

If that still fails, also look at the 'HP tested bonding driver' ( http://h18000.www1.hp.com/support/files/server/us/download/22271.html ).

Also, what sort of switch do you have the other end of the network cables plugged in to? Have you set up (or does the switch auto-detect) trunking for the two ports?
Ross Kennedy
Jul 20, 2005 07:17:43 GMT    N/A: Question Author Attachement is 263462.txt 

Thanks for the reply.
I can connect to the box quite happily. PING and SSH both work.

Here is what happens when I run ifup.

[root@penguin ~]# ifup eth0
tg3 device eth0 does not seem to be present, delaying initialization.
[root@penguin ~]# ifup eth1
Enslaving eth1 to bond0

See attachment

eth1 makes an appearance in ifconfig (good) and netstat (which isn't what I would expect) so you are on the right track in that eth1 appears to start disabled. I can see in netstat -i that the RX count for eth1 is increasing but the TX count is not changing.

However, the bonding function still isn't working. To me, it looks as though I have 2 NIC's called bond0 and eth1 with the same IP and ethernet addresses but they are not bonded.
Ross Kennedy
Jul 20, 2005 07:37:20 GMT    N/A: Question Author

Stuart,
Thanks for the reply.

The system is in our lab just now. Both NIC's are cabled into a hub which is taking care of the age old problem of HP/Compaq NIC's failing to auto detect 100M/Full on our switches. ethtool shows both bond0 and eth1 have autodetected to 100M/Full OK.

I've checked that I DO have bcm5700 installed.
[rkennedy@penguin ~]$ rpm -q bcm5700
bcm5700-7.4.12b-1

I confess I've tried changing modules.conf but it doesn't seem to make any difference but I'll definitely go back to alias eth0/1 bcm5700.

I did initially try to install the bonding driver but read somewhere that bonding is natively supported on this server&OS combination.
Steven E. Protter This member has accumulated 80000 or more points
Jul 20, 2005 08:05:54 GMT  6 pts

This is very problematic on Linux, and it gets worse if you use Gigabit cards.

The only GB cards I've gotten to work in the RedHat/Centos world is Intel. You have to add code to your /etc/init.d/network script to make the bonding work.

Also if you use ethtool after doing the bonding, you will find the speed does not show accurately.

If you want specefic configuration files, use this webform to contact me.

http://www.isnamerica.com/contactsep.shtml

SEP
Daniel Hili
Jul 20, 2005 21:25:42 GMT  6 pts

Hi Ross,

Bonding is supported natively by RHEL and I have tested with both tg3 and e1000 drivers.

I've recently had a lot of problems configuring two e1000 NICs as bond0 on an RHEL 3 system.

We could successfully load up the bonding module using modprobe whilst the operating was up and running but for some unknown reason, after every reboot the NICs would start up wrong and we wouldn't see all devices listed properly. Using ifconfig -a we would see NICs listed as devXXXXX rather than the usual eth0, eth1, etc.

Anyway to keep a long story short, we renamed our script /etc/sysconfig/network-scripts/ifcfg-bond0 to ifcfg-zbond0. Adding the "z" effectively changed the starting lineup of the bonding interface to after the slave interfaces. Ever since we made this change bonding has worked a dream but I've still got a call logged with RH to figure out what is going on.

Hope it helps.

Dan
Matt Palmer Expert in this area This member has accumulated 250 or more points
Jul 21, 2005 04:07:46 GMT  6 pts

Hi,

have u looked here:

http://docs.hp.com/en/B9903-90046/ch05s04.html

that is redhat specific, I have got this workin g successfully on SuSE SLES8, not sure if there is much difference in the procedure, but one problem I had was that the instructions were slightly back to front in places, namely adding the bond lines to the /etc/modules.conf doc before trying to use ifenslave to enlist the specific NICS to the bond0.

HTH

regards

Matt Palmer
Matt Palmer Expert in this area This member has accumulated 250 or more points
Jul 21, 2005 04:14:25 GMT  6 pts

Hi, the other thing I forgot to mention is that installing the proliant support PAQ (PSP) is quite handy, as you can see in a user friendly format, how your cards are working, ie: if you've set the mode= line to '1' then the SIM web front end(http://serverip:2301) should tell you under NIC that bond0 is up using active-backup mode, if you set mode to 3 then it will say, 'switch assisted load-balancing, and if 2 'balanced-xor'.

One thing to note is that mode 3 does not work well if both NICS are going into the same network switch

one last thing is did u rebuild the kernel when you did this to get the bonding rpm or did you use another method?

regards

Matt Palmer
Stuart Browne This member has accumulated 7500 or more points
Jul 21, 2005 04:17:52 GMT  6 pts

To counter SEP's issues with gigabit cards, I've had success with the Broadcomm series that are in the HP server range.

Just don't have any here to play with at the moment.
Matt Palmer Expert in this area This member has accumulated 250 or more points
Jul 21, 2005 04:18:57 GMT  6 pts

also have you tried here, this works for me:

ftp://ftp.compaq.com/pub/products/servers/supportsoftware/linux/bonding-1.0.4d-1.src.txt

regards

Matt
Ross Kennedy
Jul 21, 2005 06:04:06 GMT    N/A: Question Author Attachement is 263498.doc 

First off. Thanks for all replies.

Dan
Renaming ifcfg-bond0 to ifcfg-zbond0 didn't help.

Matt
I didn't rebuild the kernel after installing the softpaq. Can I show my ignorance and ask how you rebuild the kernel?

Even though modules.conf alaises eth0/1 to bcm5700, messages.log is still referencing tg3 when I disconnect cables. Maybe my kernel isn't what I hoped it was?

Jul 21 12:23:29 penguin kernel: tg3: bond0: Link is down.
Jul 21 12:23:48 penguin kernel: tg3: bond0: Link is up at 100 Mbps, full duplex.
Jul 21 12:23:48 penguin kernel: tg3: bond0: Flow control is on for TX and on for RX.
Jul 21 12:23:49 penguin snmpd[2157]: Received SNMP packet(s) from 147.114.178.59
Jul 21 12:24:02 penguin kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
Jul 21 12:24:02 penguin kernel: tg3: eth1: Flow control is on for TX and on for RX.
Jul 21 12:24:50 penguin kernel: tg3: bond0: Link is down.
Jul 21 12:25:04 penguin kernel: tg3: bond0: Link is up at 100 Mbps, full duplex.
Jul 21 12:25:04 penguin kernel: tg3: bond0: Flow control is on for TX and on for RX.


The insight manager page for the NIC's doesn't make any mention fault tolerance. It only shows eth1 (no mention of eth0 or bond0). I've attached a copy/paste from the SIM page.
Stuart Browne This member has accumulated 7500 or more points
Jul 21, 2005 06:18:46 GMT  6 pts

bring down 'bond0', 'eth0', and 'eth1', 'rmmod tg3', then try to bring it up again.
Ross Kennedy
Jul 21, 2005 08:45:01 GMT    N/A: Question Author

Stuart

Removing tg3 didn't help. It just came back again after restarting. Even though I have installed the bcm5700 package, I'm worried I need to rebuild the kernel but don't know how to!

Ross
Matt Palmer Expert in this area This member has accumulated 250 or more points
Jul 21, 2005 10:47:45 GMT  6 pts

Hi,

If you follow the bonding link that I posted previously, it gives a step by step method of rebuilding the kernel on a specific platform, RH included. It details using cloneconfig and mrproper,etc. Your are only really adding the bonding into the existing kernel.

good luck

Matt
Steven E. Protter This member has accumulated 80000 or more points
Jul 21, 2005 20:42:32 GMT  6 pts

Apologies for not replying when promised.

http://www.hpuxconsulting.com/bond.tar

That is my config.

Hopefully its helpful.

SEP
Eric van Dijken Expert in this area This member has accumulated 250 or more points
Jul 25, 2005 02:03:17 GMT  10 pts

Is /etc/modules.conf still used in RHEL4, didn't they move it to /etc/modprobe.conf?

Maybe thats why you still load the t3 module, instead of the bcm5700.
Ross Kennedy
Jul 25, 2005 04:42:01 GMT    N/A: Question Author

Success! But I'm very confused. Generous points to be awarded for all.
Especially Erik who came up with the answer (although I did stumble on
the answer on Friday - honest!).
All the instructions for bonding say you should be editing /etc/modules.conf
but my breakthrough came when I stumbled upon /etc/modprobe.conf while trying
to figure out why bcm5700 didn't appear in lsmod.
I deleted /etc/modules.conf and added the "alias bond0 bonding" directive
to modprobe.conf and it all seems to work.
Is this a difference between 2.4 and 2.6 kernels?

At this time
bond0 appears as "Switch-assisted Load Balancing (round-robin)" in HPSIM.
The NICS show up as team members in HP SIM.
TCP stays up when I pull either cable and the HP NIC agents generate SNMP traps
as expected
NIC Redundancy Decreased (Rev 1): Major Event
instead of
penguin: NIC Connectivity Lost Trap (Rev 1): Major Event

Things I have learned.

Forget modules.conf and use modprobe.conf.

Re-install HP softpaqs when you patch the kernel (oops). I installed the softpaq
then patched the kernel and bcm5700.ko wasn't carried over to the kernel drivers
directory (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/net/) until I re-installed the softpaq

This didn't help me solve the problem of loading the bcm5700 module as I get
a fatal error from modprobe. What does the error mean?

[root@penguin ~]# modprobe bcm5700
FATAL: Error inserting bcm5700 (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/net/bcm5700.ko): Invalid module format
[root@penguin ~]# ls -l /lib/modules/2.6.9-11.ELsmp/kernel/drivers/net/bcm5700.ko
-rwxr--r-- 1 root root 1293907 Jul 22 12:10 /lib/modules/2.6.9-11.ELsmp/kernel/drivers/net/bcm5700.ko

I'm successfully running with the tg3 driver and native bonding driver
(the HP bonding driver lists bcm5700 as a pre-req so I gave up on it).

The syntax for insmod does not match the HP install notes.

[root@penguin ~]# insmod bcm5700
insmod: can't read 'bcm5700': No such file or directory

Does any of this make sense to people?
Eric van Dijken Expert in this area This member has accumulated 250 or more points
Jul 25, 2005 06:17:48 GMT  4 pts

Makes sense to me :)

You came as far as i did, before i stopped using bonding on RHEL4. I'll just put in on hold until the next Update comes out (or a PSP > 7.30)

Instead of using insmod, try modprobe (not so sure anymore, but worth a try)

I even got the driver for the BCM card from Broadcom (8.1.55) that was worse (Debug info on screen, for about 2 min. Than a system panic)
Steven E. Protter This member has accumulated 80000 or more points
Jul 25, 2005 11:19:03 GMT    Unassigned

The fact you got it working with that hardware at all shows considerable skill and a bit of good luck.

Do make sure its stable, and is actually providing decent throughput, before you move this project off your radar screen.

Congrats.

SEP
 
Create a new message    Receive e-mail notification if a new reply is posted   Reply to this message
 
 
Printable version
Privacy statement Using this site means you accept its terms
© 2009 Hewlett-Packard Development Company, L.P.