Jump to content

Ethernet problems when CPU is not being used


Recommended Posts

Hello everyone,

I hope someone can help me with a very strange problem.

I have two brand new HP EliteBook 8730w laptops, with quad core processors. One is running Vista, the other has XP on it.

The problem I am having is that when streaming data packets over a TCP connection from either an embedded device or from another computer running some test software, the laptop either loses some of the data, or it arrives late after a number of retries. This is using the built in Intel gigabit ethernet card, not the wireless.

I'm seeing errors in the received column if I do a netstat -e. And I see "Packets received discarded" in Windows Performance Monitor.

The really strange thing is that if I run something other than the data receiver program, either a bit of test software written by me, or something like Prime95, so that the CPU is loaded, suddenly the data errors stop and all the packets are received correctly.

It doesn't matter which of the two laptops I use as the receiver, both have the same problem. I've tried different network drivers and made sure all other drivers are up to date. The laptops already came with the most up to date bios.

I've even tried using ExpressCard and USB network adapters instead of the built in Intel card with no luck.

Has anyone ever seen anything like this

Any help would be greatly appreciated.

Alex

Link to comment
Share on other sites

  • 2 weeks later...

Since they have common symptoms then it might be the hub/switch/router that they are plugged into. Or some other device on the network. We'd have to know more about your topology to figure out which component to troubleshoot first.

Link to comment
Share on other sites

Since they have common symptoms then it might be the hub/switch/router that they are plugged into. Or some other device on the network. We'd have to know more about your topology to figure out which component to troubleshoot first.

I've tried it both with a switch between them and by simply connecting back to back, there are no other devices on the network. Both ways of connecting display the same problems.

Edited by ajh499
Link to comment
Share on other sites

What service packs do you have installed on XP and Vista?

What OS is running on the embedded device?

You intially said that the packets arrive late, that would implicate the source or network.

What is the speed of the link? Perhaps you are trying to push a gigabit of traffic over a fast ethernet link?

Maybe you get less errors with the CPU loaded because you are receiving less packets? Prime95 will only load 1 core per instance.

Are the laptops plugged in or running on battery?

Link to comment
Share on other sites

What service packs do you have installed on XP and Vista?

What OS is running on the embedded device?

You intially said that the packets arrive late, that would implicate the source or network.

What is the speed of the link? Perhaps you are trying to push a gigabit of traffic over a fast ethernet link?

Maybe you get less errors with the CPU loaded because you are receiving less packets? Prime95 will only load 1 core per instance.

Are the laptops plugged in or running on battery?

The embedded device runs VxWorks, I think, but I might be wrong.

The Vista laptop has SP1, the XP one is SP2.

I think the packets arrive late because they have to be resent due to the first attempt containing an error for whatever reason.

All devices used in trying to find the cause of this problem are gigabit

The laptops have been tested plugged in and running on battery, it doesn't seem to make any difference.

The latest version of Prime95 will load all cores by default. However only one core (doesn't seem to matter which) needs to be loaded for the packet errors to disappear.

Link to comment
Share on other sites

The problem with the claim, loading the cpu on the machine that receives the packets affects the reception of the packets, is that error detection is usually offloaded to the NIC. Hence the CPU would not be involved with the error detection.

Check if this is true by opening the Device Manager snap-in (devmgmt.msc), expand Network Adapters, open the Properities for the Intel Gb NIC, goto the Advanced tab, and check that IP/TCP/UDP Checksum Offload is enabled.

Only other thing I can think of where the CPU might be involved is in the link speed. Maybe if the CPU is taxed the link speed would reduce from 1000BASE-T to 100BASE-TX or 10BASE-T and there is a poor link between the two nodes.

First verify that Auto Negotiation is set for both NICs (TX and RX) and the cable is CAT-6 or better. You could try forcing 10BASE-T Half Duplex and see if the quality improves.

I think it is more likely we are overlooking something.

Link to comment
Share on other sites

The problem with the claim, loading the cpu on the machine that receives the packets affects the reception of the packets, is that error detection is usually offloaded to the NIC. Hence the CPU would not be involved with the error detection.

Check if this is true by opening the Device Manager snap-in (devmgmt.msc), expand Network Adapters, open the Properities for the Intel Gb NIC, goto the Advanced tab, and check that IP/TCP/UDP Checksum Offload is enabled.

Only other thing I can think of where the CPU might be involved is in the link speed. Maybe if the CPU is taxed the link speed would reduce from 1000BASE-T to 100BASE-TX or 10BASE-T and there is a poor link between the two nodes.

First verify that Auto Negotiation is set for both NICs (TX and RX) and the cable is CAT-6 or better. You could try forcing 10BASE-T Half Duplex and see if the quality improves.

I think it is more likely we are overlooking something.

Thank you for your suggestions, I've just given them a try.

TCP/UDP Checksum Offload makes no difference whether it is enabled or disabled.

I've used a variety of cables when testing this, and they don't seem to make a difference and all work with a different computer as the receiver. Or sending from the laptop to another machine, for that matter.

I don't think that the link speed is dropping due to the CPU being loaded as the number of bytes sent and the average transfer rate are very similar regardless of the CPU loading. I guess it would actually be slightly lower due to the retries of the discarded packets, but the readout in Performance Test is not showing it in enough detail for that.

The one thing that did seem to make a dfference was the link speed and duplex setting. I tried all combination of half and full duplex at 10, 100 and 1000 MBit/s (except for 1000 Half duplex as it is not an option). The tranfer rate would be the same at the same speed setting, but the half duplex mode would not have any discarded packets, but the full duplex modes would. 100 BASE - Full Duplex discarded more packets than even Gigabit.

The trouble is we need around 200MBit/s to receive the data from the embedded device, so we have to be able to use Gigabit.

I'm still very confused :wacko:

Alex

Link to comment
Share on other sites

Well we are not going to eliminate packet errors entirely, and I am not sure how "good" or "bad" your connection really is.

What about IP checksum?

Please verify that you're using Certified CAT-6 or better patch cable(s), meaning not self terminated or dollar store stuff.

Link to comment
Share on other sites

Well we are not going to eliminate packet errors entirely, and I am not sure how "good" or "bad" your connection really is.

What about IP checksum?

Please verify that you're using Certified CAT-6 or better patch cable(s), meaning not self terminated or dollar store stuff.

IP checksum makes no difference either

Of course there is always a chance of packet errors, but I'm seeing hundreds, or even thousands of errors per second, when the CPU is doing nothing. Then no errors at all when the CPU has a bit of load on it.

I'll check out what the cables are that I've tried and make sure at least one of them is Cat 6, but I don't think it's that.

The connection should be as "Good" as it can get, two machines with gigabit cards connected back to back with a 2m-ish long cable.

I could understand a connection problem if the same cable lost packets on another machine, but it doesn't. Or, if it was something to do with the laptop's built in network adapter, but I've tried two different cards and both have the same problem. And why does loading the CPU stop the packets from apparently containing errors?

Very, very odd!!

Alex

Link to comment
Share on other sites

I think our major clue is now in this observation (emphasis mine):

the half duplex mode would not have any discarded packets, but the full duplex modes would. 100 BASE - Full Duplex discarded more packets than even Gigabit.

It indicates a duplex mismatch

http://en.wikipedia.org/wiki/Duplex_mismatch

Maybe your VxWorks device has problems with Full Duplex mode? Many embedded devices do have cheap NICs, often not gigabit rated.

Link to comment
Share on other sites

I think our major clue is now in this observation (emphasis mine):
the half duplex mode would not have any discarded packets, but the full duplex modes would. 100 BASE - Full Duplex discarded more packets than even Gigabit.

It indicates a duplex mismatch

http://en.wikipedia.org/wiki/Duplex_mismatch

Maybe your VxWorks device has problems with Full Duplex mode? Many embedded devices do have cheap NICs, often not gigabit rated.

Sorry, the half / full duplex problem appears to be a red-herring. My fault. :blushing:

I don't think I explained what I'm doing very well.

I'm not actually testing this problem most of the time using the embedded device, as it is quite complicated to test against and there is always the potential for bugs in our data receiving code.

Most of the testing I'm doing is using the Network test from Performance Test 7 by Passmark software, and two computers (the HP laptop with the problem and a Dell desktop) connected back-to-back. I'm using this software as it is easy to use to show up the same problem on the HP laptop as we see with either our own test software on two computers back to back, or with the embedded device connected to the laptop with more complex software running.

While I was trying out the half / full duplex options yesterday, I was forcing the laptop to the speed and duplex mode that I wanted, but leaving the other machine on Auto, apparently this was causing a Duplex-Mismatch. I've just tried it again, this time setting the speed and duplex of both machines and this time it works correctly, with no packet errors except at gigabit.

This is with two computers back to back with some test software, however the embedded device needs gigabit as the data rate is far too high for a 100BASE network.

I also tried out a 1m long CAT6 cable, and at gigabit it still shows packet errors when the CPU is not busy.

Any more suggestions? Anyone?

Alex

Link to comment
Share on other sites

At this point I would be running Network Monitor or Wireshark on both interfaces and try to figure out what is happening.

http://www.microsoft.com/downloads/details...;displaylang=en

http://www.wireshark.org/

Basically we need to know how the packets looked when transmitted and how they were received.

all work with a different computer as the receiver

Time to get on the phone/email HP and Intel about this and see what they say.

Link to comment
Share on other sites

At this point I would be running Network Monitor or Wireshark on both interfaces and try to figure out what is happening.

http://www.microsoft.com/downloads/details...;displaylang=en

http://www.wireshark.org/

Basically we need to know how the packets looked when transmitted and how they were received.

all work with a different computer as the receiver

Time to get on the phone/email HP and Intel about this and see what they say.

I've already tried phoning HP technical support, and they were completely useless.

I don't think Intel offer any support for notebook products, they just direct you to the manufacturer of the machine.

I've had a go with Wireshark already, and I can see that every so often the TCP sequence number stops increasing for a while, then carries on again. I guess that is the point at which the data is being resent.

I'll have another go with it tomorrow, but I think that the erroneous packets do not make it as far as Wireshark. Which doesn't really help very much.

Link to comment
Share on other sites

I've actually contacted Intel Support about their Network Adapters before and I've had good experiences with them.

http://supportmail.intel.com/Welcome.aspx?id=

Family: Network Connectivity

Line: Intel Desktop Adapters

Product: *model name from device manager*

What differences do you observe in Wireshark while throttling your CPU? I am still curious if that really affects the network...

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...