On Aug 7, 3:13 am, James Cunnane <james.cunnane+ag...@gmail.com>
wrote:
On Tue, 5 Aug 2008 16:07:34 -0700 (PDT), justin.pear...@gmail.com
wrote:
Oh, and I just remembered another piece of the puzzle: The VxWorks >machine is also exchanging data with another box on the network over
UDP. We have timers in the VxWorks app that make it panic if it stops >receiving UDP packets. It appears that during each of these anomalies, >the VxWorks box continues to receive UDP packets just fine. That is,
it appears as though it stops hearing from the TCP stream, but
continues to receive UDP packets as normal.
Perhaps your ARP cache has become corrupt. I had a system which after about 26 days of continuous connection would respond to ping but not
to telnet; it turned out that the ARP cache had become corrupted by a nanosecond timer overflow. The mechanism of corruption is probably
not timer-related in your case but the end result seems similar. Can
you devise ARP diagnostics that can run periodically on the sending
device, both before and after the TCP fail?
Hmm... In your case you said the system would respond to ping, but
not telnet. It's hard to classify that as a problem with the ARP
cache, _if_ you tried to ping the target from the same host that you
also tried to telnet to it from. If you can ping target A from host
B, then ARP resolution between A and B is working (or at least, the
ARP entries haven't timed out yet). Ping (ICMP over IP) and telnet
(TCP over IP) both rely on ARP, so if it worked for one, it should
have worked for the other.
However, if you tried to ping target A from host B, and that worked,
but trying to telnet to target A from host C did not work, that could
be an ARP problem. (The target still had an unexpired ARP entry for
host B, but was unable to perform ARP resolution for the previously
unknown host C.)
In Justin's case, he said once his app got into its error state, he
could see the target still sending TCP segments to his Windows host
using Wireshark (but not responding to ACKs from the Windows host).
This implies the target's ARP entry for the Windows host was still
valid (otherwise it would have started sending ARP "who has" requests instead).
-Bill
Regards
James Cunnane
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 357 |
Nodes: | 16 (2 / 14) |
Uptime: | 79:47:37 |
Calls: | 7,664 |
Files: | 12,822 |
Messages: | 5,706,372 |