This site is now 100% read-only, and retired.

XML logo

svn hangs caused by crappy router/NAT topology
Posted by drgraefy on Thu 8 Nov 2007 at 16:01
Tags: , ,

For the better part of the last year, I have been struggling with a frustrating svn problem. My group's svn is served through apache2+ssl. Seemingly totally inconsitently, svn checkouts/updates would hang indefinitely, a "C-c" the only way to escape. Certain files would seem to be especially problematic, but again not consistently. Most times, but not always, hangs would be accompanied by the following message in /var/log/apache/error.log:

[Thu Feb 22 12:46:37 2007] [error] [client 1.2.3.4] Provider encountered an error while streaming a REPORT response.  [500, #0]
[Thu Feb 22 12:46:37 2007] [error] [client 1.2.3.4] A failure occurred while driving the update report editor  [500, #104]
[Thu Feb 22 12:46:37 2007] [error] [client 1.2.3.4] Error writing base64 data: Connection reset by peer  [500, #104]

I have been trying for months to figure out what the problem is, but to no avail. Numerous google searches turned up people with similar issues, but never with any indication of what the problem might actually be, or how to get around it. Finally, after much struggle, we had a breakthrough yesterday.

Our network is on a private NAT'd lan. We finally noticed that the hangs were only occuring on machines located on our internal lan, and not for machines on the wan. This was a curious and important revelation. Internally the fqdn of our web site (foo.bar for the sake of argument), which resides on a server in our private lan, maps to our external IP address (1.2.3.4), just as it does externally, which in turn corresponds to the wan port of our crappy D-Link router/gateway. External ports 80 and 443 at 1.2.3.4 are then mapped to the web server (10.0.0.5).

What does this mean? Well, it means that internally, requests for foo.bar are first routed to the wan port of the router, which then sends them back to the web server. Apparently this was causing our crappy little router to choke, and drop connections. To confirm this, we changed the internal DNS to point foo.bar to the web server 10.0.0.5 directly. Once this was done, no more svn checkout/update hangs.

It's funny this never seemed to manifest itself elsewhere, but we don't do much heavy data transfer internally from the web server over ssl, except via svn. Basically the router couldn't turn around ssl packets fast enough. Ultimately, the problem is that we are running our network on a private, NAT'd lan. We shouldn't have to do this, and it's always a pain in the ass for one reason or another. As a wise man once quoted to me: 'NAT is not the answer. "NAT?" is the question, and the answer is "NOT!"'.

 

Comments on this Entry

Re: svn hangs caused by crappy router/NAT topology
Posted by Wayne (82.144.xx.xx) on Thu 8 Nov 2007 at 23:22
[ View Weblogs ]
I'm surprised this worked at all, normally you cannot test or connect to services in the LAN by going out your wan port and then back in unless you are going out via a different WAN IP to the one which is port forwarded to the server.


Strange


[ Parent ]

Re: svn hangs caused by crappy router/NAT topology
Posted by dkg (216.254.xx.xx) on Fri 9 Nov 2007 at 15:22
[ View Weblogs ]
Really? with a simple NAT configuration, this is probably true, but most consumer-grade routers (and most iptables management scripts) allow for this by re-mapping the client's LAN IP (the source IP address) to the router's internal IP address.

So it would be a quartet of iptables rules to achieve this effect of forwarding external HTTP connections (on port 80, that is) to $HTTP_TARGET, like so:

iptables -A INPUT -m state --state NEW -p tcp -d "$HTTP_TARGET" --dport 80 -j ACCEPT
iptables -A FORWARD -m state --state NEW -p tcp -d "$HTTP_TARGET" --dport 80 -j ACCEPT
iptables -t nat -A PREROUTING -d "$ROUTER_WAN_IP" -p tcp --dport 80 -j DNAT --to "$HTTP_TARGET"
iptables -t nat -A POSTROUTING -d "$HTTP_TARGET" -s "$LAN_NETWORK" -p tcp --dport 80 -j SNAT --to "$ROUTER_LAN_IP"
the first two lines say it's ok to pass traffic to the internal device. The third line says "any connection coming in from the outside to my exterior port 80 should be mangled so that the destination IP points to the internal device". And the fourth line says "any connection coming from the internal network that wants to go to the internal web server should get mangled so that the source IP address is the router's own internal IP address".

Why would any LAN device pass packets to the router to get to an internal device? Shouldn't they just pass it along the LAN? They would, normally. But they'd pass traffic intended for the external IP address to the router, which would rewrite the destination in its PREROUTING step. Then, before the packet leaves the router, the last rule would trigger.

This last rule is necessary because otherwise the $HTTP_TARGET would respond directly to the local machine, which would ignore the response because it is looking for a response from $ROUTER_WAN_IP. So the packet has to make an additional hop back to the router for de-mangling (or re-mangling, depending on your perspective).

This scenario is wasteful of LAN bandwidth, of course: Each internal packet now travels the LAN 4 times instead of once:

  1. client to router
  2. router to targeted server
  3. server to router
  4. router to client
But this overhead is often considered worthwhile for ease of management, because the internal machines can think they're talking to the same IP address as everyone else. But when you run into a router that can't keep up with all this mangling at LAN speeds, like the device drgraefy describes above, then things fall apart.

[ Parent ]

Re: svn hangs caused by crappy router/NAT topology
Posted by drgraefy (128.59.xx.xx) on Fri 9 Nov 2007 at 15:49
[ View Weblogs ]
wow, dkg. thank you. your response is better than anything i could have cooked up. thanks also for the iptables rules suggestions. there is a bunch of alchemy involved in constructing those, and it's easy to forget rules that would be relevant for certain odd situations, as this one clearly is.

[ Parent ]

Re: svn hangs caused by crappy router/NAT topology
Posted by Wayne (89.105.xx.xx) on Fri 9 Nov 2007 at 16:14
[ View Weblogs ]
I've not come across a consumer grade router that would do this by default, maybe they have moved on and I have never tried it. I normally use shorewall for my firewalls and there advice is

"You cannot test your firewall from the inside. Just because you send requests to your firewall external IP address does not mean that the request will be associated with the external interface or the net zone. Any traffic that you generate from the local network will be associated with your local interface and will be treated as loc->fw traffic."

I'm not surprised you can redirect the traffic with iptables but I have just never looked into it. Still this is a great tip and one I will remember, thanks.

[ Parent ]

Re: svn hangs caused by crappy router/NAT topology
Posted by dkg (216.254.xx.xx) on Fri 9 Nov 2007 at 17:08
[ View Weblogs ]
Ah, i see what you're saying now, Wayne: it's true that you can't guarantee that any external firewall rules are working just by testing within the LAN. You'd need to use traffic from an actual external IP address to verify any port forwarding or access restrictions to be sure that they're in place properly.

But that doesn't mean that you couldn't do an effective NAT'ed port forwarding that copes with connections from inside as well as outside the LAN, only that you can't test your external rules properly from the inside.

I'm glad the tip above is useful!

[ Parent ]