[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PF and Foundry ServerIron load balancer - potential TIME_WAIT pro blem.



Guys,

thanks for the info on this. From what is being said, with tcp.closed at 30s
and interval at 10s I should see the TIME_WAIT state being removed after
30-40 seconds. This is not happening. I've left it all night and it's not
going anywhere. This is while the ServerIron is still performing it's
healthcheck - this entales 3 SYN packets transmitted, wait about 3 seconds
for a response then send a RST when one is not received. Wait 4 seconds and
resend the 3 SYN's.. this goes on and on and on


If I remove the binding on the ServerIron so it stops doing the healthcheck,
the TIME_WAIT is removed after the tcp.closed + interval timeout periods.
This seems to suggest the timer *is* being reset....

Is something in pf broken?

Cheers,
Adrian.

-----Original Message-----
From: Daniel Hartmeier [mailto:daniel@benzedrine.cx] 
Sent: Wednesday, 27 March 2002 9:37 PM
To: Darren Reed
Cc: Adrian Buxton; 'misc@openbsd.org'
Subject: Re: PF and Foundry ServerIron load balancer - potential TIME_WAIT
pro blem.


On Wed, Mar 27, 2002 at 08:53:44PM +1100, Darren Reed wrote:

> In short, do not be so quick to criticise products for port reuse 
> because neither does OpenBSD adhear to the RFC specs on this.

It's one thing to ignore 2MSL on crash/reboot, but quite another to ignore
it during normal operation. You can bind to a used port with REUSEADDR in
OpenBSD, but I don't think you can actually connect.

I don't mean to bash the product, just wondering why it can't use different
source ports for its queries (I'm assuming it wasn't rebooted between the
connections quoted by the original poster).

> > You can decrease the timeout for TIME_WAIT with pfctl -t 
> > tcp.closed=x, and pf will remove all TIME_WAIT states after x 
> > seconds.
> 
> It will only work that way if you also do "pfctl -t interval=1".

Ok, I'll rephrase. TIME_WAIT states are removed after tcp.closed to
tcp.closed + interval seconds. If the default interval of 10 seconds makes
the resolution too low, you should decrease it.

For instance, if the load balancer sends probes every 60 seconds, using
tcp.closed=49 and interval=10 would work. If you don't want to wait at most
60 seconds for the manually initiated probes to succeed, you have to reduce
tcp.closed further.

Using tcp.closed=59 and interval=1 has the advantage that the states are
purged precisely (+/- one second) after 59 seconds, while with the previous
settings, they are purged 'randomly' after 49 to 59 seconds. The cost is
that purges happen ten times as often, but that will still be a neglectable
part of the overall load.

Daniel