[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: PF and Foundry ServerIron load balancer - potential TIME_WAIT pro blem.
On Thu, Mar 28, 2002 at 12:39:02PM +1100, Adrian Buxton wrote:
> thanks for the info on this. From what is being said, with tcp.closed at 30s
> and interval at 10s I should see the TIME_WAIT state being removed after
> 30-40 seconds. This is not happening. I've left it all night and it's not
> going anywhere. This is while the ServerIron is still performing it's
> healthcheck - this entales 3 SYN packets transmitted, wait about 3 seconds
> for a response then send a RST when one is not received. Wait 4 seconds and
> resend the 3 SYN's.. this goes on and on and on
It's doing the healthcheck every 4 seconds even when it succeeds? Then
tcp.closed=30 is still too long. In that case, you'd actually need
tcp.closed=2 (or 3) and interval=1.
> If I remove the binding on the ServerIron so it stops doing the healthcheck,
> the TIME_WAIT is removed after the tcp.closed + interval timeout periods.
> This seems to suggest the timer *is* being reset....
When in doubt, verify. Run
while true; do pfctl -vss | grep -A 2 ":1025"; sleep 1; done
or similar, and watch the state entry. Do you see a single state entry
in TIME_WAIT:TIME_WAIT? Do you see the expiry time decrease
monotonically? Does the "pkts" counter increase beyond 2? Note that when
you see the "age" time jump to 00:00:00 (and the sequence number windows
in the  change), you're seeing a new state.
Also, enable verbose logging with pfctl -x m, and see /var/log/messages
for "pf: State failure" entries, which should occur when the sender is
reusing the addresses/ports pairs with a new sequence number range while
the old state still exists.
Maybe also tcpdump on the internal interface of the firewall to confirm
that only the SYN, SYN+ACK and RST we talked about so far go through.