[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: soekris net4526 pf freeze



On Do, 24 Feb 2005, Andrew Daugherity wrote:

> On Thu, 24 Feb 2005 11:57:26 +0100, Marc Champion <m.champion@chaosys.ch> wrote:
> > After a week or so they are not
> > accessible anymore by network (wi0 and sis0).
> > It's even not possible to ping an internal interface
> > on the device itself (connected by nullmodem cable).
> > 
> 
> Odd, I had a similar problem with my firewall (which is not a Soekris
> box, but an old IBM 486 box), but I thought I had tracked it down to a
> hardware problem.  

Could very well be a software problem. I had a very similar problem on a
pentium-166 which was quite busy with handling interrupts from the NICs.

After several weeks of operation, I could not ping 127.0.0.1 even with
pfctl -f /dev/null. With pfctl -d it worked. So I put some DPFPRINTF in
pf.c, at several places where pf returns PF_DROP. It turned out that
pf_check_congestion(ipintrq) was the reason for blocking. This function
is called as the first one in some pf rule test functions. So I looked
around in the CVS and found these two patches that fix a problem with
the congestion timeout:

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/if.c.diff?r1=1.93&r2=1.94
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/if.h.diff?r1=1.59&r2=1.60

Without these patches, I think it can happen that this flag is set for
two queues at the same time, and only reset for one after the timeout.

But perhaps someone of the kernel experts can confirm my ideas, I'm not
really familiar with the code.

> Anyway, what happened to me was that at seemingly random times
> (although it seemed to be brought on by heavy net activity, bittorrent
> etc.) pf would just "die" -- the box would not respond to pings on any
> interface, or allow new connections through; however existing states
> remained open, and I could still send data through those (such as
> sending an IM to someone over AIM, or doing stuff in an active SSH
> session).  

Make sense, as the state table is checked before going into the rule
checking. I had the same behaviour.

> However, "ping6 ::1" still worked after the problem popped up (I have
> no rules dealing with inet6 explicitly, but a "block log all" and
> "pass quick on lo0 all"). 

I think ipv6 uses a different queue.

> I'd give more detail (and I have tons, as I was originally planning to
> post to the mailing list), but I think I've resolved my problem, as it
> only ever occurs with that one 3c574 card, and never when that card
> isn't installed, so I'm reasonably sure it's a hardware problem.  

Would be interesting whether the problem goes away with the patches
and the 3c574. Since I run the patched kernel, I did not see the problem
again, but that doesn't mean too much as it is not always under high
load.

Ralf.