[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: crashes on heavy disk usage




Hi Grant,

thanks for your input.

>I'd actually been having the same problems with a 2.6 kernel - exactly the
> same error in the kcore from /var/crash etc.  What triggered it in my
case
> was a script that did the following:
> 1) Copied apache logfiles while running (only 3 files, but from a
>    moderately busy site)
> 2) apachectl restart
> 3) process logfiles (ie extract stats)
> Now, the problem always cropped up at part 2; that is, the machine was
> panic'ing at 2) and the logs never got processed.  Originally I thought
it
> was a disk based thing but as a test I delayed processing by about an
hour
> a few times and discovered that the machine was crashing regardless.
> The problem was sporadic.  It cropped up maybe once every two weeks...
> Now, the machine this was happening on serves up large numbers of large
> files, often by http.  By "large" I mean between 30 megs (Mac game
> demos) and 700 megs (ISO images).  Obviously doing an apachectl restart
> would close off those connections that were in progress in the course of
> restarting.

i don't use apachectl restart, i'm using kill -USR1 [pid of apache] what is
equal to apachectl graceful and causes apache to close and reopen all
logfiles and reread its configuration file. This happens at 0:00h, but the
crashes not...
btw, another machine being the webserver does a graceful much more often
(everytime a new virtual host comes into the game), the problems are gone
since i dramatically improved maxprocs and maxfiles (set maxusers=128 after
validated that helps).

[...]

> Mostly to placate the issue, I tried a different tack.  Instead of
> apachectl restart I tried apachectl stop; sleep 30; apachectl start.

30sec not serving any pages is not acceptable for our webservers (the one
we talk about isn't one of them) as they are _very_ busy (serving 5-10 gigs
a day). The machine we are talking about is our main mail server.

[...]

> I'm happy to publish a dmesg etc if anyone's interested but given that
2.8
> is about to roll out the door and 2.6 is all but a distant memory, I'm
> resigned to avoiding the issue instead of actually tracing it and fixing
> it, if only because I can't afford the A$150 every time something serious
> happens to the machine that causes it to pause during boot-up waiting for
> me to enter the root password for a manual fsck.

No case for me, sitting 3 meters away from rhe servers ;-))

> Oh well,

> Grant

Greetings from Germany

Henning

------------------------------------------------------------
Henning Brauer      | Hostmaster BSWS
BS Web Services     | www.bsws.de
Roedingsmarkt 14    | hostmaster@bsws.de
20459 Hamburg
Germany