[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: crashes on heavy disk usage
Hi Grant,
thanks for your input.
>I'd actually been having the same problems with a 2.6 kernel - exactly the
> same error in the kcore from /var/crash etc. What triggered it in my
case
> was a script that did the following:
> 1) Copied apache logfiles while running (only 3 files, but from a
> moderately busy site)
> 2) apachectl restart
> 3) process logfiles (ie extract stats)
> Now, the problem always cropped up at part 2; that is, the machine was
> panic'ing at 2) and the logs never got processed. Originally I thought
it
> was a disk based thing but as a test I delayed processing by about an
hour
> a few times and discovered that the machine was crashing regardless.
> The problem was sporadic. It cropped up maybe once every two weeks...
> Now, the machine this was happening on serves up large numbers of large
> files, often by http. By "large" I mean between 30 megs (Mac game
> demos) and 700 megs (ISO images). Obviously doing an apachectl restart
> would close off those connections that were in progress in the course of
> restarting.
i don't use apachectl restart, i'm using kill -USR1 [pid of apache] what is
equal to apachectl graceful and causes apache to close and reopen all
logfiles and reread its configuration file. This happens at 0:00h, but the
crashes not...
btw, another machine being the webserver does a graceful much more often
(everytime a new virtual host comes into the game), the problems are gone
since i dramatically improved maxprocs and maxfiles (set maxusers=128 after
validated that helps).
[...]
> Mostly to placate the issue, I tried a different tack. Instead of
> apachectl restart I tried apachectl stop; sleep 30; apachectl start.
30sec not serving any pages is not acceptable for our webservers (the one
we talk about isn't one of them) as they are _very_ busy (serving 5-10 gigs
a day). The machine we are talking about is our main mail server.
[...]
> I'm happy to publish a dmesg etc if anyone's interested but given that
2.8
> is about to roll out the door and 2.6 is all but a distant memory, I'm
> resigned to avoiding the issue instead of actually tracing it and fixing
> it, if only because I can't afford the A$150 every time something serious
> happens to the machine that causes it to pause during boot-up waiting for
> me to enter the root password for a manual fsck.
No case for me, sitting 3 meters away from rhe servers ;-))
> Oh well,
> Grant
Greetings from Germany
Henning
------------------------------------------------------------
Henning Brauer | Hostmaster BSWS
BS Web Services | www.bsws.de
Roedingsmarkt 14 | hostmaster@bsws.de
20459 Hamburg
Germany