[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: crashes on heavy disk usage



Hi,

I'd actually been having the same problems with a 2.6 kernel - exactly the
same error in the kcore from /var/crash etc.  What triggered it in my case
was a script that did the following:

1) Copied apache logfiles while running (only 3 files, but from a
   moderately busy site)
2) apachectl restart
3) process logfiles (ie extract stats)

Now, the problem always cropped up at part 2; that is, the machine was
panic'ing at 2) and the logs never got processed.  Originally I thought it
was a disk based thing but as a test I delayed processing by about an hour
a few times and discovered that the machine was crashing regardless.

The problem was sporadic.  It cropped up maybe once every two weeks...

Now, the machine this was happening on serves up large numbers of large
files, often by http.  By "large" I mean between 30 megs (Mac game
demos) and 700 megs (ISO images).  Obviously doing an apachectl restart
would close off those connections that were in progress in the course of
restarting.

After a particularly nasty crash one day where I had to go into the
co-location facility to manually fsck the logging partition (nothing
major, this wasn't a problem really), it struck me that the problem might
lie in how Apache was dealing with the files when it was abruptly (but
according to the manual, perfectly legal way to restart it).  The same
_blksize error was showing up in the stack trace in the kcore.

Mostly to placate the issue, I tried a different tack.  Instead of
apachectl restart I tried apachectl stop; sleep 30; apachectl start.  It
was working, and has been since.  The thing is, I've never figured out
what the issue is, but the heavy disk usage theory fits in with mine - the
crash was only occurring when the apachectl restart ran when the machine
was under heavy load (like 20 people all downloading the one 700Mb ISO).

I'm happy to publish a dmesg etc if anyone's interested but given that 2.8
is about to roll out the door and 2.6 is all but a distant memory, I'm
resigned to avoiding the issue instead of actually tracing it and fixing
it, if only because I can't afford the A$150 every time something serious
happens to the machine that causes it to pause during boot-up waiting for
me to enter the root password for a manual fsck.

Oh well,

Grant

-------------------------------------------------------
Grant Bayley                         gbayley@ausmac.net
-IT Manager @ Foster Nunn Loveder      (www.fnl.com.au)
-Admin @ AusMac Archive, Wiretapped.net, 2600 Australia
 www.ausmac.net   www.wiretapped.net   www.2600.org.au
-------------------------------------------------------

On Fri, 17 Nov 2000 henning.brauer@bsmail.de wrote:

> Date: Fri, 17 Nov 2000 23:17:53 +0100
> From: henning.brauer@bsmail.de
> To: misc@openbsd.org
> Subject: crashes on heavy disk usage
> 
> Hi all,
> 
> I'm having problems with obsd. The machines crashes on heavy disk usage
> (what i unhappily prooved while trying the coredump to an websever...).
> This is reproducable.
> Running 2.7, custom kernel, no softupdates. cvs'ed up yesterday or so any
> built a new kernel. DMA transfers switched off after the first problems,
> doesn't solve the problem.
> Machine: AMD K6-III 400, Asus P5A board (Ali Aladdin V chipset), IBM 10gig
> drive (IDE). I'm putting as much information as possible at
> http://misc.bsws.de/crash/ , including dmesg and coredump. If any
> information is missing, please tell my so i can add.
> On failure, the ddb shows me s/t like panic: _blkfree ... /var or / .
> The machine is mainly used as mailserver (qmail-ldap) and dns (djbdns),
> LDAP (OpenLDAP)  and some not so heavy used services like ftp (proftpd),
> http (apache) and so on. I would call it "heavy loaded".