[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
openbsd under heavy load corrupts fs and crash ?
- To: misc@openbsd.org
- Subject: openbsd under heavy load corrupts fs and crash ?
- From: Per-Erik Persson <pere@fos.su.se>
- Date: Tue, 03 Feb 2004 10:15:32 +0100
- User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030313
I have this delicate problem that has been following me for the last year.
My only two OpenBSD servers one 1.7Ghz Celeron with "ServerWorks CSB6
IDE" chipset an the other one is a 500Mhz PIII "Intel 82371AB IDE"
This problem has been the same all thru 3.2 and 3.3(The DMA of the CSB6
chipset got supported here)
Both machines have two IDE disks that are equally heavily loaded with
diskaccess(postfix, imap, apache, nfs and scp) cpu and memory is not a
problem.
If I enable softdeps the machines crash after a day or two, always with
errors about ffs not being able to allocate data or some ffs timeout.
With syncronus mounts the computers can run for several months without
showing the same behavure. Usually I need to do a manual fsck after
rebooting, a file or two has been badly corrupted. Enabling softdeps on
only one partition will increase the chances for it to fail.
The interesting information I got last time was:
Feb 2 02:01:52 meso named[7465]: ---w2k machines trying to update the
nameserver all the time...----
Feb 2 02:12:53 meso /bsd: wd0(pciide0:0:0): timeout
Feb 2 02:12:54 meso /bsd: type: ata
Feb 2 02:12:54 meso /bsd: c_bcount: 8192
Feb 2 02:12:54 meso /bsd: c_skip: 0
Feb 2 02:12:54 meso /bsd: pciide0:0:0: bus-master DMA error: missing
interrupt,
status=0x20
I know that some people would suggest me to purchase some SCSI stuff but
that is not an option......
These two machines are in production so debuging is not that easy. I
have memory dumps from the crash, but how do I get the trace and ps info
out of it and into a file without halting the machines ? This is not
found in a FAQ that i know of!