[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: openbsd under heavy load corrupts fs and crash ?
That sounds like a bad cable or drive to me.
On Feb 3, 2004, at 3:15 AM, Per-Erik Persson wrote:
> I have this delicate problem that has been following me for the last
> year.
>
> My only two OpenBSD servers one 1.7Ghz Celeron with "ServerWorks CSB6
> IDE" chipset an the other one is a 500Mhz PIII "Intel 82371AB IDE"
> This problem has been the same all thru 3.2 and 3.3(The DMA of the
> CSB6 chipset got supported here)
> Both machines have two IDE disks that are equally heavily loaded with
> diskaccess(postfix, imap, apache, nfs and scp) cpu and memory is not a
> problem.
> If I enable softdeps the machines crash after a day or two, always
> with errors about ffs not being able to allocate data or some ffs
> timeout.
> With syncronus mounts the computers can run for several months without
> showing the same behavure. Usually I need to do a manual fsck after
> rebooting, a file or two has been badly corrupted. Enabling softdeps
> on only one partition will increase the chances for it to fail.
>
>
> The interesting information I got last time was:
>
> Feb 2 02:01:52 meso named[7465]: ---w2k machines trying to update the
> nameserver all the time...----
> Feb 2 02:12:53 meso /bsd: wd0(pciide0:0:0): timeout
> Feb 2 02:12:54 meso /bsd: type: ata
> Feb 2 02:12:54 meso /bsd: c_bcount: 8192
> Feb 2 02:12:54 meso /bsd: c_skip: 0
> Feb 2 02:12:54 meso /bsd: pciide0:0:0: bus-master DMA error: missing
> interrupt,
> status=0x20
>
>
> I know that some people would suggest me to purchase some SCSI stuff
> but that is not an option......
>
> These two machines are in production so debuging is not that easy. I
> have memory dumps from the crash, but how do I get the trace and ps
> info out of it and into a file without halting the machines ? This is
> not found in a FAQ that i know of!