[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: occasional kernel freezes possibly related to aac(4) 2410SA



Hi Ingo.

I have (maybe had...) server freezing problems (came during test phase
with a couple of days interval) with my new home server which is a Dell 
400SC with an adaptec 2410SA (4.1-0[5934] set up as a mirror. This server
is new and I have only ran OpenBSD 3.6 and only with my 2410SA card. When
the server locked up I could not even see anything on the attached
monitor. I have not enabled any debugging in the kernel yet to track if
it's the adaptec card or not that causes my hangings. I was going to try
what "Antonios Anastasiadis" did (see misc thread) and disable both the
uhci and the ehci driver to see if I get rid of the freezings (he did and
pointed against a very buggy *hci driver).

However... before disbling the *hci in the kernel I tried to disable the
driver cache on my 2410 card ( a long shot) and the hangings have not
showed up since.


Can you try to disable all cache and reply with status?  But I want to
point out that it s e e m s to be ok with the cache disabled. I am not
100% sure. I only say that the hanging hasn't showed up on my system since
I disabled it. I really want to solve my issue AND have an enabled cache.
So maybe we can help eachother in this.


Thanks
Per-Olov


Ingo Schwarze said:
> Hi Marco Peereboom, hi Jim Razmus, hi misc,
>
> last Xmas, i reported occasional problems with our i386 based
> nfs file server using an Adaptec 2410SA SATA RAID controller, see
>
> Message-ID: <20041223233510.GA5131@athene.usta.de>
> Message-ID: <20041224042231.GB28407@mail.bonetruck.org>
> Message-ID: <ED62BEBC-55D2-11D9-8DF9-000A95908CA4@peereboom.us>
> Message-ID: <20041227194636.GA5438@athene.usta.de>
>
> With controller firmware 4.0-0 (factory default), i had three
> or four freezes during two weeks in december.  With controller
> firmware 4.1-0 (=Build 7244), i had half a dozen freezes
> during a few hours in a single day (Dec 26).  With controller
> firmware 4.2-0 (=Build 7348), i now have one freeze after two
> weeks: The machine has been working from Dec 27 until Jan 08;
> today, it died once again.  Its last words were, as usual:
>
>   sd0(aac0:0:0): timed out
>
> My conclusion is:
>  a) The freezes are very probably in some way or other related
>     to the controller since their frequency varies with the
>     firmware version in use, and
>  b) 4.2-0 is better than 4.0-0 is better than 4.1-0,
>     but 4.2-0 still does not seem to be perfect.
>
> I had already gone back to the GENERIC kernel after one week
> or so of stable operation, so once more i cannot tell which
> SCSI command was the last one before the hang.  I will now
> once more boot my "AAC_DEBUG=0x0C"-kernel and wait for the
> next incident.
>
> Please tell me when anybody is working on aac(4) or when
> i can help with any testing.
>
> Yours,
>   Ingo
>
> P.S. off-topic:
> Btw, i'm pondering whether i should contact Adaptec asking
> for better firmware or better hardware - the collected evidence
> that this is rather due to the controller itself or its
> firmware than to the rest of my box and the OS is not that
> bad after all, isn't it?  There's no new info on the Adaptec
> homepage yet, i just checked once more...