[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Failover



On Thu, May 01, 2003 at 04:39:07PM +0100, Peter Galbavy wrote:
> Derick Siddoway wrote:
> > Right, so you don't just ping the active host, you test the
> > application on that host.  If you're doing failover on a
> > database, for instance, you actually execute some SQL for
> > the "aliveness" test.  At least, that's how it worked on both
> > sun and veritas clusters.
> 
> Yep, and then you find that the 'failure' is actually either between the
> primary and secondary boxes, and the secondary goes live without the primary
> really being down, or the secondary being isolated for a short time
> completeley, bringing itself into service and then when it is reattached to
> the world, it kills the primary.

You, sir, are reminding me of things I'd succesfully forgotten.
But this sort of thing is usually addressed by having dedicated,
duplicated network connections between the boxes; and quorum
devices as a last-ditch method.  And, yes, this can all still fail.
And worst-case, it corrupts your data.  Been there, done that.

And my favorite part is when the CFO drops by to tell you he's
calculated that the company is losing $5million per minute of 
downtime, so get that thing up Right Now!

<sigh>

-- 
Derick Siddoway      Yay, sleep!  That's where I'm a viking!
derick@bitflood.net            - Ralph Wiggam