[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

null fs panic while running umount -f



hello,

I've got a kernel panic when performing 
a) mount_null /src /mount_point
b) cat /mount_point/* > /dev/null
c) umount -f /mount_point 

manually rewritten from ddb

panic: lockmgr PID 12918 not exclusive lock holder 26864 unlocking

ddb trace:

Debugger()
panic()
lockmgr()
ufs_unlock()
null_bypass()
null_unlock()
VOP_UNLOCK()
null_inactive()
VOP_INACTIVE()
vclean()
vgonel()
vflush_vnode()
vfs_mount_foreach_vnode()
vflush()
null_fs_unmount()
dounmount()
sys_unmount()
syscall()
--- syscall number 22

ps:
 CMD	WAIT	PID
unmount	 - 	12918
cat	 biowait 26864


I suspect that the race condition is due to the null_lock() routine
in /sys/miscfs/nullfs/null_vnops.c where the code is as follows:

       if ((ap->a_flags & LK_TYPE_MASK) == LK_DRAIN)
                return (0);
        ap->a_flags &= ~LK_INTERLOCK;

        return (null_bypass((struct vop_generic_args *)ap));

the null_lock() is called in the panic case as a by-product of
VOP_INACTIVE() call in function vclean() in /sys/kern/vfs_subr.c

The code in vclean() is

        if (active) {
                if (flags & DOCLOSE)
                        VOP_CLOSE(vp, FNONBLOCK, NOCRED, p);
                VOP_INACTIVE(vp, p);
        } else {


VOP_INACTIVE requires the vnode lock to be held by the caller,
but in vclean() the code that does this is:

        if (vp->v_flag & VXLOCK)
                panic("vclean: deadlock");
        vp->v_flag |= VXLOCK;
        /*
         * Even if the count is zero, the VOP_INACTIVE routine may still
         * have the object locked while it cleans it out. The VOP_LOCK
         * ensures that the VOP_INACTIVE routine is done with its work.
         * For active vnodes, it ensures that no other activity can
         * occur while the underlying object is being cleaned out.
         */
        VOP_LOCK(vp, LK_DRAIN | LK_INTERLOCK, p);

which is accidently a no-op for null-fs:

int
null_lock(v)
        void *v;
{
        struct vop_lock_args *ap = v;

#if 0
        vop_generic_lock(ap);
#endif
        if ((ap->a_flags & LK_TYPE_MASK) == LK_DRAIN)
                return (0);
        ap->a_flags &= ~LK_INTERLOCK;

        return (null_bypass((struct vop_generic_args *)ap));
}


Therefore the code in vclean() (namely uvm_vnp_terminate()) is running
without the vnode lock held, the process is blocked during this call,
the exclusive lock is acquired by another proces (here cat)
and the race conditions arises, when the process running vclean() is woken-up
and tries to unlock the vnode.

Could somebody explain to me, why the code in null_lock is a NOOP
for lock request of type LK_DRAIN ?

Milos
--