[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

NFS mount hang



Dale Rahn suggested I write up this problem and send it to this
address.  He tells me that no one else seems to be having any
OpenBSD NFS problems.

I have a system that experiences an intermittent hang when mounting
an NFS file system.  I have found nothing that seems to make a
difference whether or not it will hang when I attempt the mount.
The hang can happen either during normal /etc/rc processing or
after the system has fully booted.

I've been having this problem since some time after upgrading from
2.8 base starting with the April kernel around 4/28 (the proposed
2.9 baseline) continuing up until the current 2.9-stable GENERIC
build with patches around July 11.  Both the NFS server and client
are running this kernel.

When the "mount -t nfs ..." hangs, I can interrupt the command, but
the mount_nfs child process stays.  Any attempt to do another "mount
-t nfs", that process will not be interruptible.  Also, shutting
down the system after any nfs mount problem results in a system hang
at the "syncing disks..." time.  No "done" ever comes out.

The mount_nfs process hangs in "netio" while the reboot process
hangs in "inode".  Dale says the mount_nfs though is blocked on
disk according to ps output.

My configuration is not that unusual.  Both systems are x86 clone
systems with Intel processors.  They're built from the ground up one
with an old Gateway NLX motherboard and the other from a Tyan ATX
motherboard.  The only things I can think of are:

        *) Client and server at different ethernet speeds.
             They are connected via an ethernet switch.  The server
             is running at 10baseT and the client at 100baseT.  I
             would be running the server at 100baseT since it has a
             DEC 21143, but due to independent bugs in both the 'de'
             and 'dc' drivers prevent this.  (Hi Aaron!)

        *) Server has a slower CPU and bus than the client.
             The server is a Pentium II/300MHz/66Mhz and the client
             is a Pentium III/550MHz/100Mhz.

        *) Client has a fast disk for the system.
             I do have a very fast Maxtor disk (ATA100 capable
             with 50MB/s to media) running on an ATA/66 (mode 4)
             interface.  This is not unusual in itself, but it might
             be for the relatively slow CPU and networking speed
	     of the system.

I have "vfs.nfs.iothreads=4" in the sysctl.conf on the client.
Before changing this parameter, it hung in that configuration as
well.

I have a crash dump from a "syncing disks..." hang.  I have included
the client's crash dump ps and vmstat output along with its dmesg.

I also have a tcpdump from the server's (bounce's/c248948-a's)
perspective after the client (rosie) has hung.  You can see it
continuing the attempt the retry every 15 minutes, but getting
a "udp port xxx unreachable" error.  It never comes out of this
problem.  The mount either works immediately or never.  (Well,
probably "never".  I'm not that patient.)

Any ideas to try or information I can extract from the crash dump
to help isolate this problem?

Quentin
# ps lauxww -M bsd.1.core
  UID   PID  PPID CPU PRI NI   VSZ   RSS WCHAN   STAT TT       TIME COMMAND          USER     %CPU %MEM STARTED
    0     1 -541079792   0  10 20   340     0 wait   Is    ??    0:00.02 (init)           root      0.0  0.0 Thu06PM 
    0  7615 -541079792   0   2 20   220     0 netio  D     C0-   0:00.01 (mount_nfs)      root      0.0  0.0 Thu06PM 
    0  6807 -541079792   1 -14 20   180     0 inode  D     C0-   0:00.09 (reboot)         root      0.0  0.0 Thu06PM 
OpenBSD 2.9-stable (GENERIC) #3: Wed Jul 11 15:36:53 CDT 2001
    qbarnes_(_at_)_rosie:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: disabling processor serial number
cpu0: Intel Pentium III ("GenuineIntel" 686-class, 512KB L2 cache) 551 MHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SIMD
real mem  = 133738496 (130604K)
avail mem = 119083008 (116292K)
using 1658 buffers containing 6791168 bytes (6632K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(ad) BIOS, date 06/15/00, BIOS32 rev. 0 @ 0xfb290
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery charge unknown
pcibios0 at bios0: rev. 2.1 @ 0xf0000/0xb710
pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xfdb50/160 (8 entries)
pcibios0: PCI Interrupt Router at 000:07:0 ("VIA VT82C596A PCI-ISA" rev 0x00)
pcibios0: PCI Exclusive IRQs: 9 10 11 12
pcibios0: PCI bus #1 is the last bus
bios0: ROM list: 0xc0000/0x10000
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "VIA VT82C691 Host-PCI" rev 0xc4
ppb0 at pci0 dev 1 function 0 "VIA VT82C598 PCI-AGP" rev 0x00
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "Nvidia Vanta" rev 0x15
wsdisplay0 at vga1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pcib0 at pci0 dev 7 function 0 "VIA VT82C596A PCI-ISA" rev 0x23
pciide0 at pci0 dev 7 function 1 "VIA VT82C571 IDE" rev 0x10: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
pciide0: Ultra/66 capable
wd0 at pciide0 channel 0 drive 0: <Maxtor 53073H4>
wd0: 16-sector PIO, LBA, 29311MB, 16383 cyl, 16 head, 63 sec, 60030432 sectors
pciide0: channel 0 interrupting at irq 14
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
atapiscsi0 at pciide0 channel 1
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <Lite-On, LTN483S 48x Max, PD02> SCSI0 5/cdrom removable
pciide0: channel 1 interrupting at irq 15
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
uhci0 at pci0 dev 7 function 2 "VIA VT83C572 USB" rev 0x11: irq 9
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: vendor 0x0000 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
pchb1 at pci0 dev 7 function 3 "VIA VT82C596 Power Management" rev 0x30
unknown vendor 0x1813 product 0x4000 (class communications, subclass miscellaneous, rev 0x02) at pci0 dev 16 function 0 not configured
dc0 at pci0 dev 17 function 0 "Lite-On PNIC" rev 0x20: irq 11 address 00:c0:f0:3d:ac:37
ukphy0 at dc0 phy 1: Generic IEEE 802.3u media interface
ukphy0: OUI 0x00c0b4, model 0x0000, rev. 8
eap0 at pci0 dev 18 function 0 "Ensoniq AudioPCI97" rev 0x02: irq 12
ac97: codec id 0x54524103 (TriTech TR28023)
ac97: codec features No 3D Stereo
audio0 at eap0
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask c240 netmask ca40 ttymask cac2
pctr: 686-class user-level performance counters enabled
mtrr: Pentium Pro MTRR support
dkcsum: wd0 matched BIOS disk 80
root on wd0a
rootdev=0x0 rrootdev=0x300 rawdev=0x302
WARNING: / was not properly unmounted
ukbd0 at uhub0 port 1 configuration 1 interface 0
ukbd0: ATEN CS124U V1.00, rev 1.10/1.00, addr 2, iclass 3/1
wskbd1 at ukbd0 mux 1
ums0 at uhub0 port 1 configuration 1 interface 1
ums0: ATEN CS124U V1.00, rev 1.10/1.00, addr 2, iclass 3/1
ums0: 5 buttons and Z dir.
wsmouse0 at ums0 mux 0
wskbd1: connecting to wsdisplay0
# vmstat -m -M bsd.1.core
Memory statistics by bucket size
    Size   In Use   Free   Requests  HighWater  Couldfree
      16      113      0        399       0       2650
      32     1005      0        147       0       2429
      64     1408      0        320       0       5428
     128      226      0         62       0      15107
     256      143      0        177       0       1098
     512       65      0         31       0        730
    1024      136      0         12       0       5461
    2048        7      0         13       0        468
    4096       26      0          8       0       1326
    8192        3      0         11       0       1342
   16384        0      0          0       0          1
   32768        1      0          0       0          1
  262144        1      0          0       0          1

Memory usage type by bucket size
    Size  Type(s)
      16  devbuf, pcb, routetbl, namecache, shm, proc, exec, VM swap, UVM amap,
          UVM aobj, USB, USB device, temp
      32  devbuf, pcb, routetbl, ifaddr, pgrp, vnodes, UFS mount, subproc,
          LFS segment, ether_multi, VM swap, UVM amap, USB, memdesc, temp
      64  devbuf, pcb, routetbl, ifaddr, session, NFS req, namecache, UFS mount,
          file, lockf, LFS segment, in_multi, pfkey data, UVM amap, USB,
          USB device, USB HC, NDP, temp
     128  mbuf, devbuf, routetbl, zombie, ifaddr, soopts, cred, vnodes, proc,
          ttys, exec, xform_data, UVM amap, USB, USB device, pipe, temp
     256  devbuf, socket, pcb, routetbl, ifaddr, NFS node, vnodes, VM map,
          subproc, NFS srvsock, NFS daemon, ttys, newblk, UVM amap, pool, USB
     512  devbuf, pcb, ifaddr, ioctlops, mount, NFS mount, UFS mount, file desc,
          proc, pfil, UVM amap, USB device, NDP
    1024  devbuf, namei, ioctlops, UFS mount, NQNFS Lease, ttys, exec, pagedep,
          UVM amap, UVM aobj, NDP, temp
    2048  devbuf, UFS mount, MSDOSFS mount, VM swap, UVM amap
    4096  mbuf, devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS mount,
          inodedep, UVM amap, memdesc, temp
    8192  devbuf, UFS mount, UVM amap
   16384  UFS mount
   32768  pfil
  262144  VM page bucket

Memory statistics by type                           Type  Kern
          Type InUse MemUse HighUse  Limit Requests Limit Limit Size(s)
          mbuf    69    13K     14K  9831K    13140    0     0  128,4096
        devbuf   309   133K    154K  9831K      538    0     0  16,32,64,128,256,512,1024,2048,4096,8192
        socket     2     1K     10K  9831K      146    0     0  256
           pcb     7     2K     13K  9831K      168    0     0  16,32,64,256,512
      routetbl    76    10K     11K  9831K      118    0     0  16,32,64,128,256
        zombie     0     0K      2K  9831K      169    0     0  128
        ifaddr    36     6K      6K  9831K       41    0     0  32,64,128,256,512
        soopts     0     0K      1K  9831K       79    0     0  128
         namei     0     0K      5K  9831K     4954    0     0  1024
      ioctlops     0     0K      1K  9831K       23    0     0  512,1024
          cred     2     1K      1K  9831K       12    0     0  128
          pgrp     3     1K      1K  9831K       25    0     0  32
       session     2     1K      2K  9831K       22    0     0  64
         mount     2     1K      3K  9831K        6    0     0  512
       NFS req     1     1K      1K  9831K        1    0     0  64
     NFS mount     1     1K      1K  9831K        1    0     0  512
      NFS node     2     5K      5K  9831K        2    0     0  256,4096
        vnodes   906    34K     36K  9831K      969    0     0  32,128,256
     namecache  1315    91K     91K  9831K     1315    0     0  16,64,4096
     UFS quota     1     4K      4K  9831K        1    0     0  4096
     UFS mount     5     9K     44K  9831K       17    0     0  32,64,512,1024,2048,4096,8192,16384
           shm     0     0K      1K  9831K        2    0     0  16
        VM map     4     1K      1K  9831K        4    0     0  256
          file     2     1K      6K  9831K     1489    0     0  64
     file desc    13     7K     16K  9831K      182    0     0  512
         lockf     0     0K      1K  9831K       14    0     0  64
          proc     4     2K      2K  9831K        4    0     0  16,128,512
       subproc    14     1K      3K  9831K      195    0     0  32,256
   LFS segment     0     0K      1K  9831K      288    0     0  32,64
   NQNFS Lease     1     1K      1K  9831K        1    0     0  1024
   NFS srvsock     2     1K      1K  9831K        2    0     0  256
    NFS daemon     1     1K      1K  9831K        1    0     0  256
      in_multi     7     1K      1K  9831K        7    0     0  64
   ether_multi     3     1K      1K  9831K        3    0     0  32
   ISOFS mount     1     4K      4K  9831K        1    0     0  4096
 MSDOSFS mount     1     2K      2K  9831K        1    0     0  2048
          ttys   240   140K    140K  9831K      240    0     0  128,256,1024
          exec     0     0K      2K  9831K      178    0     0  16,128,1024
          pfil     6    27K     27K  9831K        6    0     0  512,32768
    pfkey data     1     1K      1K  9831K        2    0     0  64
    xform_data     1     1K      1K  9831K        1    0     0  128
       pagedep     1     1K      1K  9831K        1    0     0  1024
      inodedep     1     4K      4K  9831K        1    0     0  4096
        newblk     1     1K      1K  9831K        1    0     0  256
VM page bucket     1   256K    256K  9831K        1    0     0  262144
       VM swap     8     3K      3K  9831K        8    0     0  16,32,2048
      UVM amap    37     2K     99K  9831K    11492    0     0  16,32,64,128,256,512,1024,2048,4096,8192
      UVM aobj     1     1K      2K  9831K        3    0     0  16,1024
          pool     3     1K      1K  9831K        3    0     0  256
           USB    27     3K      3K  9831K       29    0     0  16,32,64,128,256
    USB device     9     4K      4K  9831K        9    0     0  16,64,128,512
        USB HC     1     1K      1K  9831K        3    0     0  64
          pipe     0     0K      1K  9831K       58    0     0  128
       memdesc     1     4K      8K  9831K       26    0     0  32,4096
           NDP     1     1K      2K  9831K        3    0     0  64,512,1024
          temp     2     8K     10K  9831K       36    0     0  16,32,64,128,1024,4096

Memory Totals:  In Use    Free    Requests
                  776K    257K       36042
Memory resource pool statistics
Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
phpool        40       99    0        0     1     0     1     1     0   inf    0
pmappl        68      172    0      169     1     0     1     1     0   inf    0
vmsppl       172      172    0      169     1     0     1     1     0   inf    0
vmmpepl       60     5514    0     5494     6     0     6     6     0   inf    3
uaoeltpl      80        0    0        0     0     0     0     0     0   inf    0
aobjpl        52        2    0        2     1     0     1     1     0   inf    1
amappl        40     1473    0     1462     2     0     2     2     0   inf    0
wdcspl        96     3724    0     3723     1     0     1     1     0   inf    0
procpl       312      182    0      169     3     0     3     3     0   inf    0
vnodes       156     1310    0        0    51     0    51    51     0   inf    0
ffsino       276     1378    0      327    94     0    94    94     0   inf    7
swp buf      152        0    0        0     0     0     0     0     0   inf    0
swp vnx       20        0    0        0     0     0     0     0     0   inf    0
swp vnd      128        0    0        0     0     0     0     0     0   inf    0
sigapl       308      172    0      169     2     0     2     2     0   inf    0

In use 493K, total allocated 652K; utilization 75.6%
23:09:19.174354 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:09:19.175684 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:09:19.175839 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:13:35.212210 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:13:35.215873 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:13:35.216027 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:17:51.250138 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:17:51.250579 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:17:51.250730 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:22:07.288059 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:22:07.289390 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:22:07.289550 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:26:23.325677 arp who-has bounce.qlt.org tell rosie.qlt.org
23:26:23.325813 arp reply bounce.qlt.org is-at 0:40:f6:34:b2:4d
23:26:23.326000 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:26:23.327544 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:26:23.327693 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:30:39.363742 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:30:39.365540 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:30:39.365693 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:34:55.401655 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:34:55.405523 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:34:55.405694 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:39:11.439595 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:39:11.440038 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:39:11.440190 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:43:27.477598 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:43:27.481763 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:43:27.481915 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:47:43.515599 arp who-has bounce.qlt.org tell rosie.qlt.org
23:47:43.515717 arp reply bounce.qlt.org is-at 0:40:f6:34:b2:4d
23:47:43.515903 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:47:43.516293 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:47:43.516443 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable



Visit your host, monkey.org