[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NFS mount hang



I'm not sure this is a factor but the NFS server seems to be responding from a different
IP address than where the request was sent.

-Costa

Quentin Barnes wrote:

> Dale Rahn suggested I write up this problem and send it to this
> address.  He tells me that no one else seems to be having any
> OpenBSD NFS problems.
>
> I have a system that experiences an intermittent hang when mounting
> an NFS file system.  I have found nothing that seems to make a
> difference whether or not it will hang when I attempt the mount.
> The hang can happen either during normal /etc/rc processing or
> after the system has fully booted.
>
> I've been having this problem since some time after upgrading from
> 2.8 base starting with the April kernel around 4/28 (the proposed
> 2.9 baseline) continuing up until the current 2.9-stable GENERIC
> build with patches around July 11.  Both the NFS server and client
> are running this kernel.
>
> When the "mount -t nfs ..." hangs, I can interrupt the command, but
> the mount_nfs child process stays.  Any attempt to do another "mount
> -t nfs", that process will not be interruptible.  Also, shutting
> down the system after any nfs mount problem results in a system hang
> at the "syncing disks..." time.  No "done" ever comes out.
>
> The mount_nfs process hangs in "netio" while the reboot process
> hangs in "inode".  Dale says the mount_nfs though is blocked on
> disk according to ps output.
>
> My configuration is not that unusual.  Both systems are x86 clone
> systems with Intel processors.  They're built from the ground up one
> with an old Gateway NLX motherboard and the other from a Tyan ATX
> motherboard.  The only things I can think of are:
>
>         *) Client and server at different ethernet speeds.
>              They are connected via an ethernet switch.  The server
>              is running at 10baseT and the client at 100baseT.  I
>              would be running the server at 100baseT since it has a
>              DEC 21143, but due to independent bugs in both the 'de'
>              and 'dc' drivers prevent this.  (Hi Aaron!)
>
>         *) Server has a slower CPU and bus than the client.
>              The server is a Pentium II/300MHz/66Mhz and the client
>              is a Pentium III/550MHz/100Mhz.
>
>         *) Client has a fast disk for the system.
>              I do have a very fast Maxtor disk (ATA100 capable
>              with 50MB/s to media) running on an ATA/66 (mode 4)
>              interface.  This is not unusual in itself, but it might
>              be for the relatively slow CPU and networking speed
>              of the system.
>
> I have "vfs.nfs.iothreads=4" in the sysctl.conf on the client.
> Before changing this parameter, it hung in that configuration as
> well.
>
> I have a crash dump from a "syncing disks..." hang.  I have included
> the client's crash dump ps and vmstat output along with its dmesg.
>
> I also have a tcpdump from the server's (bounce's/c248948-a's)
> perspective after the client (rosie) has hung.  You can see it
> continuing the attempt the retry every 15 minutes, but getting
> a "udp port xxx unreachable" error.  It never comes out of this
> problem.  The mount either works immediately or never.  (Well,
> probably "never".  I'm not that patient.)
>
> Any ideas to try or information I can extract from the crash dump
> to help isolate this problem?
>
> Quentin
> # ps lauxww -M bsd.1.core
>   UID   PID  PPID CPU PRI NI   VSZ   RSS WCHAN   STAT TT       TIME COMMAND          USER     %CPU %MEM STARTED
>     0     1 -541079792   0  10 20   340     0 wait   Is    ??    0:00.02 (init)           root      0.0  0.0 Thu06PM
>     0  7615 -541079792   0   2 20   220     0 netio  D     C0-   0:00.01 (mount_nfs)      root      0.0  0.0 Thu06PM
>     0  6807 -541079792   1 -14 20   180     0 inode  D     C0-   0:00.09 (reboot)         root      0.0  0.0 Thu06PM
> OpenBSD 2.9-stable (GENERIC) #3: Wed Jul 11 15:36:53 CDT 2001
>     qbarnes@rosie:/usr/src/sys/arch/i386/compile/GENERIC
> cpu0: disabling processor serial number
> cpu0: Intel Pentium III ("GenuineIntel" 686-class, 512KB L2 cache) 551 MHz
> cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SIMD
> real mem  = 133738496 (130604K)
> avail mem = 119083008 (116292K)
> using 1658 buffers containing 6791168 bytes (6632K) of memory
> mainbus0 (root)
> bios0 at mainbus0: AT/286+(ad) BIOS, date 06/15/00, BIOS32 rev. 0 @ 0xfb290
> apm0 at bios0: Power Management spec V1.2
> apm0: AC on, battery charge unknown
> pcibios0 at bios0: rev. 2.1 @ 0xf0000/0xb710
> pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xfdb50/160 (8 entries)
> pcibios0: PCI Interrupt Router at 000:07:0 ("VIA VT82C596A PCI-ISA" rev 0x00)
> pcibios0: PCI Exclusive IRQs: 9 10 11 12
> pcibios0: PCI bus #1 is the last bus
> bios0: ROM list: 0xc0000/0x10000
> pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
> pchb0 at pci0 dev 0 function 0 "VIA VT82C691 Host-PCI" rev 0xc4
> ppb0 at pci0 dev 1 function 0 "VIA VT82C598 PCI-AGP" rev 0x00
> pci1 at ppb0 bus 1
> vga1 at pci1 dev 0 function 0 "Nvidia Vanta" rev 0x15
> wsdisplay0 at vga1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> pcib0 at pci0 dev 7 function 0 "VIA VT82C596A PCI-ISA" rev 0x23
> pciide0 at pci0 dev 7 function 1 "VIA VT82C571 IDE" rev 0x10: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
> pciide0: Ultra/66 capable
> wd0 at pciide0 channel 0 drive 0: <Maxtor 53073H4>
> wd0: 16-sector PIO, LBA, 29311MB, 16383 cyl, 16 head, 63 sec, 60030432 sectors
> pciide0: channel 0 interrupting at irq 14
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
> atapiscsi0 at pciide0 channel 1
> scsibus0 at atapiscsi0: 2 targets
> cd0 at scsibus0 targ 0 lun 0: <Lite-On, LTN483S 48x Max, PD02> SCSI0 5/cdrom removable
> pciide0: channel 1 interrupting at irq 15
> cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
> uhci0 at pci0 dev 7 function 2 "VIA VT83C572 USB" rev 0x11: irq 9
> usb0 at uhci0: USB revision 1.0
> uhub0 at usb0
> uhub0: vendor 0x0000 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 2 ports with 2 removable, self powered
> pchb1 at pci0 dev 7 function 3 "VIA VT82C596 Power Management" rev 0x30
> unknown vendor 0x1813 product 0x4000 (class communications, subclass miscellaneous, rev 0x02) at pci0 dev 16 function 0 not configured
> dc0 at pci0 dev 17 function 0 "Lite-On PNIC" rev 0x20: irq 11 address 00:c0:f0:3d:ac:37
> ukphy0 at dc0 phy 1: Generic IEEE 802.3u media interface
> ukphy0: OUI 0x00c0b4, model 0x0000, rev. 8
> eap0 at pci0 dev 18 function 0 "Ensoniq AudioPCI97" rev 0x02: irq 12
> ac97: codec id 0x54524103 (TriTech TR28023)
> ac97: codec features No 3D Stereo
> audio0 at eap0
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pcppi0 at isa0 port 0x61
> midi0 at pcppi0: <PC speaker>
> sysbeep0 at pcppi0
> lpt0 at isa0 port 0x378/4 irq 7
> npx0 at isa0 port 0xf0/16: using exception 16
> pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
> biomask c240 netmask ca40 ttymask cac2
> pctr: 686-class user-level performance counters enabled
> mtrr: Pentium Pro MTRR support
> dkcsum: wd0 matched BIOS disk 80
> root on wd0a
> rootdev=0x0 rrootdev=0x300 rawdev=0x302
> WARNING: / was not properly unmounted
> ukbd0 at uhub0 port 1 configuration 1 interface 0
> ukbd0: ATEN CS124U V1.00, rev 1.10/1.00, addr 2, iclass 3/1
> wskbd1 at ukbd0 mux 1
> ums0 at uhub0 port 1 configuration 1 interface 1
> ums0: ATEN CS124U V1.00, rev 1.10/1.00, addr 2, iclass 3/1
> ums0: 5 buttons and Z dir.
> wsmouse0 at ums0 mux 0
> wskbd1: connecting to wsdisplay0
> # vmstat -m -M bsd.1.core
> Memory statistics by bucket size
>     Size   In Use   Free   Requests  HighWater  Couldfree
>       16      113      0        399       0       2650
>       32     1005      0        147       0       2429
>       64     1408      0        320       0       5428
>      128      226      0         62       0      15107
>      256      143      0        177       0       1098
>      512       65      0         31       0        730
>     1024      136      0         12       0       5461
>     2048        7      0         13       0        468
>     4096       26      0          8       0       1326
>     8192        3      0         11       0       1342
>    16384        0      0          0       0          1
>    32768        1      0          0       0          1
>   262144        1      0          0       0          1
>
> Memory usage type by bucket size
>     Size  Type(s)
>       16  devbuf, pcb, routetbl, namecache, shm, proc, exec, VM swap, UVM amap,
>           UVM aobj, USB, USB device, temp
>       32  devbuf, pcb, routetbl, ifaddr, pgrp, vnodes, UFS mount, subproc,
>           LFS segment, ether_multi, VM swap, UVM amap, USB, memdesc, temp
>       64  devbuf, pcb, routetbl, ifaddr, session, NFS req, namecache, UFS mount,
>           file, lockf, LFS segment, in_multi, pfkey data, UVM amap, USB,
>           USB device, USB HC, NDP, temp
>      128  mbuf, devbuf, routetbl, zombie, ifaddr, soopts, cred, vnodes, proc,
>           ttys, exec, xform_data, UVM amap, USB, USB device, pipe, temp
>      256  devbuf, socket, pcb, routetbl, ifaddr, NFS node, vnodes, VM map,
>           subproc, NFS srvsock, NFS daemon, ttys, newblk, UVM amap, pool, USB
>      512  devbuf, pcb, ifaddr, ioctlops, mount, NFS mount, UFS mount, file desc,
>           proc, pfil, UVM amap, USB device, NDP
>     1024  devbuf, namei, ioctlops, UFS mount, NQNFS Lease, ttys, exec, pagedep,
>           UVM amap, UVM aobj, NDP, temp
>     2048  devbuf, UFS mount, MSDOSFS mount, VM swap, UVM amap
>     4096  mbuf, devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS mount,
>           inodedep, UVM amap, memdesc, temp
>     8192  devbuf, UFS mount, UVM amap
>    16384  UFS mount
>    32768  pfil
>   262144  VM page bucket
>
> Memory statistics by type                           Type  Kern
>           Type InUse MemUse HighUse  Limit Requests Limit Limit Size(s)
>           mbuf    69    13K     14K  9831K    13140    0     0  128,4096
>         devbuf   309   133K    154K  9831K      538    0     0  16,32,64,128,256,512,1024,2048,4096,8192
>         socket     2     1K     10K  9831K      146    0     0  256
>            pcb     7     2K     13K  9831K      168    0     0  16,32,64,256,512
>       routetbl    76    10K     11K  9831K      118    0     0  16,32,64,128,256
>         zombie     0     0K      2K  9831K      169    0     0  128
>         ifaddr    36     6K      6K  9831K       41    0     0  32,64,128,256,512
>         soopts     0     0K      1K  9831K       79    0     0  128
>          namei     0     0K      5K  9831K     4954    0     0  1024
>       ioctlops     0     0K      1K  9831K       23    0     0  512,1024
>           cred     2     1K      1K  9831K       12    0     0  128
>           pgrp     3     1K      1K  9831K       25    0     0  32
>        session     2     1K      2K  9831K       22    0     0  64
>          mount     2     1K      3K  9831K        6    0     0  512
>        NFS req     1     1K      1K  9831K        1    0     0  64
>      NFS mount     1     1K      1K  9831K        1    0     0  512
>       NFS node     2     5K      5K  9831K        2    0     0  256,4096
>         vnodes   906    34K     36K  9831K      969    0     0  32,128,256
>      namecache  1315    91K     91K  9831K     1315    0     0  16,64,4096
>      UFS quota     1     4K      4K  9831K        1    0     0  4096
>      UFS mount     5     9K     44K  9831K       17    0     0  32,64,512,1024,2048,4096,8192,16384
>            shm     0     0K      1K  9831K        2    0     0  16
>         VM map     4     1K      1K  9831K        4    0     0  256
>           file     2     1K      6K  9831K     1489    0     0  64
>      file desc    13     7K     16K  9831K      182    0     0  512
>          lockf     0     0K      1K  9831K       14    0     0  64
>           proc     4     2K      2K  9831K        4    0     0  16,128,512
>        subproc    14     1K      3K  9831K      195    0     0  32,256
>    LFS segment     0     0K      1K  9831K      288    0     0  32,64
>    NQNFS Lease     1     1K      1K  9831K        1    0     0  1024
>    NFS srvsock     2     1K      1K  9831K        2    0     0  256
>     NFS daemon     1     1K      1K  9831K        1    0     0  256
>       in_multi     7     1K      1K  9831K        7    0     0  64
>    ether_multi     3     1K      1K  9831K        3    0     0  32
>    ISOFS mount     1     4K      4K  9831K        1    0     0  4096
>  MSDOSFS mount     1     2K      2K  9831K        1    0     0  2048
>           ttys   240   140K    140K  9831K      240    0     0  128,256,1024
>           exec     0     0K      2K  9831K      178    0     0  16,128,1024
>           pfil     6    27K     27K  9831K        6    0     0  512,32768
>     pfkey data     1     1K      1K  9831K        2    0     0  64
>     xform_data     1     1K      1K  9831K        1    0     0  128
>        pagedep     1     1K      1K  9831K        1    0     0  1024
>       inodedep     1     4K      4K  9831K        1    0     0  4096
>         newblk     1     1K      1K  9831K        1    0     0  256
> VM page bucket     1   256K    256K  9831K        1    0     0  262144
>        VM swap     8     3K      3K  9831K        8    0     0  16,32,2048
>       UVM amap    37     2K     99K  9831K    11492    0     0  16,32,64,128,256,512,1024,2048,4096,8192
>       UVM aobj     1     1K      2K  9831K        3    0     0  16,1024
>           pool     3     1K      1K  9831K        3    0     0  256
>            USB    27     3K      3K  9831K       29    0     0  16,32,64,128,256
>     USB device     9     4K      4K  9831K        9    0     0  16,64,128,512
>         USB HC     1     1K      1K  9831K        3    0     0  64
>           pipe     0     0K      1K  9831K       58    0     0  128
>        memdesc     1     4K      8K  9831K       26    0     0  32,4096
>            NDP     1     1K      2K  9831K        3    0     0  64,512,1024
>           temp     2     8K     10K  9831K       36    0     0  16,32,64,128,1024,4096
>
> Memory Totals:  In Use    Free    Requests
>                   776K    257K       36042
> Memory resource pool statistics
> Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
> phpool        40       99    0        0     1     0     1     1     0   inf    0
> pmappl        68      172    0      169     1     0     1     1     0   inf    0
> vmsppl       172      172    0      169     1     0     1     1     0   inf    0
> vmmpepl       60     5514    0     5494     6     0     6     6     0   inf    3
> uaoeltpl      80        0    0        0     0     0     0     0     0   inf    0
> aobjpl        52        2    0        2     1     0     1     1     0   inf    1
> amappl        40     1473    0     1462     2     0     2     2     0   inf    0
> wdcspl        96     3724    0     3723     1     0     1     1     0   inf    0
> procpl       312      182    0      169     3     0     3     3     0   inf    0
> vnodes       156     1310    0        0    51     0    51    51     0   inf    0
> ffsino       276     1378    0      327    94     0    94    94     0   inf    7
> swp buf      152        0    0        0     0     0     0     0     0   inf    0
> swp vnx       20        0    0        0     0     0     0     0     0   inf    0
> swp vnd      128        0    0        0     0     0     0     0     0   inf    0
> sigapl       308      172    0      169     2     0     2     2     0   inf    0
>
> In use 493K, total allocated 652K; utilization 75.6%
> 23:09:19.174354 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:09:19.175684 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:09:19.175839 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:13:35.212210 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:13:35.215873 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:13:35.216027 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:17:51.250138 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:17:51.250579 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:17:51.250730 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:22:07.288059 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:22:07.289390 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:22:07.289550 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:26:23.325677 arp who-has bounce.qlt.org tell rosie.qlt.org
> 23:26:23.325813 arp reply bounce.qlt.org is-at 0:40:f6:34:b2:4d
> 23:26:23.326000 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:26:23.327544 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:26:23.327693 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:30:39.363742 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:30:39.365540 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:30:39.365693 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:34:55.401655 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:34:55.405523 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:34:55.405694 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:39:11.439595 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:39:11.440038 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:39:11.440190 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:43:27.477598 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:43:27.481763 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:43:27.481915 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:47:43.515599 arp who-has bounce.qlt.org tell rosie.qlt.org
> 23:47:43.515717 arp reply bounce.qlt.org is-at 0:40:f6:34:b2:4d
> 23:47:43.515903 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:47:43.516293 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:47:43.516443 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable