[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NFS mount hang
I'm not sure this is a factor but the NFS server seems to be responding from a different
IP address than where the request was sent.
-Costa
Quentin Barnes wrote:
> Dale Rahn suggested I write up this problem and send it to this
> address. He tells me that no one else seems to be having any
> OpenBSD NFS problems.
>
> I have a system that experiences an intermittent hang when mounting
> an NFS file system. I have found nothing that seems to make a
> difference whether or not it will hang when I attempt the mount.
> The hang can happen either during normal /etc/rc processing or
> after the system has fully booted.
>
> I've been having this problem since some time after upgrading from
> 2.8 base starting with the April kernel around 4/28 (the proposed
> 2.9 baseline) continuing up until the current 2.9-stable GENERIC
> build with patches around July 11. Both the NFS server and client
> are running this kernel.
>
> When the "mount -t nfs ..." hangs, I can interrupt the command, but
> the mount_nfs child process stays. Any attempt to do another "mount
> -t nfs", that process will not be interruptible. Also, shutting
> down the system after any nfs mount problem results in a system hang
> at the "syncing disks..." time. No "done" ever comes out.
>
> The mount_nfs process hangs in "netio" while the reboot process
> hangs in "inode". Dale says the mount_nfs though is blocked on
> disk according to ps output.
>
> My configuration is not that unusual. Both systems are x86 clone
> systems with Intel processors. They're built from the ground up one
> with an old Gateway NLX motherboard and the other from a Tyan ATX
> motherboard. The only things I can think of are:
>
> *) Client and server at different ethernet speeds.
> They are connected via an ethernet switch. The server
> is running at 10baseT and the client at 100baseT. I
> would be running the server at 100baseT since it has a
> DEC 21143, but due to independent bugs in both the 'de'
> and 'dc' drivers prevent this. (Hi Aaron!)
>
> *) Server has a slower CPU and bus than the client.
> The server is a Pentium II/300MHz/66Mhz and the client
> is a Pentium III/550MHz/100Mhz.
>
> *) Client has a fast disk for the system.
> I do have a very fast Maxtor disk (ATA100 capable
> with 50MB/s to media) running on an ATA/66 (mode 4)
> interface. This is not unusual in itself, but it might
> be for the relatively slow CPU and networking speed
> of the system.
>
> I have "vfs.nfs.iothreads=4" in the sysctl.conf on the client.
> Before changing this parameter, it hung in that configuration as
> well.
>
> I have a crash dump from a "syncing disks..." hang. I have included
> the client's crash dump ps and vmstat output along with its dmesg.
>
> I also have a tcpdump from the server's (bounce's/c248948-a's)
> perspective after the client (rosie) has hung. You can see it
> continuing the attempt the retry every 15 minutes, but getting
> a "udp port xxx unreachable" error. It never comes out of this
> problem. The mount either works immediately or never. (Well,
> probably "never". I'm not that patient.)
>
> Any ideas to try or information I can extract from the crash dump
> to help isolate this problem?
>
> Quentin
> # ps lauxww -M bsd.1.core
> UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND USER %CPU %MEM STARTED
> 0 1 -541079792 0 10 20 340 0 wait Is ?? 0:00.02 (init) root 0.0 0.0 Thu06PM
> 0 7615 -541079792 0 2 20 220 0 netio D C0- 0:00.01 (mount_nfs) root 0.0 0.0 Thu06PM
> 0 6807 -541079792 1 -14 20 180 0 inode D C0- 0:00.09 (reboot) root 0.0 0.0 Thu06PM
> OpenBSD 2.9-stable (GENERIC) #3: Wed Jul 11 15:36:53 CDT 2001
> qbarnes@rosie:/usr/src/sys/arch/i386/compile/GENERIC
> cpu0: disabling processor serial number
> cpu0: Intel Pentium III ("GenuineIntel" 686-class, 512KB L2 cache) 551 MHz
> cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SIMD
> real mem = 133738496 (130604K)
> avail mem = 119083008 (116292K)
> using 1658 buffers containing 6791168 bytes (6632K) of memory
> mainbus0 (root)
> bios0 at mainbus0: AT/286+(ad) BIOS, date 06/15/00, BIOS32 rev. 0 @ 0xfb290
> apm0 at bios0: Power Management spec V1.2
> apm0: AC on, battery charge unknown
> pcibios0 at bios0: rev. 2.1 @ 0xf0000/0xb710
> pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xfdb50/160 (8 entries)
> pcibios0: PCI Interrupt Router at 000:07:0 ("VIA VT82C596A PCI-ISA" rev 0x00)
> pcibios0: PCI Exclusive IRQs: 9 10 11 12
> pcibios0: PCI bus #1 is the last bus
> bios0: ROM list: 0xc0000/0x10000
> pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
> pchb0 at pci0 dev 0 function 0 "VIA VT82C691 Host-PCI" rev 0xc4
> ppb0 at pci0 dev 1 function 0 "VIA VT82C598 PCI-AGP" rev 0x00
> pci1 at ppb0 bus 1
> vga1 at pci1 dev 0 function 0 "Nvidia Vanta" rev 0x15
> wsdisplay0 at vga1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> pcib0 at pci0 dev 7 function 0 "VIA VT82C596A PCI-ISA" rev 0x23
> pciide0 at pci0 dev 7 function 1 "VIA VT82C571 IDE" rev 0x10: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
> pciide0: Ultra/66 capable
> wd0 at pciide0 channel 0 drive 0: <Maxtor 53073H4>
> wd0: 16-sector PIO, LBA, 29311MB, 16383 cyl, 16 head, 63 sec, 60030432 sectors
> pciide0: channel 0 interrupting at irq 14
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
> atapiscsi0 at pciide0 channel 1
> scsibus0 at atapiscsi0: 2 targets
> cd0 at scsibus0 targ 0 lun 0: <Lite-On, LTN483S 48x Max, PD02> SCSI0 5/cdrom removable
> pciide0: channel 1 interrupting at irq 15
> cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
> uhci0 at pci0 dev 7 function 2 "VIA VT83C572 USB" rev 0x11: irq 9
> usb0 at uhci0: USB revision 1.0
> uhub0 at usb0
> uhub0: vendor 0x0000 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 2 ports with 2 removable, self powered
> pchb1 at pci0 dev 7 function 3 "VIA VT82C596 Power Management" rev 0x30
> unknown vendor 0x1813 product 0x4000 (class communications, subclass miscellaneous, rev 0x02) at pci0 dev 16 function 0 not configured
> dc0 at pci0 dev 17 function 0 "Lite-On PNIC" rev 0x20: irq 11 address 00:c0:f0:3d:ac:37
> ukphy0 at dc0 phy 1: Generic IEEE 802.3u media interface
> ukphy0: OUI 0x00c0b4, model 0x0000, rev. 8
> eap0 at pci0 dev 18 function 0 "Ensoniq AudioPCI97" rev 0x02: irq 12
> ac97: codec id 0x54524103 (TriTech TR28023)
> ac97: codec features No 3D Stereo
> audio0 at eap0
> isa0 at pcib0
> isadma0 at isa0
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pcppi0 at isa0 port 0x61
> midi0 at pcppi0: <PC speaker>
> sysbeep0 at pcppi0
> lpt0 at isa0 port 0x378/4 irq 7
> npx0 at isa0 port 0xf0/16: using exception 16
> pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
> biomask c240 netmask ca40 ttymask cac2
> pctr: 686-class user-level performance counters enabled
> mtrr: Pentium Pro MTRR support
> dkcsum: wd0 matched BIOS disk 80
> root on wd0a
> rootdev=0x0 rrootdev=0x300 rawdev=0x302
> WARNING: / was not properly unmounted
> ukbd0 at uhub0 port 1 configuration 1 interface 0
> ukbd0: ATEN CS124U V1.00, rev 1.10/1.00, addr 2, iclass 3/1
> wskbd1 at ukbd0 mux 1
> ums0 at uhub0 port 1 configuration 1 interface 1
> ums0: ATEN CS124U V1.00, rev 1.10/1.00, addr 2, iclass 3/1
> ums0: 5 buttons and Z dir.
> wsmouse0 at ums0 mux 0
> wskbd1: connecting to wsdisplay0
> # vmstat -m -M bsd.1.core
> Memory statistics by bucket size
> Size In Use Free Requests HighWater Couldfree
> 16 113 0 399 0 2650
> 32 1005 0 147 0 2429
> 64 1408 0 320 0 5428
> 128 226 0 62 0 15107
> 256 143 0 177 0 1098
> 512 65 0 31 0 730
> 1024 136 0 12 0 5461
> 2048 7 0 13 0 468
> 4096 26 0 8 0 1326
> 8192 3 0 11 0 1342
> 16384 0 0 0 0 1
> 32768 1 0 0 0 1
> 262144 1 0 0 0 1
>
> Memory usage type by bucket size
> Size Type(s)
> 16 devbuf, pcb, routetbl, namecache, shm, proc, exec, VM swap, UVM amap,
> UVM aobj, USB, USB device, temp
> 32 devbuf, pcb, routetbl, ifaddr, pgrp, vnodes, UFS mount, subproc,
> LFS segment, ether_multi, VM swap, UVM amap, USB, memdesc, temp
> 64 devbuf, pcb, routetbl, ifaddr, session, NFS req, namecache, UFS mount,
> file, lockf, LFS segment, in_multi, pfkey data, UVM amap, USB,
> USB device, USB HC, NDP, temp
> 128 mbuf, devbuf, routetbl, zombie, ifaddr, soopts, cred, vnodes, proc,
> ttys, exec, xform_data, UVM amap, USB, USB device, pipe, temp
> 256 devbuf, socket, pcb, routetbl, ifaddr, NFS node, vnodes, VM map,
> subproc, NFS srvsock, NFS daemon, ttys, newblk, UVM amap, pool, USB
> 512 devbuf, pcb, ifaddr, ioctlops, mount, NFS mount, UFS mount, file desc,
> proc, pfil, UVM amap, USB device, NDP
> 1024 devbuf, namei, ioctlops, UFS mount, NQNFS Lease, ttys, exec, pagedep,
> UVM amap, UVM aobj, NDP, temp
> 2048 devbuf, UFS mount, MSDOSFS mount, VM swap, UVM amap
> 4096 mbuf, devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS mount,
> inodedep, UVM amap, memdesc, temp
> 8192 devbuf, UFS mount, UVM amap
> 16384 UFS mount
> 32768 pfil
> 262144 VM page bucket
>
> Memory statistics by type Type Kern
> Type InUse MemUse HighUse Limit Requests Limit Limit Size(s)
> mbuf 69 13K 14K 9831K 13140 0 0 128,4096
> devbuf 309 133K 154K 9831K 538 0 0 16,32,64,128,256,512,1024,2048,4096,8192
> socket 2 1K 10K 9831K 146 0 0 256
> pcb 7 2K 13K 9831K 168 0 0 16,32,64,256,512
> routetbl 76 10K 11K 9831K 118 0 0 16,32,64,128,256
> zombie 0 0K 2K 9831K 169 0 0 128
> ifaddr 36 6K 6K 9831K 41 0 0 32,64,128,256,512
> soopts 0 0K 1K 9831K 79 0 0 128
> namei 0 0K 5K 9831K 4954 0 0 1024
> ioctlops 0 0K 1K 9831K 23 0 0 512,1024
> cred 2 1K 1K 9831K 12 0 0 128
> pgrp 3 1K 1K 9831K 25 0 0 32
> session 2 1K 2K 9831K 22 0 0 64
> mount 2 1K 3K 9831K 6 0 0 512
> NFS req 1 1K 1K 9831K 1 0 0 64
> NFS mount 1 1K 1K 9831K 1 0 0 512
> NFS node 2 5K 5K 9831K 2 0 0 256,4096
> vnodes 906 34K 36K 9831K 969 0 0 32,128,256
> namecache 1315 91K 91K 9831K 1315 0 0 16,64,4096
> UFS quota 1 4K 4K 9831K 1 0 0 4096
> UFS mount 5 9K 44K 9831K 17 0 0 32,64,512,1024,2048,4096,8192,16384
> shm 0 0K 1K 9831K 2 0 0 16
> VM map 4 1K 1K 9831K 4 0 0 256
> file 2 1K 6K 9831K 1489 0 0 64
> file desc 13 7K 16K 9831K 182 0 0 512
> lockf 0 0K 1K 9831K 14 0 0 64
> proc 4 2K 2K 9831K 4 0 0 16,128,512
> subproc 14 1K 3K 9831K 195 0 0 32,256
> LFS segment 0 0K 1K 9831K 288 0 0 32,64
> NQNFS Lease 1 1K 1K 9831K 1 0 0 1024
> NFS srvsock 2 1K 1K 9831K 2 0 0 256
> NFS daemon 1 1K 1K 9831K 1 0 0 256
> in_multi 7 1K 1K 9831K 7 0 0 64
> ether_multi 3 1K 1K 9831K 3 0 0 32
> ISOFS mount 1 4K 4K 9831K 1 0 0 4096
> MSDOSFS mount 1 2K 2K 9831K 1 0 0 2048
> ttys 240 140K 140K 9831K 240 0 0 128,256,1024
> exec 0 0K 2K 9831K 178 0 0 16,128,1024
> pfil 6 27K 27K 9831K 6 0 0 512,32768
> pfkey data 1 1K 1K 9831K 2 0 0 64
> xform_data 1 1K 1K 9831K 1 0 0 128
> pagedep 1 1K 1K 9831K 1 0 0 1024
> inodedep 1 4K 4K 9831K 1 0 0 4096
> newblk 1 1K 1K 9831K 1 0 0 256
> VM page bucket 1 256K 256K 9831K 1 0 0 262144
> VM swap 8 3K 3K 9831K 8 0 0 16,32,2048
> UVM amap 37 2K 99K 9831K 11492 0 0 16,32,64,128,256,512,1024,2048,4096,8192
> UVM aobj 1 1K 2K 9831K 3 0 0 16,1024
> pool 3 1K 1K 9831K 3 0 0 256
> USB 27 3K 3K 9831K 29 0 0 16,32,64,128,256
> USB device 9 4K 4K 9831K 9 0 0 16,64,128,512
> USB HC 1 1K 1K 9831K 3 0 0 64
> pipe 0 0K 1K 9831K 58 0 0 128
> memdesc 1 4K 8K 9831K 26 0 0 32,4096
> NDP 1 1K 2K 9831K 3 0 0 64,512,1024
> temp 2 8K 10K 9831K 36 0 0 16,32,64,128,1024,4096
>
> Memory Totals: In Use Free Requests
> 776K 257K 36042
> Memory resource pool statistics
> Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
> phpool 40 99 0 0 1 0 1 1 0 inf 0
> pmappl 68 172 0 169 1 0 1 1 0 inf 0
> vmsppl 172 172 0 169 1 0 1 1 0 inf 0
> vmmpepl 60 5514 0 5494 6 0 6 6 0 inf 3
> uaoeltpl 80 0 0 0 0 0 0 0 0 inf 0
> aobjpl 52 2 0 2 1 0 1 1 0 inf 1
> amappl 40 1473 0 1462 2 0 2 2 0 inf 0
> wdcspl 96 3724 0 3723 1 0 1 1 0 inf 0
> procpl 312 182 0 169 3 0 3 3 0 inf 0
> vnodes 156 1310 0 0 51 0 51 51 0 inf 0
> ffsino 276 1378 0 327 94 0 94 94 0 inf 7
> swp buf 152 0 0 0 0 0 0 0 0 inf 0
> swp vnx 20 0 0 0 0 0 0 0 0 inf 0
> swp vnd 128 0 0 0 0 0 0 0 0 inf 0
> sigapl 308 172 0 169 2 0 2 2 0 inf 0
>
> In use 493K, total allocated 652K; utilization 75.6%
> 23:09:19.174354 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:09:19.175684 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:09:19.175839 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:13:35.212210 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:13:35.215873 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:13:35.216027 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:17:51.250138 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:17:51.250579 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:17:51.250730 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:22:07.288059 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:22:07.289390 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:22:07.289550 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:26:23.325677 arp who-has bounce.qlt.org tell rosie.qlt.org
> 23:26:23.325813 arp reply bounce.qlt.org is-at 0:40:f6:34:b2:4d
> 23:26:23.326000 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:26:23.327544 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:26:23.327693 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:30:39.363742 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:30:39.365540 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:30:39.365693 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:34:55.401655 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:34:55.405523 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:34:55.405694 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:39:11.439595 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:39:11.440038 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:39:11.440190 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:43:27.477598 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:43:27.481763 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:43:27.481915 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
> 23:47:43.515599 arp who-has bounce.qlt.org tell rosie.qlt.org
> 23:47:43.515717 arp reply bounce.qlt.org is-at 0:40:f6:34:b2:4d
> 23:47:43.515903 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
> 23:47:43.516293 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
> 23:47:43.516443 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable