[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
NFS mount hang
- To: tech_(_at_)_openbsd_(_dot_)_org
- Subject: NFS mount hang
- From: Quentin Barnes <qbarnes_(_at_)_urbana_(_dot_)_css_(_dot_)_mot_(_dot_)_com>
- Date: Mon, 30 Jul 2001 18:55:46 -0500
- Cc: qbarnes_(_at_)_prairienet_(_dot_)_org
Dale Rahn suggested I write up this problem and send it to this
address. He tells me that no one else seems to be having any
OpenBSD NFS problems.
I have a system that experiences an intermittent hang when mounting
an NFS file system. I have found nothing that seems to make a
difference whether or not it will hang when I attempt the mount.
The hang can happen either during normal /etc/rc processing or
after the system has fully booted.
I've been having this problem since some time after upgrading from
2.8 base starting with the April kernel around 4/28 (the proposed
2.9 baseline) continuing up until the current 2.9-stable GENERIC
build with patches around July 11. Both the NFS server and client
are running this kernel.
When the "mount -t nfs ..." hangs, I can interrupt the command, but
the mount_nfs child process stays. Any attempt to do another "mount
-t nfs", that process will not be interruptible. Also, shutting
down the system after any nfs mount problem results in a system hang
at the "syncing disks..." time. No "done" ever comes out.
The mount_nfs process hangs in "netio" while the reboot process
hangs in "inode". Dale says the mount_nfs though is blocked on
disk according to ps output.
My configuration is not that unusual. Both systems are x86 clone
systems with Intel processors. They're built from the ground up one
with an old Gateway NLX motherboard and the other from a Tyan ATX
motherboard. The only things I can think of are:
*) Client and server at different ethernet speeds.
They are connected via an ethernet switch. The server
is running at 10baseT and the client at 100baseT. I
would be running the server at 100baseT since it has a
DEC 21143, but due to independent bugs in both the 'de'
and 'dc' drivers prevent this. (Hi Aaron!)
*) Server has a slower CPU and bus than the client.
The server is a Pentium II/300MHz/66Mhz and the client
is a Pentium III/550MHz/100Mhz.
*) Client has a fast disk for the system.
I do have a very fast Maxtor disk (ATA100 capable
with 50MB/s to media) running on an ATA/66 (mode 4)
interface. This is not unusual in itself, but it might
be for the relatively slow CPU and networking speed
of the system.
I have "vfs.nfs.iothreads=4" in the sysctl.conf on the client.
Before changing this parameter, it hung in that configuration as
well.
I have a crash dump from a "syncing disks..." hang. I have included
the client's crash dump ps and vmstat output along with its dmesg.
I also have a tcpdump from the server's (bounce's/c248948-a's)
perspective after the client (rosie) has hung. You can see it
continuing the attempt the retry every 15 minutes, but getting
a "udp port xxx unreachable" error. It never comes out of this
problem. The mount either works immediately or never. (Well,
probably "never". I'm not that patient.)
Any ideas to try or information I can extract from the crash dump
to help isolate this problem?
Quentin
# ps lauxww -M bsd.1.core
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND USER %CPU %MEM STARTED
0 1 -541079792 0 10 20 340 0 wait Is ?? 0:00.02 (init) root 0.0 0.0 Thu06PM
0 7615 -541079792 0 2 20 220 0 netio D C0- 0:00.01 (mount_nfs) root 0.0 0.0 Thu06PM
0 6807 -541079792 1 -14 20 180 0 inode D C0- 0:00.09 (reboot) root 0.0 0.0 Thu06PM
OpenBSD 2.9-stable (GENERIC) #3: Wed Jul 11 15:36:53 CDT 2001
qbarnes_(_at_)_rosie:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: disabling processor serial number
cpu0: Intel Pentium III ("GenuineIntel" 686-class, 512KB L2 cache) 551 MHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SYS,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SIMD
real mem = 133738496 (130604K)
avail mem = 119083008 (116292K)
using 1658 buffers containing 6791168 bytes (6632K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(ad) BIOS, date 06/15/00, BIOS32 rev. 0 @ 0xfb290
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery charge unknown
pcibios0 at bios0: rev. 2.1 @ 0xf0000/0xb710
pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xfdb50/160 (8 entries)
pcibios0: PCI Interrupt Router at 000:07:0 ("VIA VT82C596A PCI-ISA" rev 0x00)
pcibios0: PCI Exclusive IRQs: 9 10 11 12
pcibios0: PCI bus #1 is the last bus
bios0: ROM list: 0xc0000/0x10000
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "VIA VT82C691 Host-PCI" rev 0xc4
ppb0 at pci0 dev 1 function 0 "VIA VT82C598 PCI-AGP" rev 0x00
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "Nvidia Vanta" rev 0x15
wsdisplay0 at vga1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pcib0 at pci0 dev 7 function 0 "VIA VT82C596A PCI-ISA" rev 0x23
pciide0 at pci0 dev 7 function 1 "VIA VT82C571 IDE" rev 0x10: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
pciide0: Ultra/66 capable
wd0 at pciide0 channel 0 drive 0: <Maxtor 53073H4>
wd0: 16-sector PIO, LBA, 29311MB, 16383 cyl, 16 head, 63 sec, 60030432 sectors
pciide0: channel 0 interrupting at irq 14
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
atapiscsi0 at pciide0 channel 1
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <Lite-On, LTN483S 48x Max, PD02> SCSI0 5/cdrom removable
pciide0: channel 1 interrupting at irq 15
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
uhci0 at pci0 dev 7 function 2 "VIA VT83C572 USB" rev 0x11: irq 9
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: vendor 0x0000 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
pchb1 at pci0 dev 7 function 3 "VIA VT82C596 Power Management" rev 0x30
unknown vendor 0x1813 product 0x4000 (class communications, subclass miscellaneous, rev 0x02) at pci0 dev 16 function 0 not configured
dc0 at pci0 dev 17 function 0 "Lite-On PNIC" rev 0x20: irq 11 address 00:c0:f0:3d:ac:37
ukphy0 at dc0 phy 1: Generic IEEE 802.3u media interface
ukphy0: OUI 0x00c0b4, model 0x0000, rev. 8
eap0 at pci0 dev 18 function 0 "Ensoniq AudioPCI97" rev 0x02: irq 12
ac97: codec id 0x54524103 (TriTech TR28023)
ac97: codec features No 3D Stereo
audio0 at eap0
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask c240 netmask ca40 ttymask cac2
pctr: 686-class user-level performance counters enabled
mtrr: Pentium Pro MTRR support
dkcsum: wd0 matched BIOS disk 80
root on wd0a
rootdev=0x0 rrootdev=0x300 rawdev=0x302
WARNING: / was not properly unmounted
ukbd0 at uhub0 port 1 configuration 1 interface 0
ukbd0: ATEN CS124U V1.00, rev 1.10/1.00, addr 2, iclass 3/1
wskbd1 at ukbd0 mux 1
ums0 at uhub0 port 1 configuration 1 interface 1
ums0: ATEN CS124U V1.00, rev 1.10/1.00, addr 2, iclass 3/1
ums0: 5 buttons and Z dir.
wsmouse0 at ums0 mux 0
wskbd1: connecting to wsdisplay0
# vmstat -m -M bsd.1.core
Memory statistics by bucket size
Size In Use Free Requests HighWater Couldfree
16 113 0 399 0 2650
32 1005 0 147 0 2429
64 1408 0 320 0 5428
128 226 0 62 0 15107
256 143 0 177 0 1098
512 65 0 31 0 730
1024 136 0 12 0 5461
2048 7 0 13 0 468
4096 26 0 8 0 1326
8192 3 0 11 0 1342
16384 0 0 0 0 1
32768 1 0 0 0 1
262144 1 0 0 0 1
Memory usage type by bucket size
Size Type(s)
16 devbuf, pcb, routetbl, namecache, shm, proc, exec, VM swap, UVM amap,
UVM aobj, USB, USB device, temp
32 devbuf, pcb, routetbl, ifaddr, pgrp, vnodes, UFS mount, subproc,
LFS segment, ether_multi, VM swap, UVM amap, USB, memdesc, temp
64 devbuf, pcb, routetbl, ifaddr, session, NFS req, namecache, UFS mount,
file, lockf, LFS segment, in_multi, pfkey data, UVM amap, USB,
USB device, USB HC, NDP, temp
128 mbuf, devbuf, routetbl, zombie, ifaddr, soopts, cred, vnodes, proc,
ttys, exec, xform_data, UVM amap, USB, USB device, pipe, temp
256 devbuf, socket, pcb, routetbl, ifaddr, NFS node, vnodes, VM map,
subproc, NFS srvsock, NFS daemon, ttys, newblk, UVM amap, pool, USB
512 devbuf, pcb, ifaddr, ioctlops, mount, NFS mount, UFS mount, file desc,
proc, pfil, UVM amap, USB device, NDP
1024 devbuf, namei, ioctlops, UFS mount, NQNFS Lease, ttys, exec, pagedep,
UVM amap, UVM aobj, NDP, temp
2048 devbuf, UFS mount, MSDOSFS mount, VM swap, UVM amap
4096 mbuf, devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS mount,
inodedep, UVM amap, memdesc, temp
8192 devbuf, UFS mount, UVM amap
16384 UFS mount
32768 pfil
262144 VM page bucket
Memory statistics by type Type Kern
Type InUse MemUse HighUse Limit Requests Limit Limit Size(s)
mbuf 69 13K 14K 9831K 13140 0 0 128,4096
devbuf 309 133K 154K 9831K 538 0 0 16,32,64,128,256,512,1024,2048,4096,8192
socket 2 1K 10K 9831K 146 0 0 256
pcb 7 2K 13K 9831K 168 0 0 16,32,64,256,512
routetbl 76 10K 11K 9831K 118 0 0 16,32,64,128,256
zombie 0 0K 2K 9831K 169 0 0 128
ifaddr 36 6K 6K 9831K 41 0 0 32,64,128,256,512
soopts 0 0K 1K 9831K 79 0 0 128
namei 0 0K 5K 9831K 4954 0 0 1024
ioctlops 0 0K 1K 9831K 23 0 0 512,1024
cred 2 1K 1K 9831K 12 0 0 128
pgrp 3 1K 1K 9831K 25 0 0 32
session 2 1K 2K 9831K 22 0 0 64
mount 2 1K 3K 9831K 6 0 0 512
NFS req 1 1K 1K 9831K 1 0 0 64
NFS mount 1 1K 1K 9831K 1 0 0 512
NFS node 2 5K 5K 9831K 2 0 0 256,4096
vnodes 906 34K 36K 9831K 969 0 0 32,128,256
namecache 1315 91K 91K 9831K 1315 0 0 16,64,4096
UFS quota 1 4K 4K 9831K 1 0 0 4096
UFS mount 5 9K 44K 9831K 17 0 0 32,64,512,1024,2048,4096,8192,16384
shm 0 0K 1K 9831K 2 0 0 16
VM map 4 1K 1K 9831K 4 0 0 256
file 2 1K 6K 9831K 1489 0 0 64
file desc 13 7K 16K 9831K 182 0 0 512
lockf 0 0K 1K 9831K 14 0 0 64
proc 4 2K 2K 9831K 4 0 0 16,128,512
subproc 14 1K 3K 9831K 195 0 0 32,256
LFS segment 0 0K 1K 9831K 288 0 0 32,64
NQNFS Lease 1 1K 1K 9831K 1 0 0 1024
NFS srvsock 2 1K 1K 9831K 2 0 0 256
NFS daemon 1 1K 1K 9831K 1 0 0 256
in_multi 7 1K 1K 9831K 7 0 0 64
ether_multi 3 1K 1K 9831K 3 0 0 32
ISOFS mount 1 4K 4K 9831K 1 0 0 4096
MSDOSFS mount 1 2K 2K 9831K 1 0 0 2048
ttys 240 140K 140K 9831K 240 0 0 128,256,1024
exec 0 0K 2K 9831K 178 0 0 16,128,1024
pfil 6 27K 27K 9831K 6 0 0 512,32768
pfkey data 1 1K 1K 9831K 2 0 0 64
xform_data 1 1K 1K 9831K 1 0 0 128
pagedep 1 1K 1K 9831K 1 0 0 1024
inodedep 1 4K 4K 9831K 1 0 0 4096
newblk 1 1K 1K 9831K 1 0 0 256
VM page bucket 1 256K 256K 9831K 1 0 0 262144
VM swap 8 3K 3K 9831K 8 0 0 16,32,2048
UVM amap 37 2K 99K 9831K 11492 0 0 16,32,64,128,256,512,1024,2048,4096,8192
UVM aobj 1 1K 2K 9831K 3 0 0 16,1024
pool 3 1K 1K 9831K 3 0 0 256
USB 27 3K 3K 9831K 29 0 0 16,32,64,128,256
USB device 9 4K 4K 9831K 9 0 0 16,64,128,512
USB HC 1 1K 1K 9831K 3 0 0 64
pipe 0 0K 1K 9831K 58 0 0 128
memdesc 1 4K 8K 9831K 26 0 0 32,4096
NDP 1 1K 2K 9831K 3 0 0 64,512,1024
temp 2 8K 10K 9831K 36 0 0 16,32,64,128,1024,4096
Memory Totals: In Use Free Requests
776K 257K 36042
Memory resource pool statistics
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
phpool 40 99 0 0 1 0 1 1 0 inf 0
pmappl 68 172 0 169 1 0 1 1 0 inf 0
vmsppl 172 172 0 169 1 0 1 1 0 inf 0
vmmpepl 60 5514 0 5494 6 0 6 6 0 inf 3
uaoeltpl 80 0 0 0 0 0 0 0 0 inf 0
aobjpl 52 2 0 2 1 0 1 1 0 inf 1
amappl 40 1473 0 1462 2 0 2 2 0 inf 0
wdcspl 96 3724 0 3723 1 0 1 1 0 inf 0
procpl 312 182 0 169 3 0 3 3 0 inf 0
vnodes 156 1310 0 0 51 0 51 51 0 inf 0
ffsino 276 1378 0 327 94 0 94 94 0 inf 7
swp buf 152 0 0 0 0 0 0 0 0 inf 0
swp vnx 20 0 0 0 0 0 0 0 0 inf 0
swp vnd 128 0 0 0 0 0 0 0 0 inf 0
sigapl 308 172 0 169 2 0 2 2 0 inf 0
In use 493K, total allocated 652K; utilization 75.6%
23:09:19.174354 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:09:19.175684 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:09:19.175839 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:13:35.212210 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:13:35.215873 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:13:35.216027 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:17:51.250138 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:17:51.250579 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:17:51.250730 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:22:07.288059 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:22:07.289390 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:22:07.289550 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:26:23.325677 arp who-has bounce.qlt.org tell rosie.qlt.org
23:26:23.325813 arp reply bounce.qlt.org is-at 0:40:f6:34:b2:4d
23:26:23.326000 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:26:23.327544 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:26:23.327693 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:30:39.363742 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:30:39.365540 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:30:39.365693 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:34:55.401655 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:34:55.405523 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:34:55.405694 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:39:11.439595 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:39:11.440038 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:39:11.440190 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:43:27.477598 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:43:27.481763 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:43:27.481915 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
23:47:43.515599 arp who-has bounce.qlt.org tell rosie.qlt.org
23:47:43.515717 arp reply bounce.qlt.org is-at 0:40:f6:34:b2:4d
23:47:43.515903 rosie.qlt.org.703 > c248948-a.chmpgn1.il.home.com.nfsd: xid 0xc6f016d3 96 getattr [|nfs]
23:47:43.516293 bounce.qlt.org.nfsd > rosie.qlt.org.703: xid 0xc6f016d3 reply ok 112
23:47:43.516443 rosie.qlt.org > bounce.qlt.org: icmp: rosie.qlt.org udp port 703 unreachable
Visit your host, monkey.org