[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Kernel debug question
I ran into a kernel crash and I am trying to debug it but I can't get
GDB to do what I need. I have spent quite a bit of time trying to find
something usefull in the archives but I have not been able to.
What I did do was
* Create a GENERIC bsd.gdb kernel (which is the same as I run).
* Run the program that crashes the kernel
* boot dump
* savecore creates the core file in /var/crash during boot
Up till here it is pretty simple ;) Now I am going to try to figure out
why a.out crashed my kernel.
V1.00
[root@corona root]# cd /var/crash/
[root@corona crash]# gdb bsd.gdb
GNU gdb 4.16.1
Copyright 1996 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-unknown-openbsd3.0"...
(gdb) target kcore bsd.1.core
#0 0x1000 in ?? ()
(gdb) bt
#0 0x1000 in ?? ()
#1 0xe02da4af in boot (howto=2308) at
/usr/src/sys/arch/i386/compile/GENERIC/../../../../arch/i386/i386/machde
p.c:1610
#2 0xe0180523 in db_boot_crash_cmd ()
#3 0xe01800d0 in db_command ()
#4 0xe0180363 in db_command_loop ()
#5 0xe01837ae in db_trap ()
#6 0xe02d55d1 in kdb_trap ()
#7 0xe02e338c in trap ()
hmmm, not wat I was looking for. Let's try what I did find in the
archives
(gdb) proc curproc
(gdb) bt
#0 0x10292 in ?? ()
#1 0xe01aca84 in preempt ()
can not access 0x8, invalid address (8)
can not access 0x8, invalid address (8)
Cannot access memory at address 0x8.
Also not what I was looking for.
V2.00
Following some other hints I found in the mailing lists
[root@corona crash]# ps ax -Opaddr -N bsd.gdb -M bsd.1.core
PID PADDR TT STAT TIME COMMAND
1 170eb000 ?? Is 0:00.01 (init)
3290 17114004 ?? Is 0:00.01 (dhclient)
13538 17114138 ?? Rs 0:00.02 (syslogd)
21023 1711426c ?? Is 0:00.00 (portmap)
31872 171144d4 ?? Is 0:00.02 (inetd)
29471 17114608 ?? Ss 0:00.01 (sendmail)
27393 17114870 ?? Is 0:00.32 (sshd)
22276 171149a4 ?? Is 0:00.00 (cron)
28780 170ebd3c C0 Ss 0:00.05 (bash)
6659 1711473c C0 R+ 0:00.00 (a.out)
30255 170ebe70 C1 Is+ 0:00.01 (getty)
19453 17114ad8 C2 Is+ 0:00.00 (getty)
32085 17114c0c C3 Is+ 0:00.00 (getty)
1264 17114d40 C4 Is+ 0:00.00 (getty)
23324 17114e74 C5 Is+ 0:00.00 (getty)
[root@corona crash]# gdb bsd.gdb
GNU gdb 4.16.1
Copyright 1996 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-unknown-openbsd3.0"...
(gdb) target kcore bsd.1.core
#0 0x1000 in ?? ()
(gdb) proc 0xf81711473c
can not access 0x1711485c, invalid address (1711485c)
can not access 0x1711485c, invalid address (1711485c)
cannot read u area ptr
ok, 0xf8 sounded weird so let's not use it
(gdb) proc 0x1711473c
can not access 0x1711485c, invalid address (1711485c)
can not access 0x1711485c, invalid address (1711485c)
cannot read u area ptr
quite similar! dang it!
I know I am missing something trivial but my brain is fried and I can't
figure it out. Does anyone out there know what I am doing wrong ?
As a little background, my kernel crashes into ddb like:
uvm_fault(xx,yy,0,1) -> 5
kernel: page fault trap, code=0
stopped at _fdrelease+0x17: movl 0(%eax), %edx
That makes perfect sense with the pasted GDB traces but I want to see
*what* created the trap. Actually, I know the userland code that crashes
the kernel but I want to get out of gdb what gets corrupted . DDB at
least gives me a useful trace which shows the function that crashed my
kernel.
Thanks,
/marco