1. 参考文章2. 实践
2.1 内核配置2.2 内核崩溃定位
1. 参考文章这些天遇到一个非常离谱的内核错误,用眼睛看根本无法定位……因为看不懂内核DUMP日志,这里简单记录一下
定位内核模块crash的方法
2. 实践 2.1 内核配置首先内核需要可DEBUG,相关配置可参考博主之前的文章用VSCode + QEMU跑起来能够可视化Debug的NOVA文件系统,这里重点关注下面内核DEBUG选项配置部分:
# 接下来配置内核Debug选项,直接用命令即可 # 下述代码-e表示enable,-d表示disable ./scripts/config -e DEBUG_INFO -e GDB_scriptS -e CONFIG_DEBUG_SECTION_MISMATCH -e CONFIG_frame_POINTER -d CONFIG_RANDOMIZE_base2.2 内核崩溃定位
当内核崩溃时,你会看到类似下面的堆栈输出:
[ 370.075682] hunter: hk_readdir: ino 0, size 0, pos 0x0 [ 370.076316] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a [ 370.076557] #PF error: [normal kernel read fault] [ 370.076748] PGD 2365bd067 P4D 2365bd067 PUD 23750b067 PMD 0 [ 370.077071] Oops: 0000 [#1] SMP NOPTI [ 370.077350] CPU: 0 PID: 1126 Comm: ls Not tainted 5.1.0-nova+ #94 [ 370.077537] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014 [ 370.077933] RIP: 0010:hk_readdir+0x203/0x30c [ 370.078217] Code: 49 8b 04 24 49 8b 4c 24 08 41 83 e1 0f 4c 89 e7 e8 d1 93 ad 00 85 c0 75 40 48 8b 1b 48 85 db 74 52 48 85 db 74 49 [ 370.078766] RSP: 0018:ffffc90000d4be30 EFLAGS: 00000282 [ 370.078941] RAX: 000000000000001f RBX: ffff888237241020 RCX: 0000000000000000 [ 370.079179] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000001d32688 [ 370.079397] RBP: ffffc90000d4be80 R08: 0000000000000001 R09: 0000000000000008 [ 370.079599] R10: 206f6e69203a7269 R11: 20657a6973202c30 R12: ffffc90000d4bed0 [ 370.079800] R13: ffff8882366f4800 R14: 000000000000001f R15: ffff88823733c280 [ 370.080046] FS: 0000000001d313c0(0000) GS:ffff888238a00000(0000) knlGS:0000000000000000 [ 370.080269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 370.080431] CR2: 000000000000000a CR3: 00000002367d0000 CR4: 00000000000006f0 [ 370.080674] Call Trace: [ 370.081378] iterate_dir+0x8c/0x190 [ 370.081514] ksys_getdents64+0x97/0x130 [ 370.081625] ? iterate_dir+0x190/0x190 [ 370.081733] __x64_sys_getdents64+0x11/0x20 [ 370.081869] do_syscall_64+0x43/0xf0 [ 370.081974] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 370.082251] RIP: 0033:0x4e760b [ 370.082354] Code: 04 48 81 ec 80 00 00 00 e8 b2 e4 f5 ff 48 81 c4 80 00 00 00 5b c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 d9 08 [ 370.082854] RSP: 002b:00007ffd0daf4888 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9 [ 370.083074] RAX: ffffffffffffffda RBX: 0000000001d32610 RCX: 00000000004e760b [ 370.083270] RDX: 0000000000008000 RSI: 0000000001d32640 RDI: 0000000000000003 [ 370.083460] RBP: 0000000001d32640 R08: 0000000000000003 R09: 0000000000888940 [ 370.083651] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffe0 [ 370.083844] R13: 0000000000000000 R14: 0000000000400628 R15: 0000000000000001 [ 370.084084] Modules linked in: [ 370.084278] CR2: 000000000000000a ...
这里主要关注RIP: 0010:hk_readdir+0x203/0x30c,它指出了崩溃位置,即hk_readdir函数的0x203偏移处,接下来的任务便是找到它对应的代码:对你的二进制Linux内核文件运行gdb,并打相应断点。
gdb vmlinux b *hk_readdir+0x203
可以看到类似下面的输出:
Breakpoint 1 at 0xffffffff81327b24: file fs/hunter/dir.c, line 56.
对着这个地址找过去就OK了。



