Debugging System Faults
Even if you’ve used all the monitoring and debugging techniques, sometimes bugs remain in the driver, and the system faults when the driver is executed. When this happens, it’s important to be able to collect as much information as possible to solve the problem. 即使你已经使用了所有的监控和调试技术,有时驱动程序中仍然存在bug,在执行驱动程序时系统会出现故障。当这种情况发生时,能够收集尽可能多的信息来解决这个问题是很重要的。
Note that “fault” doesn’t mean “panic.” The Linux code is robust enough to respond gracefully to most errors: a fault usually results in the destruction of the current process while the system goes on working. The system can panic, and it may if a fault happens outside of a process’s context or if some vital part of the system is compromised. But when the problem is due to a driver error, it usually results only in the sudden death of the process unlucky enough to be using the driver. The only unrecoverable damage when a process is destroyed is that some memory allocated to the process’s context is lost; for instance, dynamic lists allocated by the driver through kmalloc might be lost. However, since the kernel calls the close operation for any open device when a process dies, your driver can release what was allocated by the open method. 请注意,”故障 “并不意味着 “恐慌”。Linux的代码足够健壮,可以对大多数错误做出优雅的反应:故障通常导致当前进程的破坏,而系统继续工作。系统可以恐慌,如果故障发生在进程的上下文之外,或者系统的某些重要部分被破坏,它可能会恐慌。但是,当问题是由驱动错误引起的时候,它通常只导致不幸使用该驱动的进程突然死亡。当一个进程被破坏时,唯一无法恢复的损失是一些分配给该进程上下文的内存被丢失;例如,驱动通过kmalloc分配的动态列表可能被丢失。然而,由于当一个进程死亡时,内核会调用任何开放设备的关闭操作,你的驱动程序可以释放由开放方法分配的东西。
Even though an oops usually does not bring down the entire system, you may well find yourself needing to reboot after one happens. A buggy driver can leave hardware in an unusable state, leave kernel resources in an inconsistent state, or, in the worst case, corrupt kernel memory in random places. Often you can simply unload your buggy driver and try again after an oops. If, however, you see anything that suggests that the system as a whole is not well, your best bet is usually to reboot immediately. 尽管一个错误通常不会导致整个系统瘫痪,但你很可能发现自己在一个错误发生后需要重新启动。一个有问题的驱动程序可能使硬件处于不可用的状态,使内核资源处于不一致的状态,或者在最坏的情况下,在随机的地方损坏内核内存。通常情况下,你可以简单地卸载有问题的驱动程序,在发生故障后再试一次。但是,如果你看到有任何东西表明系统整体上不正常,你最好的办法通常是立即重启。
We’ve already said that when kernel code misbehaves, an informative message is printed on the console. The next section explains how to decode and use such messages. Even though they appear rather obscure to the novice, processor dumps are full of interesting information, often sufficient to pinpoint a program bug without the need for additional testing. 我们已经说过,当内核代码出现问题时,会在控制台打印出一条信息性的消息。下一节将解释如何解码和使用这些信息。尽管对于新手来说,它们显得相当晦涩,但处理器转储充满了有趣的信息,通常足以确定程序的错误,而不需要进行额外的测试。
Oops Messages
Most bugs show themselves in NULL pointer dereferences or by the use of other incorrect pointer values. The usual outcome of such bugs is an oops message. 大多数错误都表现在对NULL指针的取消引用或使用其他不正确的指针值。这类错误的通常结果是一条错误信息。
Almost any address used by the processor is a virtual address and is mapped to physical addresses through a complex structure of page tables (the exceptions are physical addresses used with the memory management subsystem itself). When an invalid pointer is dereferenced, the paging mechanism fails to map the pointer to a physical address, and the processor signals a page fault to the operating system. If the address is not valid, the kernel is not able to “page in” the missing address; it (usually) generates an oops if this happens while the processor is in supervisor mode. 处理器使用的几乎所有地址都是虚拟地址,并通过复杂的页表结构映射到物理地址(例外情况是内存管理子系统本身使用的物理地址)。当一个无效的指针被取消引用时,分页机制无法将指针映射到物理地址,处理器就会向操作系统发出一个分页故障。如果地址是无效的,内核就不能 “分页 “缺失的地址;如果这种情况发生在处理器处于监督者模式时,它(通常)会产生一个OOPS。
An oops displays the processor status at the time of the fault, including the contents of the CPU registers and other seemingly incomprehensible information. The message is generated by printk statements in the fault handler (arch/*/kernel/traps.c) and is dispatched as described earlier in
Section 4.2.1
). oops显示了故障发生时的处理器状态,包括CPU寄存器的内容和其他看似难以理解的信息。该信息由故障处理程序(arch/*/kernel/traps.c)中的printk语句产生,并按前面第4.2.1节所述进行调度。)
Let’s look at one such message. Here’s what results from dereferencing a NULL pointer on a PC running Version 2.6 of the kernel. The most relevant information here is the instruction pointer (EIP), the address of the faulty instruction. 让我们看看这样一条信息。下面是在运行2.6版内核的PC上取消引用一个NULL指针的结果。这里最相关的信息是指令指针(EIP),即有问题的指令的地址。
Unable to handle kernel NULL pointer dereference at virtual address 00000000无法处理虚拟地址00000000处的内核NULL指针解除引用问题
printing eip:
d083a064
Oops: 0002 [#1]
SMP
CPU: 0
EIP: 0060:[<d083a064>] Not tainted
EFLAGS: 00010246 (2.6.6)
EIP is at faulty_write+0x4/0x10 [faulty]
eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000
esi: cf8b2460 edi: cf8b2480 ebp: 00000005 esp: c31c5f74
ds: 007b es: 007b ss: 0068
Process bash (pid: 2086, threadinfo=c31c4000 task=cfa0a6c0)
Stack: c0150558 cf8b2460 080e9408 00000005 cf8b2480 00000000 cf8b2460 cf8b2460
fffffff7 080e9408 c31c4000 c0150682 cf8b2460 080e9408 00000005 cf8b2480
00000000 00000001 00000005 c0103f8f 00000001 080e9408 00000005 00000005
Call Trace:
[<c0150558>] vfs_write+0xb8/0x130
[<c0150682>] sys_write+0x42/0x70
[<c0103f8f>] syscall_call+0x7/0xb
Code: 89 15 00 00 00 00 c3 90 8d 74 26 00 83 ec 0c b8 00 a6 83 d0
This message was generated by writing to a device owned by the faulty module, a module built deliberately to demonstrate failures. The implementation of the write method of faulty.c is trivial: 这条信息是通过写到一个由故障模块拥有的设备上产生的,这个模块是为了演示故障而特意建立的。faulty.c的写入方法的实现是微不足道的。
ssize_t faulty_write (struct file *filp, const char _ _user *buf, size_t count, loff_t *pos)
{
/* make a simple fault by dereferencing a NULL pointer */
*(int *)0 = 0;
return 0;
}
As you can see, what we do here is dereference a NULL pointer. Since 0 is never a valid pointer value, a fault occurs, which the kernel turns into the oops message shown earlier. The calling process is then killed. 正如你所看到的,我们在这里所做的是解除对一个NULL指针的定义。由于0从来不是一个有效的指针值,所以发生了一个故障,内核把它变成了前面显示的OOPS消息。然后调用进程被杀死。
The faulty module has a different fault condition in its read implementation: 有故障的模块在其读取实现中具有不同的故障条件。
ssize_t faulty_read(struct file *filp, char _ _user *buf, size_t count, loff_t *pos)
{
int ret;
char stack_buf[4];
/* Let’s try a buffer overflow */
memset(stack_buf, 0xff, 20);
if (count > 4)
count = 4; /* copy 4 bytes to the user */
ret = copy_to_user(buf, stack_buf, count);
if (!ret)
return count;
return ret;
}
This method copies a string into a local variable; unfortunately, the string is longer than the destination array. The resulting buffer overflow causes an oops when the function returns. Since the return instruction brings the instruction pointer to nowhere land, this kind of fault is much harder to trace, and you can get something such as the following: 这个方法将一个字符串复制到一个局部变量中;不幸的是,这个字符串比目标数组长。由此产生的缓冲区溢出在函数返回时导致了一个OOPS。由于返回指令将指令指针带到了无名之地,这种故障就更难追踪了,你可以得到如下的结果。
EIP: 0010:[<00000000>]
Unable to handle kernel paging request at virtual address ffffffff无法处理虚拟地址fffffff的内核分页请求
printing eip:
ffffffff
Oops: 0000 [#5]
SMP
CPU: 0
EIP: 0060:[<ffffffff>] Not tainted
EFLAGS: 00010296 (2.6.6)
EIP is at 0xffffffff
eax: 0000000c ebx: ffffffff ecx: 00000000 edx: bfffda7c
esi: cf434f00 edi: ffffffff ebp: 00002000 esp: c27fff78
ds: 007b es: 007b ss: 0068
Process head (pid: 2331, threadinfo=c27fe000 task=c3226150)
Stack: ffffffff bfffda70 00002000 cf434f20 00000001 00000286 cf434f00 fffffff7
bfffda70 c27fe000 c0150612 cf434f00 bfffda70 00002000 cf434f20 00000000
00000003 00002000 c0103f8f 00000003 bfffda70 00002000 00002000 bfffda70
Call Trace:
[<c0150612>] sys_read+0x42/0x70
[<c0103f8f>] syscall_call+0x7/0xb
Code: Bad EIP value.
In this case, we see only part of the call stack (vfs_read and faulty_read are missing), and the kernel complains about a “bad EIP value.” That complaint, and the offending address (ffffffff) listed at the beginning are both hints that the kernel stack has been corrupted. 在这个例子中,我们只看到部分调用堆栈(vfs_read和faulty_read不见了),而且内核抱怨说 “坏的EIP值”。这个抱怨和开头列出的违规地址(fffffff)都是提示,内核堆栈已经被破坏了。
In general, when you are confronted with an oops, the first thing to do is to look at the location where the problem happened, which is usually listed separately from the call stack. In the first oops shown above, the relevant line is: 一般来说,当你遇到OOPS时,首先要做的是查看问题发生的位置,它通常与调用栈分开列出。在上面显示的第一个OOPS中,相关的行是。
EIP is at faulty_write+0x4/0x10 [faulty]
Here we see that we were in the function faulty_write , which is located in the faulty module (which is listed in square brackets). The hex numbers indicate that the instruction pointer was 4 bytes into the function, which appears to be 10 (hex) bytes long. Often that is enough to figure out what the problem is. 在这里,我们看到我们是在函数faulty_write中,它位于故障模块中(方括号中列出)。十六进制数字表明,指令指针在函数中的4个字节,似乎是10个(十六进制)字节长。通常这就足以找出问题所在了。
If you need more information, the call stack shows you how you got to where things fell apart. The stack itself is printed in hex form; with a bit of work, you can often determine the values of local variables and function parameters from the stack listing. Experienced kernel developers can benefit from a certain amount of pattern recognition here; for example, if we look at the stack listing from the faulty_read oops: 如果你需要更多的信息,调用堆栈显示了你是如何走到事情失败的地方。堆栈本身是以十六进制形式打印的;只要花点功夫,你通常可以从堆栈列表中确定局部变量和函数参数的值。有经验的内核开发者可以从某种程度的模式识别中受益;例如,如果我们看一下faulty_read oops.的堆栈列表,你就会发现,堆栈中的变量和参数都是不一样的。
Stack: ffffffff bfffda70 00002000 cf434f20 00000001 00000286 cf434f00 fffffff7
bfffda70 c27fe000 c0150612 cf434f00 bfffda70 00002000 cf434f20 00000000
00000003 00002000 c0103f8f 00000003 bfffda70 00002000 00002000 bfffda70
The ffffffff at the top of the stack is part of our string that broke things. On the x86 architecture, by default, the user-space stack starts just below 0xc0000000; thus, the recurring value 0xbfffda70 is probably a user-space stack address; it is, in fact, the address of the buffer passed to the read system call, replicated each time it is passed down the kernel call chain. On the x86 (again, by default), kernel space starts at 0xc0000000, so values above that are almost certainly kernel-space addresses, and so on. 堆栈顶部的fffffff是我们的字符串的一部分,它破坏了事情。在x86架构上,默认情况下,用户空间堆栈从0xc0000000下面开始;因此,反复出现的值0xbfffda70可能是一个用户空间堆栈地址;事实上,它是传递给read系统调用的缓冲区的地址,每次在内核调用链中被复制下来。在x86上(同样,默认情况下),内核空间从0xc0000000开始,所以这个值以上几乎肯定是内核空间地址,以此类推。
Finally, when looking at oops listings, always be on the lookout for the “slab poisoning” values discussed at the beginning of this chapter. Thus, for example, if you get a kernel oops where the offending address is 0xa5a5a5a5, you are almost certainly forgetting to initialize dynamic memory somewhere. 最后,当查看OOPS列表时,总是要注意本章开头讨论的 “slab poisoning “值。因此,举例来说,如果你得到的内核故障地址是0xa5a5a5a5,你几乎肯定是忘记了初始化某处的动态内存。
Please note that you see a symbolic call stack (as shown above) only if your kernel is built with the CONFIG_KALLSYMS option turned on. Otherwise, you see a bare, hexadecimal listing, which is far less useful until you have decoded it in other ways. 请注意,只有当你的内核在构建时打开了CONFIG_KALLSYMS选项,你才能看到一个符号化的调用栈(如上图所示)。否则,你看到的是一个赤裸裸的十六进制列表,在你用其他方式解码之前,它的作用要小得多。