偶发性崩溃的程序该怎么调试

在Unix/Linux系统上，有一种叫做 Core Dump 的文件，非常有用。

Core dump 文件保存了一个进程异常终止之后的调试信息，能够提供给gdb。

我们通过在gdb中加载这样的core dump文件，

就相当于读取了程序刚刚异常终止的状态，从而把现场还原。

好好利用core dump，你就可以知道为什么一个程序偶尔终止，偶尔正常，找出诡异情况的根源。

1、在Linux系统上，如何开启core dump调试？

目前主流的Linux系统一般都是把core dump功能关闭的，需要用户手动开启core dump的支持。

可以通过执行以下命令，查看core dump功能是否开启：

# ulimit -c

如果输出为 0，表示没有开启。一般会输出一个数字或者ulimited，表示所生成的core dump文件大小。

如果没有开启，可以手动开启，执行：

ulimit -c unlimited

2、如何长期开启core dump功能？

编辑 /etc/profile，在末尾加上命令：

ulimit -c unlimited >/dev/null 2>&1

如果原来文件中有ulimit的相关调用，则直接改那个调用。

3、core dump文件的生成方式

：

可以修改 /etc/sysctl.conf 文件，加入以下内容：

kernel.core_uses_pid = 1
kernel.core_pattern = /tmp/core-%e-%s-%u-%g-%p-%t
fs.suid_dumpable = 2

相关解释（英文）：

kernel.core_uses_pid = 1

– Appends the coring processes PID to the core file name.
fs.suid_dumpable = 2

– Make sure you get core dumps for setuid programs.
kernel.core_pattern = /tmp/core-%e-%s-%u-%g-%p-%t

– When the application terminates abnormally, a core file should appear in the /tmp. The kernel.core_pattern sysctl controls exact location of core file. You can define the core file name with the following template whih can contain % specifiers which are substituted by the following values when a core file is created:
- %%
  
  – A single % character
- %p
  
  – PID of dumped process
- %u
  
  – real UID of dumped process
- %g
  
  – real GID of dumped process
- %s
  
  – number of signal causing dump
- %t
  
  – time of dump (seconds since 0:00h, 1 Jan 1970)
- %h
  
  – hostname (same as ’nodename’ returned by uname(2))
- %e
  
  – executable filename

如果要开启对所有进程的core dump，执行：

# echo "DAEMON_COREFILE_LIMIT='unlimited'" >> /etc/sysconfig/init
# sysctl -p     重新加载sysctl配置

4、如何只开启对某个特定的守护进程的core dump调试？

以httpd为例，打开/etc/init.d/httpd，把下面的内容加上：

RedHat发行版：

DAEMON_COREFILE_LIMIT='unlimited'

其他发行版：

ulimit -c unlimited >/dev/null 2>&1
echo /tmp/core-%e-%s-%u-%g-%p-%t > /proc/sys/kernel/core_pattern

保存退出之后，重启httpd服务。

5、如何在gdb中读取core dump

执行 gdb httpd <core-dump> 即可。