汇编和c只有一步之近—-小话c语言(19)

  • Post author:
  • Post category:其他


作者:陈曦

日期:2012-6-8 10:50:13

环境:[Ubuntu 11.04  Intel-based x64 gcc4.5.2  CodeBlocks10.05  AT&T汇编  Intel汇编]

转载请注明出处

Q: 举个例子吧。

A: 下面的代码的目标是计算1+2的值,最后放到变量temp中,并输出:

#include <stdio.h>
#include <string.h>

#define PRINT_D(longValue)       printf(#longValue" is %ld\n", ((long)longValue));
#define PRINT_STR(str)              printf(#str" is %s\n", (str));


static void assemble_func()
{
    int temp;
    __asm__("mov $1, %eax");
    __asm__("mov $2, %ebx");
    __asm__("add %ebx, %eax");  // 1 + 2
    __asm__("mov %%eax, %0":"=r"(temp));    // mov the value of register eax to the var "temp" 
    PRINT_D(temp)               // print temp 
}

int main()
{
    assemble_func();
    return 0;
}

运行结果:

temp is 3

Q: assemble_func函数的汇编代码形式是什么?

A:

  0x08048404 <+0>:	push   ebp
   0x08048405 <+1>:	mov    ebp,esp
   0x08048407 <+3>:	push   ebx
   0x08048408 <+4>:	sub    esp,0x24
=> 0x0804840b <+7>:	mov    eax,0x1
   0x08048410 <+12>:	mov    ebx,0x2
   0x08048415 <+17>:	add    eax,ebx
   0x08048417 <+19>:	mov    ebx,eax
   0x08048419 <+21>:	mov    DWORD PTR [ebp-0xc],ebx
   0x0804841c <+24>:	mov    eax,0x8048510
   0x08048421 <+29>:	mov    edx,DWORD PTR [ebp-0xc]
   0x08048424 <+32>:	mov    DWORD PTR [esp+0x4],edx
   0x08048428 <+36>:	mov    DWORD PTR [esp],eax
   0x0804842b <+39>:	call   0x8048340 <printf@plt>
   0x08048430 <+44>:	add    esp,0x24
   0x08048433 <+47>:	pop    ebx
   0x08048434 <+48>:	pop    ebp
   0x08048435 <+49>:	ret    

上面的汇编是在调试运行到assemble_func函数的开始时,使用disassemble命令得到的数据。注意第五行左侧的箭头符号是调试状态显示正在运行的行数。

Q: 上面的汇编是内嵌到c代码中的,单独完全的汇编代码,如何实现hello world的功能?

A: 从本质上说,只用汇编的形式需要对于底层更了解,c代码从编译的角度来说和汇编没什么区别,只是写的格式以及调用的东西看起来不一致罢了。

如下,是实现标准控制台输出功能的代码:

.section .rodata
str:
.ascii "Hello,world.\n"

.section .text
.globl _main
_main:
movl  $4,    %eax    # the number of system call 
movl  $1,    %ebx    # file descriptor, 1 means stdout
movl  $str,  %ecx    # string address
movl  $13,   %edx    # string length
int   $0x80

保存为hello.s.

Q: 如何编译它,使用gcc吗?

A: 当然可以,不过这个文件显然不需要预处理了,它已经是汇编格式了,不需要单纯狭义的编译过程了,只需要从汇编过程开始了。

它可以直接生成目标文件hello.o

Q: 接下来做什么?可以直接执行它吗?

A: 试试。


此时,给hello.o添加可执行权限再执行:

Q: 这是为什么?

A: 继续观察hello.o文件的属性。

可以看出,它还不是可执行文件。其实很简单,hello.o只是目标文件,并没有链接成可执行文件。

Q: 这又是为什么?没有找到入口符号_start, ld默认的入口符号是_start?

A: 是的。在代码中使用的是_main, 所以应该让链接器明白,入口符号是_main.

Q: 现在应该可以运行了吧。运行一下:

Hello,world是输出了,为什么后面会出现段错误呢?

A: 我们首先看看上面的运行返回了什么。

返回值为139,它代表什么?

Q: 从系统的errno.h头文件以及相关文件中查找,得到所有系统错误码:

/usr/include/asm-generic/errno-base.h文件:

#ifndef _ASM_GENERIC_ERRNO_BASE_H
#define _ASM_GENERIC_ERRNO_BASE_H

#define	EPERM		 1	/* Operation not permitted */
#define	ENOENT		 2	/* No such file or directory */
#define	ESRCH		 3	/* No such process */
#define	EINTR		 4	/* Interrupted system call */
#define	EIO		 5	/* I/O error */
#define	ENXIO		 6	/* No such device or address */
#define	E2BIG		 7	/* Argument list too long */
#define	ENOEXEC		 8	/* Exec format error */
#define	EBADF		 9	/* Bad file number */
#define	ECHILD		10	/* No child processes */
#define	EAGAIN		11	/* Try again */
#define	ENOMEM		12	/* Out of memory */
#define	EACCES		13	/* Permission denied */
#define	EFAULT		14	/* Bad address */
#define	ENOTBLK		15	/* Block device required */
#define	EBUSY		16	/* Device or resource busy */
#define	EEXIST		17	/* File exists */
#define	EXDEV		18	/* Cross-device link */
#define	ENODEV		19	/* No such device */
#define	ENOTDIR		20	/* Not a directory */
#define	EISDIR		21	/* Is a directory */
#define	EINVAL		22	/* Invalid argument */
#define	ENFILE		23	/* File table overflow */
#define	EMFILE		24	/* Too many open files */
#define	ENOTTY		25	/* Not a typewriter */
#define	ETXTBSY		26	/* Text file busy */
#define	EFBIG		27	/* File too large */
#define	ENOSPC		28	/* No space left on device */
#define	ESPIPE		29	/* Illegal seek */
#define	EROFS		30	/* Read-only file system */
#define	EMLINK		31	/* Too many links */
#define	EPIPE		32	/* Broken pipe */
#define	EDOM		33	/* Math argument out of domain of func */
#define	ERANGE		34	/* Math result not representable */

#endif

/usr/include/asm-generic/errno.h文件:

#ifndef _ASM_GENERIC_ERRNO_H
#define _ASM_GENERIC_ERRNO_H

#include <asm-generic/errno-base.h>

#define	EDEADLK		35	/* Resource deadlock would occur */
#define	ENAMETOOLONG	36	/* File name too long */
#define	ENOLCK		37	/* No record locks available */
#define	ENOSYS		38	/* Function not implemented */
#define	ENOTEMPTY	39	/* Directory not empty */
#define	ELOOP		40	/* Too many symbolic links encountered */
#define	EWOULDBLOCK	EAGAIN	/* Operation would block */
#define	ENOMSG		42	/* No message of desired type */
#define	EIDRM		43	/* Identifier removed */
#define	ECHRNG		44	/* Channel number out of range */
#define	EL2NSYNC	45	/* Level 2 not synchronized */
#define	EL3HLT		46	/* Level 3 halted */
#define	EL3RST		47	/* Level 3 reset */
#define	ELNRNG		48	/* Link number out of range */
#define	EUNATCH		49	/* Protocol driver not attached */
#define	ENOCSI		50	/* No CSI structure available */
#define	EL2HLT		51	/* Level 2 halted */
#define	EBADE		52	/* Invalid exchange */
#define	EBADR		53	/* Invalid request descriptor */
#define	EXFULL		54	/* Exchange full */
#define	ENOANO		55	/* No anode */
#define	EBADRQC		56	/* Invalid request code */
#define	EBADSLT		57	/* Invalid slot */

#define	EDEADLOCK	EDEADLK

#define	EBFONT		59	/* Bad font file format */
#define	ENOSTR		60	/* Device not a stream */
#define	ENODATA		61	/* No data available */
#define	ETIME		62	/* Timer expired */
#define	ENOSR		63	/* Out of streams resources */
#define	ENONET		64	/* Machine is not on the network */
#define	ENOPKG		65	/* Package not installed */
#define	EREMOTE		66	/* Object is remote */
#define	ENOLINK		67	/* Link has been severed */
#define	EADV		68	/* Advertise error */
#define	ESRMNT		69	/* Srmount error */
#define	ECOMM		70	/* Communication error on send */
#define	EPROTO		71	/* Protocol error */
#define	EMULTIHOP	72	/* Multihop attempted */
#define	EDOTDOT		73	/* RFS specific error */
#define	EBADMSG		74	/* Not a data message */
#define	EOVERFLOW	75	/* Value too large for defined data type */
#define	ENOTUNIQ	76	/* Name not unique on network */
#define	EBADFD		77	/* File descriptor in bad state */
#define	EREMCHG		78	/* Remote address changed */
#define	ELIBACC		79	/* Can not access a needed shared library */
#define	ELIBBAD		80	/* Accessing a corrupted shared library */
#define	ELIBSCN		81	/* .lib section in a.out corrupted */
#define	ELIBMAX		82	/* Attempting to link in too many shared libraries */
#define	ELIBEXEC	83	/* Cannot exec a shared library directly */
#define	EILSEQ		84	/* Illegal byte sequence */
#define	ERESTART	85	/* Interrupted system call should be restarted */
#define	ESTRPIPE	86	/* Streams pipe error */
#define	EUSERS		87	/* Too many users */
#define	ENOTSOCK	88	/* Socket operation on non-socket */
#define	EDESTADDRREQ	89	/* Destination address required */
#define	EMSGSIZE	90	/* Message too long */
#define	EPROTOTYPE	91	/* Protocol wrong type for socket */
#define	ENOPROTOOPT	92	/* Protocol not available */
#define	EPROTONOSUPPORT	93	/* Protocol not supported */
#define	ESOCKTNOSUPPORT	94	/* Socket type not supported */
#define	EOPNOTSUPP	95	/* Operation not supported on transport endpoint */
#define	EPFNOSUPPORT	96	/* Protocol family not supported */
#define	EAFNOSUPPORT	97	/* Address family not supported by protocol */
#define	EADDRINUSE	98	/* Address already in use */
#define	EADDRNOTAVAIL	99	/* Cannot assign requested address */
#define	ENETDOWN	100	/* Network is down */
#define	ENETUNREACH	101	/* Network is unreachable */
#define	ENETRESET	102	/* Network dropped connection because of reset */
#define	ECONNABORTED	103	/* Software caused connection abort */
#define	ECONNRESET	104	/* Connection reset by peer */
#define	ENOBUFS		105	/* No buffer space available */
#define	EISCONN		106	/* Transport endpoint is already connected */
#define	ENOTCONN	107	/* Transport endpoint is not connected */
#define	ESHUTDOWN	108	/* Cannot send after transport endpoint shutdown */
#define	ETOOMANYREFS	109	/* Too many references: cannot splice */
#define	ETIMEDOUT	110	/* Connection timed out */
#define	ECONNREFUSED	111	/* Connection refused */
#define	EHOSTDOWN	112	/* Host is down */
#define	EHOSTUNREACH	113	/* No route to host */
#define	EALREADY	114	/* Operation already in progress */
#define	EINPROGRESS	115	/* Operation now in progress */
#define	ESTALE		116	/* Stale NFS file handle */
#define	EUCLEAN		117	/* Structure needs cleaning */
#define	ENOTNAM		118	/* Not a XENIX named type file */
#define	ENAVAIL		119	/* No XENIX semaphores available */
#define	EISNAM		120	/* Is a named type file */
#define	EREMOTEIO	121	/* Remote I/O error */
#define	EDQUOT		122	/* Quota exceeded */

#define	ENOMEDIUM	123	/* No medium found */
#define	EMEDIUMTYPE	124	/* Wrong medium type */
#define	ECANCELED	125	/* Operation Canceled */
#define	ENOKEY		126	/* Required key not available */
#define	EKEYEXPIRED	127	/* Key has expired */
#define	EKEYREVOKED	128	/* Key has been revoked */
#define	EKEYREJECTED	129	/* Key was rejected by service */

/* for robust mutexes */
#define	EOWNERDEAD	130	/* Owner died */
#define	ENOTRECOVERABLE	131	/* State not recoverable */

#define ERFKILL		132	/* Operation not possible due to RF-kill */

#endif

就是没有找到139.

A: 看来,系统已经发生一些诡异的情况,错误码已经不正确了。为了确定139错误码确实不存在,我们在/usr/include目录下递归搜索139这个字符。

grep -R '139' *

结果比较长,这里不列出来来。依然没有能找到系统对应的139错误定义。

那么,我们来看看系统日志吧,到底哪里可能有问题。

Q: 使用如下命令得到了错误信息:

最后的地方确实看到hello应用程序运行错误的系统日志。应该是指针访问出错。原因是否是汇编代码大最后没有恰当地设置堆栈寄存器等寄存器的值呢?

A: 在这里,很有可能。为了更容易看出问题可能在哪里,写一个类似功能的c代码,得到它的汇编代码,和上面的汇编代码进行比较。

Q: 写了如下的hello_1.c代码如下:

#include <stdio.h>

int main()
{
    printf("Hello,world!\n");
    return 0;
}

查看它的汇编代码:

	.file	"hello_1.c"
	.section	.rodata
.LC0:
	.string	"Hello,world!"
	.text
.globl main
	.type	main, @function
main:
	pushl	%ebp
	movl	%esp, %ebp
	andl	$-16, %esp
	subl	$16, %esp
	movl	$.LC0, (%esp)
	call	puts
	movl	$0, %eax
	leave
	ret
	.size	main, .-main
	.ident	"GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
	.section	.note.GNU-stack,"",@progbits

果然,和hello.s代码确实有不一样。这里,开始执行时对ebp, esp进行了处理,最后使用了leave和ret命令。就是它们引起的吗?

A: 不过在实际中,不管是加入pushl  %ebp之类代码,还是加入leave, ret指令,最终执行依然是段错误。这个地方笔者一直没明白,如果有谁知道的,希望能不吝赐教。不过,可以调用exit系统调用实现结束应用程序,这样就不会出现段错误。如下:

.section .rodata
str:
.ascii "Hello,world.\n"

.section .text
.globl _main
_main:

movl  $4,    %eax    # the number of system call 
movl  $1,    %ebx    # file descriptor, 1 means stdout
movl  $str,  %ecx    # string address
movl  $13,   %edx    # string length
int   $0x80

movl  $1,    %eax
movl  $0,    %ebx
int   $0x80

运行结果:

Q: 进行0x80软中断进行系统调用,参数在哪里保存,就在上面写的寄存器里面吗?

A: 是的。linux下,功能号和返回值在eax中保存,参数一般在5个以下,就按照ebx, ecx, edx, esi, edi来传递,如果参数过多,就会使用堆栈。可以看到上面两次系统调用,均是在使用ebx, ecx, edx这些寄存器。

Q: 4号系统调用是什么?在哪里能知道?

A: 可以在/usr/include/asm/unistd_32.h或者/usr/include/asm/unistd_64.h中看到平台所有系统调用,下面为unistd_32.h文件中开始一部分:

#define __NR_restart_syscall      0
#define __NR_exit		  1
#define __NR_fork		  2
#define __NR_read		  3
#define __NR_write		  4
#define __NR_open		  5
#define __NR_close		  6
#define __NR_waitpid		  7
#define __NR_creat		  8
#define __NR_link		  9
#define __NR_unlink		 10
#define __NR_execve		 11
#define __NR_chdir		 12
#define __NR_time		 13
#define __NR_mknod		 14
#define __NR_chmod		 15
#define __NR_lchown		 16
#define __NR_break		 17

可以看到,1号系统调用为exit, 4号为write, 正是上面代码使用的。

Q: 汇编如何调用c库函数?

A: 使用call指令,不过调用前要传好参数。如下代码,调用c库printf函数:

.section .rodata
str:
.ascii "Hello,world.\n"

.section .text
.globl main
main:

pushl	$str
call	printf

pushl	$0
call	exit

保存为printf.s, 编译:

运行:

Q: 可以使用as, ld来汇编以及链接吗?

A: 可以的。不过需要注意,因为它使用c库,需要指定链接c库:  -lc;

Q:  乘法运算mul后面只跟着一个数,另一个数存哪里?

A: 另一个数存储在al, ax或者eax寄存器中,这取决于使用的是mulb, mulw还是mull指令。结果将按照高位到地位的顺序保存在dx和ax中。

同理,除法运算div后面也只跟一个除数,被除数保存在ax, dx:ax或者edx:eax中。除数的最大长度只能是被除数的一半。商和余数将根据被除数占用大小来确定:

如果被除数在ax中,商在al, 余数在ah; 如果被除数在eax中,商在ax, 余数在dx; 如果被除数在edx:eax中,商在eax, 余数在edx.

如下是测试代码:

#include <stdio.h>
#include <string.h>

#define PRINT_D(longValue)       printf(#longValue" is %ld\n", ((long)longValue));
#define PRINT_STR(str)              printf(#str" is %s\n", (str));


static void assemble_func()
{
    int result_high, result_low;
    short result, remainder;

   // mul
    __asm__("mov $10, %eax");
    __asm__("mov $10, %ebx");
    __asm__("mull %ebx");
    __asm__("mov %%edx, %0":"=r"(result_high));
    __asm__("mov %%eax, %0":"=r"(result_low));
    PRINT_D(result_high)
    PRINT_D(result_low)

    // div
    __asm__("mov $0,   %dx");
    __asm__("mov $100, %ax");   // the divident is dx:ax
    __asm__("mov $9,  %bx");
    __asm__("div %bx");         // the divisor is bx
    __asm__("movw %%ax, %0":"=r"(result));
    __asm__("movw %%dx, %0":"=r"(remainder));
    PRINT_D(result)
    PRINT_D(remainder)
}

int main()
{
    assemble_func();
    return 0;
}

输出结果:

result_high is 0
result_low is 100
result is 11
remainder is 1

Q:  对于数据比较指令cmp,它是如何配合jmp相关的指令?

A:  cmp指令将进行两个数据的差计算,如果得到的是0,jz成立; 如果不是0, jnz成立。如下例子:

#include <stdio.h>
#include <string.h>

#define PRINT_D(longValue)      printf(#longValue" is %ld\n", ((long)longValue));
#define PRINT_STR(str)          printf(#str" is %s\n", (str));
#define PRINT(str)              printf(#str"\n");


static void assemble_func()
{
    __asm__("mov $10, %eax");
    __asm__("cmp $10, %eax ");
    __asm__("jz  end");
    PRINT("below jz")
    __asm__("end:");
    PRINT("the end")

}

int main()
{
    assemble_func();
    return 0;
}

显然,jz会成立,输出如下:

"the end"

Q: 对于某些时候,加法可能导致溢出,如何判断出来?

A: CPU内部有一个寄存器,它内部会保存溢出标志位OF, 可以通过jo或者jno判断。

#include <stdio.h>
#include <string.h>

#define PRINT_D(longValue)      printf(#longValue" is %ld\n", ((long)longValue));
#define PRINT_STR(str)          printf(#str" is %s\n", (str));
#define PRINT(str)              printf(#str"\n");


static void assemble_func()
{
    __asm__("movw   $0x7FFF,  %ax");
    __asm__("movw   $0x7FFF,  %bx");
    __asm__("addw   %bx,      %ax");

    __asm__("jo     overflow_set");

    __asm__("movl   $1,       %eax");
    __asm__("movl   $0,       %ebx");
    __asm__("int    $0x80");

    __asm__("overflow_set:");
    PRINT("overflow flag is set...")
}

int main()
{
    assemble_func();
    return 0;
}

运行结果:

"overflow flag is set..."

Q: 对于溢出,到底应该判断?

A: 以加法举例,如果两个相同符号的数相加得到的结果符号相反,那么一定溢出了。

Q: OF和CF标志位有什么区别?

A: CF代表进位标志。进位不一定是溢出,比如有符号整形最小值加1,虽然进位,但是没溢出。因为计算机补码的理论允许进位,但是结果却正确。

#include <stdio.h>
#include <string.h>

#define PRINT_D(longValue)      printf(#longValue" is %ld\n", ((long)longValue));
#define PRINT_STR(str)          printf(#str" is %s\n", (str));
#define PRINT(str)              printf(#str"\n");


static void assemble_func()
{
    __asm__("movw   $0xFFFF,  %ax");
    __asm__("movw   $0x1,  %bx");
    __asm__("addw   %bx,      %ax");

    __asm__("je     carry_set");

    __asm__("movl   $1,       %eax");
    __asm__("movl   $0,       %ebx");
    __asm__("int    $0x80");

    __asm__("carry_set:");
    PRINT("carry flag is set...")
}

int main()
{
    assemble_func();
    return 0;
}

运行结果:

"carry flag is set..."

当然,我们可以用jo来测试上面的加法是否溢出。

#include <stdio.h>
#include <string.h>

#define PRINT_D(longValue)      printf(#longValue" is %ld\n", ((long)longValue));
#define PRINT_STR(str)          printf(#str" is %s\n", (str));
#define PRINT(str)              printf(#str"\n");


static void assemble_func()
{
    __asm__("movw   $0xFFFF,  %ax");
    __asm__("movw   $0x1,  %bx");
    __asm__("addw   %bx,      %ax");

    __asm__("jo     overflow_set");

    __asm__("movl   $1,       %eax");
    __asm__("movl   $0,       %ebx");
    __asm__("int    $0x80");

    __asm__("overflow_set:");
    PRINT("overflow flag is set...")
}

int main()
{
    assemble_func();
    return 0;
}

执行结果:

它什么也没输出,这就意味着OF没有被置位。

作者:陈曦

日期:2012-6-8 10:50:13

环境:[Ubuntu 11.04  Intel-based x64 gcc4.5.2  CodeBlocks10.05  AT&T汇编  Intel汇编]

转载请注明出处



版权声明:本文为cxsjabcabc原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。