1 What exactly is “iowait”?
'iowait' is the percentage of time the CPU is idle AND there is at least one I/O in progress.
If the CPU is idle, the kernel then determines if there is at least one I/O currently in progress to either a local disk or a remotely mounted disk (NFS) which had been initiated from that CPU. If there is, then the 'iowait' counter is incremented by one. If there is no I/O in progress that was initiated from that CPU, the 'idle' counter is incremented by one.
%iowait = (time cpu is idle but have I/O in process) / (all cpu time)
2、iowait 有时候是有意义的,有时候没有意义。
2.1 优化磁盘I/O,更换更快的硬盘 — 降低 iowait
Example 1: Let's say that a program needs to perform transactions on behalf of a batch job. For each transaction, the program will perform some computations which takes 10 milliseconds and then does a synchronous write of the results to disk. Since the file it is writing to was opened synchronously, the write does not return until the I/O has made it all the way to the disk. Let's say the disk subsystem does not have a cache and that each physical write I/O takes 20ms. This means that the program completes a transaction every 30ms. Over a period of 1 second (1000ms), the program can do 33 transactions (33 tps). If this program is the only one running on a 1-CPU system, then the CPU usage would be busy 1/3 of the time and waiting on I/O the rest of the time - so 66% iowait and 34% CPU busy. If the I/O subsystem was improved (let's say a disk cache is added) such that a write I/O takes only 1ms. This means that it takes 11ms to complete a transaction, and the program can now do around 90-91 transactions a second. Here the iowait time would be around 8%. Notice that a lower iowait time directly affects the throughput of the program.
2.2 iowait 高 不代表有问题
Example 2: Let's say that there is one program running on the system - let's assume that this is the 'dd' program, and it is reading from the disk 4KB at a time. Let's say that the subroutine in 'dd' is called main() and it invokes read() to do a read. Both main() and read() are user space subroutines. read() is a libc.a subroutine which will then invoke the kread() system call at which point it enters kernel space. kread() will then initiate a physical I/O to the device and the 'dd' program is then put to sleep until the physical I/O completes. The time to execute the code in main, read, and kread is very small - probably around 50 microseconds at most. The time it takes for the disk to complete the I/O request will probably be around 2-20 milliseconds depending on how far the disk arm had to seek. This means that when the clock interrupt occurs, the chances are that the 'dd' program is asleep and that the I/O is in progress. Therefore, the 'iowait' counter is incremented. If the I/O completes in 2 milliseconds, then the 'dd' program runs again to do another read. But since 50 microseconds is so small compared to 2ms (2000 microseconds), the chances are that when the clock interrupt occurs, the CPU will again be idle with a I/O in progress. So again, 'iowait' is incremented. If 'sar -P <cpunumber>' is run to show the CPU utilization for this CPU, it will most likely show 97-98% iowait. If each I/O takes 20ms, then the iowait would be 99-100%. Even though the I/O wait is extremely high in either case, the throughput is 10 times better in one case.
2.3 iowait 为0 不代表没有问题
Example 3: Let's say that there are two programs running on a CPU. One is a 'dd' program reading from the disk. The other is a program that does no I/O but is spending 100% of its time doing computational work. Now assume that there is a problem with the I/O subsystem and that physical I/Os are taking over a second to complete. Whenever the 'dd' program is asleep while waiting for its I/Os to complete, the other program is able to run on that CPU. When the clock interrupt occurs, there will always be a program running in either user mode or system mode. Therefore, the %idle and %iowait values will be 0. Even though iowait is 0 now, that does not mean there is NOT a I/O problem because there obviously is one if physical I/Os are taking over a second to complete.
单CPU,运行两个程序:
1. dd :reading from the disk
2. 纯CPU消耗型,利用率100%,没有I/O操作
假设物理I/O完成需要 1 秒时间完成。当 dd 命令休眠的时候,CPU可以运行另外一个程序,
此时 %idle =0 %iowait=0 但是 这并不能说明没有磁盘的瓶颈问题,因为物理I/O完成需要 1 秒时
间, 这本身就是一个问题。
2.4 相同的工作负载,换个系统之后,iowait 可能会翻倍
Example 4: Let's say that there is a 4-CPU system where there are 6 programs running. Let's assume that four of the programs spend 70% of their time waiting on physical read I/Os and the 30% actually using CPU time. Since these four programs do have to enter kernel space to execute the kread system calls, it will spend a percentage of its time in the kernel; let's assume that 25% of the time is in user mode, and 5% of the time in kernel mode. Let's also assume that the other two programs spend 100% of their time in user code doing computations and no I/O so that two CPUs will always be 100% busy. Since the other four programs are busy only 30% of the time, they can share that are not busy. If we run 'sar -P ALL 1 10' to run 'sar' at 1-second intervals for 10 intervals, then we'd expect to see this for each interval: cpu %usr %sys %wio %idle 0 50 10 40 0 1 50 10 40 0 2 100 0 0 0 3 100 0 0 0 - 75 5 20 0 Notice that the average CPU utilization will be 75% user, 5% sys, and 20% iowait. The values one sees with 'vmstat' or 'iostat' or most tools are the average across all CPUs. Now let's say we take this exact same workload (same 6 programs with same behavior) to another machine that has 6 CPUs (same CPU speeds and same I/O subsytem). Now each program can be running on its own CPU. Therefore, the CPU usage breakdown would be as follows: cpu %usr %sys %wio %idle 0 25 5 70 0 1 25 5 70 0 2 25 5 70 0 3 25 5 70 0 4 100 0 0 0 5 100 0 0 0 - 50 3 47 0 So now the average CPU utilization will be 50% user, 3% sy, and 47% iowait. Notice that the same workload on another machine has more than double the iowait value.
版权声明:本文为Adrian503原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。