也谈进程和线程

  • Post author:
  • Post category:其他


首先来看一下为什么会有进程的出现,为了管理程序,因为在一个只有程序调度的操作系统中,所有程序共享内存,会引起冲突,然后需要统一的内存管理机制,保证每个程序都是由独立的运行空间。

引用地址:http://www.ruanyifeng.com/blog/2013/04/processes_and_threads.html中的viho_he用户的评论内容—-ps:有时候评论反而更精彩

抛开各种技术细节,从应用程序角度讲:

  1. 在单核计算机里,有一个资源是无法被多个程序并行使用的:cpu。

    没有操作系统的情况下,一个程序一直独占着全都cpu。

    如果要有两个任务来共享同一个CPU,程序员就需要仔细地为程序安排好运行计划–某时刻cpu和由程序A来独享,下一时刻cpu由程序B来独享

    而这种安排计划后来成为OS的核心组件,被单独名命为“scheduler”,即“调度器”,它关心的只是怎样把单个cpu的运行拆分成一段一段的“运行片”,轮流分给不同的程序去使用,而在宏观上,因为分配切换的速度极快,就制造出多程序并行在一个cpu上的假象。
  2. 在单核计算机里,有一个资源可以被多个程序共用,然而会引出麻烦:内存。

    在一个只有调度器,没有内存管理组件的操作系统上,程序员需要手工为每个程序安排运行的空间 – 程序A使用物理地址0x00-0xff,程序B使用物理地址0x100-0x1ff,等等。

    然而这样做有个很大的问题:每个程序都要协调商量好怎样使用同一个内存上的不同空间,软件系统和硬件系统千差万别,使这种定制的方案没有可行性。

    为了解决这个麻烦,计算机系统引入了“虚拟地址”的概念,从三方面入手来做:

    2.1、硬件上,CPU增加了一个专门的模块叫MMU,负责转换虚拟地址和物理地址。

    2.2、操作系统上,操作系统增加了另一个核心组件:memory management,即内存管理模块,它管理物理内存、虚拟内存相关的一系列事务。

    2.3、应用程序上,发明了一个叫做【进程】的模型,(注意)每个进程都用【完全一样的】虚拟地址空间,然而经由操作系统和硬件MMU协作,映射到不同的物理地址空间上。不同的【进程】,都有各自独立的物理内存空间,不用一些特殊手段,是无法访问别的进程的物理内存的。
  3. 现在,不同的应用程序,可以不关心底层的物理内存分配,也不关心CPU的协调共享了。然而还有一个问题存在:有一些程序,想要共享CPU,【并且还要共享同样的物理内存】,这时候,一个叫【线程】的模型就出现了,它们被包裹在进程里面,在调度器的管理下共享CPu,拥有同样的虚拟地址空间,同时也共享同一个物理地址空间,然而,它们无法越过包裹自己的进程,去访问别一个进程的物理地址空间。
  4. 进程之间怎样共享同一个物理地址空间呢?不同的系统方法各异,符合posix规范的操作系统都提供了一个接口,叫mmap,可以把一个物理地址空间映射到不同的进程中,由不同的进程来共享。
  5. PS:在有的操作系统里,进程不是调度单位(即不能被调度器使用),线程是最基本的调度单位,调度器只调度线程,不调度进程,比如VxWorks

然后具体细节理解进程和线程可以看这一个:


Processes and Threads


英文原版,比喻比上面一个引用文章更贴近实际和利于理解。CSDN上也有翻译文章,但是翻译的并不好。应该是直接谷歌翻译的。

挑选里面比较有意思的段落翻译过来。



A process as a house
  • Let’s base our analogy for processes and threads using a regular, everyday object — a house.
  • A house is really a container, with certain attributes (such as the amount of floor space, the number of bedrooms, and so on).
  • If you look at it that way, the house really doesn’t actively do anything on its own — it’s a passive object. This is effectively what a process is. We’ll explore this shortly.


The occupants as threads
  • The people living in the house are the active objects — they’re the ones using the various rooms, watching TV, cooking, taking showers, and so on.


翻译:


进程是房子
  • 让我们用一个普通的、日常的对象——房子来类比进程和线程。
  • 房子实际上是一个容器,具有某些属性(如建筑面积、卧室数量等)。
  • 如果你这样看的话,房子实际上并没有主动地做任何事情——它是一个被动的物体。这就是一个有效的过程。我们稍后将对此进行探讨。


线程是居住的人
  • 住在房子里的人是活跃的对象——他们使用不同的房间,看电视,做饭,洗澡,等等。


Single threaded: 单线程
  • If you’ve ever lived on your own, then you know what this is like — you know that you can do anything you want in the house at any time, because there’s nobody else in the house. If you want to turn on the stereo, use the washroom, have dinner — whatever — you just go ahead and do it.
  • 翻译: 你一个人在家,你可以在家里的任何时候做任何事情,因为家里没有其他人。打开音响,上厕所,吃晚餐——无论什么你都可以去做。


Multi threaded:多线程
  • Things change dramatically when you add another person into the house. Let’s say you get married, so now you have a spouse living there too. You can’t just march into the washroom at any given point; you need to check first to make sure your spouse isn’t in there!
  • If you have two responsible adults living in a house, generally you can be reasonably lax about “security” — you know that the other adult will respect your space, won’t try to set the kitchen on fire (deliberately!), and so on.


  • 翻译:
  • 当你让另一个人住进你的房子,事情就会发生戏剧性的变化。假设你结婚了,现在你的配偶也住在那里。你不能随便进洗手间;你要先检查一下,确保媳妇不在里面!
  • 如果两个负责任的成年人住在一所房子里,通常可以在“安全”方面相当松懈——你知道另一个人会尊重你的空间,不会故意把厨房点着,等等。
  • ps: 如果是小孩你就完蛋了,他不知道啥叫边界。随时会制造问题。



Back to processes and threads:对进程和线程进一步说明
  • Just as a house occupies an area of real estate, a process occupies memory. And just as a house’s occupants are free to go into any room they want, a processes’ threads all have common access to that memory. If a thread allocates something (mom goes out and buys a game), all the other threads immediately have access to it (because it’s present in the common address space — it’s in the house). Likewise, if the process allocates memory, this new memory is available to all the threads as well. The trick here is to recognize whether the memory should be available to all the threads in the process. If it is, then you’ll need to have all the threads synchronize their access to it. If it isn’t, then we’ll assume that it’s specific to a particular thread. In that case, since only that thread has access to it, we can assume that no synchronization is required — the thread isn’t going to trip itself up!
  • As we know from everyday life, things aren’t quite that simple. Now that we’ve seen the basic characteristics (summary: everything is shared), let’s take a look at where things get a little more interesting, and why.


翻译:
  • 就像房子占据一个区域一样,进程也占据一块内存。像居住者可以自由进入任何他们自己房子的房间一样,进程中的线程也都可以访问该进程的内存。如果一个线程分配了(ps: alloc)某个东西,所有其他线程都可以立即访问它(妈妈出去买了一个游戏机放到房子里,所有家庭成员都可以玩他),(因为它存在于公共地址空间—它在房子里)。同样,如果进程分配内存,那么所有线程也可以使用这个新内存。这时候要注意识别内存是否应该对进程中的所有线程可用。如果是,就可以让所有线程同步对它的访问。如果不是,那它就给特定的线程。—–我们在这里先简单假设不需要同步线程也不会出问题。
  • 我们知道,事情并没有那么简单。但是,我们已经看到了进程和线程的基本的特征(总结:所有内容都是共享的),让我们来看看哪里变得更有趣,以及为什么。


The kernel’s role :内核角色

In our house, we had many threads running simultaneously. However, in a real live system, there’s typically only one CPU, so only one “thing” can run at once.

翻译: 在我们家里,同时有许多线程在运行。但是,在实际的系统中,通常只有一个CPU,因此一次只能运行一个线程。



The kernel as arbiter: 内核—-仲裁者
  • The kernel determines which thread should be using the CPU at a particular moment, and switches context to that thread.

翻译: 内核选择在某个特定时刻哪个线程应该使用CPU,并加载该线程的context信息。

  • When the kernel decides that another thread should run, it needs to:
  1. save the currently running thread’s registers and other context information
  2. load the new thread’s registers and context into the CPU

翻译: 当内核决定另一个线程应该运行,它需要

  1. 保存当前运行的线程的寄存器和其他上下文信息
  2. 将新线程的寄存器和上下文添加到CPU中

ps: 具体怎么做的可以参考原文



Why processes? 为啥多进程

Reliability, though, is perhaps the most important point. A process, just like a house, has some well-defined “borders.” A person in a house has a pretty good idea when they’re in the house, and when they’re not. A thread has a very good idea — if it’s accessing memory within the process, it can live. If it steps out of the bounds of the process’s address space, it gets killed. This means that two threads, running in different processes, are effectively isolated from each other.

最重要的一点可能是可靠性。进程就像房子一样,它定义线程的“边界”。一个住在房子里的人,当他们在房子里的时候,会很好,但当他们不在房子里的时候呢?对一个线程控制好的想法是——如果它正在访问进程中的内存,它就可以继续运行。如果它超出了进程地址空间的界限,就会被杀死。这意味着在不同进程中运行的两个线程实际上是相互隔离的。

可靠性的进程

  • The process address space is maintained and enforced by Neutrino’s process manager module. When a process is started, the process manager allocates some memory to it and starts a thread running. The memory is marked as being owned by that process.
  • 翻译: 进程地址空间由操作系统进程管理器模块维护和执行。当进程启动时,进程管理器分配一些内存给它,并启动进程的一个线程。此内存被标记为这一进程所拥有。
  • This means that if there are multiple threads in that process, and the kernel needs to context-switch between them, it’s a very efficient operation — we don’t have to change the address space, just which thread is running. If, however, we have to change to another thread in another process, then the process manager gets involved and causes an address space switch as well.
  • 翻译: 这意味着,如果进程中有多个线程,内核只要在它们之间切换上下文(context-switch),这是一个高性能的操作——不需要更改地址空间,只需更改正在运行的线程即可。但是,如果我们必须切换到另一个进程中的另一个线程,那么进程管理器也会参与进来,并导致地址空间切换。


Where a thread is a good idea: 什么时候使用线程?
  1. Threads are great where you can parallelize operations — a number of mathematical problems spring to mind (graphics, digital signal processing, etc.). 很多数据并行化数学计算是一个好主意
  2. Threads are also great where you want a program to perform several independent functions while sharing data, such as a web-server that’s serving multiple clients simultaneously. –如果您希望程序在共享数据的同时执行多个独立的功能,比如同时为多个客户端提供服务的web服务器,那么线程也非常有用。

通过下面例子对线程使用说明:

1, 数据独立并行化计算

// 1.单线程执行对X1的循环计算,这里面X1的计算是独立的
int main (int argc, char **argv)
{
    int x1;// perform initializations

    for (x1 = 0; x1 < num_x_lines; x1++) {
        do_one_line (x1);
    }// display results
}

// 2.多CPU下, 改进多线程方式
int main (int argc, char **argv)
{
    int x1;// perform initializations

    for (x1 = 0; x1 < num_x_lines; x1++) {
    	// 使用多线程
        pthread_create (NULL, NULL, do_one_line, (void *) x1);
    }// display results
}

//3.第二个一个致命问题是创建的线程和num_x_lines有关,每个线程分配一个栈空间,即使合理分配也要8k。
//那如果num_x_lines=1280,内存空间= 1280 × 8 KB =10M,如果这个系统是4CUP那么,
//你有1276个线程是空闲等待的,这显然是一个巨大的开销浪费,所以改进为如下

int num_lines_per_cpu;
int num_cpus;
//根据CPU数量合理的分配线程
int main (int argc, char **argv)
{
    int cpu;// perform initializations

    // get the number of CPUs
    num_cpus = _syspage_ptr -> num_cpu;
    num_lines_per_cpu = num_x_lines / num_cpus;
    for (cpu = 0; cpu < num_cpus; cpu++) {
        pthread_create (NULL, NULL, do_one_batch, (void *) cpu);
    }// display results
}

void * do_one_batch (void *c)
{
    int cpu = (int) c;
    int x1;
    for (x1 = 0; x1 < num_lines_per_cpu; x1++) {
        do_line_line (x1 + cpu * num_lines_per_cpu);
    }
}

//4.最后在3的基础上添加等待每个线程执行完成
int num_lines_per_cpu, num_cpus;

int main (int argc, char **argv)
{
    int cpu;
    pthread_t *thread_ids;// perform initializations
    thread_ids = malloc (sizeof (pthread_t) * num_cpus);

    num_lines_per_cpu = num_x_lines / num_cpus;
    for (cpu = 0; cpu < num_cpus; cpu++) {
        pthread_create (&thread_ids [cpu], NULL,
                        do_one_batch, (void *) cpu);
    }

    // synchronize to termination of all threads
    for (cpu = 0; cpu < num_cpus; cpu++) {
        pthread_join (thread_ids [cpu], NULL);// 这个是阻塞的,会先等待0线程执行完成,
        //如果顺序不是0最先执行完成也并不影响,比如2先完成那到检查2线程的
        //时候,由于已经执行成功就会直接返回,进行下一个循环判断
        // 注意。join是等待子线程结束,
    }// display results
}

//5.还有一种等待是等待一个信号让所有线程一起搞事情Using a barrier 具体就不写了。。。

//回到我们房子的类比,假设这家人想去某个地方旅行。司机上了面包车,发动了引擎。
//然后开始等待家庭成员。司机等到所有的家庭成员都上了车,面包车才离开去旅行——我们不能丢下任何人!

最后翻译下线程池;



Pools of threads 线程池

You’ll often notice in your programs that you want to be able to run a certain number of threads, but you also want to be able to control the behavior of those threads within certain limits. For example, in a server you may decide that initially just one thread should be blocked, waiting for a message from a client.When that thread gets a message and is off servicing a request, you may decide that it would be a good idea to create another thread, so that it could be blocked waiting in case another request arrived.

This second thread would then be available to handle that request. And so on. After a while, when the requests had been serviced, you would now have a large number of threads sitting around, waiting for further requests. In order to conserve resources, you may decide to kill off some of those “extra” threads.

经常会有,在程序中,希望能够运行一定数量的线程,但是也希望在一定的范围内控制这些线程的行为。例如:在服务器中,最初用一个线程,等待来自客户机的消息。当该线程收到消息并处理服务请求时,你决定创建另一个线程,以便在另一个请求到达处理它。然后,第二个线程将可用来处理新请求,然后你继续增加,线程越来越多。为了节省资源,就需要关闭一些“额外的”线程。

  • 这时候你就需要线程池,因为阻塞的线程不消耗CPU,执行的才消耗。这样把一定的线程存起来控制,就是线程池的作用。



版权声明:本文为u010700066原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。