线程间共享数据的问题
当谈论多线程数据共享问题时,皆因为修改共享数据。如果多线程之间是通过只读方式读取数据则不同线程之间则不会有影响。*( If all shared data is read-only, there’s no problem,because the data read by one thread is unaffected by whether or not another thread is reading the same data)
如果有一个线程或者多个线程会修改数据则要小心此时则有很多潜在的可能引发问题。
一种观念解决多线程并发数据共享问题 invariants(不变量),对于特定的数据结构总是正确的。在进行更新或者修改时不变量会被破坏,尤其在数据结构比较复杂时或者修改涉及多个值。
下文举例说明,比如双向链表,其中包含前向节点后向节点指针,稳定状态则为前驱、后驱指针若不变则处理稳定状态,若执行更新或者修改时则稳定状态被破坏直至修改操作完成重新回稳定状态。
The steps in deleting an entry from such a list are shown in figure 3.1:
1 Identify the node to delete (N).
2 Update the link from the node prior to N to point to the node after N.
3 Update the link from the node after N to point to the node prior to N.
4 Delete node N.
最常见的并发代码中的问题:a race condition
race condition 竞态条件
举例说明:假设你在电影院买票,在一个超大的电影院有很多收银机卖票,多人同时买票,如果你要买的电影场次、座位相同,每个人购买时独立的,此时票少人多是否能买到票则变成竞态条件。
在并发世界,竞态条件是结果依赖于多个线程执行操作的结果,每个线程在各自执行独立的操作。当谈论并发编程时,竞态条件通常指的是恶性的竞争,良性竞争不会引发问题bugs。
data races cause the dreaded undefined behavior.数据竞争则会引发令人生畏的未定义行为。
并发编程引发的问题表现以及原因,英文原版阐述,建议多次翻看,有助于理解并发引发的诡异问题:
Problematic race conditions typically occur where completing an operation requires modification of two or more distinct pieces of data, such as the two link pointers in the example. Because the operation must access two separate pieces of data, these must be modified in separate instructions, and another thread could potentially access the data structure when only one of them has been completed. Race conditions can often be hard to find and hard to duplicate because the window of opportunity is small. If the modifications are done as consecutive CPU instructions, the chance of the problem exhibiting on any one run-through is very small, even if the data structure is being accessed by another thread concurrently. As the load on the system increases, and the number of times the operation is performed increases, the chance of the problematic execution sequence occurring also increases. It’s almost inevitable that such problems will show up at the most inconvenient time. Because race conditions are generally timing sensitive, they can often disappear entirely when the application is run under the debugger, because the debugger affects the timing of the program, even if only slightly.
Protecting shared data with mutexes
如果在执行修改时是独占的,一个线程在执行修改其他线程都要等待,则看不到被破坏的不稳定状态,这是不是很安全不会存在竞态条件。
a mutex (mutual exclusion). 互斥锁实现独占。在访问共享数据时锁定互斥变量,修改完毕则释放互斥变量。The Thread Library 并发库会保证线程持有互斥变量时其他线程尝试获取互斥锁要等待直到线程释放互斥量。稳定变量破坏状态则不会被其他线程看到。
互斥量是保护共享数据最常用的工具但它不是银钥匙,它的缺点是死锁。
Using mutexes in C++
std::mutex使用记得lock() unlock()但不建议直接使用,直接使用需要考虑异常发生时也要释放互斥量,C++标准库以提供更为方便的类模板std::lock_guard,他符合RAII设计原则。它构造时锁定互斥量、析构时释放互斥量,它可以保证正确的释放互斥量。
//Both of these are declared in the <mutex> header
//Protecting a list with a mutex
#include <list>
#include <mutex>
#include <algorithm>
std::list<int> some_list;
std::mutex some_mutex;
void add_to_list(int new_value)
{
std::lock_guard<std::mutex> guard(some_mutex);
some_list.push_back(new_value);
}
bool list_contains(int value_to_find)
{
std::lock_guard<std::mutex> guard(some_mutex);
return std::find(some_list.begin(),some_list.end(),value_to_find)
!= some_list.end();
}
合理的设计应该是将互斥量以及数据封装在class类中而非使用全局变量,这是面向对象编程的设计准则,把它们放在类中可以清晰的使它们相关、封装函数、执行保护。
聪明伶俐的你是不是已经发现漏洞了,如果保护的函数返回的是指针或者引用,则共享数据的保护则不生效了。因为你留个一个大洞。
Any code that has access to that pointer or reference can now access (and potentially modify) the protected data without locking the mutex.
使用互斥量保护共享数据需要精心的接口设计,确保数据在被访问时互斥量被占用且无其他后门。
//Accidentally passing out a reference to protected data
class some_data
{
int a;
std::string b;
public:
void do_something();
};
class data_wrapper
{
private:
some_data data;
std::mutex m;
public:
template<typename Function>
void process_data(Function func)
{
std::lock_guard<std::mutex> l(m);
func(data); //Pass “protected” data to user-supplied function
}
};
some_data* unprotected;
void malicious_function(some_data& protected_data)
{
unprotected=&protected_data;
}
data_wrapper x;
void foo()
{
x.process_data(malicious_function); //Pass in a malicious function
unprotected->do_something(); //Unprotected access to protected data
}
process_data函数看似已经加锁保护了,但malicious_function内部的调用do_something又脱离的互斥量的保护。代码中的问题并未彻底将要保护的变量实现互斥访问。it missed the code in foo() that calls unprotected->do_something().不幸的是C++标准库并未给我们处理,这需要程序设计者正确的使用互斥变量进行数据保护。但是有个准则可作为指导意见:
Don’t pass pointers and references to protected data outside the scope of the lock, whether by returning them from a function, storing them in externally visible memory, or passing them as arguments to user-supplied functions.
这是使用互斥量常见的错误,但陷阱不止这一个。
接口固有的竞争
// 接口存在
stack<int> s;
if(!s.empty())
{
int const value=s.top();
s.pop();
do_something(value);
}
上述代码在单线程代码是安全的,但多线程代码是不安全的,在空stack调用top时会有未定义的行为。多线程编程时对于共享stack对象,判断非空进行top时可能此时另外一个线程正在移除元素导致top时会发生异常,这是接口存在的竞态条件。
上述代码执行情况,两个线程在共享一个栈,但此时取到的是同一个值而非预想的不同的值,这要比是否非空异常取值崩溃更为严重,因为它不会使得程序有何异常表现,但是执行结果并不符合预期,严重的程序bugs。