: Programming with POSIX Threads

3.2.5.1 Lock hierarchy

3.2.5.1 Lock hierarchy

If you can apply two separate mutexes to completely independent data, do it. You'll almost always win in the end by reducing the time when a thread has to wait for another thread to finish with data that this thread doesn't even need. And if the data is independent you're unlikely to run into many cases where a given function will need to lock both mutexes.

The complications arise when data isn't completely independent. If you have some program invarianteven one that's rarely changed or referencedthat affects data protected by two mutexes, sooner or later you'll need to write code that must lock both mutexes at the same time to ensure the integrity of that invariant. If one thread locks mutex_a and then locks mutex_b, while another thread locks mutex_b and then mutex_a, you've coded a classic deadlock, as shown in Table 3.1.

First thread Second thread
pthread_mutex_lock (&mutex_a); pthread_mutex_lock (&mutex_b);
pthread_mutex_lock (&mutex_b); pthread_mutex_lock (&mutex_a);

TABLE 3.1 Mutex deadlock

Both of the threads shown in Table 3.1 may complete the first step about the same time. Even on a uniprocessor, a thread might complete the first step and then be timesliced (preempted by the system), allowing the second thread to complete its first step. Once this has happened, neither of them can ever complete the second step because each thread needs a mutex that is already locked by the other thread.

Consider these two common solutions to this type of deadlock:

Fixed locking hierarchy: All code that needs both mutex_a and mutex_b must always lock mutex_a first and then mutex_b.

Try and back off: After locking the first mutex of some set (which can be allowed to block), use pthread_mutex_trylock to lock additional mutexes in the set. If an attempt fails, release all mutexes in the set and start again.

There are any number of ways to define a fixed locking hierarchy. Sometimes there's an obvious hierarchical order to the mutexes anyway, for example, if one mutex controls a queue header and one controls an element on the queue, you'll probably have to have the queue header locked by the time you need to lock the queue element anyway.

When there's no obvious logical hierarchy, you can create an arbitrary hierarchy; for example, you could create a generic "lock a set of mutexes" function that sorts a list of mutexes in order of their identifier address and locks them in that order. Or you could assign them names and lock them in alphabetical order, or integer sequence numbers and lock them in numerical order.

To some extent, the order doesn't really matter as long as it is always the same. On the other hand, you will rarely need to lock "a set of mutexes" at one time. Function A will need to lock mutex 1, and then call function B, which needs to also lock mutex 2. If the code was designed with a functional locking hierarchy, you will usually find that mutex 1 and mutex 2 are being locked in the proper order, that is, mutex 1 is locked first and then mutex 2. If the code was designed with an arbitrary locking order, especially an order not directly controlled by the code, such as sorting pointers to mutexes initialized in heap structures, you may find that mutex 2 should have been locked before mutex 1.

If the code invariants permit you to unlock mutex 1 safely at this point, you would do better to avoid owning both mutexes at the same time. That is, unlock mutex 1, and then lock mutex 2. If there is a broken invariant that requires mutex 1 to be owned, then mutex 1 cannot be released until the invariant is restored. If this situation is possible, you should consider using a backoff (or "try and back off") algorithm.

"Backoff means that you lock the first mutex normally, but any additional mutexes in the set that are required by the thread are locked conditionally by

calling pthread_mutex_trylock. If pthread_mutex_trylock returns EBUSY, indicating that the mutex is already locked, you must unlock all of the mutexes in the set and start over.

The backoff solution is less efficient than a fixed hierarchy. You may waste a lot of time trying and backing off. On the other hand, you don't need to define and follow strict locking hierarchy conventions, which makes backoff more flexible. You can use the two techniques in combination to minimize the cost of backing off. Follow some fixed hierarchy for well-defined areas of code, but apply a backoff algorithm where a function needs to be more flexible.

The program below, backoff.c, demonstrates how to avoid mutex deadlocks by applying a backoff algorithm. The program creates two threads, one running function lock_forward and the other running function lock_backward. The two threads loop ITERATIONS times, each iteration attempting to lock all of three mutexes in sequence. The lock_forward thread locks mutex 0, then mutex 1, then mutex 2, while lock_backward locks the three mutexes in the opposite order. Without special precautions, this design will always deadlock quickly (except on a uniprocessor system with a sufficiently long timeslice that either thread can complete before the other has a chance to run).

15You can see the deadlock by running the program as backoff 0. The first argument is used to set the backoff variable. If backoff is 0, the two threads will use pthread_mutex_lock to lock each mutex. Because the two threads are starting from opposite ends, they will crash in the middle, and the program will hang. When backoff is nonzero (which it is unless you specify an argument), the threads use pthread_mutex_trylock, which enables the backoff algorithm. When the mutex lock fails with EBUSY, the thread will release all mutexes it currently owns, and start over.

16It is possible that, on some systems, you may not see any mutex collisions, because one thread is always able to lock all mutexes before the other thread has a chance to lock any. You can resolve that problem by setting the yield_flag variable, which you do by running the program with a second argument, for example, backoff 1 1. When yield_flag is 0, which it is unless you specify a second argument, each thread's mutex locking loop may run uninterrupted, preventing a deadlock (at least, on a uniprocessor). When yield_flag has a value greater than 0, however, the threads will call sched_yield after locking each mutex, ensuring that the other thread has a chance to run. And if you set yield_ flag to a value less than 0, the threads will sleep for one second after locking each mutex, to be really sure the other thread has a chance to run.

70-75 After locking all of the three mutexes, each thread reports success, and tells how many times it had to back off before succeeding. On a multiprocessor, or when you've set yield_flag to a nonzero value, you'll usually see a lot more nonzero backoff counts. The thread unlocks all three mutexes, in the reverse order of locking, which helps to avoid unnecessary backoffs in other threads. Calling sched_yield at the end of each iteration "mixes things up" a little so one thread doesn't always start each iteration first. The sched_yield function is described in Section 5.5.2.

? backoff.c

1#include <pthread.h>
2#include "errors.h" 3
4#define ITERATIONS 10 5
6/*
7* Initialize a static array of 3 mutexes.
8*/
9pthread_mutex_t mutex[3] = {
10PTHREAD_MUTEX_INITIALIZER,
11PTHREAD_MUTEX_INITIALIZER,
12PTHREAD_MUTEX_INITIALIZER
13}; 14
15int backoff = 1;/* Whether to backoff or deadlock */
16int yield_flag = 0;/* 0: no yield, >0: yield, <0: sleep */ 17
18/*
19* This is a thread start routine that locks all mutexes in
20* order, to ensure a conflict with lock reverse, which does the
21* opposite.
22*/
23void *lock_forward (void *arg)
24{
25int i, iterate, backoffs;
26int status;

27
28for (iterate = 0; iterate < ITERATIONS; iterate++) {
29backoffs = 0;
30for (i = 0; i < 3; i++) {
31if (i == 0) {
32status = pthread_mutex_lock (&mutex[i]);
33if (status != 0)
34err_abort (status, "First lock");
35} else {
36if (backoff)
37status = pthread_mutex_trylock (&mutex[i]);
38else
39status = pthread_mutex_lock (&mutex[i]);
40if (status == EBUSY) {
41backoffs++;
42DPRINTF ((
43" [forward locker backing off at %d]n",
44i));
45for (; i >= 0; i--) {
46status = pthread_mutex_unlock (&mutex[i]);
47if (status != 0)
48err_abort (status, "Backoff");
49}
50} else {
51if (status != 0)
52err_abort (status, "Lock mutex");
53DPRINTF ((" forward locker got %dn", i));
54}
55}
56/*
57* Yield processor, if needed to be sure locks get
58* interleaved on a uniprocessor.
59*/
60if (yield_flag) {
61if (yield_flag > 0)
62sched_yield ();
63else
64sleep (1);
65}
66}
67/*
68* Report that we got 'em, and unlock to try again.
69*/
70printf (
71"lock forward got all locks, %d backoffsn", backoffs);
72pthread_mutex_unlock (&mutex[2]);
73pthread_mutex_unlock (&mutex[1]);
74pthread_mutex_unlock (&mutex[0]);
75sched_yield ();
76}
77return NULL;
78}

79
80/*
81* This is a thread start routine that locks all mutexes in
82* reverse order, to ensure a conflict with lock_forward, which
83* does the opposite.
84*/
85void *lock_backward (void *arg)
86{
87int i, iterate, backoffs;
88int status;

89
90for (iterate = 0; iterate < ITERATIONS; iterate++) {
91backoffs = 0;
92for (i = 2; i >= 0; i--) {
93if (i == 2) {
94status = pthread_mutex_lock (&mutex[i]);
95if (status != 0)
96err_abort (status, "First lock");
97} else {
98if (backoff)
99status = pthread_mutex_trylock (&mutex[i]);
100else
101status = pthread_mutex_lock (&mutex[i]);
102if (status == EBUSY) {
103backoffs++;
104DPRINTF ((
105" [backward locker backing off at %d]n",
106i));
107for (; i < 3; i++) {
108status = pthread_mutex_unlock (&mutex[i]);
109if (status != 0)
110err_abort (status, "Backoff");
111}
112} else {
113if (status != 0)
114err_abort (status, "Lock mutex");
115DPRINTF ((" backward locker got %dn", i));
116}
117}
118/*
119* Yield processor, if needed to be sure locks get
120* interleaved on a uniprocessor.
121*/
122if (yield_flag) {
123if (yield_flag > 0)
124sched_yield ();
125else
126sleep (1);
127}
128}
129/*
130* Report that we got 'em, and unlock to try again.
131*/
132printf (
133"lock backward got all locks, %d backoffsn", backoffs);
134pthread_mutex_unlock (&mutex[0]);
135pthread_mutex_unlock (&mutex[l]);
136pthread_mutex_unlock (&mutex[2]);
137sched_yield ();
138}
139return NULL;
140 } 141
142int main (int argc, char *argv[])
143{
144pthread_t forward, backward;
145int status;

146
147#ifdef sun
148/*
149* On Solaris 2.5, threads are not timesliced. To ensure
150* that our threads can run concurrently, we need to
151* increase the concurrency level.
152*/
153DPRINTF (("Setting concurrency level to 2n"));
154thr_setconcurrency (2);
155#endif

156
157/*
158* If the first argument is absent, or nonzero, a backoff
159* algorithm will be used to avoid deadlock. If the first
160* argument is zero, the program will deadlock on a lock
161* "collision."
162*/
163if (argc > 1)
164backoff = atoi (argv[l]);

165
166/*
167* If the second argument is absent, or zero, the two threads
168* run "at speed." On some systems, especially uniprocessors,
169* one thread may complete before the other has a chance to run,
170* and you won't see a deadlock or backoffs. In that case, try
171* running with the argument set to a positive number to cause
172* the threads to call sched_yield() at each lock; or, to make
173* it even more obvious, set to a negative number to cause the
174* threads to call sleep(l) instead.
175*/
176if (argc > 2)
177yield_flag = atoi (argv[2]);
178status = pthread_create (
179&forward, NULL, lock_forward, NULL);
180if (status != 0)
181err_abort (status, "Create forward");
182status = pthread_create (
183&backward, NULL, lock_backward, NULL);
184if (status != 0)
185err_abort (status, "Create backward");
186pthread_exit (NULL);
187}

Whatever type of hierarchy you choose, document it, carefully, completely, and often. Document it in each function that uses any of the mutexes. Document it where the mutexes are defined. Document it where they are declared in a project

header file. Document it in the project design notes. Write it on your whiteboard. And then tie a string around your finger to be sure that you do not forget.

You are free to unlock the mutexes in whatever order makes the most sense. Unlocking mutexes cannot result in deadlock. In the next section, I will talk about a sort of "overlapping hierarchy" of mutexes, called a "lock chain," where the normal mode of operation is to lock one mutex, lock the next, unlock the first, and so on. If you use a "try and back off algorithm, however, you should always try to release the mutexes in reverse order. That is, if you lock mutex 1, mutex 2, and then mutex 3, you should unlock mutex 3, then mutex 2, and finally mutex 1. If you unlock mutex 1 and mutex 2 while mutex 3 is still locked, another thread may have to lock both mutex 1 and mutex 2 before finding it cannot lock the entire hierarchy, at which point it will have to unlock mutex 2 and mutex 1, and then retry. Unlocking in reverse order reduces the chance that another thread will need to back off.


: 0.462. /Cache: 3 / 1