Tuesday, December 24, 2013

ThreadRAII + Thread Suspension = Trouble?

Topic 1: An RAII class for std::thread objects

In my GoingNative 2013 talk, I explained how destruction of joinable std::thread objects leads to program termination, and I introduced an RAII class, ThreadRAII, to make sure that joinable std::threads aren't destroyed. The details of ThreadRAII aren't important for this post, so here's a stripped-down version that always does a join before permitting a joinable std::thread to be destroyed:
class ThreadRAII {
public:
  ThreadRAII(std::thread&& thread): t(std::move(thread)) {}
  ~ThreadRAII() { if (t.joinable()) (t.join(); }

private:
  std::thread t;
};
Using an RAII class to make sure that a std::thread object is brought into an unjoinable state on every path out of a block seems like an obviously-reasonable thing to do.

Topic 2: Using void promises/futures to "start threads suspended"

This Q&A at StackOverflow explains why it might be useful to start threads in a suspended state. There's no direct support for that in the C++11 threading API, but one way to implement it is to create a std::thread running a lambda that waits on a std::future<void> before starting its real work. For example, if the "real work" is funcToRun, we could do this:
std::promise<void> p;

std::thread t([&p]{ p.get_future().wait();    // start t and  suspend it
                    funcToRun(); }            // (conceptually)
              );

...                      // t is "suspended" waiting  for p to be set

p.set_value();           // t may now continue
This isn't the only way to suspend thread execution after creation of the thread and before execution of the work it's supposed to do, but it seems reasonable. In contrast to having the lambda spin on an atomic bool waiting for the flag to be set, for example, there is no need for the lambda to poll.

Putting the two together

The use of a std::thread object in that last code example leads to the possibility of there being some flow of control that could cause that object to be destroyed in a joinable state, and that would lead to program termination. So it seems natural to use RAIIThread:
std::promise<void> p;                         // as before

std::thread t([&p]{ p.get_future().wait();    // as before
                    funcToRun(); }
             );             

ThreadRAII tr(std::move(t));                  // USE RAII CLASS HERE

...                                           // as before

p.set_value();                                // as before
The problem is that if an exception is thrown in the "...", we'll never set the value of the std::promise, and that means that the destructor of the RAII object (tr) will block forever. That's no better than program termination, and it might be worse.

My question

I'm trying to figure out what the fundamental problem is here. Is there something wrong with using an RAII class to keep joinable threads from being destroyed (hence causing program termination)? Is there something wrong with using void promises/futures to emulate starting thread execution in a suspended state?

If the two techniques are independently valid, am I combining them incorrectly, or are they somehow fundamentally incompatible?

I'd like to write about this in Effective C++11/14, but I want to offer my readers good advice. What should that advice be?

Thanks,

Scott


34 comments:

Rein Halbersma said...

What if you made a `ResumableThreadRAII` that contains both a thread and a reference to the promise? The constructor simply takes a promise and a lambda (e.g. `funcToRun()`) and waits on a future. If an exception happens in the ellipsis part, then the thread destructor can set the future value and join the thread after running its assigned lambda.

Norbert Wenzel said...

When futures do not signal when the corresponding promise is deleted (without being set before), shouldn't futures/promises also get some RAII wrapper in general anyway?
If I remember correctly futures may return exceptions in case of an error, but I don't see how the future<void> situation is any different from computing a non-void result and getting an exception during that computation.
In both cases a future may be waiting forever, or am I missing something important here?

Anonymous said...

I think the problem is in std::promise<void> p; you're using it in a RAII context. However you can think of the promise as a resource itself so that the manual set_value() goes against the grain of RAII. A solution to this problem would be to wrap the promise with a RAII object that calls set_value() in its destructor. Thus if an exception is thrown in the ... region the promise will be signaled and the thread can continue running so that the program doesn't deadlock.

The problem with this is that the order of the destructors isn't what we want since the promise was constructed first it will be destructed last :(

Anonymous said...

@Norbert Wenzel: Promises do signal that they're destructed. Their futures emit a broken_promise exception.

So when the promise destructor is called before the thread destructor then all is fine. Only the other way around is a problem, because we can't detect that the promise won't be fulfilled when the thread is about to be destroyed.

Alexander Motzkau said...

@Norbert Wenzel: Promises do signal that they're destructed. Then their futures emit a broken_promise exception. (Then the thread could continue or abort.)

So all is fine as long as the promise is destructed before the thread. Only the other way around is a problem (like in the example above), like lanzkron said. So RAII wrappers for the promise won't help either. The only solution that I see is to remember the promise in the thread, and set an exception in the thread's destructure to it if it hasn't got a value at that time.

In my code I usually use the boost implementation that has an interrupt() function that also interrupts futures. So in the destructor I call interrupt() before join() so that the thread won't wait any further.

Argenet said...

Perhaps my point will look moot but I am not quite sure the proposed ThreadRAII concept is RAII indeed.
Initially "Resource Acquisition Is Initialization" applied to C++ semantics stands for taking ownership of some resource in constructor and releasing this ownership in destructor.
In that sense thread.join() doesn't seem to match the "release" part.
To release the resource (thread in our case) thread.detach() looks to be a better choice as it indeed releases the ownership (thread continues its execution independently) and doesn't block.

In order to resolve the case with promise and set_value case described here it may be sensible to look at the scope_guard idiom to make set_value trigger regardless of the possible exceptions thrown halfway.

Anonymous said...

I think that when the promise is destroyed, it needs to "do something" to signal to waiting futures. I think that is a requirement in the general case, not just in this specific use.

Scott Meyers said...

As a number of people have observed, destructor order is a problem for this kind of situation. I don't think that even ScopeGuard would help here, because a ScopeGuard created for the std::promise would typically be created immediately after the std::promise was constructed, and that means that it would be destroyed after the ThreadRAII object, but the ThreadRAII object's destructor is where we end up waiting for the std::promise to get set.

Scott

Argenet said...

In my vision the ScopeGuard here should solely protect the region marked with "..." in the example above and in that case it will not get created immediately after std::promise's creation.

Something like this.
(originally):

std::promise p;
std::thread t([&p]{ p.get_future().wait();
funcToRun(); }
);
ThreadRAII tr(std::move(t));
... // something that may throw
p.set_value();

(after wrapping the "..." region into a protected sub-scope):

std::promise p;
std::thread t([&p]{ p.get_future().wait(); // as before
funcToRun(); }
);
ThreadRAII tr(std::move(t));
{
SET_SCOPE_GUARD(p, &std::promise::set_value)
... // something that may throw
}

This matches the initial purpose of triggering promise regardless of any exceptions' presence and still guarantees it will get triggered prior to ThreadRAII destruction.

I am not sure

Jamie said...

IMO, std::promise is an implementation detail of the specific construct you are creating and as such should be hidden away from the client.

Jamie said...
This comment has been removed by a blog administrator.
Unknown said...

I think Rein is on the right track here. Since the behavior of the promise and the thread are so tightly coupled and we want to enforce invariants it might make sense to manage them using a class that automatically guarantees proper behavior. For example (disclaimer that I haven't compiled this):

class delayed_thread {
using promise_t = std::promise;
using thread_t = std::thread;

enum class execution_state {
WAITING, TRIGGERED, DISMISSED
};

promise_t m_promise;
thread_t m_thread;

execution_state m_state = WAITING;

public:

// Construction.

delayed_thread() = delete;
delayed_thread(delayed_thread const &) = delete;

delayed_thread(delayed_thread &&other)
: m_promise(std::move(other.m_promise))
, m_thread (std::move(other.m_thread ))
, m_state (std::move(other.m_state ))
{
other.m_state = DISMISSED;
}

template
delayed_thread(op_t &&op)
: m_thread([op = std::forward(op), &m_promise]() {
m_promise.get_future().wait();
op();
})
{}

// Destruction.

~delayed_thread() {
if(m_state == DISMISSED) {
return;
}

trigger();

if(m_thread.joinable()) {
m_thread.join();
}
}

// Assignment.

delayed_thread &operator = (delayed_thread &&rhs) {
m_promise = std::move(rhs.m_promise);
m_thread = std::move(rhs.m_thread );
m_state = std::move(rhs.m_state );

rhs.m_state = DISMISSED;
return *this;
}

delayed_thread &operator = (delayed_thread const &rhs) = delete;

// Execution.

void trigger() {
if(m_state == TRIGGERED || m_state == DISMISSED) {
return;
}

m_state = TRIGGERED;
m_promise.set_value();
}

}; // end class delayed_thread

Scott Meyers said...

@Christopher Hayden: I agree, this seems to be a promising approach. There are a couple of problems with your code (e.g., std::promise is a template, not a class), so I'm going to play around with it before I reply more thoroughly, but the idea of creating a class to represent a thread that starts suspended seems sound, and it should (I hope) avoid things like destructor order problems.

Zahir said...

Regarding Herb Sutter's "never use naked threads" idiom one may use some active objects (namely some task execution loops of their own threads) via passing "packaged_task" for getting the job done.

Other than that, tt seems to me if "triple-dots" throws an exception or includes a return statement, this mostly effects the "funcToRun" too, it is waiting for "triple-dots" is a serious clue for me. So, if there is an exception/exit it should be handled inside this context or inside the thread.

I can think of a solution including "packaged_task" like this:

std::packaged_task tripleDots([&]{
...
});

std::thread t([&tripleDots]{
//handle here or ignore here
tripleDots.get_future().wait();
funcToRun();
});

ThreadRAII tr(std::move(t));
tripleDots();


Herb Sutter's related post: http://herbsutter.com/2010/07/12/effective-concurrency-prefer-using-active-objects-instead-of-naked-threads/

Zahir said...

I have missed that I should be careful writing less-than, greater-than signs, my post is missing "packaged_task" template argument
which is "int()" regarding the return code. And I forgot to put a return statement (e.g. return OK;) just after "..."

Scott Meyers said...

@Zahir: Your idea is interesting. We can think of the work in tripleDots as setting up a context in which the new thread should do its work, and if an exception is thrown during this setup, the code I originally posted would propagate that exception instead of starting funcToRun. I don't see a simple way to get that same behavior from your approach. It's easy to avoid running funcToRun if tripleDots throws, but propagating that exception back to the thread's creator is less straightforward.

Scott

Scott Meyers said...

@ Christopher Hayden: With a little tinkering, I got your code to work, at least on simple test cases. There are still some refinements that could be added (e.g., dealing with the possibility of an exception being thrown by std::promise::get_future), but you certainly addressed everything in my original question. Thanks very much for that!

Here's the code I was able to get to compile, link, and run using gcc at http://rextester.com/runcode. (I had to add -pthread to the default command line.)

#include <thread>
#include <chrono>
#include <iostream>
#include <future>

// Per C++11 30.2.6
template <class T> typename std::decay<T>::type decay_copy(T&& v)
{ return std::forward<T>(v); }


class delayed_thread {
using promise_t = std::promise<void>;
using thread_t = std::thread;

enum execution_state {
WAITING, TRIGGERED, DISMISSED
};

promise_t m_promise;
thread_t m_thread;

execution_state m_state = WAITING;

public:

// Construction.

delayed_thread() = delete;
delayed_thread(delayed_thread const &) = delete;

delayed_thread(delayed_thread &&other)
: m_promise(std::move(other.m_promise))
, m_thread (std::move(other.m_thread ))
, m_state (std::move(other.m_state ))
{
other.m_state = DISMISSED;
}

template<typename op_t>
delayed_thread(op_t &&op)
: m_thread([op = decay_copy(std::forward<op_t>(op)), &m_promise = m_promise]() {
m_promise.get_future().wait();
op();
})
{}

// Destruction.

~delayed_thread() {
if(m_state == DISMISSED) {
return;
}

trigger();

if(m_thread.joinable()) {
m_thread.join();
}
}

// Assignment.

delayed_thread &operator = (delayed_thread &&rhs) {
m_promise = std::move(rhs.m_promise);
m_thread = std::move(rhs.m_thread );
m_state = std::move(rhs.m_state );

rhs.m_state = DISMISSED;
return *this;
}

delayed_thread &operator = (delayed_thread const &rhs) = delete;

// Execution.

void trigger() {
if(m_state == TRIGGERED || m_state == DISMISSED) {
return;
}

m_state = TRIGGERED;
m_promise.set_value();
}

}; // end class delayed_thread

void funcToRun()
{
std::cout << "In funcToRun...\n";
return;
}

int main()
{
delayed_thread t(funcToRun);
std::cout << "Working while thread 'suspended'...\n";
t.trigger();
std::cout << "Working while thread runs asynchronously...\n";
std::this_thread::sleep_for(std::chrono::milliseconds(500));
std::cout << "Done!\n";

}

Unknown said...

@Scott Meyers I don't think triggering on destruction is actually a good idea as it may create gotchas and problems that are hard to solve for the end user.
If we take into account, as you pointed earlier, that the thread may be waiting on environment setup, triggering on destruction may lead to undesired behaviors where op is called without the proper settings when an exception is thrown while setting up.
That being said, I would consider skipping the op call when destructing a delayed_thread on WAITING state.

Scott Meyers said...

@Tiago Gomes: I agree. But if we don't trigger, we can't do a join, because the join call will never return. (The lambda will never finish running.) Your suggestion to skip the call to op when we're destroying a delayed_thread in a WAITING state is reasonable, but how can we communicate the state information to the thread running the lambda? In the current version of delayed_thread, we could let the lambda capture m_state by reference, but that would work only because we do a join in delayed_thread. In the more general version of ThreadRAII that I discussed in my Going Native talk (delayed_thread is an RAII class like ThreadRAII), ThreadRAII could be configured to do either a join or a detach, and if we add that generalization to delayed_thread, there is no guarantee that the lifetime of m_state would extend to the point where the lambda could check it.

So I agree that we need to find a way to avoid running op if trigger isn't called, but it's not clear to me this morning what the best way to do that is.

Suggestions?

Unknown said...
This comment has been removed by the author.
Unknown said...

@Scott Meyers After giving it some thought I think we are overengineering this problem.
From the point where we decided to create a class to wrap all the logic, we can stop using the future/promise trick and just delay the thread creation to achieve the desired behavior and avoid most of the problems we were facing.

This is the code I've being testing ( http://rextester.com/runcode gcc with -pthread ) and seems ok:

#include <thread>
#include <chrono>
#include <iostream>
#include <future>

template<class func_t>
class delayed_thread
{
using thread_t = std::thread;

enum execution_state {
WAITING, TRIGGERED, DISMISSED
};

func_t m_op;
thread_t m_thread;
execution_state m_state = WAITING;

public:
// Construction.
delayed_thread() = delete;
delayed_thread(delayed_thread const &) = delete;

template<typename op_t>
delayed_thread(op_t &&op)
: m_op(std::forward<op_t>(op))
{ }

delayed_thread(delayed_thread &&other)
: m_op (std::move(other.m_op ))
, m_thread(std::move(other.m_thread))
, m_state (std::move(other.m_state ))
{
other.m_state = DISMISSED;
}

// Destruction.
~delayed_thread() {
if(m_state == DISMISSED) {
return;
}

if(m_thread.joinable()) {
m_thread.join();
}
}

// Assignment.
delayed_thread &operator=(delayed_thread const &rhs) = delete;

delayed_thread &operator=(delayed_thread &&rhs) {
m_op = std::move(rhs.m_op );
m_thread = std::move(rhs.m_thread);
m_state = std::move(rhs.m_state );

rhs.m_state = DISMISSED;

return *this;
}

// Execution.
void trigger() {
if(m_state == TRIGGERED || m_state == DISMISSED) {
return;
}

m_state = TRIGGERED;
m_thread = std::thread( m_op );
}

}; // end class delayed_thread

template<class func_t>
delayed_thread<func_t> make_delayed_thread(func_t&& func)
{
return delayed_thread<func_t>(std::forward<func_t>(func));
}

void funcToRun()
{
std::cout << "In funcToRun...\n";
return;
}

int main()
{
auto t = make_delayed_thread(funcToRun);
auto t1 = std::move(t);
std::cout << "Working while thread 'suspended'...\n";
t.trigger();
t1.trigger();
std::cout << "Working while thread runs asynchronously...\n";
std::this_thread::sleep_for(std::chrono::milliseconds(500));
std::cout << "Done!\n";
}

Scott Meyers said...

@Tiago Gomes: My concern with your approach is that it doesn't achieve the effect of starting a thread in a suspended state. Instead, your version of trigger creates a brand new thread object. One of the original goals was to pay all the overhead for creating a new thread, then suspend execution of that thread until some event occurs. I don't think your code does that. Am I overlooking something?

Unknown said...

@Scott Meyers: You are actually right, I overlooked this part of the original problem.

Unknown said...

@Scott Meyers: Would it be any worse if we used promise/future of bool instead of void ?
If we use the bool version, we may use the value to indicate if op shall be executed or not.Then we change the lambda to:

if(m_promise.get_future().get()) {
op();
}

Change trigger to use m_promise.set_value(true) and the destructor would call m_promise.set_value(false).

Would that be ok ?

Scott Meyers said...

@Tiago Gomes: This looks like a great suggestion! We'd also have to modify the lambda to handle the possibility that get_future throws, but that's something we have to deal with in any case.

Thanks for this idea!

Unknown said...

@Scott Meyers: I've being toying with this class throughout the day and I think now it's covering all the problems mentioned before.

Here is the live code demo: http://rextester.com/EXCC9898

I tried to mimic std::thread as much as possible, including behavior definition for moved-from objects.

I think the way the constructor which takes a function is setup right now makes it impossible for std::promise::get_future to throw an exception, as it's just being created and the call to get_future is being done synchronously before any other call.

This change also fixed a problem where moving from a recently created delayed_thread would throw sometimes because the lambda was executed after the move and it was holding a reference to the moved-from promise.

One thing that I wanted to add but was not satisfied with my solution was the possibility of creating delayed_thread with function that take parameters, as std::thread does. Here is the code I used, which compiles under gcc on http://rextester.com, but it feels wrong to me:

template< typename op_t, typename... args_t >
delayed_thread( op_t&& op, destruction_action on_destruction, args_t&&... args )
: m_thread(
[ op = decay_copy(std::forward<op_t>(op)),
should_execute = m_promise.get_future( ) ]
( typename std::decay<args_t>::type... args ) mutable
{
if( should_execute.get( ) )
{
op( args... );
}
}, std::forward< args_t >( args )...
)
, m_on_destruction( on_destruction )
{}

Any insight on that ?

Also, working with this code made me wonder, do you think this class should really behave as a mix of ThreadRAII and std::thread with suspension or would it be better to implement it solely as a std::thread and use your original RAII code with it ?

ThreadRAII tr( delayed_thread( funcToRun ), &delayed_thread::join ) // Maybe provide delayed_thread::dismiss for this ?

or

delayed_thread( funcToRun, delayed_thread::action::join )

Scott Meyers said...

@Tiago Gomes: The standard specifies the behavior of the variadic std::thread constructor more or less this way,

template <class F, class ...Args> explicit thread(F&& f, Args&&... args);

The new thread of execution executes decay_copy(std::forward<F>(f)), decay_copy(std::forward<Args>(args))...).

so I think that's how you'd have to write the implementation. The tricky part is coming up with a good calling API. Your design interposes on_destruction between f and args, which I find odd. The simplest solution would be to put on_destruction first.

I like your idea of separating delayed_thread from ThreadRAII. It would be nice if ThreadRAII would work with anything supporting a std::thread-like API, and of course the general idea of separating concerns argues for it, too.

I'm less enthusiastic about having to replicate the std::thread API for delayed_thread, especially because one thing I didn't mention in the post is that I'd ultimately like to make it possible for a single promise to control multiple delayed_threads, i.e., for a client to create a bunch of delayed_threads, then trigger them all with a single call to what Christopher Hayden called "trigger" and what your code calls "execute". Implementing this behavior is easy (just use a std::shared_future), but I'm afraid it will lead to two versions of delayed_thread, much like we currently have both std::future and std::shared_future. We'd then have std::thread, std::delayed_thread, and std::shared_delayed_thread, and that doesn't seem very pretty.

I'll play around with this more today and post again. I encourage you to do the same.

Scott Meyers said...

@Tiago Gomes: I don't think you want to offer "dismiss" as an option of what to do to the std::thread in the delayed_thread destructor, because by the time that destructor is running, the std::thread may have been "unsuspended" (via a call to trigger/execute), and in that case, your only available actions are the ones offered by std::thread, i.e, join or detach.

The notion of dismissal makes sense only before a call to trigger/execute, and before such a call, the notion of joining or detaching isn't really meaningful, because the thread is blocked waiting for a call to trigger/execute.I think the current semantics are the proper ones: dismissal of the function to be run on the std::thread if the delayed_thread is destroyed before trigger/execute has been called, and either join or detach (at the client's choice) if the delayed_thread is destroyed after trigger/execute has been called.

In the meantime, on further reflection, I think it makes sense to replicate the std::thread API in delayed_thread. The other option would be to provide just enough API to let delayed_thread do its work (notably including the special member functions), plus offer a "get" member function to access the underlying std::thread, but then it would be harder to use a delayed_thread in a context where a std::thread-like object was required.

Scott

Unknown said...

@Scott Meyers: I agree with your point about not offering "dismiss" for the user, it would probably cause more harm than good.

After giving it some though, I do think we should push strongly towards std::thread interface and behavior, not only to work nicely in contexts where std::thread-like objects are required, as with ThreadRAII, but also for consistency sake. Providing behavior as close to std::thread's behavior as possible would make using delayed_thread simpler for users of the former.

With that in mind, I modified the code I was using to strip ThreadRAII behavior from delayed_thread and still keep it std::thread-like.
Here is a live demo code of this implementation which also shows ThreadRAII compatibility:
http://rextester.com/AUUTS48533

About your concern of std::thread-like interface making it hard to implement the possibility of triggering/executing multiple delayed_thread with a single call, I don't think that would be the case.

Keeping the interface as it is, I think all we would have to do is add to some constructors the possibility of passing in a std::shared_future, maybe wrapped in some meaningful class such as trigger_receiver. This way a user can provide shared_futures from a single promise, maybe wrapped as something like delayed_thread_trigger, to multiple delayed_thread and trigger/execute all at once by setting the promise's value.

The only problem I'm still somewhat unable to solve, with c++11, is how to offer both this possibilities of triggering/executing at the same time, as we can't just do "if ( m_shared_future.get( ) || should_execute.get( ) ) { op(); }" because it would block until first value or both values are set.

I would envision something like using shared_future::then, if/when the proposal actually passes, to call "trigger"/"execute" upon definition of outer promise's value and this way not changing our current lambda definition.

What are your thoughts on those subjects ?

Scott Meyers said...

@Tiago Gomes: Right now, I'm thinking that a better name for delayed_thread is latched_thread, and the corresponding class using a shared_future would be something like shared_latch_thread. latched_thread would offer a share member function, thus making it possible to create a shared_latch_thread from a latched_thread, just like you can currently create a shared_future from a future. Internally, the shared_latch_thread would contain a shared_ptr to a promise.

I don't care for the idea of having clients pass in a shared_future, because the promise/future implementation should be hidden from clients, just like the different kinds of shared states are hidden from clients using promises and futures/shared_futures.

I'm hoping to work on this later today, but I may not have time to get to it.

Scott

bert Dvornik said...

I see two interesting problems here. The first is the specific problem of starting the thread suspended, and I like the direction Tiago and Scott have been taking.

But this is of course just a special case of the much broader problem of coping with long-blocking threads interfering with thread termination. A truly satisfying RAII thread wrapper would have some way of controlling interaction with all sorts of resources that may block for a long time, not just limited to futures used in the thread.

Maybe I'm a pessimist, but this seems like an impossible nut to tackle outside of the C++ standard library itself (which is not to say that a C++ library based solution would be easy). In C++2x, maybe? =)

Scott Meyers said...

@bert Dvornik: I think the general problem of what to do about threads that prevent termination (due to blocking calls or otherwise) is likely to be bigger than just the library or even the language. Well-behaved software will have to be written in ways that permit orderly thread shutdown. I suspect it will turn out to be akin to writing exception-safe code: achieving the desired behavior will require particular programming techniques, and while the language or the library can provide useful building blocks, it will still be up to software developers to use those building blocks in a correct fashion.

bert Dvornik said...

@Scott Meyers: yes, I think that's spot on.

I should have thought through my last post better. =) I realized after I posted that in all but the most benign cases, the library/language don't have a good way to do the right thing. (If you don't believe this, think about the case where a thread is blocking on a join of another thread.)

That said, it does feel like a really solid set of programming practices to deal with prevention of termination may need better tools and paradigms than have emerged thus far. The comparison with thread safety seems very apt, though I guess only time will tell. Given how fast and how well C++ has been evolving, I'm feeling pretty hopeful.

bert Dvornik said...

I mean exception safety, of course (not thread).