Wednesday, March 20, 2013

std::futures from std::async aren't special!

This is a slightly-revised version of my original post. It reflects information I've since received that confirms some of the suppositions I'd been making, and it rewords some things to clarify them.


It's comparatively well known that the std::future returned from std::async will block in its destructor until the asynchronously running thread has completed:
void f()
{
  std::future<void> fut = std::async(std::launch::async, 
                                     [] { /* compute, compute, compute */ });

}                                    // block here until thread spawned by
                                     // std::async completes
Only std::futures returned from std::async behave this way, so I had been under the impression that they were special. But now I believe otherwise. I now believe that all futures must behave the same way, regardless of whether they originated in std::async. This does not mean that all futures must block in their destructors. The story is more nuanced than that.

There's definiately something special about std::async, because futures you get from other sources (e.g., from a std::promise or a std:: packaged_task) don't block in their destructors.  But how does the specialness of std::async affect the behavior of futures?

C++11 futures are the caller's end of a communications channel that begins with a callee that's (typically) called asynchronously. When the called function has a result to communicate to its caller, it performs a set operation on the std::promise corresponding to the future.  That is, an asynchronous callee sets a promise (i.e., writes a result to the communication channel between it and its caller), and its caller gets the future (i.e., reads the result from the communications channel).

(As usual, I'm ignoring a host of details that don't affect the basic story I'm telling.  Such details including return values versus exceptions, waiting versus getting, unshared versus shared futures, etc.)

Between the time a callee sets its promise and its caller does a corresponding get, an arbitrarily long time may elapse. (In fact, the get may never take place, but that's a detail I'm ignoring.) As a result, the std::promise object that was set may be destroyed before a get takes place.  This means that the value with which the callee sets the promise can't be stored in the promise--the promise may not have a long enough lifetime.  The value also can't be stored in the future corresponding to the promise, because the std::future returned from std::async could be moved into a std::shared_future before being destroyed, and the std::shared_future could then be copied many times to new objects, some of which would subsequently be destroyed. In that case, which future would hold the value returned by the callee?

Because neither the promise nor the future ends of the communications channel between caller and callee are suitable for storing the result of an asynchronously invoked function, it's stored in a neutral location. This location is known as the shared state.  There's nothing in the C++ standard library corresponding to the shared state.  No class, no type, no function. In practice, I'm guessing it's implemented as a class that's templatized on at least the type of the result to be communicated between callee and caller.

The special behavior commonly attributed to futures returned by std::async is actually determined by the shared state. Once you know what to look for, this is indicated in only moderately opqaque prose (for the standard) in 30.6.8/3, where we learn that
The thread object [for the function to be run asynchronously] is stored in the shared state and affects the behavior of any asynchronous return objects [e.g., futures] that reference that state.
and in 30.6.8/5, where we read:
the thread completion [for the function run asynchronously] synchronizes with [i.e., occurs before] [1] the return from the first function that successfully detects the ready status of the shared state or [2] with the return from the last function that releases the shared state, whichever happens first.
It's provision [2] that's relevant to us here. It tells us that if a future holds the last reference to the shared state corresponding to a call to std::async, that future's destructor must block until the thread for the asynchronously running function finishes. This is a requirement for any future object. There is nothing special about std::futures returned from std::async. Rather, the specialness of std::async is manifested in its shared state.

By the way, when I write that the "future's destructor must block," I don't mean it literally. The standard just says that the function releasing the last reference to a shared state corresponding to a std::async call can't return as long as the thread for the asynchronously running function is still executing. That behavior doesn't have to be implemented by having a future's destructor directly block. The future destructor might simply call a member function to decrement the reference count on the shared state. Inside that call, if the result of the decrement was zero and the shared state corresponded to a std::async call, the member function would simply wait until the thread running the asynchronously running function completed before it returned to the future destructor.  From the future's point of view, it merely made a synchronous call to a function to decrement the reference count on the shared state.  The runtime behavior, however, would be that it could block until the asynchronously running thread completed.

The provision stating that, essentially, the shared state corresponding to a call to std::async must somehow indicate that the last future referring to them must block until the associated thread has finished running, is not popular. It's been proposed to be changed, and some standard library implementations (e.g., Microsoft's) have already revised their implementations to eliminate the "futures from std::async block in their destructors" behavior. That makes it trickier for you to test the behavior of this part of the standard, because the library you use may be deliberately nonconformant in this area.

Scott

PS - The reason I got caught up in this matter was that I was trying to find a way to perform the moral equivalent of a detach on a thread spawned via std::async.  Because I believed it was the std::future returned from std::async that was special, I started experimenting with things like moving that std::future into a std::shared_future in an attempt to return from the function calling std::async before the asynchronously running function had finished. But since it's the shared state that's special, not the std::future, this approach seems doomed. If you know how to get detach-like behavior when using std::async (without the cooperation of the function being run asynchronously), please let me know!

16 comments:

Martinho Fernandes said...

> This is a requirement for any future object, not just the ones returned from std::async calls.

I don't see why this is a requirement for all futures: this particular requirement you quote is from the requirements upon std::async, not the requirements on std::future. The only requirement on the destructor of std::future is that it "releases the shared state". It happens that the shared state from std::async adds the requirement you quote, but no other shared state in the standard library has such a requirement.

Scott Meyers said...

@Martinho Fernandes: As you note, the destructor for a std::future releases the shared state. And per 30.6.8/5, the last function that releases the shared state must not return until the asynchronously running thread has completed. This is actually a constraint on the function's execution, but a common way to implement that requirement, I believe, is to have the entity performing the final release of the shared state to block until the asynchronous thread has finished running. That entity could be any kind of future. A simple implementation mechanism would be for the shared state to have a decrementRefCount member function, which would decrement the reference count and, if the result was zero, destroy the shared state object. For shared state objects produced by calling std::async, the decrementRefCount function would not return until the thread (which, per 30.6.8/3, is stored in the shared state object) finishes. From the point of view of the future's destructor, all it did was decrement the reference count. When the shared state didn't come from std::async, that operation would sometimes complete much faster than when the shared state came from std::async.

Anthony Williams said...

Yes, it is the shared state that is special, not the std::future object. You can move or copy (via std::shared_future) that future object as many times as you like, and the last reference to that shared state must still obey the constraint.

The "function that releases the shared state" is the destructor of the last future referencing that shared state, or the assignment operator of that future, should a new value be assigned to it. The destructor or assignment operator must therefore block if the async thread has not completed, as the thread completion synchronizes with the return to the caller [of the destructor or assignment operator].

If you want to detach your async function, stick the future in a global container of "detached" async functions. That way the future won't be destroyed until program exit, and so there will be no waiting until then. If you really want to detach it, put the future on the heap and leak it, but that's wrong in so many ways.

Really, if you want a detached thread, use std::thread.

Zenju said...

I really don't get the problem here:
Yes, std::async returns a std::future that blocks on scope exit.

But I have no idea why this is not simply implemented like the canonical beginners example of boost packaged task and wrapped in an "async" template:
http://www.boost.org/doc/libs/1_53_0/doc/html/thread/synchronization.html#thread.synchronization.futures.creating

Is there any problem with boost's example I'm missing? (except that they forgot to detach the thread in the example)

Kometes said...

If you're interested in the inner workings of a std::future/async implementation (like the shared state stuff), my thread library code is highly readable, mostly follows the standard and contains a complete implementation of the composable future extensions from the proposal N3428 (then, when_any/all, unwrap, and a work-stealing thread pool scheduler).

What's the reasoning that the enclosing scope to async should block? It seems contrived to me, it's unexpected behaviour.

Kometes said...

As an addendum to my comment above here's the future/async test code, it helps to see the end-user side of things to know what you're looking at =)

Scott Meyers said...

@Zenzu: I'm not sure what your point is regarding the Boost documentation. The question I'm addressing is how a requirement in the standard (that std::future objects returned from std::async appear to have special behavior) is implemented in the library. The code at the page you reference shows the use of futures, not the implementation of them.

Note also that futures returned from std::packaged_task don't have the behavior of futures returned from std::async.

Scott Meyers said...

@Kometes: I don't know for sure why the committee specified the behavior they did, but their implicit-join approach may well have been a reaction to the problems of an implicit-detatch approach detailed in N2802.

Herb Sutter said...

tl;dr: Martinjo already answered this correctly -- the article is not correct, the blocking applies only to futures returned from std::async with launch policy launch::async.

Scott said:
and in 30.6.8/5, we see that [...] This is a requirement for any future object, not just the ones returned from std::async calls.


This does not follow. 30.6.8 is the specification of std::async. Anything said in there is specific to std::async.

Scott is correct that:
the specialness of std::async is manifested in its shared state.


Yes, as Anthony confirmed, that's the mechanics of how the standardese implements the (controversial) exception to make std::async-produced futures -- and those only -- block in their destructors even if you didn't ask to .get() or .wait() on them.

But as Martinho already correctly said, this exception is in clause 30.6.8 "Function template async" only, and does not apply to futures in general.

If you wanted to find something that applies to futures in general, it would be in 30.6.4-7 which covers shared state, promise, future, and shared_future generally. And indeed 30.6.4 "Shared state" tells us exactly what happens by default to shared state when the last future goes away. First, paragraph 3 defines:


An asynchronous return object is an object that reads results from an shared state. ...


such as a future.

Then, paragraph 5 clearly states (quoted in full):


When an asynchronous return object or an asynchronous provider is said to release its shared state, it means:
— if the return object or provider holds the last reference to its shared state, the shared state is destroyed;
and
— the return object or provider gives up its reference to its shared state.


Note that no synchronization is specified, just basic reference counting cleanup actions. The first bullet covers the case where if this future is the last to leave (all other futures, and the promise/task providing the value, have already gone away before this last future goes away) then it turns off the lights. The second bullet says that always, whether last or not, it decrements the refcount.

That's just reference counting. No threads were blocked in the releasing of this shared state from any future destructor...

... unless you were so unlucky as to have a future returned from std::async, in which case the specially sharpened fangs in 30.6.8/5 which apply to async-generated futures only are deployed, and deliver their charming sudden snake-bite to your nether regions.

Having said all that, Scott's actual intuition was entirely correct:
The reason I got caught up in this matter was that I was trying to find a way to perform the moral equivalent of a detach on a thread spawned via std::async. Because I believed it was the std::future returned from std::async that was special, I started experimenting with things like moving that std::future into a std::shared_future in an attempt to return from the function calling std::async before the asynchronously running function had finished. But since it's the shared state that's special, not the std::future, this approach seems doomed. If you know how to get detach-like behavior when using std::async (without the cooperation of the function being run asynchronously), please let me know!


Your approach was exactly right -- if you unhappily receive an async-generated future, the workaround is to make the future object (or move/copy thereof) live as long as you need it to, which means get it out of its structured/bounded lifetime in any of the usual ways -- put it in any non-local location, such as returning it, writing it to an "out" reference parameter, or writing it to a heap location or global variable.

Zenju said...

> @Zenzu: I'm not sure what your point is regarding the Boost documentation.

AFAIK the behavior of a blocking std::future will be removed with the next C++ standard update (Herb Sutter mentioned this in some talk).

So the remaining question for me as a user is, how to get this behavior right now. This is where the boost example comes in.

Therefore all my problems are currently solved and I'm wondering why you bother about std::future implementation issues.
Yeah, maybe I simply do not understand :)

Scott Meyers said...

@Zenju: As far as I know, the proposed revision of the behavior of std::async has not yet been adopted into a revised draft standard, though Herb would certainly know whether it has. (It's worth noting that many proposals that seem like slam-dunks to be adopted don't necessarily end up getting into a revised standard. Concepts didn't make it into C++11, nor did range-based algorithms, nor did the suggestion that destructors in classes with virtual functions automatically become virtual.)

My interest is in the behavior of the current standard. My motivation for the blog post was the realization that, contrary to what I had believed (and what I believe others hold to be true), even with the current standard, there is nothing special about futures returned from std::async. Rather, there is something special about the shared state corresponding to std::async calls. If you care only about Boost and not about the standard, then my blog post is unlikely to be of interest to you.

Scott Meyers said...

@Kometes: How do you implement the "futures returned by std::async are special" behavior? Do you have special code in the destructors for futures to handle this case, or do you somehow encode this behavior in the shared state?

Kometes said...

@Scott Meyers
I didn't implement this. The shared state would need a wait-on-destruction flag set from async, and both future and shared_future would need to check in their dtor:

if (state && state->waitOnDestroy && !state->ready && state.use_count() <= 2) wait();

The use_count term is only for the shared_future dtor -- if the async task isn't ready then it's still sharing the state, so 2 means we're the final shared_future. Also, note that if future.share() is called then the state shared_ptr is moved out, thus in the future dtor state would be null.

Scott Meyers said...

@Kometes: Wouldn't it also be possible for the shared state to offer a decrementRefCount function that hides the blocking/waiting behavior, such as I've described in another comment as well as in the revised blog post? Such an implementation, if viable, would mean that futures of all kinds could be implemented without having to worry about special rules for shared states corresponding to std::async calls.

Kometes said...

@Scott Meyers
Yeah that would work just as well. One drawback of generic handling though that I can think of is that future, being unique, could be optimized to not check use_count() and not lock the state. For shared futures a lock on the state would be necessary to avoid a race condition where another shared future is created and destroyed in the same moment that use_count() is called in the dtor.

Scott Meyers said...

@Herb Sutter: I think we agree on what the standard specifies, we just word it differently. My argument is that any future that happens to refer to a shared state corresponding to a call to std::asyunc must block in its destructor if it's the last future referring to that shared state. Maybe that future came from std::async, but maybe it was just the last of many std::shared_futures that, through some sequence of moves and copies, ended up referring to a shared state for a std::async call. The future itself is not special. It simply decrements the reference count for the shared state and, if the result is zero, it blocks if the asynchronously running thread has not completed. All futures must contain this logic in their destructor. What's special about std::async is not the std::future it returns. What's special is the shared state.