Thursday, July 24, 2014

Should move-only types ever be passed by value?

[For this post, I'm going to pretend that std::unique_ptr is a type, instead of a template, because the issue being examined is independent of what a std::unique_ptr points to.]

Suppose I want to pass a std::unique_ptr to a constructor, where the std::unique_ptr will be moved into a data member. The std::unique_ptr parameter thus acts as a sink. To the extent that we have enough experience with C++11 for wisdom about it to be conventional, said wisdom seems to be that the std::unique_ptr should be passed by value. In his GotW 91 solution, Herb Sutter argues for it. The High Integrity Coding Standard has it as a guideline. (It cites Herb's article as the source.) In his C++ Reference Guide, Danny Kalev argues for it. Many StackOverflow answers repeat this advice.

But recently Matthew Fioravante brought a StackOverflow question to my attention showing a problem resulting from declaring a std::unique_ptr by-value sink parameter, and later Matthew suggested that sink parameters of move-only types should be passed by rvalue reference. This is a very interesting idea.

Suppose you see this function signature:
void f(SomeType&& param);
What does this tell you about param? The fact that it's an rvalue reference tells you that it's a candidate to be moved from, and the usual expectation is that it will be. In other words, it's a sink parameter. Note that this is completely independent of param's type. Even without knowing anything about SomeType, we can conclude that param is a sink parameter.

If SomeType happens to be std::unique_ptr, nothing changes: param is still a sink. There's no need for a special rule for std::unique_ptrs that tells us to pass them by value to indicate that they're sinks, because we already have a way to unambiguously say that: pass them by rvalue reference.

Going back to the idea of a constructor moving a std::unique_ptr into a data member, this is what the code looks like using pass by value:
class Widget {
public:
  explicit Widget(std::unique_ptr ptr): p(std::move(ptr)) {}

private:
  std::unique_ptr p;
};
Now consider this calling code:
std::unique_ptr up;

Widget(std::move(up));
What's the cost of getting up into p? Well, the parameter ptr has to be constructed, and the data member p does, too. Each costs a move construction, so the total cost is two move constructions (modulo optimizations).

Now consider the same thing using pass by rvalue reference:

class Widget {
public:
  explicit Widget(std::unique_ptr&& ptr): p(std::move(ptr)) {}

private:
  std::unique_ptr p;
};

std::unique_ptr up;

Widget(std::move(up));
Here, only the data member p will be constructed, so the total cost is only one move construction.

Unless I'm overlooking something, passing sink parameters of type std::unique_ptr by value is inconsistent with our usual idiom for expressing the idea of a sink parameter (i.e., to pass by rvalue reference), and it's less efficient, too. My sense is that the conventional wisdom regarding sink parameters of type std::unique_ptr is all messed up.

Which leads to the question: how did it get messed up?  I believe what happened was that people noticed that for maximal efficiency when passing lvalues and rvalues of a particular type that needed to be copied inside the function, you needed to either overload on lvalue references and rvalue references, or you needed to pass by universal reference. Both approaches have problems (overloading doesn't scale to multiple parameters, and universal references suffer from the shortcomings of perfect forwarding, lousy error messages, and sometimes being too greedy). For cheap-to-move types, people found, you can use pass by value with only a modest efficiency loss, and the conventional wisdom, in large part based on a David Abrahams' blog post, "Want Speed? Pass by value", came to embrace that idea.

The thing is, for move-only types like std::unique_ptr, you don't need to worry about dealing with lvalues, because lvalues get copied, and move-only types aren't copyable. So there's no need to overload for lvalues and rvalues, hence no scalability problem for multiple parameters. Which means that the motivations for replacing pass by reference--which is what the conventional wisdom from C++98 always dictated--with pass by value don't exist for move-only types.

My feeling is that Matthew Fioravante may well have hit the nail on the head here: there is no reason to use by-value parameters to express "sinkness" for move-only types. Instead, the usual rule of passing sink parameters by rvalue reference should apply.

The special case of considering the use of pass by value for always-copied parameters really only applies to types that are both copyable and movable, and only in situations where overloading and the use of a universal reference is not desired.

What do you think? Is there ever a time where move-only types should be passed by value?

Scott

72 comments:

  1. What I'm really seeing about that question is that since there's even the potential for a mistake, then he's already doing it wrong.

    The problem with taking by rvalue ref is that when you involve templates, you can get some unpleasant interactions with perfect forwarding that don't occur when taking by value, and decltype works differently.

    Another potential pitfall is the possibility of the caller keeping a reference to your dead value, which is impossible for a value.

    There are other more subtle interactions like evaluation order (as we see) and destruction order if an exception is thrown or if it turns out at runtime that the function just doesn't need that object, which I'd say is far more pertinent.

    Ultimately, a unique_ptr move construction is a negligible cost, taking it by value only has general suboptimal semantics in a niche case that's pretty questionable to begin with, no potential forwarding pitfalls.

    But I'd say what's almost as important is that you can treat every type the same. It's very generic to say "Just take it by value". It's easy for people to remember and get right. It's easy to express that semantic in a template. If you want to add new special cases to the rule, then it's going to have a cost w.r.t. people understanding that case, why it exists, and when to apply it. And the benefits, or rather, almost total lack thereof, of adding a new case to the rule absolutely do not justify that cost.

    I would only take by rvalue ref if there is a runtime condition in which you might want to not move from the argument.

    ReplyDelete
  2. Ahem, just to clarify my last point above, the most obvious condition there is "Exception with strong guarantee". However this applies to all moves, not just move-only types.

    ReplyDelete
  3. While the Want Speed? Pass By Value article is seemingly missing, the fortunate thing is that C++Next is pretty well archived.

    Anyway, I appreciate you questioning C++'s conventions. It's a great way to discuss and learn, so thank you.

    ReplyDelete
  4. Hey, just an FYI, the HTML code that comments out unneeded syntax highlighters is using C++ multi-line comments instead of HTML comments, so the very top of the page has "/* */ /* */", and the extra highlighters are loading anyway!

    ReplyDelete
  5. @Anonymous: We agree that all types should be treated the same way. But the standard way to say "this is a sink parameter" is to pass it by rvalue reference. It's how move constructors and move assignment operators work, and they're the poster children for functions acting as sinks. Using pass by value to express "this is a sink" for std::unique_ptr is an exception to the general rule, and I think it makes sense to get rid of the exception.

    ReplyDelete
  6. @Anonymous: Thanks for the link to Dave's article. I've revised the blog post to include it.

    ReplyDelete
  7. That's not really true. Rvalue references for sinks is only an implementation necessity. It's not the general rule for the class's users. They take by value. If you *could* take by value as a constructor argument, I would. In fact, as we've discussed before, in the assignment operator we often *do* take by value.

    I think that if you're using an rvalue ref to a class, in a non-generic context (counting auto&& as generic), outside the implementation, then you're doing it wrong. That's an easy simple rule.

    Accepting by value if you need a value is also a super simple rule, and it templates a lot better than the rvalue ref rule and produces a consistent rule for all value types, regardless of their copyability or movability or whether or not the value type is a template param or not.

    Having the rule be consistent for interfaces is far more important than having it be consistent for implementations, IMO. We apply the same considerations when saying, for example, "Don't use new[] and delete[], use std::vector.". This rule is obviously inapplicable if you're implementing std::vector. The same holds true here.

    If you're a user of class X, and you need a value, take a value. That's a simple rule, with no exceptions or special cases. Even for the implementation, it's arguable since move constructors do not need to take values.

    ReplyDelete
  8. IMHO, the problem with passing moveable types by rvalue reference instead of by value, is that there are cases where you don't know the state of the moved from object after the function call.

    Consider this:

    void f(SomeType&& param);

    SomeType instance;
    f(std::move(instance));

    // was instance moved or not?

    Instead, if f() took the argument by value, there's no ambiguity about the state of instance after the call.

    Here's another answer that advocates passing unique_ptr by value: https://stackoverflow.com/a/8114913/241631

    ReplyDelete
  9. @Anonymous: I guess we just disagree. To me, pass by rvalue reference is the proper way to express "this is a sink." Pass by value is the way you say "I want a copy of the passed-in argument." Pass by value incurs the cost of parameter construction and destruction, and when there are chains of calls, even small costs can add up. Pass by value for UDTs was unidiomatic in C++98, and I think it should be no more idiomatic in C++11.

    Even with move semantics, it makes no sense to me to construct and destruct objects you don't need. Passing expensive-to-copy lvalues by value (e.g., containers) is still prohibitively expensive, so it's not practical to adopt a "pass by value all the time" philosophy. So you have to decide what kind of uniformity you want. I like "pass all sink parameters by rvalue reference--no exceptions." You can't make the same sweeping claim about pass by value for all sink arguments, because some types may have nontrivial move costs; an example is std::array.

    ReplyDelete
  10. @Ashish Sadanandan: Can you give me an example of (1) a function that would take an rvalue reference and would not necessarily move from it and (2) a caller of that function who would care whether the move was performed? I'd expect all such callers to write code downstream of the call assuming that a move had taken place, because passing an argument to an rvalue reference parameter means "this argument may be moved from."

    ReplyDelete
  11. map::insert and map::emplace are very similar to (1) I know they involve universal references, and not rvalue reference parameters, but it's possible that there exist functions taking rvalue reference parameters, and conditionally move depending on some internal logic. There's even a proposal to address the ambiguous nature of those map member functions: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n3873.html

    I agree with (2), for any sane type, it should be possible for the caller to inspect whether the type is "empty" after the call. But it's more lines of code that need to be written.

    I guess my argument is that, given a typical, cheap-to-move type, passing the argument by value is not going to affect the vast majority of software. If your application is so performance sensitive that it cannot tolerate a few extra pointer copies and nullptr assignments (unique_ptr move), then you should be profiling the crap out of your code, and not relying on any catch-all guidelines.

    Note that I'm not arguing your guideline is incorrect either, it too works in the vast majority of cases. I'm just not sure about the *always do this* part.

    ReplyDelete
  12. If the function may throw and you want to guarantee strong exception safety, you have to accept arguments that you want to move from by rvalue reference.

    ReplyDelete
  13. Sorry, I'm on vacation so I don't have time to read the whole article (or the private email thread which preceded it that's still in my inbox), but I got as far as the following and I think you have a wrong premise here:


    > Suppose you see this function signature:
    > void f(SomeType&& param);
    > What does this tell you about param?

    Answer: It tells you that:

    - f accepts its parameter by reference (e.g., so no copy will be performed).

    - f can accept both named and temporary objects (l/r/whatever-values).

    - f can modify the argument you pass (because it’s not const).

    I’m not aware of pass-by-&& being an idiom (we're still learning C++11/14 idioms), but if it were one my first guess would be that it means “in/out parameter where you explicitly don’t care about losing side effects since you’re enabling passing a temporary object”...


    > The fact that it's an rvalue reference tells you that it's a candidate to be moved from

    No, I don't think so. This does not tell you it’s a candidate to be moved from any more than passing by plain &-to-non-const would (you can move from those too) -- and both require an explicit std::move and aren't allowed implicitly. In fact, move operations usually take their source parameters by plain &-to-non-const...

    This tells you at most that what you pass is being taken by reference etc. If anything it means a writable in/out parameter that can also accept a temporary (unlike the usual pass-by-&).

    Sorry if I missed something.

    ReplyDelete
  14. I'm surprised by Herb's reply. If a function accepts a parameter by rvalue reference I expect the function to move from the argument, just like move constructors or move assignment operators do. And if a function accepts a parameter by a plain reference to non-const, I expect the function to mutate the argument, but generally not move the guts out of it.

    ReplyDelete
  15. The fact that Herb says SomeType&& "l/r/whatever-values" makes me think he mistook SomeType for a template and assumed universal reference? (Those are technically r-value references, and I'm not sure if Herb prefers to use the term universal reference or not).

    ReplyDelete
  16. @Herb Sutter: Look at the signature for f again. It can’t accept lvalues, and it would rarely make sense for param to be an in/out parameter, because it can bind to temporaries.

    Contrary to what you wrote, move operations (i.e., move constructor and move assignment operator) definitely take their parameters by rvalue reference, not lvalue reference.

    I suggest you read the post again more carefully.

    ReplyDelete
  17. @Ashish Sadanandan: My goal is to make things as simple and uniform as possible. Being able to say "always pass sink arguments by rvalue reference" is simpler and more uniform than "pass expensive-to-move sink arguments by rvalue reference and cheap-to-move sink arguments by value." Furthermore, I don't see any advantage to the more complicated rule. What are you gaining in return for having to think about how expensive it is to move and the assumption that the additional cost won't matter?

    I think Stephan's point about strong exception safety is relevant, too. Passing by value can preclude it in a way that pass by rvalue reference can't.

    Under what conditions is pass by value a win (vis-a-vis pass by rvalue reference) for a move-only sink parameter?

    ReplyDelete
  18. @Anonymous: Thanks for the heads-up about the syntax highlighting snafu. Unfortunately, how that works is a black box to me. What little I know is described here.

    Getting C++ code into blogger is a pain, so if you know a better way for me to do it, please let me know.

    ReplyDelete
  19. "Getting C++ code into blogger is a pain, so if you know a better way for me to do it, please let me know."

    Use wordpress: http://crazycpp.wordpress.com/wtf/

    ReplyDelete
  20. " It can’t accept lvalues, and it would rarely make sense for param to be an in/out parameter, because it can bind to temporaries. "

    There actually was a case in which I was sorely tempted to misuse rvalue references in just this manner. I had a visitation framework for a complex container of objects that could be almost anything. Sometimes I wanted the visitor to be actively gathering data that I could retrieve from it later. Other times I wanted to just pass in a temporary and be done with it.

    Options that I seemed to have were either to write distinct accept() functions that accepted the different reference types, or to use rvalue references.

    With an adequately descriptive, renamed copy of "std::move" I could get rid of the move semantic. I could just do something like accept(make_visit(my_visitor)). So long as the accept function didn't actually move the variable I'd be fine.

    In the end I decided this was just too risky. Left too much to convention. Decided instead to go with the multiple accept function route.

    I think Herb's point is on the mark. You don't know what fun(type&&) does. It *can* take an lvalue because it's too easy to turn one into an rvalue.

    The exception safety thing is indeed an issue. I hit up Abrahams about it when I was bothered by std::auto_ptr semantics and exception safety:

    > From: Noah Roberts
    > Subject: Exception guarantees and auto_ptr parameters
    >
    > Message Body:
    > I have a question regarding your exception guarantees and how one
    > should classify a certain class of functions.
    > struct thing;
    > struct has_things
    > {
    > void add_thing(std::auto_ptr);
    > private:
    > std::list things;
    > };
    > void has_things::add_thing(std::auto_ptr t)
    > {
    > things.push_back(t.get());
    > t.release();
    > }
    >
    > Since the only thing that can throw here, and it isn't very likely, is
    > the allocation of new space for the list node we can guarantee that
    > things is not changed if the function fails. No memory is leaked
    > either since the auto_ptr won't release if this happens and scope will
    > delete the thing.
    >
    > The question I have here is whether this function conceptually fits
    > within the strong guarantee classification. On the one hand, program
    > state was altered because the pointer was deleted. From the
    > perspective of any client or the 'has_things' instance itself though
    > nothing is any different than if the call had succeeded, except that
    > has_things hasn't been altered. Thus on the other hand it seems that
    > "state" in the important sense has not been altered.
    >
    > I would be very interested in your opinion on the matter.

    Hi Noah,

    This is clearly a borderline case—i.e. it's clearly unclear ;-).

    I draw a conceptual line after all the arguments have been passed and
    before the body of the function begins to execute, and ask whether any
    subsequent effects are visible outside the function in case of an
    exception. In this case, they might be visible but only if there's a
    stray pointer/reference into *t.get() somewhere. Reference semantics
    complicates everything :(

    I might classify this as a strong guarantee function but
    probably wouldn't rely on that classification as the entire
    documentation of the exception guarantee. auto_ptr is weird, since it
    doesn't obey regular value semantics rules. You can only really
    understand such functions in terms of the expected transfer of
    responsibility that goes with passing auto_ptr arguments.

    Sorry I can't say something more definitive, and I hope this helps.

    ReplyDelete
  21. I wrote up a comment, submitted, and it took me to another view. My comment didn't show (yet). Here is the short version.

    If I were consistently using std::vector&& to indicate a sink, how/where/when would I use std::vector? I feel like const std::vector& would be the only other one that mattered.

    ReplyDelete
  22. @The Buzz Saw: Pass by value is for when (1) you unconditionally want to make a copy of whatever is passed in and (2) you don't want to overload on lvalues and rvalues and (3) you don't want to use a universal reference.

    If you want maximal efficiency, pass by value never makes sense for types that are cheaper to move than to copy. But overloading can be a hassle (multiple functions to maintain, and the number of needed overloads grows geometrically with the number of parameter you want to handle), and universal references can be a pain, too (for reasons I summarized in an earlier comment). In such situations, you may decide to give up some efficiency in exchange for not having to deal with the hassle and pain of overloading or universal references.

    ReplyDelete
  23. @Crazy Eddie:

    I think Herb's point is on the mark. You don't know what fun(type&&) does. It *can* take an lvalue because it's too easy to turn one into an rvalue.

    No, no, no, no, no. fun(type&&) cannot take an lvalue. Period. Full stop. Rvalue references can bind only to rvalues. A caller with an lvalue can cast the lvalue to an rvalue (typically using std::move), but then the caller is saying "I give you my express permission to treat this as an rvalue, meaning that you should move from it if that lets you do your job more efficiently." When called in that way, fun receives an rvalue, because fun always gets an rvalue.

    If you have an lvalue that you don't want to have messed with, don't cast it to an rvalue and pass it to a function taking an rvalue reference. Similarly, if you have a const object whose value you don't want messed with, don't cast it to a non-const and pass it to a function taking an lvalue reference to non-const.

    ReplyDelete
  24. Scott, when you want to take a copy, but the type is cheaper to move than copy, how does copy elision come into play? (vis-a-vis Dave Abrahams seminal article linked to above)

    That is, for a constructor as follows, will copy ellision come into play, and will it be as efficient as passing by &&?

    ctor(type val) : val_(std::move(val)) {}

    ReplyDelete
  25. Taking sink parameters by rvalue reference makes it more difficult to pass in a copy of an lvalue. For example, if a parameter is shared_ptr I'd quite often want to pass in a shared_ptr lvalue ptr I created earlier and still access it afterwards; I need a facility to create a prvalue copy of ptr (and preferably without having to repeat the type of ptr or write decltype).

    Perhaps if there were such a facility in ? I think template T copy(T const& t) { return t; } would work?

    ReplyDelete
  26. @lori: In your example code, there are two named objects that have to be constructed: val and val_. Copy elision can't change that. (After inlining, compilers might find ways to apply the as-if rule to eliminate the code constructing and destructing val, but that's not related to copy elision.)

    Somewhere along the line, the idea that copy elision makes everything free seems to have arisen and taken root, but that's not an accurate reflection of reality. In your example, if val were an rvalue reference, the number of constructions would be one instead of two.

    ReplyDelete
  27. With regards to performance, I actually expect passing unique_ptr by value to be *more* efficient, not less.

    Keep in mind:

    - I would expect all unique_ptr move constructor and destructor code to be inlined
    - More often than not, sink functions for unique_ptrs will *not* be inlined.

    So, under those circumstances, if I pass a unique_ptr by (rvalue) reference:

    - The compiler is likely forced to commit the unique_ptr itself to memory to pass the address (as the rvalue reference)
    - The sink function has to then read the passed pointer (reference) to the unique_ptr, and the read again to get the pointee.
    - Upon return of the sink function, the compiler doesn't know if the unique_ptr has actually been moved from, so...
    - When it comes time to destruct or reset the unique_ptr, it has to read it back from memory
    - and have a branch to see if the pointee should be destructed.

    If I pass the unique_ptr by value:

    - The compiler doesn't need to commit it to memory, as the pointee within the unique_ptr can be passed directly via a register.
    - The compiler knows at call site that the moved from unique_ptr is now 'null', and in most cases, will not need to check if the unique_ptr has a value which needs to be deleted after the sink returns
    - The sink function has the unique_ptr value already available via a register
    - If it moves from that argument, the compiler also knows it won't need to check for deleting there.

    ReplyDelete
  28. So just to reiterate the 2 issues I've found in particular with passing a move only type by value.

    First, passing by value means that the move construction happens at the call site. Move construction has side effects, and this can lead to bugs:

    void foo(std::unique_ptr<int> p, int x);

    std::unique_ptr<int> ip = /*something*/;

    foo(std::move(ip), *ip); //undefined behavior

    This is essentially the same as doing something like this:

    void bar(int x, int y);

    bar(i++, i);

    If foo instead passed p by rvalue reference, then the move construction happens inside the body of the function and is sequenced after the dereference.

    The second issue is with exception unwinding semantics.

    Lets look at this call again:

    foo(std::move(ip), 0);

    If foo() throws an exception, we have no possible way to "undo" the move operation because the body of foo() has no alias to ip. The object pointed to by ip is lost.

    If on the other hand, we pass by rvalue reference, we can design our function to only do the move operation if there are no errors.

    void foo(std::unique_ptr<int>&& p) {
    if(*p <= 0) {
    throw some_error;
    }
    auto something = std::move(p);
    //...
    }


    try {
    foo(std::move(ip));
    }
    catch(some_error& e) {

    std::cout << "Invalid value for p : " << *ip << std::endl;
    }

    Esentiallly this allows us to design interfaces which will "Move the object if and only if no exception occurs". I think in general this design is more robust than a by value approach which always moves the object and causes it be unconditionally destroyed when an exception is thrown.

    ReplyDelete
  29. If you take by value you're absolutely, 100% certain that the copy is yours and that nobody else can possibly mess with it. I find this an important thing to assert, through language syntax and semantics, because I run into all sorts of nonsense in the field. If someone *can* do it...it *will* be done.

    For example, spinning off a thread and handing a unique_ptr to a function taking unique_ptr&& and then using get() on that same ptr to hand it to someone else. This is mind numbingly stupid stuff for sure, but if you can use the value you "moved" into a sink function you can bet your life that someone will.

    If you take by value this is simply not possible to accomplish no matter how much casting you perform. Once you move into a parameter value your copy of the ptr is toast. If you try to access it you're going to immediately see undesirable results.

    I myself consider this to be a whole lot more important than performance in most situations. I see more damage done in the name of "performance" than any single other metric.

    I think that if you're scare of exceptions to general rules then you're in for trouble. Everything you might suggest to someone has an exception to it somewhere. Taking by rvalue reference seems to me to be an excellent example of micro-optimization at the expense of more important ideals, such as code clarity, security, and certainty. You want to, "Design your interfaces to be easy to use correctly and difficult to use incorrectly." Using rvalue reference to advertise "sinkeness" fails that criteria because it's too easy to use incorrectly.

    ReplyDelete
  30. @Anonymous

    A std::unique_ptr value arg will actually not be passed in a register on the common platforms (due to the non-trivial/deleted copy constructor and destructor).

    ReplyDelete
  31. "I think Stephan's point about strong exception safety is relevant, too. Passing by value can preclude it in a way that pass by rvalue reference can't."

    I think this actually speaks toward the problem I have with this idea. Unless I'm misunderstanding the idea here, we want to call a function that won't change program state unless it can complete entirely...and in that we include the parameters that were sent to us to perhaps take ownership of. So then if we can't take that ownership we don't and the caller can recover.

    The problem there is that we are relying on a convention to protect us from abusive callers, and at the same time assuming that they'll break it. The convention I speak of is that simply calling 'move' on a variable and using it in an expression should result in the assumption that it's gone. We should not assume that it's only gone if an exception is not thrown within that expression...it's gone as soon as we made the call. As such we should not assume that taking the variable we're going to own by rvalue reference absolves us of the responsibility to deal with it. We have had ownership of that variable since we were first called.

    Breaking the convention that move destroys the source is a pretty dangerous one I think. It would make rvalue references perhaps the most dangerous language feature I've ever seen. If we don't keep with that convention then we're doing things akin to making '==' mean less than or something equally crazy. You can never be sure what you're reading unless you have read the code of the thing being called.

    Short of the possible performance considerations, I just don't see any good coming of taking a value you intend to own, which is what I assume 'sink' to mean, as anything other than a value. The rvalue reference does not clearly indicate that it's yours, it instead indicates that you can take it. IMO although these seem semantically similar statements they're not and the distinction is important.

    ReplyDelete
  32. I've not felt the need to think much about how to pass move-only sink parameters because with C++11 it looks straightforward: 1. Sink parameter should be moved in. 2. C++11 has r-value references precisely for the purprose of modelling "move" 3. use "fun(Type&& param)" to decorate param as being moved from. 4. client code *has* to use "fun(std::move(blub)" to make the function call compile. => crystal clear (C++11) semantics on both caller and callee side if you ask me. That's simply the C++11 model with two copy-constructor variants "C(const C& other) C(C&& tmp)" applied to ordinary functions.

    ReplyDelete
  33. @Crazy Eddie:

    I just don't see any good coming of taking a value you intend to own, which is what I assume 'sink' to mean, as anything other than a value. The rvalue reference does not clearly indicate that it's yours, it instead indicates that you can take it. IMO although these seem semantically similar statements they're not and the distinction is important.

    Can you elaborate on why you think the distinction is important? The only behavioral difference I can think of is the timing of resource release. With pass-by-value, it's presumably the time of destruction of the parameter (though it may be sooner or later, depending on what happens inside the function). With pass-by-rvalue-reference, I'd assume the same thing (because I assume the function would take ownership), but under your "maybe the function takes ownership, but maybe it doesn't" assumption, resource release is presumably at the time of destruction of whatever expression is passed to the function (though it may be sooner or later, depending on what happens inside the called function as well as what happens inside the calling function). Both scenarios cover a lot of uncertainty, in my view, and I'm having trouble envisioning scenarios where the difference matters.

    ReplyDelete
  34. Well, note Matthew Fioravante's post above. He's using the value passed in after it's been into a function taking an rvalue reference. He's also using it after he's moved it in his earlier examples.

    This is bad jujitsu. It's something that should be avoided. The function that wishes to take ownership of a resource should claim that it's doing so in no uncertain terms. The rvalue reference parameter doesn't do that and like I said, and as you can see, if you allow someone to abuse a construct they will.

    The value makes it quite clear that calling that function means passing ownership of the resource. There's no doubt at all whether it's legal to use what was moved after that. So by taking by value the function says, "This will be mine now." Taking by reference it's instead saying, "This could be mine if I decide I want it."

    On a first-time written perspective I think you're right. There's no big difference. Resource destruction is really it. But when you get multiple people working on the two sides of the call, people who don't understand fully the rvalue reference convention, people who've accessed the moved value afterward (such as the example I cite above), people coming in later after the guys who made that setup are gone, etc... I think you're creating a maintenance nightmare and it's one that isn't needed. Someone comes along, decides they really need to actually move that object before the exception is thrown, and now client code breaks. If you assume rvalue and value mean the same thing then this should be a legal change to make and yet it blows up in your face...or the customer's.

    So if you want to claim ownership over the resource you should just do so, and do so right away. It's the cleanest, most certain thing to do leaving no room for the kind of bad stuff people will be tempted to do otherwise. Otherwise take by lvalue reference. IMO the use of rvalue references should be limited to a very few select places, like move constructors and "universal reference" uses in templates. If and when you do take by rvalue reference outside of those conditions I think you need to be very careful to document concerns and make sure nobody's breaking the convention you have in place...because at that point convention is all that saves you, the language can't help. I like to avoid those situations when possible and in this case it's really, really easy. I don't like making policy for the lowest common denominator but in this case I think the benefits outweigh the cost...this is a thing I could screw up.

    ReplyDelete
  35. If I have to/can provide two overloads for foo:

    void foo(T&&);
    void foo(T);

    I usually end up with:

    template T copy(const T& t){ return t; }
    void foo(T&&);
    // usage:
    foo(std::move(t));
    foo(copy(t));

    and I'm sure I don't get any unnecessary copies.
    Greetings.

    ReplyDelete
  36. Update: Please ignore my previous thinko -- right, f can’t accept an lvalue unless you std::move it explicitly at the call site (i.e., cast it to an rvalue). Sorry for the noise.

    I have now read the article and am not convinced of the problem. However, it's great to discuss and debate these things as we're still learning the conventions of C++11/14 including &&.

    tl;dr: I continue to believe you should express sinks using pass-by-value by default. Of course, do fall back to the option of pass-by-&& if you have demanding needs for it, notably if you can prove you need the incremental performance gain.

    I believe the correct advice is still "to express a sink (always move ownership in), pass by value." This is simple, clear, correct, and consistent -- consistent across both copyable and move-only types, stays consistent if you turn the function parameter into a 'universal reference'. And the performance overhead for unique_ptr in particular is guaranteed to be nearly-or-close-to a single raw pointer copy operation (if it isn't optimized away) -- there are rare and subtle cases where unique_ptr is not exactly as cheap as a raw pointer but I don't think this is one of the cases where it's materially different.

    As was pointed out, a *default* style of "pass by && (sometimes?)" is harder to teach and learn (what are the exact rules?), can leave dangling references, changes the code's meaning if you change to/from being a template (e.g., for forwarding), and in general seems complicated without sufficient motivation for the complication.

    But yes, do fall back to pass by && as an advanced option (not default) if you can prove a need. As far as I can tell, the argument for proving you do need it boils down to proving that you need the performance of reducing two moves to one move. (Another comment mentioned passing by && for exception safety, but if I understood that correctly it would apply to going from one move to no moves, whereas the motivating case in the article was going from two to one -- that is, you're going to move anyway.)

    Re && being the rule already: I don't see it that way, it's the exception rather than the rule today IMO. We correctly teach pass by value the vast majority of the time where the alternative of pass by && would arise. In move constructors and operators (primarily) you have to write &&, but it seems to me that's where you *implement* move, and so that doesn't make it the default for actually *using* move where the default is definitely pass/return by value (e.g., in every by-value return statement).

    Re this:
    > Pass by value for UDTs was unidiomatic in C++98, and I think it should be no more idiomatic in C++11.

    But that's just not right -- pass and return by value is very much more idiomatic in C++11 and rightfully so (we added a major fundamental optimization to copying, and so of course that changes copying idioms). Examples include the return statement (return by value is much more recommended including for containers), the assignment operator taking its parameter by value (something only a few like Andrei argued for in C++98 and is now common in C++11), and other cases.

    Value semantics are good -- simple, clear, consistent. They are also the default way to use move operations today, plumbed through into return statements etc. Use them by default. *Teach* them as the default. (Absent a strong argument to the contrary of course but I don't think we have that here yet...) Of course, you can always resort to a non-default when you have the need, it's a rich language, but I still think && is an advanced knob not a default.

    ReplyDelete
  37. I did a fair share of moving the last year or so, and I too foten was in doubt wether to use lvalues, rvalues or even references. To my particular case it boiled dont to having functions accept a buffer, and optionally a handle - the buffer beeing within the handle`s resource.
    bool send(const void *pStart, unsigned size, handle_t &&handle == handle_t())
    {
    // do some stuff and checks (1)
    _queue.emplace(pStart, size, std::move(handle)); // accept data and take ownership away (2)
    //....
    }

    The thing that motivated this scheme was that the move handle was optional (you could send some static read-only buffer too) and this way you can use a default argument.
    But more importantly, passing-by-value would mean that the handle is lost when the function returns or throws in (1). This is not the strong exception guarantee, even if you make sure nothing returns or throws its still something unintuitive you have to write down so noone else will mess it up later

    Given that I work with custom containers with fixed size, an full queue would just mean to retry later. I think most embedded - and realtime-guys will use queues with a fixed size instead of the infinite-ram and infinite-delay STL Implementations?

    I am looking forward to some books for good C++11 style, but I really hope they will fit the whole range of C++ applications. The consistent and good-everywhere pattern would be just to use rvalues whenever stuff gets moved.

    ReplyDelete
  38. (post from July 25, 2014 at 11:15 AM was mine also)

    Herb wrote: 'As was pointed out, a *default* style of "pass by && (sometimes?)" is harder to teach and learn (what are the exact rules?), can leave dangling references, changes the code's meaning if you change to/from being a template (e.g., for forwarding), and in general seems complicated without sufficient motivation for the complication.'

    Im not aware of the template vs non-template thing. Can you point me in the right direction?

    Some other - potentially related thing: what would best pratices be for funtions with callbacks:

    template
    void foo(T cb); // bad for big function objects
    template
    void foo(const T &cb); // function objects immutable!
    template
    void foo(T &cb); // no temporaries!
    template
    void foo(T &&cb); // Winner?

    ReplyDelete
  39. @Herb Sutter
    > "Another comment mentioned passing by && for exception safety, but if I understood that correctly it would apply to going from one move to no moves, whereas the motivating case in the article was going from two to one -- that is, you're going to move anyway."

    If you accept a value by rvalue reference, you can delay the actual moving until you can be sure that whatever your function does won't fail, e.g. until you have acquired some required resource. (Or in more complicated cases you might move from the argument and then, if an exception is thrown, move back the value into the argument in an exception handler -- assuming the type has a nothrow move assignment operator).

    If however the function accepts a parameter by value, the value will be irrevocably gone when the function throws an exception.

    Note that std::move doesn't actually move anything, it just creates an rvalue reference.

    Consider for example a bounded concurrent queue of move-only work items. If you want to give the caller of a push operation the chance to recover from a failed push without loosing the work item and you want to report errors with an exception, then you cannot accept the work item by value.

    ReplyDelete
  40. I just answered to Herb, but got an error from the blog software. I hope the comment went trough anyway.

    ReplyDelete
  41. Stephan: Right. I was also thinking that there were some reported issues with, I believe, map::insert, which doesn't overwrite the existing value if there is one in the map already, but the argument was moved from anyway, whereas you could (and I believe it was) argued that it should not move from the argument if the value wasn't stored in the map. So there are conditions for maybe moving apart from just strong exception guarantee.

    I guess that the point I'm making here is that these are exceptions, not the rule. Passing by value will be fine for most functions. You can take by value now and come back later if you need more complex semantics. The always-move semantics of taking by value are plain for all to see and easy to use and understand, and much more consistent and generic than taking move-only types by rvalue ref.

    > I think Stephan's point about strong exception safety is relevant, too.

    Hey, it is I, the Great Anonymous, who originally raised the strong exception guarantee point in comment #2. But I guess that if you want credit, you should probably not comment anonymously :P Of course, arguably, moving from the argument is an effect of the caller, not the callee, so the strong exception guarantee I feel need not necessarily cover not moving from the argument. Whether or not that's the optimal behaviour remains to be seen, but I certainly feel like it is the intuitive behaviour. There is no ambiguity over what will occur when you pass by value.

    ReplyDelete
  42. @Stefan - I really must dispute that what you're talking about is a good thing. We should stick to the convention that moving a value destroys it, even if that happens to sometimes not be the case. The rvalue reference is the wrong thing for the job you're expecting of it. Instead, you should provide an actual type that provides the behavior you want...namely that of being able to recover ownership if transfer fails.

    ReplyDelete
  43. I've been marking sink parameters as rvalue refs since they were introduced and I have to say it works very well. There is a clear message being sent, that the argument will get "taken" and there is no worry for types that may be/become copyable. After all, if the intention is to move the argument - be explicit about it.

    ReplyDelete
  44. Let's go back to basics.

    1. The language says that sink parameters should be rvalue references. This is why the signatures for the compiler-generated move operations (move constructor and move assignment operator) include rvalue reference parameters.

    2. The standard library says that sink parameters should be rvalue references. This is why the all insert, push_back, and push_front functions for rvalue arguments use rvalue reference parameters. This applies even for unique_ptr, as can be seen in the shared_ptr inteface: the shared_ptr constructor taking a unique_ptr rvalue takes it by rvalue reference. An analogous example for a different move-only type occurs in the shared_future interface: the shared_future constructor taking a future rvalue takes it by rvalue reference. There are, as far as I know, zero occurrences of by-value parameters for move-only types in the standard library. (I don't know of any for user-defined types that are both copyable and movable, either, but this blog post addresses only move-only types.)

    3. In the general case, using pass by value for move-only types can lead to nontrivial performance costs. Consider the active object class Active that Herb published here. Active is a move-only type containing two unique_ptrs and a message queue. Moving that is already probably three to four times as expensive as moving a unique_ptr (it depends on the cost of moving a message queue), but let's make things more interesting by assuming that the Active sometimes engages in encrypted I/O using PGP, and as such it caches its public key as a data member. I know nothing about PGP (but I hope this example makes sense in concept), but when I see public key blocks, they're generally a few hundred characters in length, so let's assume that Active has a data member of something like std::array<char, 500>. (We could use a std::string, but why put a fixed-size string on the heap?) Now moving an Active calls for moving two unique_ptrs, a message queue, and a std::array<char, 500> (and moving a std::array<char, 500> calls for copying it). Do you still think that passing sink arguments of this type by value is preferable to passing it by rvalue reference? Passing it by rvalue reference has essentially zero cost.

    Using pass-by-value to express "sinkness" for parameters of move-only types is thus contrary to the way the language is designed, contrary to the way the standard library is designed (including the specific case of passing unique_ptrs as sink arguments), and contrary to the goal of adopting conventions that tend to avoid performance problems (i.e., avoiding premature pessimization).

    ReplyDelete
  45. @Herb:

    I continue to believe you should express sinks using pass-by-value by default. Of course, do fall back to the option of pass-by-&& if you have demanding needs for it, notably if you can prove you need the incremental performance gain.

    This is the opposite of what we recommend for possibly-copied arguments. We don't say "pass them by value, unless that's too slow." We say "pass them by reference (by const T&, to be precise), unless that's too slow (as it can sometimes be for very small objects, e.g., of built-in type)." The default is to pass by reference, because efficiency is a good habit to get into. Why should we adopt a different philosophy for sink arguemnts?

    I believe the correct advice is still "to express a sink (always move ownership in), pass by value." This is simple, clear, correct, and consistent -- consistent across both copyable and move-only types, stays consistent if you turn the function parameter into a 'universal reference'.

    It's actually inconsistent, and that's what prevents it from being simple and clear. It's contrary to the general C++ philosophy of doing the most efficient thing by default (which is to pass by reference). It's contrary to the conventions used in the STL (per my other post). It can't be reasonably applied to types that aren't trivially cheap to move (per my other post). It can't be seamlessly replaced with pass-by-universal reference, because (1) you lose the side effects of parameter construction and destruction and (2) some calling code may break, because there are arguments that can be passed to a type that can't be passed to a reference (e.g., braced initializers, overloaded function names, and a hodge-podge of other perfect forwarding failure cases).

    The simple, clear, consistent rule is to declare sink parameters of move-only types by rvalue reference.

    Re && being the rule already: I don't see it that way, it's the exception rather than the rule today IMO. We correctly teach pass by value the vast majority of the time where the alternative of pass by && would arise.

    I'm not sure who "we" is. As far as I know, there are zero examples in the standard library--not even for unique_ptr (per my other post). I certainly don't teach people to pass by value, and I go out of my way to warn them that move semantics doesn't eliminate the need to worry about unnecessary object constructions and destructions. When did we decide to teach people that creating objects willy-nilly is proper software design?

    But that's just not right -- pass and return by value is very much more idiomatic in C++11 and rightfully so (we added a major fundamental optimization to copying, and so of course that changes copying idioms). Examples include the return statement (return by value is much more recommended including for containers), the assignment operator taking its parameter by value (something only a few like Andrei argued for in C++98 and is now common in C++11), and other cases.

    Returning is not the topic under discussion. I limited my post to sink parameters of move-only types for a reason.

    As for parameters for assignment operators, my preference is still to pass them by reference, and the C++ Standard and its library agree.

    [Continued in next comment, because blogger limits comments to 4096 characters, sigh.]

    ReplyDelete
  46. [Continued from previous comment.]

    Value semantics are good -- simple, clear, consistent. They are also the default way to use move operations today, plumbed through into return statements etc. Use them by default. *Teach* them as the default. (Absent a strong argument to the contrary of course but I don't think we have that here yet...) Of course, you can always resort to a non-default when you have the need, it's a rich language, but I still think && is an advanced knob not a default.

    Rvalue references are the most pervasive new feature in C++11. They're used everywhere. I don't believe you can be a decent C++11 programmer (much less an effective one) without understanding them and being comfortable with them. They are no more advanced than lvalue references, which I hope we can agree is not an advanced knob.

    Again, the simple, clear, consistent rule is to declare sink parameters of move-only types by rvalue reference. Defaulting to pass by value is to default to a technique offering suboptimal performance that's inconsistent with the conventions of the standard libary (and, I'm guessing, most other industrial-strength libraries, though I have not checked).

    ReplyDelete
    Replies
    1. This is a well reasoned and convincing rebuttal. As Scott says, there is simply no substitute for understanding. I am utterly baffled why the author of the "Exceptional C++" books is making these kinds of arguments. Surely there are much better targets for simplification in C++ than this one.

      Delete
  47. @Edward Catmur: I'm having trouble understanding what you want to do. If you want to create a copy of an lvalue, pass it to a sink function, then access it later, why not just do this?

    auto copyOfLValue = myLValue;
    sinkFunction(std::move(copyOfLValue));

    I think this gives you everything you say you want. It requires no type repetition and no decltype, and, unlike your solution, it gives you access to the copy of the lvalue after the call (though at that point the copy probably has a different value than the original lvalue).

    ReplyDelete
  48. 1. The language says that sink parameters should be rvalue references. This is why the signatures for the compiler-generated move operations (move constructor and move assignment operator) include rvalue reference parameters.

    This is true, but irrelevant. First, the language definition is created from way back before anybody had any clue about how to optimally use rvalue references. Secondly, you can't go back and fix the language definition if it happens to be too slow for your specific case. This is not true for this specific interface consideration. Thirdly, it's arguable that a different definition would be optimal for assignment operators in general, and move constructors simply don't have a choice to make here so there's no meaning in what decision they made. Ultimately, you're talking about the same people who voted in prioritization for initializer list constructors, who are trying to get in VLAs, who voted in export. We're not talking about a group of people with a track record of getting the big stuff right, let alone the small stuff.

    2. The standard library says that sink parameters should be rvalue references. This is why the all insert, push_back, and push_front functions for rvalue arguments use rvalue reference parameters.

    The stdlib has to deal with many conditions that user-written code rarely has to face. But they fall victim to the same issues as the language definition- it's pretty debatable which version would actually be better in a good few of these cases, there's the potential for compatibility screwups, and the stdlib's authors did not have five years experience using rvalue refs to call upon when making these decisions.

    (We could use a std::string, but why put a fixed-size string on the heap?)

    Because they're O(1) move, which is a very serious performance consideration that could, in many cases, far outweigh the price of dynamically allocating it in the first place. It's far from a foregone conclusion that allocating it as an automatic storage duration data member is the more efficient choice. The very argument you're using is the reason why.

    This is the opposite of what we recommend for possibly-copied arguments. We don't say "pass them by value, unless that's too slow." We say "pass them by reference (by const T&, to be precise), unless that's too slow (as it can sometimes be for very small objects, e.g., of built-in type)." The default is to pass by reference, because efficiency is a good habit to get into. Why should we adopt a different philosophy for sink arguemnts?

    Not sure what "we" you're referring to here, I always take by value every time unless it's too slow. The advice you're referring to may have been valid C++03, but it's certainly not C++11. If you want to take a copy of the argument std::string, and I have an rvalue std::string, then you waste a copy when it could have been a move. That's exactly the opposite of the efficiency you want.

    continued...

    ReplyDelete
  49. continuation...

    It's actually inconsistent, and that's what prevents it from being simple and clear. It's contrary to the general C++ philosophy of doing the most efficient thing by default (which is to pass by reference).
    That is a broken, primitive philosophy that has no place in the modern programming world. Screw efficiency. The only thing that matters about efficiency is that I can have it in the 0.1% of my code hot path that my profiler found me. Efficiency in the rest of my codebase is utterly irrelevant. Having clear, correct and safe semantics is of far greater value.

    Also, I think that "Take everything by value unless you happen to really need a reference for some important reason" is pretty consistent, simple and clear.

    When did we decide to teach people that creating objects willy-nilly is proper software design?

    When it became clear that people could manage them far better than references, and easily create simple, correct, maintainable code, and with the extra time they didn't have to spend rooting out bugs, optimize their hot paths better than before, leading to an overall performance increase on top of reducing errors and increasing maintainability. Creating objects willy-nilly is having your cake and eating it.

    The real way to gain performance increases in non-hotpath code is to increase the power of automated optimizers. Having a human do the optimizing is a total waste of the human's time since he will very rarely make any appreciable difference, there's far more code than he could possibly manage, and he will just get it wrong anyway. If the cost of moves seriously concerns you, you should instead consider lobbying to substantially increase the compiler's scope for object elision. The current rules are excessively restrictive. Then instead of obsessing over object moves by hand, you could have an automated program take care of it for you- the very thing we endeavour to accomplish by writing programs in the first place. Why do a thing when you can have someone else write a program to do that thing for you.

    I'm not sure who "we" is. As far as I know, there are zero examples in the standard library--not even for unique_ptr (per my other post).

    The Standard library is full of mistakes. std::async, std::thread, std::vector::at/operator[] are the wrong way around, unconstrained std::function, std::auto_ptr, no Unicode, iostreams, iterators instead of ranges, I could go on. The best that can be said in this case is that it's not obviously incorrect.

    ReplyDelete
  50. (Reposting with fixed typos -- I wish there was an "edit my comment" feature.)

    Scott, thanks for the thoughtful reply. I am writing a detailed response, but I need two points of clarification please to be sure I understand your position correctly.

    1. In various places in your reply, I’m unsure whether you are claiming one or both directions of the following:

    (a) If a parameter will be moved from, then it should be passed by rvalue reference.

    and/or

    (b) If a parameter is passed by rvalue reference, then [you can infer that] it will be moved from.

    Are you claiming (a), (b), or both? They are potentially different arguments and I want to be sure I understand which one(s) you’re saying.

    2. In various places you seem to use "parameter will be moved from" and "sink parameter" mostly interchangeably. For the purposes of this discussion, do you think the two are the same or different?

    Thanks.

    Herb

    ReplyDelete
  51. @Herb Sutter: Sorry for the long delay in replying. This is my first opportunity to get back to a computer since early this morning.

    I believe that the way a function interface communicates that a parameter's value may be destructively modified (typically moved from) is by declaring that parameter to be an rvalue reference. (Here I mean a "real" rvalue reference, not an rvalue reference that may turn into an lvalue reference after reference collapsing. So universal references don't count.) This includes parameters that are passed into functions that will act as sinks for that parameter's value. So I believe both 1a and 1b, and for purposes of this discussion, I am not aware of any meaningful distinction between "sink parameter" and "parameter that may be moved from."

    I will remark that for functions acting as sinks for resource-managing objects, they will typically have a responsibility to unconditionally take over management of the resource. As a specific example, the shared_ptr constructor taking a unique_ptr is declared like this,

    template <class Y, class D< shared_ptr(unique_ptr<Y, D<&& r);

    but even if an exception is thrown during construction of the shared_ptr, it takes ownership of the object managed by the unique_ptr (and releases the managed resource prior to propagation of the exception). One could argue that this could be made explicit by taking the unique_ptr by value, but this has a cost, and though the cost may be small in the case of a unique_ptr, in general the cost may be arbitrarily large (as in the case of the active object example I posted earlier).

    I'll also remark that I think focusing on unique_ptr is misleading, because I'm proposing a general-purpose no-exceptions-necessary guideline--one that applies to all move-only types, including those where the cost of a move may be more substantial than is the case for unique_ptr. My feeling is that if you're going to argue that sink parameters should be passed by value, you need to make that argument for types where moving has more than a trivial cost. Either that or you need to explain why two rules (one for cheap-to-move types and a second for not-cheap-to-move types) is better than the single one I'm proposing.

    ReplyDelete
  52. @Scott: Yes, that's what I'm trying to do - I think that, depending on your definition of a sink function, passing in a copy of an lvalue will be quite a common thing to do, so you'll need to have an idiom established for that use case.

    I note that above Pawel Turkowski comes to the same conclusion I do, using a helper copy function. I don't like your solution of a local variable as it imposes the cognitive overhead of thinking up an appropriate name for the variable that then pollutes the local scope; I don't see being able to examine the moved-from value afterwards as having any real benefit.

    For completeness I'll also throw out the option of using an immediate lambda:

    sinkFunction([&]{return myLValue;}());

    ReplyDelete
  53. @Edward Catmur: First, let me note that your question/issue is irrelevant to the topic of my blog post, because my blog post is about move-only types, and you're interested in copyable types.

    Setting that aside, I'll remark that even for types that are both copyable and movable, there is always a performance cost associated with pass by value (compared to pass by reference). So one of the considerations that a function author would have to take into account would be whether to impose the cost of an unnecessary move on callers who want to pass "true" rvalues (what the standard calls prvalues, I think), i.e., rvalues returned from functions, in order to make life easier for callers who want to pass copies of lvalues. Given that it's so easy for callers to make copies of lvalues, I'd personally give that consideration very little weight in my interface design.

    But, as I said, that has nothing to do with this post, which is only about move-only types.

    ReplyDelete
  54. @Scott Meyers: But the issue *is* relevent to types which are both movable and copyable, since sink functions don't care whether their parameter is copyable. Surely you should extend this guideline to cover all sink functions, rather than just those which take move-only parameters?

    Also, the author in your example doesn't need to impose an unnecessary copy on the caller, since he can provide both r-value and l-value overloads (e.g. push_back et al.) The decision he must make is whether to provide an l-value overload or impose a slightly awkward syntax on the caller in the name of performance. In this case, would you recommend providing the l-value overload? If so, would you always recommend providing separate overloads rather than a single pass-by-value version?

    ReplyDelete
  55. *unnecessary move

    (sorry)

    ReplyDelete
  56. @Joe: I agree, the issue is relevant to both copyable and movable types, but the common desire to treat lvalues and rvalues differently complicates matters for such types. My proposed guideline applies only for move-only types, because for such types, the complications don't arise.

    In a private thread spawned by this blog post, I summarized things this way:

    The fundamental parameter-passing rules are simple:
    • Pass read-only parameters by const T&. Same as C++98.
    • Pass read-write parameters by T&. Same as C++98.
    • Pass sink lvalues by T&. Same as C++98.
    • Pass sink rvalues by T&&.
    This generally gives you optimal efficiency, and, as you can see, there’s only one new rule compared to C++98.

    The complication arises when all of the following are true:
    1. You want to write a function accepting a type that’s both copyable and movable (hence can accept both lvalues and rvalues).
    2. Moving objects of that type is cheap.
    3. You don’t want to overload for lvalues and rvalues (because there are multiple functions to maintain, and the number of overloads grows geometrically with the number of parameters to be dealt with).
    4. You don’t want to use a universal reference (because not every kind of argument can be passed to such a reference, e.g., braced initializers).
    Under those conditions, pass by value can be a viable design.

    Somewhere along the line, the idea arose that these four things are almost always true, and that’s when some people decided that passing sink parameters by value should be the default. My sense is that Herb is in that camp. I’m not. Whether he and I will come to a common way of viewing things remains to be seen. I expect we’ll have another couple of rounds on the blog. It will be interesting to see how it turns out.

    ReplyDelete
  57. @Scott, I have a question about the (just posted) parameter-passing rules in the context of optimal efficiency (I'd actually like to have a "short & sweet" list like this that I could also recommend to the others, thus my question).

    How about scalar types (e.g., `int` or `double`) and the aliasing problem?

    While modern optimizing compilers have improved type-based alias analysis capabilities, wouldn't passing these by value still constitute the best practice?

    ReplyDelete
  58. @Matt: This blog post is about move-only types, so the built-ins aren't among those I'm addressing.

    Nevertheless, regarding your question, my understanding is that the most efficient way to pass built-in types is by value, so unless you need a function to be able to modify the value of a built-in such that the caller sees the revised value, my recommendation for maximum efficiency would be to pass the built-in by value. However, if I were writing a template, I'd pass a read-only parameter by const T& by default (i.e., by reference). It would be possible to also write a pass-by-value companion template for use with built-ins and then choose between them using std::enable_if, but I'd consider that overkill unless there was compelling empirical data showing that the difference in performance was worth it.

    ReplyDelete
  59. "Some work is more work than no work." - quoting Andrei, IIRC :-)

    (re-post, because it seems the last comment got lost)

    ReplyDelete
  60. [I wish I could edit comments... :)]

    P.S.: To be clear, I should add "shouldn't use &7 for sink parameters by default."

    Of course using && is still a valid optimization, if you have performance justification that the extra move is material in a specific case, and && is indeed required to implement move functions (which is not the same as to use move, as already correctly noted by other commenters). But it should be viewed as an optimization and therefore not done prematurely because it does trade off complexity (there's another concept, &&) and correctness (there are more ways to go wrong, not only that the parameter is no longer guaranteed to be moved from, but also the other valid objections already raised by others in this thread such as the ability to keep a reference and get aliasing, etc.).

    Bottom line, using && is a tradeoff because it incurs addition uncertainty/complexity, and so like any such optimization it shouldn't be reached for until there is data to show you need it in a given case.

    Just like exception safety is not about writing try/catch frequently (it's about RAII and destructors), using move semantics is not about writing && or even move() frequently.

    ReplyDelete
  61. @Herb Sutter: It's clear we're not going to agree, but I have three questions for you (the answers for which will give you the chance to get the last word in on this :-}):

    1. If pass-by-value for move-only types has a different meaning from pass by rvalue reference, how can you argue that one is an optimization of the other? Optimizations don't change semantics.

    2. In my reply to Crazy Eddie on July 25, I explained that the only behavioral difference between pass by value and pass by rvalue reference that I could discern was the timing of resource release, and I asked for an example when this would make a difference. In my view, he didn't give one. Can you?

    3. Do you believe the shared_ptr constructor taking a unique_ptr should be change to take it by value? It currently takes it by rvalue reference, yet always takes ownership.

    Unrelatedly, I wish Blogger allowed editing of comments, too :-)

    ReplyDelete
  62. @Scott:

    1. I'm not the one arguing it's an optimization or has the same semantics. This blog post is the one arguing that && expresses "takes ownership of" or "will move from" (I don't think this is correct) and arguing that it's an optimization because as you wrote "passing sink parameters of type std::unique_ptr by value... is less efficient" because of doing two moves (I think that's premature esp. for the motivating example given of unique_ptr which is dirt cheap to move).

    2. I'm not sure what you mean. For example, I just gave what I called the "killer argument" in my previous comment, namely that for a move-only type pass-by-&& does not in fact express "takes ownership" or "will move from" (it does not guarantee that, which invalidates a primary premise in this blog post) and pass-by-value does. And some others were mentioned, such as that pass-by-&& creates the possibility of the callee keeping an alias of the argument.

    3. No, please see my P.S. comment immediately above. That && is part of how you implement move in what is clearly termed a "special member function" (note the word: "special"), not part of the general guidance of how to use move in general code. Also, && is perfectly fine as a general optimization, including for parameters in generic code which more often does reach for more advanced optimizations/techniques, unlike most user code.

    ReplyDelete
  63. @Herb Sutter: The bulk of your post from this morning at 7:19 AM argues that passing by rvalue reference is an optimization.

    ReplyDelete
  64. Then let me be clear: I understood that your post presented && in part as having better efficiency because of saving a unique_ptr move. I'm simply acknowledging that and arguing that shouldn't be a first-order question to be considering up front; clarity and correctness is first, and using &&, in part with the justification that is saves a move, is premature by default because it has downsides for clarity and correctness that I and others have listed. Of course && is a fine tool for non-default special situations when you have a need, such as to implement move (the language specially recognizes SMFs that take &&'s) or for efficiency (if you've proven that the extra unique_ptr copy is a hot spot, which will be virtually never but anyway, or are writing generic code which like the STL itself naturally uses more general tools by default than typical application code would).

    ReplyDelete
  65. The fact that by-value guarantees ownership transfer seems rather irrelevant to me. I think Scott already asked it: Provide an example, where it matters to the caller that:
    * it's been moved from
    * this is guaranteed by the function signature alone (vs. documented behaviour)
    * and every caller of a function needs this guarantee (because if only some callers need the guarantee, *they* should just additionally wipe the object, if the functions doc doesn't guarantee moved-from

    ReplyDelete
  66. I would add however, that I think this cannot be considered in isolation for move only types. If I get things correctly, copy+move types benefit greatly from pass by value, because you can roll both the move case as well as the copy case into one signature.

    Unifying the move-only case with the "normal" type case, *is* a compelling reason to advocate pass-by-value, but IMHO it's got nothing to do with what pass-by-val does or does not guarantee fore the move-only types.

    ReplyDelete
  67. IMhO, the position of supporters of pass-by-value for move-only types would be stronger if copy-elision would be required (to the compiler), in those circumstances where it is now only just 'permitted' (according to the C++ Standard).

    My 2 cents, Niels

    ReplyDelete
  68. With pass-by-reference, runtime type information of the argument is preserved (if it has a virtual table). Is that a good or a bad thing, compared to pass-by-value (within the context of this discussion)? At least, it allows adding a check, e.g., for debugging, to ensure that no accidental object slicing takes place:

    void f(SomeType&& param)
    {
    assert(typeid(param) == typeid(SomeType));

    // Now go stealing from the argument...
    }

    ReplyDelete
  69. @Niels Dekker: If slicing would lead to incorrect program behavior, pass by value should never be used.

    ReplyDelete
  70. Scott, thanks for re-enabling comments on this post.

    I just wanted to point out that (unless I am missing something) passing by value can be replaced by passing by rvalue reference both for move-only and copyable types. In terms of usage, it only makes a difference when a non-movable lvalue is passed in as an argument, and this case can be easily handled in both cases through temporaries. Please have a look at my post on this topic for more details.

    In addition, getting a compile-time error when an lvalue reference is passed to the function accepting only rvalue references can be seen as an advantage as it forces the developer to explicitly indicate whether a move or a copy is intended. Moreover, the behavior that Herb referred to as the "killer argument" in the comments above can also be easily reproduced when passing by rvalue reference with exactly the same semantics. Taking this into account, I see only advantages in passing by rvalue reference and thus a new question - is there a case when passing by value should actually be preferred to passing by rvalue reference?

    ReplyDelete