Monday, September 7, 2015

Thoughts on the Vagaries of C++ Initialization

If I want to define a local int variable, there are four ways to do it:
int x1 = 0;
int x2(0);
int x3 = {0};
int x4{0};
Each syntactic form has an official name:
int x1 = 0;              // copy initialization
int x2(0);               // direct initialization
int x3 = {0};            // copy list initialization
int x4{0};               // direct list initialization
Don't be misled by the word "copy" in the official nomenclature. Copy forms might perform moves (for types more complicated than int), and in practice, implementations often elide both copy and move operations in initializations using the "copy" syntactic forms.

(If you engage in written communication with a language lawyer about these matters and said lawyer has its pedantic bit set, you'll be reprimanded for hyphen elision. I speak from experience. The official terms are "copy-initialization," "direct-initialization," "copy-list-initialization," and "direct-list-initialization." When dealing with language lawyers in pedantic mode, it's wise to don a hazmat suit or to switch to oral communication.)

But my interest here isn't terminology, it's language design.

Question #1: Is it good language design to have four ways to say the same thing?

Let's suppose that instead of wanting to define an int, we want to define a std::atomic<int>. std::atomics don't support copy initialization (the copy constructor is deleted), so that syntactic form becomes invalid. Copy list initialization continues to succeed, however, because for std::atomic, it's treated more or less like direct initialization, which remains acceptable. So:
std::atomic<int> x5 = 0;    // error!
std::atomic<int> x6(0);     // fine
std::atomic<int> x7 = {0};  // fine
std::atomic<int> x8{0};     // fine
(I frankly expected copy list initialization to be treated like copy initialization, but GCC and Clang thought otherwise, and 13.3.1.7 [over.match.list] in C++14 backs them up. Live and learn.)

Question #2: Is it good language design to have one of the four syntaxes for defining an int be invalid for defining a std::atomic<int>?

Now let's suppose we prefer to use auto for our variable instead of specifying the type explicitly. All four initialization syntaxes compile, but two yield std::initializer_list<int> variables instead of ints:
auto x9 = 0;                // x9's type is int
auto x10(0);                // x10's type is int
auto x11 = {0};             // x11's type is std::initializer_list<int>
auto x12{0};                // x12's type is std::initializer_list<int>
This would be the logical place for me to pose a third question, namely, whether these type deductions represent good language design. The question is moot; it's widely agreed that they don't. Since C++11's introduction of auto variables and "uniform" braced initialization syntax, it's been a common error for people to accidentally define a std::initializer_list when they meant to define, e.g., an int.

The Standardization Committee acknowledged the problem by adopting N3922 into draft C++17. N3922 specifies that an auto variable, when coupled with direct list initialization syntax and exactly one value inside the braces, no longer yields a std::initializer_list. Instead, it does what essentially every programmer originally expected it to do: define a variable with the type of the value inside the braces. However, N3922 leaves the auto type deduction rules unchanged when copy list initialization is used. Hence, under N3922:
auto x9 = 0;                // x9's type is int
auto x10(0);                // x10's type is int
auto x11 = {0};             // x11's type is std::initializer_list<int>
auto x12{0};                // x12's type is int
Several compilers have implemented N3922. In fact, it can be hard—maybe even impossible— to get such compilers to adhere to the C++14 standard, even if you want them to. GCC 5.1 follows the N3922 rule even when expressly in C++11 or C++14 modes, i.e., when compiled with -std=c++11 or -std=c++14. Visual C++ 2015 is similar: type deduction is performed in accord with N3922, even when /Za ("disable language extensions") is used.

 Question #3: Is it good language design for copy list initialization (i.e., braces plus "=") to be treated differently from direct list initialization (i.e., braces without "=") when deducing the type of auto variables?

Note that these questions are not about why C++ has the rules it has. They're about whether the rules represent good programming language design. If we were designing C++ from scratch, would we come up with the following?
int x1 = 0;                 // fine
int x2(0);                  // fine
int x3 = {0};               // fine
int x4{0};                  // fine
std::atomic<int> x5 = 0;    // error!
std::atomic<int> x6(0);     // fine
std::atomic<int> x7 = {0};  // fine
std::atomic<int> x8{0};     // fine
auto x9 = 0;                // x9's type is int
auto x10(0);                // x10's type is int
auto x11 = {0};             // x11's type is std::initializer_list<int>
auto x12{0};                // x12's type is int
Here's my view:
  • Question #1: Having four ways to say one thing constitutes bad design. I understand why C++ is the way it is (primarily backward-compatibility considerations with respect to C or C++98), but four ways to express one idea leads to confusion and, as we've seen, inconsistency.
  • Question #2: Removing copy initialization from the valid initialization syntaxes makes things worse, because it introduces a seemingly gratuitous inconsistency between ints and std::atomic<int>s.
  • Non-question #3: I thought the C++11 rule about deducing std::initializer_lists from braced initializers was crazy from the day I learned about it. The more times I got bitten by it in practice, the crazier I thought it was. I have a lot of bite marks.
  • Question #3: N3922 takes the craziness of C++11 and escalates it to insanity by eliminating only one of two syntaxes that nearly always flummox developers. It thus replaces one source of programmer confusion (auto + braces yields counterintuitive type deduction) with an even more confusing source (auto + braces sometimes yields counterintuitive type dedeuction). One of my earlier blog posts referred to N2640, where deducing a std::initializer_list for auto variables was deemed "desirable," but no explanation was offered as to why it's desirable. I think that much would be gained and little would be lost by abandoning the special treatment of braced initializers for auto variables. For example, doing that would reduce the number of sets of type deduction rules in C++ from five to four.
But maybe it's just me. What do you think about the vagaries of C++ initialization?

Scott

43 comments:

Rein Halbersma said...

On top of all the vagaries you describe, there's also the auto/template deduction discrepancy for initializer_list, as well as the overly greedy matching of initializer_list constructors. Fixing initialization in order to get a more consistent language should probably also take into account those two aspects. And maybe (probably?) it's impossible to get there without breaking backward compatibility.

Scott Meyers said...

@Rein Halbersma: the only difference between template type deduction and auto type deduction is the treatment of braced initializers, so if you eliminate the special provision for auto and direct-list-initialization, you also eliminate what I assume you mean by the auto/template deduction discrepancy for initializer_list.

TartanLlama said...

Looks like we need uniform uniform initialization. I propose using backticks as delimiters then nuking the whole thing from outer space.

Dan Schmidt said...

I don't think N3922 is all that crazy, since it's not hard for me to read the equals sign in "auto x11 = {0}" as creating the meaning "x11 is the initializer list {0}" while it is not hard to interpret "auto x12{0}" as "x12 is initialized with 0". But clearly the initialization syntax situation is not ideal no matter what choices the language designers make at this point.

Anonymous said...

I think you missed another initialization syntax: int x{}; also initializes x to 0. (i.e., default initialization)

Scott Meyers said...

@Anonymous: I was limiting my scope to syntaxes where I specify an explicit initial value. The scope could be broadened to syntaxes that allow you to define an int with the value zero, but then you also have to include "int x;" with static storage duration. That's a different can of worms.

Marco said...

Hi Scott, there is also:

auto value = T{initializers...};

Honestly, I don't like it because it's not always applicable (e.g. non-moveable type).

Guy Davidson said...

When I introduced Almost Always Auto to the coding standards at my studio it was an unpopular move, precisely because of this ambiguity. I can't see any solution that doesn't break old code. Having said that, C++ is not devoid of nooks and crannies: adding a couple extra while removing a few more seems unavoidable.

cmeerw said...

Actually, do we even have a consistent view of what the type of x12 (auto x12{0};) should be according to C++11, C++14 and C++17?

Well, for C++17 it will obviously be int. But if you look at the published C++11 and C++14 ISO standards, they will tell you it's supposed to be std::initializer_list, but then n3922 says "Direction from EWG is that we consider this a defect in C++14". Which would mean that the type is supposed to be int in C++14 as well?

BTW, (the informative) annex C doesn't list any incompatibility between the versions.

Marco Alesiani said...
This comment has been removed by the author.
Martin Brandst├Ątter said...

As usually great article.
I also think that these 4x types of initialization is a mess.
Newer the less backwards compatibility is a good thing (actually it’s a must have) and sometimes you just have to pay the price.
This only stresses out the fact that developers using c++ absolutely need coding rules and best practices to avoid the confusion, inconsistencies and time spend debugging the damn thing.
I personally prefer "copy-initialization" for simple types and "direct-initialization," for complex types.
If you are about to initialize huge array with “copy list initialization” you probably should put that into some data file...
Just wondering if this could be easier/simpler with some option like : I do not care about c,c++98 [yes/no]

Mark Atkinson said...
This comment has been removed by the author.
Mark Atkinson said...

[edit]
What do you think about the idea that initializer_list<> should be {{0}} ? So no contextual deduction - double braces is an initializer_list<>, otherwise not.

Ref: http://stackoverflow.com/questions/22501368/why-wasnt-a-double-curly-braces-syntax-preferred-for-constructors-taking-a-std

Chris Glover said...

My personal opinion is that initializer_list should be removed from the language because it's not very useful and causes many side effects. I say it's not very useful because real code doesn't initialize containers that way. Real code, in real environments rarely has any sort of list hard coded with some sort of initialization list; the data comes from disk, or the network, etc. On the occasion that one needs such a list, std::array can be used, which can be initialized using aggregate initialization via array = { 1, 2, 3 }. This has always been the case so why do we even need initializer_list?

Unknown said...

I totally agree, Scott, it's a ridiculous state of affairs and is indicative of C++'s sprawling mess in its attempt to be all things to all men. It's reached the point where it is no longer practical to fully understand all of C++ and it's syntactic constructs, and that makes for poor readability, difficult debugging and inconsistent coding styles. In short everyone loses. If I had my way, the committees who keep extending C++ as a language would be called to account and asked to stop extending the language.

Malte Skarupke said...

The problem with initializer_list that Rein Habersma was talking about is that
std::vector a(2, 3);
and
std::vector b{2, 3};
are different things.

In fact if you use brace initialization {}, initializer_list constructors are always preferred, even if they are a worse match:

std::vector c(size_t(2), 3.0f);
and
std::vector d{size_t(2), 3.0f};
are still different things, even though the second constructor has to convert from size_t to float, and there is a perfectly good direct match available.

This means that if you add an initializer_list constructor to an existing class, you can accidentally change code all over the place because that is the only way to make the language pick a worse overload. Try it.

Because of that, you only get two chances of adding initializer_list constructors to your class: 1. When you first switch your codebase to C++11 (which we are about to do at work, so it's good that I received this reminder) and 2. when you first write a new class. You should never add an initializer_list constructor to a class that's already used in C++11 code. If you do, you might accidentally change code all over the place.

If you think that somebody might, at some point in the future, get the bright idea of adding an initializer_list constructor to a class that you're already using, make sure to never use brace initialization {} for that class. There are some classes where this is more likely to happen, like custom containers. Just never use {} initialization for those. Otherwise some well-meaning colleague will probably break your code soon (if you're lucky) or they will change what the code does without breaking it (if you're unlucky) when they add an initializer_list constructor to that custom container.

Malte Skarupke said...

Ah and in my comment above the blogging software swallowed the templates. Here are the vectors again using [] for the template arguments
std::vector[int] a(2, 3);
std::vector[int] b{2, 3};
std::vector[float] c(size_t(2), 3.0f);
std::vector[float] d{size_t(2), 3.0f};

Scott Meyers said...

@Marco: Syntactically,

auto value = T{initializers...};

is copy initialization.

Scott Meyers said...

@cmeerw: There's no mention of N3922 in N4458, which seems to be the most recent document summarizing core language defects. Note that even if the issue were listed there, the document points out that "[Issues listed here] should not be considered definitive until or unless they appear in an approved Technical Corrigendum or revised International Standard for C++." To me, it's clear that in C++11 and C++14, an auto-declared variable using direct-list-initialization is of type std::initializer_list.

Not that this makes much difference in practice. From what I can tell, there is no way to get the current compilers from Gnu and Microsoft to behave as C++14 dictates. Both appear to unconditionally implement the N3922 behavior.

Johannes Schaub said...

I would also like "auto x = { ... }" be ill-formed by the reason that no type can be determined (just like with function template arguments). And I would also like automatic return type deduction to use the same principle, so that this can be written:

auto f() {
if(...) {
std::vector f;
return f;
}
return {};
}

Currently ill-formed because the "{}" would deduce to initializer_list which is nonsense for functions to return. If we make the rules the same as for "auto x = ..." and do not deduce that as initializer_list, it would just work and the "return {}" would be treated as a non-deduced context and be well-formed.

Scott Meyers said...

@Mark Atkinson: I'm not convinced that we need a syntactic shorthand for std::initializer_list objects. In my heart of hearts, I'd like to agree with Chris Glover (above) that such lists just don't occur in real code often enough to worry about. Unfortunately, I know that several people tried to come up with C++98 constructs to serve the purpose (typically by declaring a special type and then overloading the comma operator), and there was even a Boost library that did just that. So there seems to be a demand for the idea, possibly because it's useful in small test programs, demos, and unit test code.

As for using double curly braces to indicate an initializer list, I haven't given it much thought. One concern that comes to mind is that brace initialization can be nested (e.g., for aggregates), so double curly braces already has a meaning in some contexts. Whether that would be problematic in developing a proposal for a double-curly-brace-based initializer lists, I don't know.

Kevin said...
This comment has been removed by the author.
Anonymous said...

I would prefer if

x = {...}

always attempted to use std::initializer_list (typically to do something like aggregate initialization), while

x{...}

always attempted to call a constructor (ie, the same as `x(...)`, but without narrowing), regardless of the type specified. It really sucks that, currently, whether `x{...}` calls a constructor or std::initializer_list is not immediately apparent (you have to know whether the type has a std::initializer_list constructor).

IMHO, it was a pretty big mistake in C++11 to 'overload' on the syntax of `x{...}` for these two things.

Paul Jurczak said...

Hazardous materials lawyer with pedantic bit set would write "hazmat suit" instead of "hazmet suit" ;-)

Scott Meyers said...

@Paul Jurczak: Right you are. Fixed, thanks :-)

cmeerw said...

@Scott There is no mention of n3922 in n4458 because it wasn't raised as a core language issue, but came in directly via the Evolution Working Group (it's issue 161 in n4540)

tkamin said...

The difference between direct and copy initialization is the tool at the hand of the programmer that can express intent of the code and prevent future errors. In the cases when we initializers has a type and value visible at the point of declaration it may look superfluous. But if we consider initialization from the result of the function, we can use direct/copy initialization to express our intent.

For example:
std::chrono::seconds s = get_timeout();
//In that case I will get a compilation error if timeout will return bare number (without unit) or I will get loosy conversion

std::chrono::seconds s(get_timeout());
//I am one that is assuring that timeout is returning seconds and responsible for the errors if it start to return bare number of milliseconds.

Using the auto-everywhere syntax actually prevents us for writing the statement in the that will guarantee that unit mismatch produces compile time error:
auto s = get_timeout(); //Will note declare seconds
And using ETII:
auto s = std::chrono::seconds(get_timeout()); //Will work even if timeout is returning bare number (unknown unit).

Saying that direct and copy initialization should work the same, is like saying that the:
foo(new T());
Should work even if foo is accepting shared_ptr/unique_ptr. Its effectively removes the ability to differentiate safe and unsafe conversion.

For the explanation of need of differentiation between direct and copy list initialization I recommend reading: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4387.html (voted in C++17).

Scott Meyers said...

@tkamin: There is no mention of list initialization in N4387. Can you please clarify what you mean?

tkamin said...

@Scott Meyers:

What differentiates copy and direct initialization, is that the later allows the explicit constructor to be called. The N4387 introduces conditional explict specification for the pair/tuple constructors from elements/other tuples. The idea is that the constructor will be explicit if at least one of the element requires conversion by direct-initialization

As consequence following:
std::pair p1 = {get_timeout(), get_retries};
std::pair p2 = get_timeout_and_retires();
Will compile successfully if get_timeout return compatible time unit, the same applies for first element of tuple-like type returned from get_timeout_and_retires().

This would be excatly the same as for:
void foo(std::pair);
foo({get_timeout(), get_retries});
foo(get_timeout_and_retires());

The direct initialization is mechanism that allow you to perform the same checks during the compilation, for return from function to caller, like the ones used for passing arguments to function.

tkamin said...

The pair should have std::chrono::seconds and int as the template arguments.

Scott Meyers said...

@tkamin: Let's assume that it's worthwhile to distinguish the concepts of direct and copy initialization, i.e., to distinguish between initialization contexts where explicit conversions are and are not permitted. That says nothing about whether those concepts should be deferentially applied in the syntactic constructs I wrote about. For example, the syntax

T var = expr;

is currently defined to be copy initialization, but it could just as easily be defined to be direct initialization. As somebody with a lot of experience trying to explain to programmers that

T var(expr);

and

T var = expr;

do the same thing and typically generate the same code if both compile (and that the latter construct has nothing to do with assignment), I'm inclined to think that this would be a net win.

Getting back to the fundamental question, do you think the examples I posted represent good language design?

tkamin said...

@Scott:

"Let's assume that it's worthwhile to distinguish the concepts of direct and copy initialization, i.e., to distinguish between initialization contexts where explicit conversions are and are not permitted. That says nothing about whether those concepts should be deferentially applied in the syntactic constructs I wrote about."

In my example, I am trying to show that when you are declaring variable that needs to be initialized with some expression, you should have ability to decide whether you want to allow explicit conversion. So if we assume that distinction is worthwhile, we accept the fact that we need two initialization syntax (I will not argue, how they should look).

"As somebody with a lot of experience trying to explain to programmers that [...] do the same thing and typically generate the same code if both compile (and that the latter construct has nothing to do with assignment), I'm inclined to think that this would be a net win."

The whole point lies in the line "if they both compile". The value in the differentiation is that the code will not compile after change. This is the same as explaining that:
double a = 2, b = 5;
double c = 2 + 5;
Generate the same (in terms of performance, used memory code) as:
std::chrono::duration a = 2, b = 5;
std::chrono::duration c = 2 + 5;
My point is that we should not try to hide the difference between direct and non-direct initialization syntax, by embrace them and describe when they are actually useful.

For my answers:
Q1: It is necessary to have two ways to initialize, and except the simple cases, they do not do the same thing. I agree that fact that the {} behaves differently than () in initialization (vector with size constructor case) is bad design.

Q2: The fact that you cannot copy-initialize non-copyable atomic seems reasonable. The fact that you can copy-list-initialize them is insane.

I think that the intent of $13.3.1.7 was to made following ill-formed:
struct A
{
A(int);
explicit A(double);
};
A a = 10.0; //will call A(int);
A a = {10.0}; //ill-formed
Not to remove requirement of accessible copy-constructor.

Q3: The whole paper was done to make generalized initializers works. Using in capture [x{10}] is equivalent to auto x{10} and declared initializer list. The current resolution is compromise that makes lambda work. We still should say that auto + {} is a problem and avoid that.

I think that the original initializer_list design is not as good as it should. Having separate syntax that will force creation of initializer_list and no special rules for deduction will be more pragmaticall. There are more problem with initializer_list itself: (https://groups.google.com/a/isocpp.org/forum/#!topic/std-proposals/cYcqxtsyH2c).

Trevor Hickey said...

Also note that if you have a default constructor, direct initialization, without any parameters(which I suppose, isn't actually direct initialization), will cause C++ to read a function deceleration- not a variable instantiation. This is another gotcha that seems to get a lot people.

Perhaps good rule of thumb should be: "use direct list initialization as your default method of initialization". I've always referred to it simply as brace initialization.

Unknown said...

The undocumented incompatibilty with C++14 is an open Core issue (http://wg21.link/cwg2038) and wording should appear in Annex C in the next standard.

Unknown said...

This "copy initialization" is misleading especially when dealing with move-only types.

auto x = std::unique_ptr{new int}; // ok
auto y = x; // error! copy-constructor is deleted

Aaron McDaid said...

@Mark Atkinson, @Scott Meyers,

The question of using double braces for lists is very interesting, because it highlights what is so strange about the status quo. In my opinion, every brace or parenthesis should do one thing, and do it well.

For example, we're well used to the idea of parentheses to group constructor arguments.

int x(0); // parentheses around the constructor args.

Imagine taking c++03 and adding support for initializer lists. Also, imagine a constructor that takes two arguments, a string and a list of ints:

MyType y("primes", {2,3,5,7});

Now a parallel proposal which allows braces to be used instead of parentheses around constructor args:

int x{0};

Bringing these two (unrelated) proposals together allows:

MyType y{"primes", {2,3,5,7};

Here, each brace does one thing and does it well. It eithers groups the elements of a list, or a collection of arguments for a constructor. So far so good.

Imagine now that MyType has a constructor that takes one arg, which happens to be a list. i.e. the string is not needed. The following code is very readable to me, braces for lists and parentheses around the constructor args:

MyType y( {2,3,5,7} );

Combined with the other proposal we have

MyType y{ {2,3,5,7} };

I wish the committee had stopped here. Most of them problems would go away. There were two, unrelated, proposals that play nicely together.

The problem is that the committee went further. As well as adopting these two proposals, they added an extra "twist" that allows developers to drop one set of braces where one set (the list braces) are nested directly inside the other set (the constructor-arg braces). In other words, the compiler will (sometimes) infer an extra set of braces, leading to the problem that some vector constructors are inaccessible if you want to use "uniform" initialization.

I guess it's too late to undo the "twist", due to backwards compatibility, but I do hope that it should be possible to fix all this somehow.

(I must admit that I have zero knowledge of the actual history of the proposals, but the (fake) story I've told above helps me to understand the current rules.)

Scott Meyers said...

@Aaron McDaid: Interestingly, you can use double braces to distinguish invoking a constructor with zero arguments (one set of braces) versus invoking a constructor with an empty initializer list (as in w5 below):

class Widget {
public:
Widget();
Widget(std::initializer_list il);
};

Widget w1; // calls default ctor
Widget w2{}; // also calls default ctor
Widget w3(); // most vexing parse! declares a function!
Widget w4({}); // calls std::initializer_list ctor with empty list
Widget w5{{}}; // ditto

Bart Vandewoestyne said...

So with direct initialization, we get no errors if we change from int to
std::atomic, and using direct initialization with auto does not change the type to std::initializer_list.

That makes me wonder whether a good guideline would be "Always use direct
initialization." Probably I'm overlooking a lot of things, but from a learning
perspective, I would be happy to hear comments on this line of thought.

Scott Meyers said...

@Bart Vandewoestyne: There are problems with trying to use only direct initialization. To detect narrowing conversions or to initialize aggregates (e.g., structs and arrays), you have to use list initialization. Direct initialization syntax isn't valid for default member initializers. Also, direct initialization syntax can lead to the most vexing parse, which list initialization avoids.

I do my best to cover this territory in Item 7 of Effective Modern C++.

Bart Vandewoestyne said...

Thanks for redirecting me to Item 7. I had quickly skimmed the table of contents, but apparently missed that item ;-) I was quite sure there were things I overlooked. Thanks for pointing them out.

Andreas Quast said...

I think the type deduction for auto where no type is specified (as return value of a function or explicit) should be ill-formed and an error from the compiler.
That is bad design and leaves the reader with too much confusion about the type and the developer can avoid to express his intentions (see CppCoreGuidelines Philosophy P1 and P3).

When you look at the designs, the double brace is more clear and would have avoided the problems with vector (in its current design) and copy list initialization.
The current design was choosen that you can initialise a plain array and a std::vector with the same syntax with only 1 curly braces:

int array[5] = { 1, 2 };
std::vector<int> vector = { 1, 2 };

And thats what leads to the problems with the size + value constructors from vector now, when you direct list initialize a vector.

But somewhere else i read a post that the design of the vector contructors should be changed (which is also hard with backward compatibility). I cant remember the details but they had good points there.

And this is really a problem when you work with template code T{1,2} where you dont know what T is and what constructors T has. There i think T{{1,2}} whould have also been a better design and intent what you are trying to do.

Andreas Weis said...

One use case that I can see for auto deducing std::initializer_list from braced initializers is that you can do this:

for(auto const& i : {1, 2, 3}) {
...
}

Since std::initializer_list provides non-member begin() and end(), this is a very handy way of specifying an ad-hoc range of stuff to iterate over.

Whether this single use case is worth all the confusion is of course debatable.

Scott Meyers said...

@Andreas Weis: There is no need to create a general rule regarding auto and braced initializers in order to have range-based for loops be able to iterate over braced initializers. All you have to do is create a special rule that applies only to range-based fors. Range-based for loops already have special treatment for arrays, so special treatment for a braced initializer would not be breaking new ground. Furthermore, such an approach would avoid the surprising behavior and inconsistency that I mention in the blog post.