Friday, November 13, 2015

Breaking all the Eggs in C++

If you want to make an omelet, so the saying goes, you have to break a few eggs. Think of the omelet you could make if you broke not just a few eggs, but all of them! Then think of what it'd be like to not just break them, but to replace them with newer, better eggs. That's what this post is about: breaking all the eggs in C++, yet ending up with better eggs than you started with.

NULL, 0, and nullptr

NULL came from C. It interfered with type-safety (it depends on an implicit conversion from void* to typed pointers), so C++ introduced 0 as a better way to express null pointers. That led to problems of its own, because 0 isn't a pointer, it's an int. C++11 introduced nullptr, which embodies the idea of a null pointer better than NULL or 0. Yet NULL and 0-as-a-null-pointer remain valid. Why? If nullptr is better than both of them, why keep the inferior ways around?

Backward-compatibility, that's why. Eliminating NULL and 0-as-a-null-pointer would break existing programs. In fact, it would probably break every egg in C++'s basket. Nevertheless, I'm suggesting we get rid of NULL and 0-as-a-null-pointer, thus eliminating the confusion and redundancy inherent in having three ways to say the same thing (two of which we discourage people from using).

But read on.

Uninitialized Memory

If I declare a variable of a built-in type and I don't provide an initializer, the variable is sometimes automatically set to zero (null for pointers). The rules for when "zero initialization" takes place are well defined, but they're a pain to remember. Why not just zero-initialize all built-in types that aren't explicitly initialized, thus eliminating not only the pain of remembering the rules, but also the suffering associated with debugging problems stemming from uninitialized variables?

Because it can lead to unnecessary work at runtime. There's no reason to set a variable to zero if, for example, the first thing you do is pass it to a routine that assigns it a value.

So let's take a page out of D's book (in particular, page 30 of The D Programming Language) and zero-initialize built-ins by default, but specify that void as an initial value prevents initialization:
int x;              // always zero-initialized
int x = void;       // never zero-initialized
The only effect such a language extension would have on existing code would be to change the initial value of some variables from indeterminate (in cases where they currently would not be zero-initialized) to specified (they would be zero-initialized). That doesn't lead to any backward-compatibility problems in the traditional sense, but I can assure you that some people will still object. Default zero initialization could lead to a few more instructions being executed at runtime (even taking into account compilers' ability to optimize away dead stores), and who wants to tell  developers of a finely-tuned safety-critical realtime embedded system (e.g., a pacemaker) that their code might now execute some instructions they didn't plan on?

I do. Break those eggs!

This does not make me a crazy man. Keep reading.

std::list::remove and std::forward_list::remove

Ten standard containers offer a member function that eliminates all elements with a specified value (or, for map containers, a specified key): list, forward_list, set, multiset, map, multimap, unordered_set, unordered_multiset, unordered_map, unordered_multimap. In eight of these ten containers, the member function is named erase. In list and forward_list, it's named remove. This is inconsistent in two ways. First, different containers use different member function names to accomplish the same thing. Second, the meaning of "remove" as an algorithm is different from that as a container member function: the remove algorithm can't eliminate any container elements, but the remove member functions can.

Why do we put up with this inconsistency? Because getting rid of it would break code. Adding a new erase member function to list and forward_list would be easy enough, and it would eliminate the first form of inconsistency, but getting rid of the remove member functions would render code calling them invalid. I say scramble those eggs!

Hold your fire. I'm not done yet.

override

C++11's override specifier enables derived classes to make explicit which functions are meant to override virtual functions inherited from base classes. Using override makes it possible for compilers to diagnose a host of overriding-relating errors, and it makes derived classes easier for programmers to understand. I cover this in my trademark scintillating fashion (ahem) in Item 12 of Effective Modern C++, but in a blog post such as this, it seems tacky to refer to something not available online for free, and that Item isn't available for free--at least not legally. So kindly allow me to refer you to this article as well as this StackOverflow entry for details on how using override improves your code.

Given the plusses that override brings to C++, why do we allow overriding functions to be declared without it? Making it possible for compilers to check for overriding errors is nice, but why not require that they do it? It's not like we make type checking optional, n'est-ce pas?

You know where this is going. Requiring that overriding functions be declared override would cause umpty-gazillion lines of legacy C++ to stop compiling, even though all that code is perfectly correct. If it ain't broke, don't fix it, right? Wrong!, say I. Those old functions may work fine, but they aren't as clear to class maintainers as they could be, and they'll cause inconsistency in code bases as newer classes embrace the override lifestyle. I advocate cracking those eggs wide open.

Backward Compatibility 

Don't get me wrong. I'm on board with the importance of backward compatibility. Producing software that works is difficult and expensive, and changing it is time-consuming and error-prone. It can also be dangerous. There's a reason I mentioned pacemakers above: I've worked with companies who use C++ as part of pacemaker systems. Errors in that kind of code can kill people. If the Standardization Committee is going to make decisions that outlaw currently valid code (and that's what I'd like to see it do), it has to have a very good reason.

Or maybe not. Maybe a reason that's merely decent suffices as long as existing code can be brought into conformance with a revised C++ specification in a way that's automatic, fast, cheap, and reliable. If I have a magic wand that allows me to instantly and flawlessly take all code that uses NULL and 0 to specify null pointers and revises the code to use nullptr instead, where's the downside to getting rid of NULL and 0-as-a-null-pointer and revising C++ such that the only way to specify a null pointer is nullptr? Legacy code is easily updated (the magic wand works instantly and flawlessly), and we don't have to explain to new users why there are three ways to say the same thing, but they shouldn't use two of them. Similarly, why allow overriding functions without override if the magic wand can instantly and flawlessly add override to existing code that lacks it?

The eggs in C++ that I want to break are the old ways of doing things--the ones the community now acknowledges should be avoided. NULL and 0-as-a-null-pointer are eggs that should be broken. So should variables with implicit indeterminate values. list::remove and forward_list::remove need to go, as do overriding functions lacking override. The newer, better eggs are nullptr, variables with indeterminate values only when expressly requested, list::erase and forward_list::erase, and override. 

All we need is a magic wand that works instantly and flawlessly.

In general, that's a tall order, but I'm willing to settle for a wand with limited abilities. The flawless part is not up for negotiation. If the wand could break valid code, people could die. Under such conditions, it'd be irresponsible of the Standardization Committee to consider changing C++ without the above-mentioned very good reason. I want a wand that's so reliable, the Committee could responsibly consider changing the language for reasons that are merely decent.

I'm willing to give ground on instantaneousness. The flawless wand must certainly run quickly enough to be practical for industrial-sized code bases (hundreds of millions of lines or more), but as long as it's practical for such code bases, I'm a happy guy. When it comes to speed, faster is better, but for the speed of the magic wand, good enough is good enough.

The big concession I'm willing to make regards the wand's expressive power. It need not perform arbitrary changes to C++ code bases. For Wand 1.0, I'm willing to settle for the ability to make localized source code modifications that are easy to algorithmically specify. All the examples I discussed above satisfy this constraint:
  • The wand should replace all uses of NULL and of 0 as a null pointer with nullptr. (This alone won't make it possible to remove NULL from C++, because experience has shown that some code bases exhibit "creative" uses of NULL, e.g., "char c = (char) NULL;". Such code typically depends on undefined behavior, so it's hard to feel too sympathetic towards it, but that doesn't mean it doesn't exist.)
  • The wand should replace all variable definitions that lack explicit initializers and that are currently not zero-initialized with an explicit initializer of void. 
  • The wand should replace uses of list::remove and forward_list::remove with uses of list::erase and forward_list::erase. (Updating the container classes to support the new erase member functions would be done by humans, i.e. by STL implementers. That's not the wand's responsibility.)
  • The wand should add override to all overriding functions.
Each of the transformations above are semantics-preserving: the revised code would have exactly the same behavior under C++ with the revisions I've suggested as it currently does under C++11 and C++14.

Clang

The magic wand exists--or at least the tool needed to make it does. It's called Clang. All hail Clang! Clang parses and performs semantic analysis on C++ source code, thus making it possible to write tools that modify C++ programs. Two of the transformations I discussed above appear to be part of clang-tidy (the successor to clang-modernize): replacing NULL and 0 as null pointers with nullptr and adding override to overriding functions. That makes clang-tidy, if nothing else, a proof of concept. That has enormous consequences.

Revisiting Backward Compatibility 

In recent years, the Standardization Committee's approach to backward compatibility has been to preserve it at all costs unless (1) it could be demonstrated that only very little code would be broken and (2) the cost of the break was vastly overcompensated for by a feature enabled by the break. Hence the Committee's willingness to eliminate auto's traditional meaning in C and C++98 (thus making it possible to give it new meaning in C++11) and its C++11 adoption of the new keywords alignas, alignof, char16_t, char32_t, constexpr, decltype, noexcept, nullptr, static_assert, and thread_local.

Contrast this with the perpetual deprecation of setting bool variables to true by applying ++ to them. When C++14 was adopted, that construct had been deprecated for some 17 years, yet it remains part of C++. Given its lengthy stint on death row, it's hard to imagine that a lot of code still depends on it, but my guess is that the Committee sees nothing to be gained by actually getting rid of the "feature," so, failing part (2) of the break-backward-compatibility test, they leave it in.

Incidentally, code using ++ to set a bool to true is another example of the kind of thing that a tool like clang-tidy should be able to easily perform. (Just replace the use of ++ with an assignment from true.)

Clang makes it possible for the Standardization Committee to retain its understandable reluctance to break existing code without being quite so conservative about how they do it. Currently, the way to avoid breaking legacy software is to ensure that language revisions don't affect it. The sole tool in the backward-compatibility toolbox is stasis: change nothing that could affect old code. It's a tool that works, and make no mistake about it, that's important. The fact that old C++ code continues to be valid in modern C++ is a feature of great importance to many users. It's not just the pacemaker programmers who care about it.

Clang's contribution is to give the Committee another way to ensure backward compatibility: by recognizing that tools can be written to automatically modify old code to conform to revised language specifications without any change in semantics. Such tools, provided they can be shown to operate flawlessly (i.e., they never produce transformed programs that behave any differently from the code they're applied to) and at acceptable speed for industrial-sized code bases, give the Standardization Committee more room to get rid of the parts of C++ where there's consensus that we'd rather not have them in the language.

A Ten-Year Process

Here's how I envision this working:
  • Stage 1a: The Standardization Committee identifies features of the language and/or standard library that they'd like to get rid of and whose use they believe can be algorithmically transformed into valid and semantically equivalent code in the current version or a soon-to-be-adopted version of C++. They publish a list of these features somewhere. The Standard is probably not the place for this list. Perhaps a technical report would be a suitable avenue for this kind of thing. 
  • Stage 1b: Time passes, during which the community has the opportunity to develop tools like clang-tidy for the features identified in Stage 1a and to get experience with them on nontrivial code bases. As is the case with compilers and libraries, the community is responsible for implementing the tools, not the Committee.
  • Stage 2a: The Committee looks at the results of Stage 1b and reevaluates the desirability and feasibility of eliminating the features in question. For the features where they like what they see, they deprecate them in the next Standard.
  • Stage 2b: More time passes. The community gets more experience with the source code transformation tools needed to automatically convert bad eggs (old constructs) to good ones (the semantically equivalent new ones).
  • Stage 3: The Committee looks at the results of Stage 2b and again evaluates the desirability and feasibility of eliminating the features they deprecated in Stage 2a. Ideally, one of the things they find is that virtually all code that used to employ the old constructs has already been converted to use the new ones. If they deem it appropriate, they remove the deprecated features from C++. If they don't, they either keep them in a deprecated state (executing the moral equivalent of a goto to Stage 2b) or they eliminate their deprecated status. 
I figure that the process of getting rid of a feature will take about 10 years, where each stage takes about three years. That's based on the assumption that the Committee will continue releasing a new Standard about every three years.

Ten years may seem like a long time, but I'm not trying to optimize for speed. I'm simply trying to expand the leeway the Standardization Committee has in how they approach backward compatibility. Such compatibility has been an important factor in C++'s success, and it will continue to be so.

One Little Problem

The notion of algorithmically replacing one C++ construct with a different, but semantically equivalent, construct seems relatively straightforward, but that's only because I haven't considered the biggest, baddest, ruins-everythingest aspect of the C++-verse: macros. That's a subject for a post of its own, and I'll devote one to it in the coming days. [The post now exists here.] For now, I'm interested in your thoughts on the ideas above.

What do you think?

87 comments:

DeadMG said...

I think that if you're going to make a magic backwards compatibility wand and wave it, then the samples you've identified are a tiny tiny subset of what could be acheived. Just make a new language with the desired semantics and then use the wand to instead backwards compat with your new language.

KjellKod said...

I like it! And to be honest it's way overdue.

It wouldn't break code either if it comes with a new language version. Legacy projects are likely to very carefully upgrading compilers and language versions.

Shea Levy said...

What about code bases that need to compile with older and newer compilers? I think before starting stage 1a, the target of the automatic replacement must be well-established (e.g. feel free to do b = true instead of b++, but might be too soon to try inserting nullptrs everywhere).

Unknown said...

Bold Stance. I like it :)

One solution would probably be for compilers to provide a "strict" mode where deprecated constructs wouldn't be available ?

I know Qt used clang-tidy to add override everywhere, with great effect (while keeping compatibility with old compilers through the use of a macro that conditionally resolves to override)

Ben Craig said...

I offer a counterexample.

https://docs.python.org/2/library/2to3.html

Despite the existence of an automated tool to transform most python2 code to python3, python2 remains quite popular.

One of the reasons I've heard cited for the difficulty in porting to python3 is getting dependencies ported as well. I can see the same argument applying here. If I am unable to port to (a hypothetical) C++50 until all the C90 headers get their act together, then I won't be able to upgrade to C++50. I'm not going to port Windows.h or unistd.h or QT or whatever on behalf of those vendors / suppliers.

Damián Lezama said...
This comment has been removed by the author.
Damián Lezama said...

Please enlight me: why don't we have a "#pragma version" we can put in our sources and crack even bigger eggs? Nothing would break, but every new file we added to a legacy project would be written in a way superior language...

Craig Henderson said...

I agree backward compatibility cannot be an ever-increasing burden on the language, and I applaud your proposals in principle.

Standard compliance could be a big problem with the wand tool though, and I speak as from experience with MSVC. Correct and compilable that is accepted by MSVC is often not Standard compliant, so I assume Clang would reject it. Equally, MSVC cannot compile many Standard constructs, so a re-written source from Clang may not compile in MSVC. I imagine the latter would be less of an issue if Clang only made very localised changes relevant to the deprecations.

Anonymous said...

I fear that macros will make a lot of legacy code wand-proof, ruinning the day once again.

That said, I'm all for deprecating awful things that have a replacement, and seeing if we can remove them 10 years later, be that thanks to automated tools, manual churn, or legacy codebases not planning to update to the latest standard anyways.

It's nice to see that C++17 will remove some old entirely superseded stuff, maybe there is some hope for ramping up the deprecations a little bit.

Rein Halbersma said...

Removing operator++ on bool has been voted into C++17, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0002r0.html

Jussi Pakkanen said...

The major fly in this ointment is C compatibility. Being able to compile random C code as C++ is really useful and getting rid of 0 and NULL for nullptr would break that. The proper solution would be to get nullptr & co into core C but that will be even harder. C does not seem to have a group of people working to improve it through standardisation like C++ has.

John said...

Thanks for writing this. It's a fun thing to consider, and I know others have not-unrelated things in mind. It's all quite exciting really.

One of the advantages of expressing your thought in English rather than C++ is that no one can claim you meant ++b should become b = true when you actually meant (b = true). Operator precedence is one tiny example of why this whole thing isn't going to be easy.

To me it seems there's an important distinction between a standard change that turns valid code invalid and requiring a compiler diagnostic vs. leaving it valid but with a different meaning at runtime. It shouldn't be hard to guess that I'd strongly prefer the former, but for thoroughness here's why I'm thinking that:

* No one has to die. These deaths only come from runtime bugs.
* Someone might forget to run clang-tidy. Compiler errors make that a non-option.
* Requiring that the compiler detect the situation makes it not insane to think that many compiler vendors will provide an option (or separate binary from the same sources) that will fix your source for you. Point being if everyone has to implement detecting it, and the fix is straightforward enough, the tools for automatically fixing it may become abundant and convenient.
* The other category of changes can still be made, they just have to be broken into multiple decade-long steps.

For what it's worth I think there's plenty of other changes we could make if we're willing to take this approach.

How about const being the default and non-const being a (better-chosen) keyword?

I know some people advocate statically-checked throw specifications similar to what is found in Java, and that others hate the idea on the face of it. But it's always been a purely academic discussion because there's no way we could make such a radically breaking change. Or can we?

Martin said...

"break exiting programs", perhaps not what you intend to say.

Daniel said...

Jussi Pakkanen:
in case of 0/NULL/nullptr this can be easily worked around to compile C as C++, I think:

#if defined(__cplusplus) && __cplusplus >= 201103L
#define NULL nullptr
#endif


but I agree that being able to compile C code as C++ is valuable, especially for libraries.
And so is supporting both old and newer versions of C++, also for libraries. As long as supporting C++98 is a thing, updating libs to only support C++11 and newer is is a bad idea.
Adding some hacky #defines for NULL and override might be doable, but workarounds for renamed methods are harder/more invasive.

Arseny Kapoulkine said...

Is there an actual benefit to nullptr that's so important as to break compatibility?

Let me rephrase that.
Imagine a C++11 program that consistently uses nullptr instead of 0/NULL.
Now imagine I do a search & replace of nullptr with 0.

What's the absolute worst thing that can realistically happen?

Rein Halbersma said...

@Arseny Kapoulkine: suppose you have two overloads: void fun(int) and void fun(double*). What should fun(NULL) do? It would call fun(int) if NULL was defined as 0, but it would be ambiguous if NULL was defined as 0L. In contrast, fun(nullptr) calls fun(double*).

Anonymous said...

Hi Scott,

I'm a big fan of yours but I feel like I'm missing something really basic here.

So you want a magic wand, that when run will magically update someone's code to a newer version of the standard. So this is a tool which effectively has to be a backwards compatible compiler for a language, yet exists so that the language itself- and the compilers which implement it- does *not* have to be backwards compatible.

First off: as someone who has worked in "modernization" companies producing tools that magically upgrade old code bases, those tools by their very nature do not work as perfectly as you describe. There is always hardship and pain and manual labor.

Second: we have those tools today. They are called C++ compilers, which are backwards compatable because the language is backwards compatable.

If you want a new version of the language that is not backwards compatable, that also exists in the form of the warnings given off by most compilers. If you don't allow any warnings when compiling, you effectively have a non-backwards compatable version of C++. Additionally, the static analysis tool demo'd by Herb Sutter at CppCon seems to take things another several steps in the right direction.

So let me get to my point. If a company making pace makers won't make builds pass with zero warnings or update their CI process to involve additional static analysis tools even when the world's foremost C++ consultant says to, they for damn sure aren't going to deal with upgrading to a backwards-incompatible compiler that advertises breaking changes even if some magic wand tool claims to exist. The day C++0z shipped without support for the version their code used would be the day they never upgraded C++ again.

MAJOR KUDOS to the person who mentions Python 2 / 3 above. As someone who spends all day working on Python code I can guarantee you that the grass is *NOT* greener on the other side. There is nothing really that sexy about Python 3 compared to 2, yet the thought leaders of that community thought it was crucial they violate backward compatibility to achieve their goals. In the end they left behind everybody who had a large and already working code base, as well as people who enjoyed or needed to be using libraries that did not yet work in Python 3. I have literally never met a Python fan in real life who strongly advocated dropping 2 in favor of 3. Even people who are in love with the language act like Python 3 is idiotic because it's just different enough to make things a huge pain in a language that is all about ease of use.

If C++11 had taken the same path as Python 3, nobody would be using it today but a small fraction of the C++ community that didn't care about backwards compatibility and could afford to update everything, including their libraries. In that nightmare of a parallel timeline C++11 would be competing with other languages like D, Haskell, Rust, etc, because if you used C++11 you would already need to write so much from scratch that you might as well consider something completely different.

So the best state of the world is to leave things optional, meaning we can have our cake and eat it too, not alienating the legions of devs who can't or won't update 100% of their code while still keeping them involved with the language itself while the rest of us can continue to apply best practices, use ever better static analysis tools and get to the world you're describing for free.

Arseny Kapoulkine said...

@Rein Halbersma: Yeah, sure, but that is not catastrophic - you just get an ambiguous overload error?

I can imagine weird overloads being picked up - I think I have seen code like this once or twice:

std::string s(0);

Where the developer intended to do a number to string comparison but got a crash; but these cases almost never come up in my experience.

Anonymous said...

If you want to break some eggs, why not do
#undef NULL
#define NULL nullptr

"NULL" is three characters less than "nullptr", so let's save some typing.

In my opinion, C++ breaking C's "NULL", and requiring "0" was the original problem. Why not fix it? Requiring yet another way to declare a null pointer just seems silly.

Björn Michaelsen said...

Everyone suspicious about the magic wand (clang plugins and rewriters) should have a deep look at:

https://github.com/LibreOffice/core/tree/master/compilerplugins/clang/

and

https://github.com/LibreOffice/core/tree/master/compilerplugins/clang/store

for stuff that is used in production today. The future is now.

Unknown said...

If I may,

Being a (recent) member of WG21 (the C++ standard committee), what I can say from your (interesting) post is that the committee already pretty much follows a path like the one you recommend here.

In Kona, a number of features that were already marked as deprecated were removed from C++ starting at C++17, including such wonderfully (!) weird things as operator++(bool), which behave differently in C and C++ and for which there was no operator--(bool) in C++ anyway. In some cases, there was a push to remove features that we could legitimately place in the «bad eggs» basket, but that was blocked due to the fact that these features had not been marked as deprecated previously. This might not be the proper way to do things according to some, but it does make sense from the perspective of some not-unsignificant parts of the industry, and the removal of such features will probably happen around C++20, after «deprecated» has ha a chance to sink in.

Clang tools indeed do a very good job, and I'm glad you mentioned them. I doubt the standard will advocate specific tools, but proof-of-concept is a good approach. We need more tools like these.

I think, thus, that what you are suggesting is close to what's actually being done. WG21 is a big group with various interests and viewpoints, which slows it down according to some perspectives but I actually think it's a good thing.

Cheers!
Thanks for the interesting suggestions!

Unknown said...

Python is not a good comparison.

The move from Python 2 to Python 3 had semantic changes that were impossible to provide automatic translation to. There were several major changes which were effectively impossible to see syntactically, and which required major modifications of logic. In many cases, the fallback option an automatic conversion would require just didn't exist.

Further, Python also changed the C API. This meant that any C library (and there were a lot) needed to go through a manual migration process, and for some of these this took a long time and a lot of work. There is certainly no automatic tool for that.

Discussing only those changes verified to have perfectly semantics preserving automatic transformations makes this a totally different ballgame. Whether this is the right option is, of course, a different question.

Unknown said...

Suppose you had such tools, compiler and translator. Their composition, compiler · translator, would be exactly identical to a backwards compatible standards revision.

The only difference, then, is that with the two separate one would have to run it over one's code - and its dependencies too. Having them pre-composed, as we do, with an optional clang-tidy step is functionally identical and much less likely to aggravate or cause issues.

One should thus focus not on separating them, but in making changes in the C++ standard purposely compatible with such code transformations. That, to me, is the best of both worlds.

Jeremiah said...

I'd really like to see something like the Python 2 -> 3 break. Along the lines of the first C++ after 2020 becomes C++2, and has no guarantee of backwards compatibility, and just sweeps out a whole bunch of crap.

It's incredibly frustrating that there are so many situations where the simple and obvious code is now considered bad practice. To make things worse, the new and improved waya of doing stuff are usually complicated and ugly because they had to be crammed in some way that would be backwards compatible. It's negative reinforcement for people to do things the wrong way. The correct code should be simple and obvious. If I need to do something unusual, I don't mind jumping through hoops, but I shouldn't have to jump through them all the time for every day stuff.

The reality is there are 30 years of books, and 20 years of internet tutorials telling people to use C++ in ways that are frowned upon now, and those aren't going away. As new people start using the language they're just going to add to the backwards compatibility burden when they see some 15 year old tutorial and all the code still works.

rnburn said...

For the override proposal, a I don't think a magic wand is possible. With templates, you can have code that would be invalid with override and (under proposed rules) invalid without it. Here's an example: http://melpon.org/wandbox/permlink/9DMqAwWXlrpUKBrQ

How would the new uninitialized memory rule affect arrays? or the special rules around dynamically allocated PODs (http://stackoverflow.com/questions/620137/do-the-parentheses-after-the-type-name-make-a-difference-with-new)

Nevin ":-)" said...

If we are going to break backwards compatibility, this seems like a half-measure:

int x;              // always zero-initialized
int x = void;       // never zero-initialized

Why should we still allow

int x;

to compile at all? The problem is that allowing the implicit zero initialization can still hide bugs that sanitizers would no longer be able to find. All it does is make buggy code produce repeatable results, which can make it harder, not easier, to find the bug.

How about making it so you always have to specify an initializer or void?

Unknown said...

I would also use this chance to remove two-phase name lookup and replace it with something more sane. Two-phase name lookup is what requires you to write this->m_x instead of m_x if and only if m_x was declared in a base class that has a type parameter (template).

It makes absolutely no sense that accessing base-class members is different depending on whether the base class has a type parameter or not. When I write m_x, I don't care what the base class looks like. Having to put 'this->' everywhere pollutes the code badly.

The current two-phase name lookup feels like a compiler writers' cop-out. The MS compiler shows that one can do without. This is the first rotten egg I would address in C++.

RobertoParolin said...

The semantics for uninitialized variables 'int a = void' is nice. I'd prefer something that easier to grep for though... especially in the context of a magic wand tool that is going to automatically update my source.

Scott Meyers said...

Thanks to everyone for the comments to date. I won't try to address each individually, but I will try to offer a few clarifications:
- Tools such as I propose would be useful in practice only if they handle real source code, and that means being able to process platform-specific extensions. Few, if any, nontrivial systems are written in 100% pure ISO Standard C++. I don't know how well current Clang-based tools handle this, but Microsoft already has some support for Clang in Visual Studio, and if they haven't done it yet, I'd be surprised if they didn't do what it takes to get their version of Clang to handle MS-specific language extensions. (A wonderful article about the difference between "standard" programs and "real" ones is A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World.)
- The situation as regards Python 2 and Python 3 is a cautionary tale for the Committee, and I'm sure they're well aware of it. If the Committee tries to lead where the community as a whole doesn't want to go, the result is likely to be a splintered community. To avoid that problem, you avoid breaking changes that are not widely accepted by the community. Clang-based tools can demonstrate to the community that a change they already accept in concept is practical, but the tools alone are unlikely to build consensus about breaking changes that important parts of the community oppose in concept.
- I agree that the need to compile some headers under both C and C++ would complicate attempts to get rid of NULL. Perhaps C++ could be specified to define NULL only if another preprocessor symbol--something like __C_COMPATIBILITY--is defined. This wouldn't break new conceptual ground. The behavior of the assert macro already depends on whether NDEBUG is defined.
- I'm pleased to hear that it looks like C++17 will finally get rid of ++ on bools, but the fact that it will have remained deprecated for nearly 20 years underscores my point that the Committee is currently very conservative about removing language features.

More later when I have more time. I'm truly grateful for all the feedback.

Anonymous said...

This might be a minor detail given the intent of the article, but as former C programmer, this seems like a very strange attempt at revisionism to me:

"NULL came from C. It interfered with type-safety (it depends on an implicit conversion from void* to typed pointers), so C++ introduced 0 as a better way to express null pointers."

This is just flat out wrong. From the very beginning (in C), the null pointer was the integer "0". NULL is just preproceesor macro that turns into "0" to distuingish (in the code, not in the compiler) between an integer and a null pointer. See also http://c-faq.com/null/index.html

I wouldn't usually bother with such a statement on the Internet, but from Scott Meyer? C'mon!! :)

beelz said...

I think this would be great in getting C++ to be more competitive with the newer languages nipping at its heels that aren't burdened with design mistakes made 20 years ago.

The one place where it becomes challenging is third-party libraries as someone mentioned because unless you fork them, you have to wait for them to update their code. And they may be prevented from upgrading due to another customer.

For example, imagine a third-party header:

struct bar { int x; int y; };

bar foo () {
bar b;
b.x = 5;
return b;
}

If compiled under the new mode, this would give y a value of 0. If compiled in the old mode this would be an uninitialized value.

If you did also run the tool on that header, then the problem is that now you've had to fork the 3rd-party library.

I don't think it's an unsolvable problem. Perhaps it can be managed with compiler switches that only apply the new behavior on files that exist within certain paths or have a pragma & fall back to old behaviour otherwise.

Alternatively, it's quite possible this just becomes a non-issue with modules but it's going to take a few years to get good penetration with modules.

Germán Diago said...
This comment has been removed by the author.
Johan Tibell said...

+1

In Haskell we didn't do this and the new improved spelling for lots of things is now

#ifdef
Old
#else
New
#endif

Unknown said...

Couldn't disagree more.

Your criticism is inconsistent and picks on issues that hardly matter in real life programming.

Uninitialised arrays? If every local array is zeroed, you won't make many friend with those who use C++ for writing highly optimized code. Just as we despise nanny-states we should not make our compilers act like nannies for us by taking care to initialize every variable for us.

Backwards compatibility is merely also compatibility with C, algorithms should work on bare-metal embedded systems as well as on modern CPU.

Against macros you fail to provide any argument but your personal dislike, and I am happy to know that these will never go. As soon as some committee favours their abolition, I shall happily roll my own preprocessor. Macros are a powerful thing of C/C++ that most other languages fail to provide. If for some undisclosed reason you don't like them, there is an easy solution: don't use them.

Zenju said...

The Python 2/3 argument IMHO is a showstopper argument against introducing versions of C++ code (once the wand did it's job, the code should not be compiled with an older compiler anymore).
So the most basic requirement should be that by default all old code is compiled with backwards-compatibility just as it is right now.

However (!) there is no reason why proactive C++ developers who want to get rid of the old cruft cannot be allowed a compiler option to say so.
By default, all the old constructs should cause a special compiler warning like "ISO deprecated", which all the legacy maintainers are free to ignore. However everyone who systematically wants to write modern constructs now has a tool to find and fix all the places in his code.

Knowing me, knowing you, a-ha said...


"Your criticism is inconsistent and picks on issues that hardly matter in real life programming" - how can Scott Meyers know what a real life programming is. He is an amateur who never wrote a line of production code, not even wanted to do so.
He is a sneak oil salesman, that's all he is.

Unknown said...

The various ways, old and new, to write code make it harder to teach/learn. It also reduce the coherency of a base code shared by people with various habits, and add a useless cognitive load when reading code. Also, newcomers may not be very receptive to the backward compatibility necessity, it's difficult to accept that one must know all the language rusty constructs...

My suggestion in N points.
First, the commitee should define a set of syntaxical constructs, covering features that can be written in several ways (such as defining 3 syntaxical construct for the null pointer initialisation, one for "= NULL", one for "= 0" and the last one for "= nullptr")
Step 2: the commitee mark a subset of thoses syntaxical constructs as deprecated.
Step 3: compilers should warn when using deprecated syntaxical constructs
Step 4: ignored by legacy code mainteners, enjoyed by everyone else.

An other example, for the [default initialized value]. Detect "int i;" as one (deprecated) lexical contruct, and "int i = void;" as a prefered syntaxic construction, with equivalent semantic.

- I'm not a compiler developer, at all, but I feel like it's not the hardest thing to implement (sorry if I'm wrong, I'm just ignorant).
- I feel like nothing is broken in legacy code as nothing is removed, added or modified in the language itself
- the standard is shared by everyone as the deprecated syntaxical constructs are in the ISO text.
- easy to handle as a newcomer, or a more experienced programmer cause we are all used to warnings. Something like "x.cpp, line:42 : prefer using int* i = nullptr, instead of the deprecated form" would be great.

I'd love a feedback on this :)

Scott Meyers said...

@Anonymous: Regarding C, NULL, 0, and C++, this is from Bjarne Stroustrup's 1994 The Design and Evolution of C++, page 230:

"Unfortunately, there is no portable correct definition of NULL in K&R C. In ANSI C, (void*) 0 is a reasonable and increasingly popular definition for NULL."

"However, (void*) 0 is not a good choice for the null pointer in C++. ... A void* cannot be assigned to anything without a cast. Allowing implicit conversions of void* to other pointer types would open a serious hole in the type system."

I believe this backs what I wrote in the blog post. I'm certain that when I was working with C++ in the late 1980s, C's NULL was often defined as (void*)0, and if this Wikipedia article is correct, "In C, ... the macro NULL is defined as an implementation-defined null pointer constant, which in C99 can be portably expressed as the integer value 0 converted implicitly or explicitly to the type void*."

Scott Meyers said...

@Martin Moene: Typo fixed, thanks.

Knowing me, knowing you, a-ha said...

@Scott Meyers
You lack not only a practical knowledge but also academical. Nobody uses Wikipedia to back their arguments. Unless you're amateur. Then you do things like you just did.

Scott Meyers said...

Here are a few more general remarks motivated by comments that have been posted:
- C++ weighs backward compatibility highly, but the Committee has been willing to introduce breaking changes when they felt it was worth it. As I mentioned, C++11 changed the semantics of auto and introduced a number of new keywords. (Introduction of new keywords is always a breaking change.) The new idea in my post is not that the Committee introduce breaking changes, it's that they consider being more aggressive about it by taking into account how the impact of such changes can be mitigated by Clang-based source-to-source transformation tools.
- Experience shows that relying on compiler warnings to inform programmers about "bad" practices is not reliable. In Item 12 of Effective Modern C++, I show code with four different overriding-related mistakes (i.e., valid code where derived class functions look like they should override base class virtuals, but don't). I then write: "With two of the compilers I checked, the code was accepted without complaint, and that was with all warnings enabled. (Other compilers provided warnings about some of the issues, but not all of them.)" To date, the Standardization Committee has shied away from addressing compiler warnings, so there is no such thing as a "mandatory" warning.
- If C++ were to adopt zero-initialization by default, I'd expect a provision for opting out in every context where the current language doesn't require initialization (e.g., arrays, heap objects without constructors, etc.). What the opt-out syntax would be for the various contexts, I don't know, though the first place I'd look would be D to see what it does.

Scott Meyers said...

@rnburn: Good point about templates and the indeterminacy of whether a function should be declared override. I don't know if that's necessarily a deal-killer for requiring override on overriding functions, but it's certainly a notable obstacle. Thanks for pointing this out.

Anonymous said...

As the teenagers says "True Dat."

Also, I think the "pacemaker example" is a bit of a strawman argument. No one building a pacemaker changes any part of their toolchain without a complete regression test.

Death concerns don't need to be a consideration when changing C++. The "keep them living" responsibility lies solely with the people that are creating safety-critical devices. Either they see the benefit of the new version and then update all of their code to conform (and test that they have done so), or they stay on the older version.

Paul Jurczak said...

I'm wholeheartedly with you on cleaning up C++ and breaking some legacy C compatibility. But if you want to break all the eggs, I suggest you go much further. Lets drop all the crud and leave "the much smaller and cleaner language struggling to get out" Bjarne is talking about. This will be painful, but probably not as much as Python 2 vs. Python 3 struggle, which accomplished not that much considering the ramifications. Breaking backward compatibility is a serious business, so let's break it good.

Anonymous said...

I'm pretty sure DEC C on VAX/VMS is just one example of a compiler that used to define NULL as (void*) 0. To "Knowing me, knowing you, a-ha": Scott has written lots of good books that I've found incredibly useful so I don't care what Scott's academic or employment record is - he does good work and that's all that matters. Leave Scott alone and get your facts right.

Scott Meyers said...

@Anonymous: Yes, my use of the pacemaker example and the risk of people dying was simplified and exaggerated for dramatic effect. I'd expect any company developing safety-critical systems to employ extensive regression testing any time any part of the build process changed. In addition, I'd expect such companies to employ detailed code reviews for any kind of change to their safety-critical source code. Adoption of any new compiler version presumably means that the company incurs the costs associated with regression testing, but breaking changes to the language may additionally cause such companies to incur costs associated with changes to their source code, which, for all I know, involve not just internal code reviews but also new rounds of government certification.

My fundamental point is that revising C++ such that old code requires modification in order to retain its semantics can have a dramatic and costly impact on the language's users. This is why the Standardization Committee is very reluctant to adopt breaking changes.

Krzesimir Nowak said...

@Arseny Kapoulkine: If that C++11 program uses a C interface of some library then the problem might exist when C interface exposes a function with variadic parameters. On 64bit platforms passing a nullptr will send 64 bits set to zero (or to any pattern interpreted as a NULL pointer). Passing literal zero - usually only 32 bits, as literal numbers are usually treated as ints and usually int is 32 bits wide even on 64 bit platforms. So replacing nullptr with a 0 in this case will end with C library expecting a pointer reading 32 zero bits and 32 bits of garbage. An example fix done by me: https://github.com/frankosterfeld/qtkeychain/commit/a60acabf1c57cb63b9addc285bed7f4ff0b12abc

Passing NULL would probably solve it too. But that's another problem - sometimes you can pass NULL, but not 0, and probably sometimes you can pass 0, but not NULL. With nullptr, you don't have that problem.

Knowing me, knowing you, a-ha said...

@Anonymous
"Scott has written lots of good books that I've found incredibly useful so I don't care what Scott's academic or employment record is - he does good work and that's all that matters. Leave Scott alone and get your facts right."

I got my facts right. His books? Books written by amateur, who never wrote production code. Those are **facts**.

I prefer books written by professionals who actually wrote production code in their lives. Like Alexandrescu or Bjarne.

ReBoot said...

Amen brother. C++ can be an elegant language, if consequently avoiding the old and embracing the new features. While I can do that myself, it's not nice to read someone else's code and find all the old idiocraties. It would be great to get a C++ compiler that simply refuses to support the legacy stuff unless some obscure #pragma is set. Or at least a #pragma to support only new features that I can set.

CdrJameson said...

Well, you could also drop K&R compatibility, some of the freaky ways of calling functions and I'm guessing C-style casting would get the purge.

I'm pretty sure you'd just split the development community.

While all these things are lovely in theory, in practice there are large areas where such modern innovations as STL and exceptions are still a bit racy and C++11 is just crazy talk (and for practical reasons too, not just techno-dinosauring. Good luck doing all this if Visual Studio 2005 is the only compiler that works on your codebase).

Given that only about one in five developers is even interested in using no-brainer static analysers, or paying attention to compiler warnings, they aren't going to take kindly to forced backwards incompatibility.

Anonymous said...

Dear internet troll of literally gigantic proportions named "knowing me knowing you,aha",

Could you please do yourself - and by that, us as well - a favor and consult psychiatric assistance as soon as possible ? In your dissing every single post of Scott Meyers - for weeks or months now - you are resembling a tibetan prayer wheel constantly repeating the same phrases over and over again as a response. It is clear for everyone in here - except you - that your issues are serious and probably pathologic. Get help !

Should you opt for ignoring my sound advice rest instead assured that the whole community reading the blog has already recognized that you are obviously a
production code writing genius who knows everything better than the entire abysmal rest of the "programming world", or at least better than Scott Meyers.
So, there is actually no need for you adding further comments, right ?

Anonymous said...

Scott:
I need to save this blog so I can look at your book. At least for me,I am happy to see that you have written something on the subject of making C++ robust and not offended at the shameless plug.

I surely hope you are talking about the programmer device for pacemakers and not the actual pacemaker inside someone's body. I worked for Intermedics until we got bought by Guidant on Monday and shut down on Tuesday. We had a project at that time that was being written in C++ and it was likely the compiler did not even have a standard year attached. I was never comfortable with that project given the really ugly tendencies of both compilers and software engineers to do awful things in code. The ugly things in compilers was behind the push for standards in both C and C++!

The actual pacemaker likely has so little memory and power that it would be very strange to be written even in C (but more likely after 16 years of improved technology). It is more likely that the pacemaker code is still being written in assembler and the whole program is likely less that a few thousand lines.

I am confused by your assertions. It would be *very* unlikely once a device is released to production that the compiler would be changed to a newer version. Medical device software that is done properly must undergo massive amounts of verification and validation before it is released. Changing the compiler would require that the compiler itself be exhaustively validated against the old compiler and then the verification and validation of the device would be required to be repeated. That whole process would likely cost hundreds of thousands of dollars (perhaps even a million) in engineer/clinician time to verify that the device is still safe and effective.

It is very likely that all properly managed medical device companies continue to use the initially validated compiler for a *very* long time. As an example, when I worked in arthroscopy, we used the same C compiler for our micro-controllers for 6 years before we even entertained updating to the very latest. And arthroscopy is not nearly as mission critical as pacemakers.

If the company you did contract work for was not that diligent, I would sure like to know who it is so I can tell my Dad to decline to use that manufacturer's pacemakers.


Scott Meyers said...

@Anonymous: Please see my earlier comment that starts with "my use of the pacemaker example and the risk of people dying was simplified and exaggerated for dramatic effect" and concludes with "My fundamental point is that revising C++ such that old code requires modification in order to retain its semantics can have a dramatic and costly impact on the language's users."

Anonymous said...

I definitely missed your point. What I said amplifies your point that changes can have a dramatic and costly impact..

I tend to think of each new version of the standard as a new language and thus there is less reason to have backward compatibility. The whole point of a new version of the standard is to create a new version that is better than the last. C++11 and C++14 are *not* the same language. Driving a stake through the heart of things like ++ on a boolean should absolutely break old code. The idiots who did that operation deserve as much pain as possible since a numeric operation on a boolean has no meaning. FALSE++ and --TRUE are just plain stupid.

I like your suggestions on what "features" to just kill.

I am looking forward to getting your book.

Nevin ":-)" said...

Anonymous CdrJameson said...

"Good luck doing all this if Visual Studio 2005 is the only compiler that works on your codebase)."

If you are stuck using an older toolchain for the foreseeable future, why does it matter what direction future C++ takes? People will inevitably embrace new features, making their code incompatible with your toolchain.

Ben Craig said...

"Driving a stake through the heart of things like ++ on a boolean should absolutely break old code. The idiots who did that operation deserve as much pain as possible since a numeric operation on a boolean has no meaning."
Those people are usually not the ones affected. The people that inherited their code base typically are. Those people may be unable to convince management to upgrade because the new version has too many incompatibilities.

"If you are stuck using an older toolchain for the foreseeable future, why does it matter what direction future C++ takes? "
If the standard breaks a lot of things, then upgrading becomes expensive, making it even more likely that you stay stuck with the old toolchain.

The new C++11 features are absolutely better than what they replace. Were the old features so broken and error prone that they need to be broken at compile time though? How many bugs will that fix?

If the goal is to fix / prevent bugs, then I can definitely see significant justification for changing the behavior of uninitialized variables, and possibly see justification for forcing override. I think that those are two of the more expensive changes though.

ScarvedOne said...

People who keep wanting to turn C and C++ into Java need to rethink their priorities or just switch to Java. One of the first descriptions I ever heard of "What is C?" (this was back in the early 90's) was "Gentleman's Assembly". You're trying to morph a language that had a philosophy of "trust the programmer" into a philosophy (much agreed upon by Java developers everywhere) that "Programmers are too stupid to handle pointers and other language features, so let's take those features away from these incompetent developers". When you start believing in your own incompetence, you're going to be incompetent.

You're trying to make a system that can prevent developers from doing things that might create bugs. You might as well try to get people to use adverbs correctly when asked, "How are you doing?" Five out of six well educated checkers at my local grocery store answer using "well". People in general tend to answer using "good". I get the feeling there are folks in your community of readers who would like to have such people outfitted with shock collars.

Adam Romanek said...

Interesting blog post. Thanks Scott!

As a matter of fact I had thoughts on this subject some time ago, I even posted a question on StackOverflow [1], as I had no idea where to ask this... I'm not sure my idea would be a step in the right direction but I agree this part of C++ needs improvement.

[1] http://stackoverflow.com/questions/28572642/has-anyone-ever-considered-a-more-strict-flavor-of-c-in-which-variables-are

Scott Meyers said...

@Adam Romanek: I think many people agree that unintentionally uninitialized memory is a problem waiting to happen, but there are also many who feel passionately that initialization of built-ins should be optional. I like D's solution (zero-initialization by default, no initialization by request), and I think that Clang makes it possible to create a path from the current C/C++ approach to the D approach in a way that breaks no code and forces nothing on anybody (other than a slightly different syntax for new code that wants to avoid zero initialization).

Craig Henderson said...

I'm struggling with default initialisation. If the argument that defined is better than undefined, or that zero-initialised is 'best'? I don't have experience with languages that default initialise to zero, but my (possibly biased thinking because I grew up with C and then C++) thinking would be that best practice should explicitly initialise before use regardless. An object is default initialise to a state via its ctor, and conceptually has a 'good' default state, defined by the behaviour implementation. But the same isn't true for fundamental type -- what makes 0 a better default that -1 or std::numeric_limits<>::min() or max(), or 18374? The more I think abut this, I think the default uninitialised is actually the right choice :)

Nevin ":-)" said...

Scott, what are your thoughts on sanitizers? If there is broken code because of an uninitialized value, that code may still be broken if you implicitly initialize it to 0, but you can no longer use a sanitizer to look for it, because the sanitizer cannot tell the difference between accidentally implicitly using zero initialization vs. deliberately implicitly using zero initialization. This is why, as I mentioned earlier, that I'd rather just ban default initialization in these circumstances.

Scott Meyers said...

@Craig Henderson: My argument for using 0 is consistency. Both C and C++ already perform zero-initialization on objects of static storage duration. Aggregates that are list-initialized and given too few initializers use 0 as a default value for the members with no corresponding initializer. In the STL, vectors of built-in types are zero-initialized by default.

Like it or not, 0 is the default initialization value in many places in C++. I think there's much to be gained and nothing to be lost by extending this from "many places" to everywhere--with the understandings that (1) there will be a way to opt out of zero initialization where you don't want it and (2) there will be a practical way to migrate legacy code to the new default-initialization rules without any change in semantics.

Scott Meyers said...

@Nevin ":-)": In this post, my interest is in a way to change the language in a way that preserves full backward compatibility, so any context in which sanitizers are currently useful would remain a context where sanitizers would be useful on programs that had been transformed. Note that due to my focus on maintaining strict backward compatibility in this post, nothing I'm proposing would cause existing code that currently has an uninitialized value to become implicitly zero-initialized.

My sense is that you'd like to get rid of implicit zero initialization entirely, and I believe that that, too, is something amenable to "magic wand" legacy program transformation: have a Clang-based tool replace all code that currently gets implicitly zero-initialized with the corresponding syntax for explicit zero initialization.

Anonymous said...

Backwards compatibility is indeed necessary and desired, but C++ has grown too big, and for new code indeed there are lots of bits that one should be able to only enable selectively. Pretty much everything in effective C++ (or more modern equivalent) that can be automatically checked should be checked. And it should be done by default. It's far too easy to write incorrect code (Declare unique pointer, move it, access it and watch it core -- should have been checked statically). I should not have to read three books of C++ gotchas to get some basic code written.

The language doesn't need to change, but the compilers should help you pick a more sensible language subset that is suited for your task. Libraries would need to change to allow this. If you then want to go off and enable a language feature for your specific use case, then go ahead - in a scoped manner.

// disables things that make it easy for you to shoot yourself in the foot.
lang mode strict;

// modern code, sensible language subset

lang enable c_varargs;
// some code that interfaces C
lang disable c_varargs;

As an aside, It's also INSANE that -Wall on gcc doesn't really mean everything. This is again for backwards compatibility.
No!, if I mean everything I really mean it. If I'm upgrading compiler and I wanted to have yesterday's -Wall, then it should be versioned: -Wall-gcc49.
Otherwise what's the point of all those compiler devs spending their time and effort trying to make my life easier if it's so hard to access their efforts?


At the same time, as someone who spends quite a lot of time doing C99, it's also incredible that C++ is not a superset of C. I know this is not completely possible, but a number of things are different when there clearly is no need for it.

Nevin ":-)" said...

You are basically allowing more things to legally compile and run, so there is less checking that compilers and sanitizers can do.

Example #1: the following does not compile with warnings enabled:

int i;
std::cout << i << std::endl;

If the behavior were changed so that i was initialized to 0, this code would have to legally compile.

Example #2: this code compiles but is caught by the sanitizer:

foo(int& ii) { std::cout << ii << std::endl; }

int i;
foo(i);

With your proposed change, sanitizers could no longer diagnose such code, as it would be perfectly legal.


So, why make the change?

Reason #1: this bug is common in the wild. Unfortunately, if we make this change, detecting such a bug becomes harder, not easier, as we can no longer use tools like sanitizers to find it, because this previously illegal code would now be legal. It seems like a big presumption to assume that all the uninitialized values were meant to be 0.

Reason #2: this is a common mistake for beginners. Beginners (as well as experts and everyone in between) ought to be using sanitizers.

On the whole, this kind of change seems to mask bugs instead of preventing bugs. What am I missing?

Scott Meyers said...

@Nevin ":-)": I'm proposing changing C++ only if there is an essentially foolproof way to migrate legacy code with no change in semantics. If you have code where you want the current behavior, migrate it using the migration tool (during the decade or more where the practicality of the change is being considered), and nothing will behave differently. If you choose not to use the migration tool and the language is changed, then the semantics of your program may change.

You're essentially arguing that C++ should retain the current rule whereby some memory is implicitly uninitialized on some platforms some of the time so that the subset of users who use sanitizers can have those sanitizers diagnose problems arising from reads of uninitialized memory. That doesn't make a lot of sense to me.

My guess is that if all memory holding built-ins were zero-initialized by default, sanitizer implementers would find a way to identify zero-initialization writes and offer an option to disregard them when looking for reads of uninitialized memory. If that were to happen, sanitizers could offer the same functionality they offer now.

Anonymous said...

I'm inclined to agree with Nevin on default initialization not being 0, at least with regards to simple stack types like int.

I don't want to look at a declaration and assume that the author intended to initialise to zero, I'd rather explicitly know.

I also often find bugs by the fact that the values are random each time rather than zero as often then the bug eventually results in a crash; where as with 0, if the bug doesn't result in a crash immediately it results in an on going but a subtle error, the value is consistent and often goes unseen.

One could argue that the other way though, that they'd rather auto initialize to 0 so that if that wasn't intended, one can predict what will/has happened to the data/system better.

If one likes that latter view, then could say default initialization to zero is done, but the user/compiler should still not let default initialization to zero affect it's compile time analysis of errors. But it will likely affect runtime.

But overall, so far, I'm not in favour of automatic initialization, at least of simple types like int. I want the compiler and the user to be able to assume that anything not explicitly initialized is a potential source of error.

I might be more inclined to want pods' default initialized to zero if there is no default constructor as that is a pretty common use case (see windows sdk) and when debugging if you see a pod that is all zero, it's easier to see that it hasn't been further initialized with something extra when it probably should have been which helps see that something is wrong. But as interesting as this idea is, I'm just as inclined to think what we have already is still workable though and not the most pressing problem. I hope good things will come out of it all being discussed though so thanks for raising the issues.

Unknown said...

Great idea! Except, there are already enough languages which are perfect candidates for this. So just use one of them instead of making a new one!

Rein Halbersma said...

@Scott Re: the compiler warnings on your Item 12. I tested this with both clang 3.4 through 3.8SVN and gcc 6.0SVN. clang does warn with -Wall on mf1 through mf3 (even providing the reason for mf1 and mf2). gcc is silent even with -Wsuggest-override. I filed a bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68391

Scott Meyers said...

@Rein Halbersma: Thanks for the test results with Clang and gcc. The current MSVC issues no warnings with /W4, but with /Wall (an option I didn't know about until just now), it warns about mf1-mf3. I'll make an errata list entry to update the book, because it now seems to be the case that both Clang and MSVC issues warnings for mf1-mf3 with full warnings enabled, though gcc remains silent.

Ben Craig said...

Note that /Wall is extremely noisy, as it turns on all the warnings. It is roughly equivalent to GCC's and Clang's -Weverything.

Unknown said...

Hi Scott, i think you are rigth and it will be an improvement but i think it shouldnt be the language who encourages it, instead leave C++ as it is and make the ides and compiler to do that, null pointer = error instead of a warning (if ever it is in the ide)

This way you dont break any legacy code and you will get better use of the tool, you should probably say in the Visual Studio conmunity and make a proposal for it, VS guys are doing great improvements and listen to the com, or to CLion, it will make it product better and make it diferent so more people will use it and it will be less eggs in the basket

P.D I am ready for your next book ;-)

jensa said...

Hi Scott,

I cannot agree more that it is time to break some eggs. As for the general argument that it a important to compile C code as C++, I think that the extern "C" solves this issue. Maybe that would be a way for the standard to evolve, e.g. define a extern "C++14" when there are breaking changes. However, I can see why a compiler vendor would not want this because it basically leads to different compilers which have to be maintained.

I think that the a different wording of the standard could help to lead the path to remove redundant features, and make it easy to implement tools. As an example, I looked at typedef and type aliases. The standard defines typedef and type aliases in one section and defines a translation from type alias syntax to typedef. Let's assume that type aliases should replace typedefs in the long run. In this case, I think it would be better to define type aliases as the primary construct, and then define the semantics of typedef by referring to that definition, maybe in a different section for deprecated features. That would make it very clear that type aliases are the good feature, and typedef will vanish. If it is put in a special "deprecated features section", it would help the hidden beautiful language in C++ to eventually reach the surface.

Another example would be uniform initialization. The standard defines all forms of initialization in one section and e.g. defines the semantics of T a{b} and T a(b) in the same paragraph. I think it would be much more explicit if uniform initialization is the preferred way to define uniform initialization, and the define alternative forms in separate sections, possibly deprecated.

Cheers,
Jens

Unknown said...

I agree it's time to do this break those eggs, as for the comment on python 2to3 two points
1). the change is happening but slowly and
2). the change has been much slowed by the fact that the 2to3 utility was sloppily implemented it couldn't even do trivial changes like print args --> print(args) we can do better

npl said...

I have found one use for nullptr and deprecation or removing the automatic cast from 0 to nullptr would break it.

When you want to provide enums as bitmask.types (17.5.2.1.3), you have to define a few operations on them. set, clear, xor and negate are easy and even documented in the standard.
Now taking a enum bitmask and 2 instances X,Y, defined as
enum class bitmask : int_type{....}; bitmask X,Y;

you would have to support 2 additional operations (noted in the standard):
(X & Y) == 0; (X & Y) != 0;

In other words, you need operator== and operator!=, which ideally ONLY TAKE CONSTANT 0. The solution I came up with was:
constexpr bool operator ==(bitmask X, const decltype(nullptr))
{ return X == bitmask(); }

Maybe this is a bit offtopic, but if nullptr is an egg that has to be broken, what would the best solution be for bitmask types. I found using ints or other types to be more troublesome since comparing with anything but constant 0 would be undefined and might have different behaviour depening on the implementation and size of the enum.

Vincent G. said...

About override why not just write:

class ... {
...
override type fun(paramlist);
};

instead of:

virtual type fun(paramlist) override;

Scott Meyers said...

@npl: From what I can tell, your operator== function returns whether all bits in the bitmask are 0, so I don't see why you want a binary operator to test that. Why don't you just define a function something like this?

constexpr bool noBitsAreSet(bitmask X)
{ return X == bitmask(); }

Scott Meyers said...

@Vincent G.: I'm not familiar with the history of the placement of "override" at the end of the function declaration, sorry.

Greg Marr said...

It's a contextual keyword, which means it has to go after the function definition. To put it at the front, it would have to be a full keyword, meaning that it's a reserved word, and all programs that used it as a type or variable name would be invalid.

Scott Meyers said...

@Greg Marr: I should have remembered that; I write about it in Effective Modern C++, Item 12. Thanks for reminding me!

Vincent G. said...

@Greg Marr: Thank you for the explanation.
But IMHO the 'virtual' should also contextual. And so having the possibility to use override at the same place doesn't require that have them as full keyword. It is compiler implementation dependent. Or at least, for facility, it could be a 'local' (optional) keyword (mmmmm sounds like contextual isn't it). So no problem with backward compatibility.

npl said...

Cause the interface of a bitmasktype requires it: http://en.cppreference.com/w/cpp/concept/BitmaskType
Its supposed to be identical in use / interchangeable with a pre11 plain enum

Scott Meyers said...

@npl: Okay, I think I see what you mean. However, I believe that [bitmask.types]/4 is simply defining terminology, not required expressions. ("The following terms apply...") As such, I think cppreference's interpretation of that clause of the Standard is incorrect.

Even if we assume that [bitmask.types]/4 requires that the expression "(X & Y)" be testable to see if it's nonzero, I don't see any requirement that the zero value be a compile-time constant. That is, I believe this would be valid:

template<typename T>
void setToZero(T& param) { param = 0; }

int variable;
setToZero(variable);

if ((X & Y) == variable) ....

As such, if you choose to define operators to test the result of X & Y against zero, I think you have to support variables with the runtime value zero, not just zero as a compile-time value. (If somebody were to do something like test X & Y against 42, results would presumably be undefined.)

If you really want to ensure that the value passed in is a compile-time zero, I suspect you can find a way to do that using either static_assert or enable_if. That is, you can still do what you want to do without relying on 0 being interpreted as the null pointer constant.

npl said...

@Scott Meyers:
You know, I am talking about an ideal, egg-free omlett world (sounds somehow implausible).
There are alot ways I could check for "zero" (a simple template function would do), but ideally I would want to take an existing codebase and replace an plain-old-enum with an enum class by simply transforming the member names (given an existing naming sheme with the enum-name as prefix).

The upsides are cleaning up the namespace and defining a underlining type. The last part is more important than some might think, clang and gcc seem to have different defaults for "short-enums" (on arm atleast). The enum below would be either one or 4 byte and this bit me already. egg:

enum EType {
eType_Somebit = 1 << 0,
eType_Otherbit = 1 << 1,
};

bool foo(EType e)
{
return (e & eType_Somebit) != 0;
}

// easily transformed (search + replace mainly) to omlette:
enum class EType : unsigned {
Somebit = 1 << 0,
Otherbit = 1 << 1,
}
// define & | ~ &= |= == != operators for EType, a MACRO can do this

bool foo(EType e)
{
return (e & EType::Somebit) != 0;
}

BTW, I had written some lines to explain why there cant be a "standard" way to test for constant 0 argument - but while writing I might have found one =)

Scott Meyers said...

@npl: I don't have a solution for you, I'm sorry. Perhaps others here do. If it makes you feel any better (and it probably won't), because your current codebase only compares bitmasks with the compile-time constant 0, if you modify the equality and inequality comparison functions for bitmasks to accept an arbitrary int, the behavior of your current code won't change, because only zeros will be passed in.

npl said...

@Scott: Its fine, helped me thinking about the problem again. Thanks for your time.
The reason I want only constant zero is simply to enforce that code, some simple canonical "bitcheck" that has a well defined meaning whatever the type is. x == 0 is a good candidate because its pretty simple, "builtin" for plain enums, widely known and used.

olvap olvap said...

Approach with third party tools is completely wrong and is likely to fail.
No "magic wand" will help if we have 10M lines of sources with requirement to do code review and write (and physically sign!) "formal code review reports" (this is what FDA requires from healthcare related project!).
The best solution for this problem is to adopt for C++ something like "use strict" does for JavaScript.

Now it is up to developer (!) to decide: do we need to update all the 10000files of project sources with "magic wand" (and write tons of those "formal code review reports"), or use this new "strict" mode only for new or refactored files!

This new "#pragma strict" or "using strict" will not be the same thing as all those "MISRA" or "embedded C++" or other "ugly ducklings" like "special safe coding conventions" - every big enough company (or even division within this company) has invented their own "special safe coding convention" to work around C/C++ flaws and has their own set of ugly buggy tools to support this hell.
The new "#pragma strict" or "using strict" or whatever we call it will be different from this hell just because it is part of C++ standard and every conforming compiler is forced to support this new feature! No more reinvention the wheel and no more trying to tie square wheels invented by my company to triangle wheels invented by third party companies!