In 1992, I was responsible for organizing the Advanced Topics Workshop that accompanied the USENIX C++ Technical Conference. The call for workshop participation said:
By the end of the day, my mood was different. Regardless of how we approached the problem of automated code comprehension, we ran into the same problem: the preprocessor. For tools to understand the semantics of source code, they had to examine the code after preprocessing, but to produce acceptable transformed source code, they had to modify what programmers work on: files with macros unexpanded and preprocessor directives intact. That means tools had to map from preprocessed source files back to unpreprocessed source files. That's challenging even at first glance, but when you look closer, the problem gets harder. I found out that some systems #include a header file, modify preprocessor symbols it uses, then #include the header again--possibly multiple times. Imagine back-mapping from preprocessed source files to unpreprocessed source files in such systems!
Dealing with real C++ source code means dealing with real uses of the preprocessor, and at that workshop nearly a quarter century ago, I learned that real uses of the preprocessor doomed most tools before they got off the drawing board. It was a sobering experience.
In the ensuing 23 years, little has changed. Tools that transform C++ source code still have to deal with the realities of the preprocessor, and that's still difficult. In my last blog post, I proposed that the C++ Standardization Committee take into account how source-to-source transformation tools could reduce the cost of migrating old code to new standards, thus permitting the Committee to be more aggressive about adopting breaking changes to the language. In this post, I simply want to acknowledge that preprocessor macros make the development of such tools harder than my last post implied.
Consider this very simple C++:
I don't recall a lot of talk about templates at the workshop in 1992. At that time, few people had experience with them. (The first compiler to support them, cfront 3.0, was released in 1991.) Nevertheless, templates can give rise to the same kinds of problems as the preprocessor:
The fact that the problem is hard doesn't mean it's insurmountable. The existence of refactoring tools like Clang-tidy (far from the only example of such tools) demonstrates that industrial-strength C++ source transformation tools can be developed. It's nonetheless worth noting that such tools have to take the existence of templates and the preprocessor into account, and those are noteworthy complicating factors.
-- UPDATE --
A number of comments on this post include references to tools that chip away at the problems I describe here. I encourage you to pursue those references. As I said, the problem is hard, not insurmountable.
The focus of this year's workshop will be support for C++ software development tools. Many people are beginning to experiment with the idea of having such tools work off a data structure that represents parsed C++, leaving the parsing task to a single specialized tool that generates the data structure.As the workshop approached, I envisioned great progress in source code analysis and transformation tools for C++. Better lints, deep architectural analysis tools, automatic code improvement utilities--all these things would soon be reality! I was very excited.
By the end of the day, my mood was different. Regardless of how we approached the problem of automated code comprehension, we ran into the same problem: the preprocessor. For tools to understand the semantics of source code, they had to examine the code after preprocessing, but to produce acceptable transformed source code, they had to modify what programmers work on: files with macros unexpanded and preprocessor directives intact. That means tools had to map from preprocessed source files back to unpreprocessed source files. That's challenging even at first glance, but when you look closer, the problem gets harder. I found out that some systems #include a header file, modify preprocessor symbols it uses, then #include the header again--possibly multiple times. Imagine back-mapping from preprocessed source files to unpreprocessed source files in such systems!
Dealing with real C++ source code means dealing with real uses of the preprocessor, and at that workshop nearly a quarter century ago, I learned that real uses of the preprocessor doomed most tools before they got off the drawing board. It was a sobering experience.
In the ensuing 23 years, little has changed. Tools that transform C++ source code still have to deal with the realities of the preprocessor, and that's still difficult. In my last blog post, I proposed that the C++ Standardization Committee take into account how source-to-source transformation tools could reduce the cost of migrating old code to new standards, thus permitting the Committee to be more aggressive about adopting breaking changes to the language. In this post, I simply want to acknowledge that preprocessor macros make the development of such tools harder than my last post implied.
Consider this very simple C++:
#define ZERO 0 auto x = ZERO; int *p = ZERO;In the initialization of x, ZERO means the int 0. In the initialization of p, ZERO means the null pointer. What should a source code transformation tool do with this code if its job is to replace all uses of 0 as the null pointer with nullptr? It can't change the definition of ZERO to nullptr, because that would change the semantics of the initialization of x. It could, I suppose, get rid of the macro ZERO and replace all uses with either the int 0 or nullptr, depending on context, but (1) that's really outside its purview (programmers should be the ones to determine if macros should be part of the source code, not tools whose job it is to nullptr-ify a code base), and (2) ZERO could be used inside other macros that are used inside other macros that are used inside other macros..., and especially in such cases, reducing the macro nesting could fill the transformed source code with redundancies and make it harder to maintain. (It'd be the moral equivalent of replacing all calls to inline functions with the bodies of those functions.)
I don't recall a lot of talk about templates at the workshop in 1992. At that time, few people had experience with them. (The first compiler to support them, cfront 3.0, was released in 1991.) Nevertheless, templates can give rise to the same kinds of problems as the preprocessor:
template<typename T> void setToZero(T& obj) { obj = 0; } int x; setToZero(x); // "0" in setToZero means the int int *p; setToZero(p); // "0" in setToZero means the null pointerI was curious about what clang-tidy did in these situations (one of its checks is modernize-use-nullptr), but I was unable to find a way to enable that check in the version of clang-tidy I downloaded (LLVM version 3.7.0svn-r234109). Not that it matters. The way that clang-tidy approaches the problem isn't the only way, and one of the reasons I propose a decade-long time frame to go from putting a language feature on a hit list to actually getting rid of it is that it's likely to take significant time to develop source-to-source translation tools that can handle production C++ code, macros and templates and all.
The fact that the problem is hard doesn't mean it's insurmountable. The existence of refactoring tools like Clang-tidy (far from the only example of such tools) demonstrates that industrial-strength C++ source transformation tools can be developed. It's nonetheless worth noting that such tools have to take the existence of templates and the preprocessor into account, and those are noteworthy complicating factors.
-- UPDATE --
A number of comments on this post include references to tools that chip away at the problems I describe here. I encourage you to pursue those references. As I said, the problem is hard, not insurmountable.