Thursday, April 5, 2012

C++11 is Almost Here for Real!

This entry is the first under a slightly expanded charter for this blog.  Until now, I've restricted blog entries to announcements about my professional activities.  Hereafter, entries will simply be related to my professional activities and interests. The difference is subtle, and, in all likelihood, you will rarely notice it.



At the beginning of my Overview of C++11, I show a simple program to compute the most common words in a set of input files.  I write the program once using "old" C++ (i.e., standard C++98/03), then again using features from C++11. 

In 2009, when I first published the C++11 program (at that time, what became C++11 was still known as C++0x), there was no compiler that could come anywhere near compiling it. Testing the code required replacing standard C++11 library components with similar components available in TR1 or from Boost or Just Software Solutions, and language features like auto, range-based for loops, lambda expressions, and template aliases had to be replaced with typically clumsier C++98/03 constructs that were more or less equivalent in meaning.

This week I tested my simple C++11 sample program with Stephan T. Lavavej's excellent  distribution of gcc 4.7 for Windows as well as Microsoft's VC11 beta.  gcc 4.7 has lots of support for C++11, but the concurrency API still seems to be largely missing, at least for Windows, so my sample program doesn't get very far with that compiler.  [Update 6 April 2012: As noted in the comments below, when invoked in the proper manner on the proper platform, gcc 4.7 compiles and runs my program without modification!]

 The situation with the VC11 beta is a lot better.  Only two lines have to be changed.  The template alias
  using WordCountMapType = std::unordered_map<std::string, std::size_t>;
needs to be replaced by its typedef equivalent:
  typedef std::unordered_map<std::string, std::size_t> WordCountMapType;
And the z length specifier in this call to printf,
  std::printf("  %-10s%10zu\n", (*it)->first.c_str(), (*it)->second);
needs to be replaced with its VC++ equivalent, I:
  std::printf("  %-10s%10Iu\n", (*it)->first.c_str(), (*it)->second);
Other than that, the demonstration program I wrote three years ago (which, in fairness to compiler writers, was two and a half years before the C++11 standard was ratified) compiles cleanly with VC11.

If you have access to a compiler that compiles my program without modification, please let me know! The program itself is below.  You can see a more colorful version of it, along with some commentary, and an example invocation and the corresponding output, on slides 13-15 of the free sample of my C++11 training materials.

Scott

#include <cstdio>
#include <iostream>
#include <iterator>
#include <string>
#include <fstream>
#include <algorithm>
#include <vector>
#include <unordered_map>  
#include <future>

using WordCountMapType = std::unordered_map<std::string, std::size_t>;

WordCountMapType wordsInFile(const char * const fileName)   // for each word
{                                                           // in file, return
  std::ifstream file(fileName);                             // # of
  WordCountMapType wordCounts;                              // occurrences

  for (std::string word; file >> word; ) {  
    ++wordCounts[word];
  }
  return wordCounts;
}

template<typename MapIt>                                            // print n most  
void showCommonWords(MapIt begin, MapIt end, const std::size_t n)   // common words
{                                                                   // in [begin, end)
  // typedef std::vector<MapIt> TempContainerType;
  // typedef typename TempContainerType::iterator IterType;
  std::vector<MapIt> wordIters;
  wordIters.reserve(std::distance(begin, end));
  for (auto i = begin; i != end; ++i) wordIters.push_back(i);

  auto sortedRangeEnd = wordIters.begin() + n;

  std::partial_sort(wordIters.begin(), sortedRangeEnd, wordIters.end(),
                    [](MapIt it1, MapIt it2){ return it1->second > it2->second; });

  for (auto it = wordIters.cbegin(); 
       it != sortedRangeEnd; 
       ++it) {
    std::printf("  %-10s%10zu\n", (*it)->first.c_str(), (*it)->second);
  }
}

int main(int argc, const char** argv)   // take list of file names on command line,
{                                       // print 20 most common words within;
                                        // process files concurrently
  std::vector<std::future<WordCountMapType>> futures;

  for (int argNum = 1; argNum < argc; ++argNum) {
    futures.push_back(std::async([=]{ return wordsInFile(argv[argNum]); }));
  }

  WordCountMapType wordCounts;
  for (auto& f : futures) {
    const auto wordCountInfoForFile = f.get();  // move map returned by wordsInFile

    for (const auto& wordInfo : wordCountInfoForFile) {
      wordCounts[wordInfo.first] += wordInfo.second;
    }
  }

  std::cout << wordCounts.size() << " words found.  Most common:\n" ;
 
  const std::size_t maxWordsToShow = 20;
  showCommonWords(wordCounts.begin(), wordCounts.end(), 
                  std::min(wordCounts.size(), maxWordsToShow));
}

25 comments:

  1. It compile with gcc 4.8, though doesnt seem to run:

    g++-4.8 --version
    g++-4.8 (GCC) 4.8.0 20120311 (experimental)
    g++-4.8 --std=c++11 prog.cpp
    ./a.out prog.cpp
    terminate called after throwing an instance of 'std::system_error'
    what(): Unknown error 18446744073709551615
    Aborted

    I did not investigated much.

    With clang it fail compiling in GNU libstd++ headers (lots of errors seem related to atomics support).

    clang++ --version
    clang version 3.1 (trunk 150359)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix

    fx

    ReplyDelete
  2. So it compiles with gcc 4.8 under Linux. Cool! Did you happen to try gcc 4.7 as well? Unfortunately, I have access only to Windows, so I can't check myself.

    If you get a chance to get some more information about the runtime failure, I'd be interested to know the details. It seems to work correctly for me under Windows (once I make the changes needed to get it to compile).

    Scott

    ReplyDelete
  3. Threading support works if you compile gcc-4.7 on Cygwin and chose thread=posix.

    Paul

    ReplyDelete
  4. Can you check to see if the program I posted compiles under the conditions you describe? If so, can you let me know if it seems to run properly?

    Thanks,

    Scott

    ReplyDelete
  5. Sure,

    can you repost your code between pre tags or some other tag that will preserve the original format ? I'll check your code on a Mac with gcc-4.7 and on Cygwin (Windows 7).

    Paul

    ReplyDelete
  6. I've used the following easy way to test things with Linux compiler in Windows: (1) install VirtualBox, it's free, and (2) install Debian in a virtual box. The only problem I had with that was lack of built-in support for Norwegian keyboard. I found a config file on the net that fixed that, and then even copy and paste between Windows and the Lunix box worked (that is, works)! Cheers, - Alf

    ReplyDelete
  7. @Anonymous: the code is between pre tags, and copying it from the blog post (either from my blog site or from the entry's version in Google Reader) using Firefox lets me paste it into a text editor with formatting intact. Are you not able to do the same thing?

    Scott

    ReplyDelete
  8. It compiled on Linux with GCC 4.7.0 with only one warning:
    warning: ISO C++ does not support the ‘z’ gnu_printf length modifier [-Wformat]
    I've used -Wall -Wextra -pedantic -std=c++11.

    When it comes to running it, the situation changes. It works fine without any arguments, but throws an "Unknown error -1" std::system_exception if I pass some file name to it.

    ReplyDelete
  9. It turns out one has to pass -pthread to g++ and it starts working. I hope it will be on by default when compiling in c++11 mode in the future versions of gcc.

    To sum up, your example works perfectly without any code modifications on GCC 4.7.0.

    ReplyDelete
  10. @Scott

    Sorry about that, apparently when I press "Show Original Post" when I'm in the comment zone, I'm redirected to a version of your page without proper formatting.

    Using a direct link to your blog works perfectly:

    http://scottmeyers.blogspot.ca/2012/04/c11-is-almost-here-for-real.html

    So, I've tested your code on Mac OSX Lion and Cygwin (under Windows 7) with a custom compiled gcc-4.7.0 and it works perfectly. Here is a link to some screenshots (I'll keep them for a few days in my Dropbox folder), feel free to use them as you wish:

    https://www.dropbox.com/gallery/56297644/1/tests?h=824eb5

    Paul

    ReplyDelete
  11. I've revised the title of this blog entry and updated the content to reflect the fact that, on the proper platform and with the proper command line options, gcc 4.7 accepts my sample program. I'm very excited about this, and I thank Wojciech and Anonymous for letting me know about it. C++11, at least for the simple demonstration I wrote three years ago, is finally here!

    Scott

    ReplyDelete
  12. It compiles on Mac with the macports g++-4.7:

    g++-mp-4.7 (GCC) 4.7.0 20120225 (experimental). It runs and seems to produce the correct output.

    ReplyDelete
  13. As Wojciech Cierpucha realized, pthread is not automatically linked, and nothing warn about any symbol missing.

    Here the test with gcc-4.8 again, no errors, no warning, and seem to run fine:

    g++-4.8 --version
    g++-4.8 (GCC) 4.8.0 20120311 (experimental)
    g++-4.8 --std=c++11 -lpthread prog.cpp
    ./a.out prog.cpp
    155 words found. Most common:
    // 13
    #include 9
    { 8
    } 8
    for 7
    = 7
    const 5
    return 4
    WordCountMapType 4
    wordCounts; 3
    words 3
    of 2
    : 2
    << 2
    std::vector 2
    most 2
    it 2
    in 2
    common 2
    != 2

    Regards,
    fx

    ReplyDelete
  14. Adding -Wall -pedantic print this warning though:

    prog.cpp:41:5: warning: ISO C++ does not support the ‘z’ gnu_printf length modifier [-Wformat]

    I guess its just a matter of detail ;)

    fx

    ReplyDelete
  15. Regarding gcc's diagnostic, "warning: ISO C++ does not support the ‘z’ gnu_printf length modifier," this is incorrect as of C++11. C++11 relies on C99 for the specification for printf formatting strings, and "z" is part of C99 (in 17.6.9.1/7).

    Scott

    ReplyDelete
  16. Yep its a matter of details compared to the first time I played with gcc support for c++0x! ;)

    ReplyDelete
  17. gcc 4.6 is perfectly fine as well. Needs the using changed to a typedef as MSVC11Beta apparently.

    Output below. I would like to ask a sneaky wee question if possible (I bought the c++11 overview, very good, no enough grovelling). Could you tell me if

    1: Will std::async launch automatically in async mode or do we need to make sure by passing a launch to it (std::launch::async) ?

    2: Are there any thread pool policies compilers should be expected to follow, i.e. do I need to worry about thread pools/efficient reuse etc.?

    (I did read where you say we need to care about cleaning up thread local storage objects and that's OK)


    ~/ggcov-0.8.3 $ gcc --version
    gcc (Ubuntu/Linaro 4.6.3-1ubuntu4) 4.6.3
    Copyright (C) 2011 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

    g++ -Wall -Wextra -Weffc++ -std=c++0x -pthread d.cc

    Output

    155 words found. Most common:
    // 13
    #include 9
    { 8
    } 8
    for 7
    = 6
    const 5
    return 4
    WordCountMapType 3
    typedef 3
    words 3
    wordCounts; 3
    i 2
    word; 2
    in 2
    MapIt 2
    it 2
    std::vector 2
    of 2
    auto 2

    ReplyDelete
  18. @David:

    1. By default, async may choose whether to run its function synchronously or asynchronously, the idea being to give it the flexibility to avoid oversubscription. If you want to guarantee that the function passed to std::async will run asychronously, you need to specify a launch policy of std::launch::async.

    2. The standard gives no guarantees about thread pools or efficient use of threads or scheduling fairness, etc. All that is considered QoI (Quality of Implementation) stuff. It's reasonable to assume that once implementers have the basic functionality under control, they will turn their attention to QoI issues. Bartosz Milewski's blog post from last October is worth reading in this regard.

    Scott

    ReplyDelete
  19. @Scott

    Thanks very much Scott for answering my cheekily posted questions. I am now on packaged_task and looking at opportunities there. The videos http://www.youtube.com/watch?v=80ifzK3b8QQ&feature=relmfu by Bartosz are very much worth a watch for anyone interested in c++11 concurrency support, plus of course Anthony Williams blog.

    Can I just add great books Scott!! all our developers get at least Effective c++ and more Effective c++ on their desk when starting with us. They are excellent, I keenly await the c++11 version, although I think this will take a load of time to find all the nuances there to keep to your high standards.

    ReplyDelete
  20. it compiles with -Wall -Wextra and runs correctly with gcc-4.7.0 (built from source) on os/x 10.7.3, both 32 and 64.

    ReplyDelete
  21. Compiles and works fine on my MacBook Pro OSX Lion using gcc-4.7.0 :)

    >: g++ --version
    g++ (GCC) 4.7.0
    Copyright (C) 2012 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    >: g++ -std=c++11 myers.cpp -o myers
    >: ./myers myers.cpp
    155 words found. Most common:
    // 13
    #include 9
    { 8
    } 8
    for 7
    = 7
    const 5
    return 4
    WordCountMapType 4
    wordCounts; 3
    words 3
    of 2
    : 2
    << 2
    std::vector 2
    most 2
    it 2
    in 2
    common 2
    != 2

    AS I remember it compiling gcc 470 on my mac was hard because apple ship LLVM based gcc with Xcode 4. I'll build 4.7.1 and see if it still works.

    ReplyDelete
  22. BTW, rather than testing with gcc under windows - which I have found to be a pain, there are some great free-to-use virtualisation programs like virtualbox.org.

    Virtualisation lets you install linux for just such demos and tests even though you have a windows computer.

    1) Install virtualbox from virtualbox.org (< 5 mins with high-speed corporate internet)
    2) Install Ubuntu from ubuntu.com (< 30 mins)
    3) sudo apt-get install g++ (< 5 mins) - for the latest g++ that version of ubuntu has
    3a) follow the build instructions for gcc and apt-get install any prerequisites for the latest full gcc/g++ version (3 hours)

    You will then have a Linux desktop ready to compile using a full version of g++ in a window on your windows machine (pretty cool - you really ought to try this technology out).

    ReplyDelete
  23. It compiled with clang (svn3.2) and libc++ (and probably has for a long time; I don't see any C++11 features that haven't been in clang for a long time now).

    It also appears to run fine unless I build using -fcatch-undefined-behavior. Running the program built with that flag indicates that there's some kind of undefined behavior in the program. The output doesn't give any hints as to the cause though. I understand some work is being done to improve that so maybe more info will be available soon.

    ReplyDelete
  24. This comment has been removed by a blog administrator.

    ReplyDelete
  25. GCC printf warning bug should be fixed as of 4.8:
    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52818

    ReplyDelete