Tuesday, April 10, 2012

std::string, SSO, and Move Semantics

I'm becoming increasingly concerned that C++11's support for move semantics is leading some people to conclude that we no longer have to pay as much attention to the cost of copying objects as we used to.  "Many copies will silently become moves," they exclaim, "and moves are cheap!" Such reasoning is invalid in a number of ways.

First, not all copy requests can be replaced by moves.  Only copy requests for rvalues are eligible for the optimization.  Second, not all types support move operations that are more efficient than copying operations.  An example is a std::array<double, 1000>.  Third, even types that support efficient move operations may support them only some of the time.  Case in point: std::string.  It supports moves, but in cases where std::string is implemented using SSO (the small string optimization), small strings are just as expensive to move as to copy!

What it means to be "small" is up to the implementation.  Unless I'm misreading the source files, std::strings with a capacity of up to 15 are "small" in Visual C++ 11 beta, so std::string objects of up to that capacity will not benefit when copies become moves.  (The SSO buffer size seems to be hardwired to be 16 bytes in VC11, so the maximum capacity of a std::wstring that fits in the SSO buffer is smaller:  7 characters.)

gcc 4.7 seems to continue to use reference-counting instead of SSO for its std::string implementation.  This is non-conforming in C++11 (reference-counting may no longer be used for std::string), and it looks like gcc is aware that it must be changed. However, gcc 4.7 offers the non-standard __versa_string with a std::basic_string-compatible interface (other than the name of the template and the namespace in which it's located), and __versa_string offers the SSO.  In this case, the maximum capacity of "small" strings seems to be hardwired to 15 characters (not bytes).

While googling around for a good page to link to for the term "small string optimization" above, I came across John Ahlgren's blog entry of March 30 on how SSO interacts with move semantics.  He refers to the fact that short strings with the SSO move more slowly than longer strings as the "small string deterioration" :-)

Scott

11 comments:

Herb Sutter said...

I mostly agree, but I think this point needs some clarification:

> Case in point: std::string. It supports moves, but in cases where std::string
> is implemented using SSO (the small string optimization), small strings are
> just as expensive to move as to copy!

I think the right thing to say here is, "small strings are just as cheap to copy as to move!" This is not a pedantic distinction -- the situation is that it does not benefit further from move because it is already optimized! There really is a difference between "copy is as fast as move" (correct here) vs. "move is as slow as copy" when copy is already not deep (i.e., not involving reallocations).

Scott Meyers said...

@Herb: Your viewpoint is legitimate, of course, but the fact that the efficiency of move vis-a-vis copy is dependent on the value of the string is something that is likely to surprise many people, I think, especially in view of John Ahlgren's measurements showing that adding a single character to a string can increase the speed of moving it by a factor of three. It's a peculiar definition of "optimized" when adding data to the payload improves its performance significantly.

Scott

John Ahlgren said...

@Herb: If std::string was implemented without SSO, it would copy the pointer (usually 4 or 8 bytes) instead of the 16 bytes (in the worst case). It's not a huge difference, and it's always a constant time penalty since stack memory is fixed-sized (C++ doesn't support VLAs). So if I understand your argument, "small strings are almost as cheap to copy as to move"?

It's easy to understand Scott's concern when programmers are dealing with objects that have a mix of large stack and free store memory - the penalty would still be the constant amount of stack that needs to be copied, but the move operation gives the illusion that it should happen much faster than what would be (I don't think this is a concern with any STL containers).

Andreas Bergmeier said...

I am wondering whether SSO is platform dependent or more precise, whether it also holds up on embedded devices like Smartphones (e.g. ARM) since the instruction set is different.

Dave Abrahams said...

@LCID Fire: The small string optimization (SSO) is a library-level optimization that's independent of machine architecture. Either the standard library that ships with your compiler implements it, or it doesn't.

@all: Actually there are lots of tradeoffs here worth experimenting with. For example, if you have 8-byte pointers and can store 15 characters without dynamic allocation, once you do have to allocate memory, do you store the end() pointer in the local memory, which makes size() (and maybe some other things) faster, or in the dynamic memory, which makes move faster?

Dave Abrahams said...

@Scott: it's not that we don't have to pay attention to the cost of copies—the cost of copying hasn't changed. What's new is that lots of things that used to copy don't anymore, so we can stop reflexively distorting code (e.g. with "out parameters") to avoid copies in those cases where copies have become moves.

Scott Meyers said...

@Dave: Regarding "What's new is that lots of things that used to copy don't anymore, so we can stop reflexively distorting code (e.g. with "out parameters") to avoid copies in those cases where copies have become moves.", such blanket reasoning is precisely my concern. I agree that this is valid reasoning if you know that the type supports efficient moving, but not all types support efficient moving (again I mention std::array<1000, int>), and if you are writing a template, you typically don't know if the types involved support move at all.

The prospective guideline I've been kicking around is "Assume that move operations are neither present nor cheap." If you know that the assumption does not hold for the types you are using, great, take advantage of that knowledge. But in the same way that we generally assume that copying is expensive unless we have specific information to the contrary, we should assume that moving is no cheaper than copying unless we somehow know better.

Dave Abrahams said...

@Scott: I think you should re-read what I wrote before you call it "blanket reasoning." It was qualified in at least two ways. I have never told anyone, "don't worry, be happy, copies are fast now" or "all copies have turned into moves." And I don't believe that anything I've said even sounds remotely like that.

Also, allow me to quote an argument from Martin B. that showed up on comp.lang.c++.moderated today, one that I often forget to make myself:

"You should consider the issue of aliasing. It may be cheaper to pass a 4-int rectangle by const reference, but using it may be more expensive because it's harder for the compiler to be sure it is not changed through an alias."

The compiler doesn't generally optimize based on constness because of aliasing and separate compilation. It's actually not const but pass-by-value that lets the compiler make the optimizations that most people are hoping for in those circumstances.

I you're looking for a description that I won't object to of my position, use this one:
There were always good reasons to seriously consider value semantics—an approach whose benefits have been under-recognized, partly due to fear of copying costs—as a normal part of your programming practice, and there are even more good reasons to do so today.

Anonymous said...

@Scott:

1. It should not be a surprise to anyone that the efficiency of copying a container depends on its content.

2. If they have done C++-level close-to-the-metal programming with performance concerns for a while, they will be familiar with the idea that such correlations might be nonlinear and counterintuitive.

3. Copying those 15 bytes is still the fasted you get without move semantics, so — as Herb already said — you haven't lost anything. There's just an area (which is already fast) where you might haven't gained something.

4. If copying 15 bytes (that's just 2 doubles!) is of a performance concern to someone, they should not rely on guesses anyway, but profile and measure. If it then turns out that adding a '\0' to the string makes it faster, that's on par with strange things like adding NOPs into code making it faster. (see #2)

5. On 64bit machines, copying a pointer and a std::size_t might be 16 bytes, too.

Marmot said...

std::array< 1000,double > should be
std::array< double,1000 >?

Scott Meyers said...

@Marmot: Yes, of course, thanks for pointing this out. I've fixed it in the post.

Scott