The April 2000
C++ Report included an important article by Matt Austern: "
Why You Shouldn't Use set
—and What to Use Instead." It explained why lookup-heavy applications typically get better performance from applying binary search to a sorted std::vector than they'd get from the binary search tree implementation of a std::set. Austern reported that in a test he performed on a Pentium III using containers of a million doubles, a million lookups in a std::set took nearly twice as long as in a sorted std::vector.
Austern also reported that the std::set used nearly three times as much memory as the std::vector. On Pentium (a 32-bit architecture), doubles are 8 bytes in size, so I'd expect a std::vector storing a million of them to require about 7.6MB. Storing the same million doubles in a std::set would call for allocation of a million nodes in the underlying binary tree, and assuming three 4-byte pointers per node (pointer to left child, pointer to right child, pointer to parent), each node would take 20 bytes. A million of them would thus require about 19MB. That's only 2.5 times the memory required for the std::vector, but in 2000, I believe it was fairly common for each dynamic memory allocation to tack on a word indicating the size of the allocated storage, and if that was the case in Austern's test, each tree node would require 24 bytes—precisely three times the memory needed to store a single double in a std::vector.
The difference in memory utilization explains why searching a sorted std::vector can be faster than searching a set::set holding the same data. In a std::vector, the per-element data structure overhead present in a search tree is missing, so more container elements fit onto a memory page or into a cache line. We'd thus expect fewer page faults and/or cache misses when looking things up in a std::vector, and faster memory accesses means faster lookups.
Many people were influenced by Austern's article and by independent experiments bolstering his conclusions. I was among them. Item 23 of
Effective STL is "Consider replacing associative containers with sorted
vector
s." Boost was convinced, too:
Boost.Container offers the
flat_(multi)map/set associative containers, citing as inspiration both Austern's article and the discussion of the sorted std::vector-based AssocVector in Andrei Alexandrescu's
Modern C++ Design. (In his book, Alexandrescu references the
C++ Report article. All roads in this area lead to Matt Austern.)
There's nothing in the article about containers based on hash tables (i.e., the standard unordered containers), presumably because there were no hash-table-based containers in the C++ standard library in 2000. Nevertheless, the same basic reasoning would seem to apply. Hash tables are based on nodes, and node-based containers incur overhead for pointers and dynamic allocations that std::vectors avoid.
On the other hand, hash tables offer O(1) lookup complexity, while sorted std::vectors offer only O(lg n). This suggests that there should be a point at which the more conservative memory usage of a std::vector is compensated for by the superior computational complexity of an unordered container.
After a recent presentation I gave discussing these issues, Paul Beerkens showed me data he'd collected showing that on simple tests, the unordered containers (i.e., hash tables) essentially always outperformed sorted std::vectors for container sizes beyond about 50. I was surprised that the crossover point was so low, so I did some testing of my own. My results were consistent with his. Here's the data I got for containers between size 20 and 1000; the X-axis is container size, the Y-axis is average lookup time:
The lines snaking across the bottom of the graph (click the image to see a larger version) are for the unordered containers. The other lines are for the tree-based containers (std::set and std::map) and for Boost's flat containers (i.e., sorted std::vectors). For containers in this size range, the superiority of the hash tables is clear, but the advantage of the flat containers over their tree-based counterparts isn't. (They actually tend to be a bit slower.) If we bump the maximum container size up to 20,000, that changes:
Here it's easy to see that for containers with about 5000 elements or more, lookups in the flat containers are faster than those in the binary trees (though still slower than those in the hash tables), and that remains true for containers up to 10 million elements (the largest I tested):
For very small containers (no more than 100 elements), Beerkens added
a different strategy to the mix:
linear search through an
unsorted
std::vector. He found that this O(n) approach performed better than
everything else for container sizes up to about 35 elements, a result
that's consistent with the conventional wisdom that, for small data
sets, linear search runs faster than more complicated hash-based and binary-search-based
algorithms.
The graphs above correspond to tests I ran on an Intel Core i7-820 using GCC 5.1.0 and MinGW under 64-bit Windows 7. Optimization was set to -O3. I ran the same tests using Visual C++ 2015's compiler, but for reasons I have yet to investigate, all the timings for the hash tables were zero under that compiler. I've therefore omitted data for VC++. Interestingly, the code to perform linear searches through unsorted std::vectors took zero time under GCC, though non-zero time under VC++. This is why I show no data comparing lookup speeds for all of binary search trees, hash tables, sorted std::vectors, and unsorted std::vectors: neither GCC nor VC++ generated non-zero lookup times for all approaches.
Maybe GCC optimized out the loop doing the lookups in the unsorted std::vectors, while VC++ optimized away the loops doing the lookups in the hash tables. Or maybe my test code is flawed. I don't know. Perhaps you can figure it out:
here's the test code. (It's based on code Paul Beerkens shared with me, but I made substantial changes, so if there's something wrong with the test code, it's my fault.) Feel free to play around with it. Let me know what you find out, either about the code or about the results of running it.
If Paul Beerkens' and my results are valid and generalize across hardware, compilers, standard library implementations, and types of data being looked up, I'd conclude that the unordered standard associative containers (i.e., hash tables) should typically be the ones to reach for when you need high-performance lookups in an associative container and element ordering is not important. I'd also conclude that for very small containers, unsorted std::vectors are likely to be the way to go.
As for the title of this blog post, it looks like what you should use instead of using std::set (and std::map and their multi cousins) these days is probably a truly "unordered" container: a hash table or, for small containers, an unsorted std::vector.
Scott
UPDATE ON 17 SEPTEMBER
Jonathan Wakely posted
source code showing how to replace my Windows-specific timing code with portable standards-conformant C++, and Tomasz Kamiński took my code incorporating it and sent
a revised program that (1) uses the modify-a-
volatile
trick to prevent compilers from optimizing loops away and (2) checks the result of the lookup against the container's end iterator (because, in practice, you'd always need to do that). When I run Kamiński's code, I get non-zero lookup times for all containers for both GCC and VC++.
Here are the results I got for containers of up to 100 elements with GCC:
And here they are for VC++:
With both compilers, linear search through an unsorted std::vector is fastest for very small containers, but it doesn't take more than about 20-50 elements for the hash tables to take the lead.Whether that remains the case across hardware, compilers, standard library implementations, and types of data being looked up, I don't know. (That caveat is present in the original post above, but some commenters below seem to have overlooked it.)