Cache-friendly associative container

Few weeks ago I stumbled upon a nice blogpost describing a flat tree layout optimized for cache locality. Another profound work, in the form of scientific paper, entitled “Array layouts for comparison-based searching” also deserves mentioning. The basic idea there is that the commonly used layout of flat maps, i.e. an array of sorted key values, has the negative side-effect of a not-ideal cache locality. When doing a binary search, memory is accessed on distant positions and every such lookup might cause a cache miss.

The level-order layout (aka BFS or Eytzinger layout) utilizes a different approach. Instead of placing keys in a non-descending order, they are laid out in the order of binary search traversal. This leads to fewer cache misses and thus to better lookup times, especially on big trees. Of course, nothing comes for free, and there are two major drawbacks of this data structure:

  • Terrible complexity of per-element insertion/deletion, which makes such modifying operations barely usable. However, many containers are modified only on initialization stage and accessed in read-only mode afterwards. This especially plays well with the immutable data ideology.
  • Lack of “cheap” ordered traversal, like std::map/boost::flat_map has. A simple answer to this issue might be accepting the fact that traversing such structures is not ordered, exactly like in the case of std::unordered_map.

That said, I’ve decided to try to implement a generic STL-like associative map with Eytzinger layout, providing interface as close to drop-in replacement as it can be. Here’s the fixed_eytzinger_map source code.

 

Design considerations

  • This map is supposed to be a read-only structure, i.e. support only construction and per-element access. However, while keys order is defined upon construction, it’s still possible to alter the values these keys are mapping to.
  • Construction of a new map can be done with std::initializer_list or with a range of InputIterators.
  • operator= makes it possible to change the map’s content, filling it with another map or with a pair of iterators as the source.
  • operator[] has different semantics compared with std::map/std::unordered_map. It can’t add new entries, but [] is too handy for element access to discard it. In case of absent key the map will simply throw an exception, just as at() does.
  • This map uses two memory blocks to store keys and values separately. Instead of using std::pair<key, value>, which creates gaps between keys and thus decreasing SNR when performing lookups, this approach leads to a better memory locality.
  • Since the data is stored in separate locations, iterators can’t provide references to “real” std::pair<key, value> objects. Instead, they have to return proxy std::pair<key&, value&> objects.
  • C++14’s heterogeneous lookup can be enabled via std::less<> comparison operator or any other declaring is_transparent type. Heterogeneous lookup is available for all methods involving search: at, operator[], count, find, equal_range, lower_bound and upper_bound.
  • There’re no rbegin/rend methods, as iteration is not ordered by “less” operator and reverting the internal order wouldn’t make any sense.

 

Performance tests

Various measurements were done to compare fixed_eytzinger_map with the following containers: std::map (libc++3.7), std::unordered_map (libc++3.7), boost::flat_map (1.55).
Compilation was made by clang-800.0.42.1, x64 target, -Os.
Executed on i7-3615QM, running macOS 10.12.2.
X-axis represents amount of elements in the container, Y-axis shows μs/bytes per element.

Lookup times.
This measures how much time is spent on .count(..) method. The sequence of keys to search is semi-random with the fixed seed.

 

Lookup & fetch times.
Similar to the previous graph, but uses the .at(..) method to actually access the values pointed by keys.

 

Construction times.
This measures the time spent on the construction of an associative container from a semi-random std::vector<std::pair<Key,Value>>, performing a single lookup and cleaning up afterwards.

 

Memory consumption.
How much memory is used to construct a container. It’s a bit hard to measure directly, so instead what is shown here is a delta between the memory consumed by test process before and after building a container.

 

Results

It’s not easy to compare these data structures directly, as each has it’s unique properties. However, what can be said, is that Eytzinger layout is always outperforms a sorted array (boost::flat_map) and consumes 2x to 5x less memory than std::unordered_map does. If immutability is not an issue, fixed_eytzinger_map might be worth trying at least.