1. 5
  1. 2

    I often find myself reaching for map over unordered_map not because I care about an order but because iterators that are stable in the presence of insertion/deletion are useful. Am I missing something? Nobody else seems to be talking about this use case, does it work for unordered_map too and I’m operating under the misconception that it doesn’t?

    1. 1

      Iterators in both kinds of map are stable in the presence of deletions (except iterators pointing to the deleted element of course). However you’re correct that unordered_map can invalidate iterators on insertion, while map doesn’t.

    2. 2

      This is one of the weirdest things about the C++ stdlib. The unordered map and set were added in C++03, as I recall. The ordered ones were in the first standard. I don’t think I’ve ever had a use case where I care about stable ordering. Maybe once or twice in a couple of decades of C++. Pretty much every program I’ve written in any language wants a dictionary though, and when a standard library says ‘map’ or ‘dictionary’ (or ‘set’), I assume it’s some form of hash table, unless it’s a variant that’s specialised for very small sizes. Having the default for these be a tree is just confusing.

      1. 4

        unordered_map is from C++11, so it’s relatively modern. What boggles my mind is that, despite being modern C++, standard still messes up and hard-codes a particular (and slow) implementation strategy via exposing bucket iteration. Like, I can see how std::map could be a tree because, when Stepanov was designing what were to become STL, we didn’t care about cache that much. But fixing that by including a crippled hash map is beyond me.

        where I care about stable ordering

        natural ordering is indeed rare (and often, when you have order, you want just a sorted array). However, having some stable iteration order is a much better default than non-deterministic iteration, at leas if you don’t need 100% performance. So, I’d expect more or less every GC language to do what Python and JS are doing, for the default container.

        1. 2

          What boggles my mind is that, despite being modern C++, standard still messes up and hard-codes a particular (and slow) implementation strategy via exposing bucket iteration. Like, I can see how std::map could be a tree because, when Stepanov was designing what were to become STL, we didn’t care about cache that much. But fixing that by including a crippled hash map is beyond me.

          See “Chaining vs Open Addressing” in the proposal for std::unordered_map for historical background about why they went with chaining in 2003.

          1. 2

            In hindsight the justification was terrible: “nobody has written a good implementation yet”. And later Google has written swisstable.

          2. 1

            unordered_map is from C++11

            It was in TR1 and merged into C++ in C++03, so it’s been around for 20 years. There are C++ programmers who started programming after that. I first used it many years before C++11 was standardised.

            What boggles my mind is that, despite being modern C++, standard still messes up and hard-codes a particular (and slow) implementation strategy via exposing bucket iteration

            I’d have to check the spec, but I don’t believe that there’s a requirement that the size of a bucket is >1.

            natural ordering is indeed rare (and often, when you have order, you want just a sorted array). However, having some stable iteration order is a much better default than non-deterministic iteration, at leas if you don’t need 100% performance.

            After writing my original comment, I remembered the one case where I have recently wanted a stale (sorted) ordering. I haven’t ever wanted the Python behavior of insertion order and, for the any case where I might, it’s easy to add in a wrapper, but impossible to remove in a wrapper. I believe the main reason that languages like Python keep this is to make JSON serialisation simpler, which isn’t a consideration in most places where C++ is a sensible choice.

            1. 2

              I believe the main reason that languages like Python keep this is to make JSON serialisation simpler,

              Python had a json module before dict iteration order was made stable by setting it to insertion order.

              A much bigger reason is that a change landed is Python 3.6 that shrunk the memory usage of Python dicts by making the hash buckets be indexes into a dense array of entries (1-8 bytes each, picking the smallest possible size for each dict at runtime during resizing), instead of the buckets being typedef struct { Py_ssize_t me_hash; PyObject* me_key; PyObject* me_value; }; (12 or 24 bytes depending on arch). Offhand I believe Python dicts typically have a load factor of about 0.5ish.

              dict iteration order matching insertion order fell out of this almost for free. It got added to the language documentation for python 3.7. Arguments in favour included “it already does iteration order as of python 3.6” and “it’s helpful to remove one possible source of non determinism”.

              1. 1

                It was in TR1 and merged into C++ in C++03

                Hm, I am 0.95 sure unordered map is TR1, but not C++03. According to https://en.cppreference.com/w/cpp/language/history TR1 came after C++03 standard?

                1. 2

                  Ah, you’re right. I think libstdc++ exposed it only in c++03/gnu++03 modes. I was definitely using it (possibly in the experimental namespace?) long before 2011.

                  1. 1

                    Uhu, my “from C++11” was also wrong: 11 is when it became standard, but it was totally available before that as a part of TR1.

                2. 1

                  +1 for TR1…in a previous life my gamedev buddies and I had to wrap unordered_map to account for some minor differences of some form or another between MSVC and GCC. I think that’s all in the past now, one hopes.

              2. 1

                I personally still end up defaulting to std::set and std::map unless profiling shows that it’s a bottleneck, even when I don’t care about ordering, because they Just Work for just about any reasonable type, while the coverage of std::hash is weirdly spotty. For example I fairly often want a tuple as dictionary key, but there is no standard std::hash<std::tuple> specialization (or one for std::pair, or std::array), and I don’t want to go down a rabbit hole of rolling my own hash function or pulling in boost just to use a dictionary.

                1. 1

                  Defaulting to map makes it harder for someone to understand the code later one. They will have to figure out that the ordering doesn’t matter and that maps guarantees aren’t needed.

                2. 1

                  Map and Set have insertion ordering in JS - it was our solution to ensuring definable behavior (and also consistency with the existing object/dictionary use cases. JS has been hit by Hyrum’s law numerous times over the decades and having a defined order meant that there was no leaky abstraction horror.

                  Take the “always randomize iteration order” solution people often propose - does that mean that getting two iterators of the same map will enumerate in the same order. Will the enumeration order change over time? How does mutation impact ordering, how does gc impact things, etc.

                  The problem with unordered_map is that it overprescribes the implementation scheme, but what it has prescribed also doesn’t provide any useful gains to end users/developers. I haven’t seen any large scale C++ projects that don’t end up using a non-std implementation of unordered maps due to the std definition due to the poor unordered_map performance mandated by the spec.

                  1. 1

                    I’ve used unordered map a lot in C++. I often don’t have a sensible total order to use for key types, which means that map is a pain. Where performance really matters, it’s easy to drop in something like LLVM’s dense map or tsl’s robin map, but only after profiling shows me that this is necessary.

                    1. 1

                      Oh I agree, I don’t think I’ve ever needed an ordered map (outside of code puzzles/maybe interview style quizzes). The defined ordering in JS is specifically about removing any kind of non-determinism or leaked implementation semantics.

                      A few committee members would produce examples of problems that can be made “better” or “smaller” with insertion order enumeration[1], but for implementor committee members it was entirely about ensuring no hyrum’s rule horrors.

                      [1] Though those examples either have bad memory use problems or hurt the cpu/runtime performance of maps in all other cases

                3. 1

                  I hate the C++ map API, although I understand the reasons it is the way it is. The natural square-bracket syntax is usually the wrong thing to use, so you have to go through the verbose Iterator API. At least I’ve gotten good at typing it.

                  It’s also pretty well known that std::unordered_map is slow and memory-hungry. There are many faster replacements, like the ones in Abseil and Folly, but IIRC they have slightly different APIs to avoid the inefficiencies of the standard one.