Has anyone here used integer keys in a hash table? What was the context? Is it as vanishingly rare as I think it is?
There’s so much literature measuring the performance of associative data structures with integer keys, and it all seems to me to be largely useless for learning about the more common case of string keys.
Any kind of serious API to a financial exchange will likely result in the clients of that API having hash tables with integer keys, as product IDs and/or order IDs will be received as small integers. For example, see the message layouts described at https://cdn.cboe.com/resources/membership/US_EQUITIES_OPTIONS_MULTICAST_PITCH_SPECIFICATION.pdf, which contain integer order IDs, and are likely to end up in a hash table.
To add to the other replies here: Hashtabeles keyed by integers come up in compilers a bunch. So much so that rustc uses fxhash which performs great for integers but poorly for larger datatype like strings.
Often in these systems you have many different items (basic blocks, instructions, …) that need secondary maps/properties. In C++ (LLVM) usually the pointer to the objects are used (which are also just integers at the end of the day) while in rust it’s more common to store objects in a list/vec and reference them by (newtype) ids (so again integer keys for hashmaps).
It’s really a design pattern I adopted in many other domains where I store data in an “entity-component” where data is stored in many arrays/hashmap (arays for dense information and hashmap for sparse information or Informationen associated with tuples of objects) and objects are represented by (newtype) integers
I use them often enough; any time I have a bunch of random-ish identifiers but any particular set of them is going to be sparse. Interned strings are a good example of that, ironically, and they appear all the time in my compiler project. Sometimes just wasting some memory on a mostly-empty array would be better, sure, but a hashmap is usually more convenient. It’s just a mostly-empty array that is always Big Enough to hold whatever my maximum integer is.
Has anyone here used integer keys in a hash table? What was the context? Is it as vanishingly rare as I think it is?
There’s so much literature measuring the performance of associative data structures with integer keys, and it all seems to me to be largely useless for learning about the more common case of string keys.
Any kind of serious API to a financial exchange will likely result in the clients of that API having hash tables with integer keys, as product IDs and/or order IDs will be received as small integers. For example, see the message layouts described at https://cdn.cboe.com/resources/membership/US_EQUITIES_OPTIONS_MULTICAST_PITCH_SPECIFICATION.pdf, which contain integer order IDs, and are likely to end up in a hash table.
To add to the other replies here: Hashtabeles keyed by integers come up in compilers a bunch. So much so that rustc uses fxhash which performs great for integers but poorly for larger datatype like strings.
Often in these systems you have many different items (basic blocks, instructions, …) that need secondary maps/properties. In C++ (LLVM) usually the pointer to the objects are used (which are also just integers at the end of the day) while in rust it’s more common to store objects in a list/vec and reference them by (newtype) ids (so again integer keys for hashmaps).
It’s really a design pattern I adopted in many other domains where I store data in an “entity-component” where data is stored in many arrays/hashmap (arays for dense information and hashmap for sparse information or Informationen associated with tuples of objects) and objects are represented by (newtype) integers
I use them often enough; any time I have a bunch of random-ish identifiers but any particular set of them is going to be sparse. Interned strings are a good example of that, ironically, and they appear all the time in my compiler project. Sometimes just wasting some memory on a mostly-empty array would be better, sure, but a hashmap is usually more convenient. It’s just a mostly-empty array that is always Big Enough to hold whatever my maximum integer is.
Yeah, any time I have a big array of structs and one of the fields is mostly null, it might make more sense to pull it out into a hashtable.
I also often have small structs of integers as keys, which probably has similar behaviour in benchmarks.
Hashtables on integers are also pretty common in query engines for joins on a foreign id.