Great write up. The error messages look really nice.
Hashing strings with for example FNV-1a is as expensive as the string is long, because the hash has to take every character into account.
Is that really true? I would expect that the hash algorithm could just do the first N bytes. It is going to have to compare the whole string regardless because two thing could hash to the same value just by chance. That might explain why you “only” saw a 2x speedup with hashing ints.
Also I am curious what gains came from the “fast paths” optimization. There were no benchmarks provided.
It can perform really badly on certain strings. For example, JSON and XML documents have very low entropy in the first characters. I recall one system that did this and ended up discovering that every single one of the strings in their sets had the same hash because they all had the same prefix. Switching to hashing on the suffix improved in this case, but a lot of other use cases have common suffixes.
Thanks :^), it really took me a while to get them right, but I wanted to have good and user-friendly feedback. You can take a look at https://xnacly.github.io/Sophia/Internal.html#error-handling if you’re interested in more examples on how sophia handles and displays errors.
Great write up. The error messages look really nice.
Is that really true? I would expect that the hash algorithm could just do the first N bytes. It is going to have to compare the whole string regardless because two thing could hash to the same value just by chance. That might explain why you “only” saw a 2x speedup with hashing ints.
Also I am curious what gains came from the “fast paths” optimization. There were no benchmarks provided.
…huh. I never really considered that you could only hash the first N bytes of the string.
hashing the first
Nbytes sounds like a very good idea. :)It can perform really badly on certain strings. For example, JSON and XML documents have very low entropy in the first characters. I recall one system that did this and ended up discovering that every single one of the strings in their sets had the same hash because they all had the same prefix. Switching to hashing on the suffix improved in this case, but a lot of other use cases have common suffixes.
I see. It all depends on the kind of data being hash in the end. :)
Due to me having implemented fast paths, token references and the format helper in a single commit I merged those into one benchmark.
Nice error messages: https://xnacly.me/programming-lang-performance/errors.png
Thanks :^), it really took me a while to get them right, but I wanted to have good and user-friendly feedback. You can take a look at https://xnacly.github.io/Sophia/Internal.html#error-handling if you’re interested in more examples on how sophia handles and displays errors.
Will have a look. This also reminds me I need to profile my lisp interpreter as well.
“Interal Documentation” << typo
Thanks :) will change