1. 172
    1. 33

      This was a fun read. Explained well, with examples, and all to shave a few minutes of loading times. Scratching your own itch to the max.

    2. 33

      This is nuts. Who would have expected sscanf() to call strlen() on the /input string/ every time it was called? Not me!

      Turns out that the vast majority of libc implementations implement sscanf() by converting the input string to a fake FILE* object & then passing that to their fscanf() implementation. (The FILE* object needs the length of the input string so that it can signal EOF correctly I presume.)

      Pretty sure that I would have made exactly the same mistake in the same circumstances: How else are you going to convert input strings into floats / ints. Write your own float or decimal parser? Are you mad? No, obviously you call sscanf(). And here we are.

      What a footgun.

      1. 13

        It’s a good reminder that writing in C / using stdlib doesn’t make your code fast. I’ve been guilty of choosing libraries written in C because I believed this meant they’d be fast. Often they are, but I need to remember that performance should always be tested when it matters and shouldn’t be assumed to be as good as it could or should be.

      2. 10

        Or just use an existing JSON parser.

      3. 3

        Typically you should use strtod, strtol, and the rest of strto* family for conversion after strtok, or strsep if you have it.

        1. 3

          Oh sure, but if you’re ad-hoc parsing little chunks of string encoded data that happens to have a regular structure, sscanf is right there, begging to be used so you can blow your feet off with it.

        2. 1

          If you strtok it before probably this is not a big deal. Because it null-terminated the returned token already so you are not strlen the whole input.

      4. 3

        Turns out that the vast majority of libc implementations implement sscanf() by converting the input string to a fake FILE* object & then passing that to their fscanf() implementation. (The FILE* object needs the length of the input string so that it can signal EOF correctly I presume.)

        That is … bizarre. Not implementing sscanf in terms of fscanf—that’s fine, and sprintf usually does it too—but precalculating the length! It’s trivial to signal EOF: just look for the null terminator.

    3. 25

      That’s a good counter argument to “why pull in dependencies if I can write my own in 20 lines of code”

      1. 2

        I mean, its more like:

        if your program is a bit slow, spend a few minutes to understand why.

        I dunno if it has anything to do with dependencies or not.

        1. 6

          There are JSON parsers that parse at 400-800MB/s. There are SIMD-optimized hashmaps.

          Get these two, and the 60MB file would have been processed in a fraction of a second. You wouldn’t even need to dig out a profiler and reinvent these two very well reinvented wheels.

          I’m not saying you shouldn’t profile your code, but choose your battles. Profiling effort could be spent on optimizing something unique to the game, rather than solving an already solved problem.

    4. 5

      I wonder if this could improve Red Dead Online load times too, if there’s code overlap between the two games? Would love for some rando to drastically improve this for soany users.

    5. 4

      Disassembling and improving GTA games, seams to becoming a trend nowadays. Meanwhile the reverse engineered and improved GTA 3 and GTA VC are still taken down by Rockstar…

    6. 4

      What would it take to have tooling (editor, test harness, …?) recognise patterns like this? If it seems difficult, is it more difficult than an english language grammar checker?

      “You appear to be using a linear scan on an array with an equality test, would you like to use a map/dict/assoc-array?”

      Could that test be compile-time or runtime-only?

      1. 8

        It’s difficult, because linear scan on an array isn’t always the wrong thing. LLVM, for example, has a SmallDenseMap class that does exactly this. It’s designed for sets where the keys fit in 1-2 cache lines and is very fast. If you use this on a 10 MiB data set then performance is going to be incredibly painful. You need some dynamic profiling to indicate that you’re doing this on a large amount of data. If you’re going to do dynamic profiling, then you may as well do simple sample-based profiling and this kind of thing will appear at the top (as it did for the author of this article).

        My guess here is that Rockstar devs did profile this on an early version where there were far fewer objects and perf was fine, the then didn’t look at the loading code again because other things were more urgent.

        1. 3

          I understand the 100% case is hard (and profiling is the current way of approaching this), but an IDE plugin or linter which could recognise “linear scan over array with comparison and break” would be right 90+% of the time (as a wild-assed guess).

          Again - I think it’s like grammar checker. The rules are “mainstream guidelines” and you can ignore them / disable them in a specific case if you know what you are doing.

          If you could calm the linter/plugin with an “assert N < 1000” you could also have it pop an assertion fail at runtime if your estimate of the size was wrong or changed.

          1. 3

            an IDE plugin or linter which could recognise “linear scan over array with comparison and break” would be right 90+% of the time (as a wild-assed guess).

            90% of the time means that 10% of the time you’ll see a false positive. I suspect 90% is pretty optimistic and even then it’s way too high a false positive rate for most developers to turn on in practice. In any of the nontrivial codebases I’ve worked with, you’d see hundreds of warnings for this pattern and almost all of them would be false positives.

            In C++ code, the problem is not that this is the access pattern for a dense map data structure, it’s that the user picked a dense-map data structure when they had a large amount of data. Flagging a warning in the implementation of the dense map is a waste of time: the author knows that it’s not intended to be fast in the cases where the warning is intended for. You’d need to raise the warning to the point where the dense map is instantiated (and that may require some cross-module analysis, depending on whether the lookup path is in the header for inlining or not). You’d then raise a warning on every use of SmallDenseMap, but that’s not helpful because everyone who used that data structure chose to use it explicitly. The bug comes from either not understanding the characteristics of SmallDenseMap or from understanding them but incorrectly estimating the amount of data that you’ll use it for.

            1. 1

              So the warning would be disabled for SmallDenseMap.

              My browser has underlined “SmallDenseMap” with a wiggly red line. I didn’t spell it incorrectly, that’s a false positive. It’s not a problem and it is a useful affordance when I do spell something incorrectly. I could - if I choose add it to my dictionary, or I could ignore the warning or I could craft a rule which said CamelCaseIsExemptFromSpelling.

              The same analogy holds with the tooling suggestion above. It’s basically a linting check. The reason linting checks aren’t compile failures is that they can have false positives, but they are useful.

              Thinking about this more, some of the better linters I’ve used have rules almost like this. But they tend to be focused on idiom and correctness rather than performance, maybe what I’m after is just an extension of that.

              1. 2

                So the warning would be disabled for SmallDenseMap.

                So then it wouldn’t catch the (real) bugs where you’re using SmallDenseMap but on data of a size where it’s not the right solution. Most people don’t roll their own generic data structures, this kind of bug comes from choosing a generic data structure that has the wrong performance profile for what they want. SmallDenseMap is O(n), but with a very small constant so if n is small it’s the best choice. std::map is O(log(n)), but with a comparatively high constant and so my be doing too many compares on small data and so will be slower for data below a certain size. std::unordered_map is O(1) if you have a good hash function, but has a high constant cost and may be O(n) in terms of the key length to compute the hash for lookups.

                Any one of these off-the-shelf data structures is wrong for some data sets. Your proposed linting tool would tell you StrongDenseMap does a thing that is slow for large data sets, but you know that when you implement SmallDenseMap, so you disable it for that class. Now no one sees the warning for SmallDenseMap.

      2. 2

        CI/daily test could generate a flame graph of loading time, and a graph of the history of loading times. And then someone would of course need to keep watching that or you’d have to have a bunch of alert thresholds that would spot issues without generating a lot of false positives.

        1. 4

          Given the type of issue I’m 90% sure they wouldn’t send the production JSON to the CI but a short anonymized sample and then it would be 10s of loading time, like we all would ;)

          1. 1

            Most gamedev studios will do a daily build, which includes tests for loading every level.

        2. 2

          Yes, existing profiling tools would catch this if used.

          I’m thinking more of an IDE plugin which points out to people they are doing something which is likely to be problematic.

    7. 1

      For tasks like this, JSON parsing should be done once per revision of the file. Heck, for tasks like this, JSON probably isn’t even the right data structure.

      Also, if it’s so important to validate the data, why not do that server side and sign validated files with a private key.