1. 98
  1. 34

    I still think that its a big failure if a project as well funded as GTA can’t be bothered to track down minutes of loading time to a sscanf. That’s not some hickup or such, that’s a major dent in this games experience (speaking of experience). So while the author surely showed us how easy it is to get wrong I’d expect a company as big as this one to find something that takes not some seconds but minutes.

    1. 18

      Agreed; it seems absurd that you even have to point out that there’s a difference between “I’m hacking on this fun toy in my spare time and I made an oopsie that wasted an entire 1.8 seconds” vs “I work on a multimillion dollar game and can’t be bothered to investigate why it’s wasting four extra minutes every time it loads”

      Edit: this is not a little embarassing coding failure; it’s a tremendous organizational failure.

      1. 8

        I would agree that it is an organizational failure, but I can also imagine why it didn’t get priority. There are always dozens to hundreds of things on the backlog, so there is always something to do. Long loading time isn’t always as pressing as it should be.

        1. 3

          Long loading time isn’t always as pressing as it should be.

          I understand that the backlog can be huge, but a long-loading issue facing all users (and the users being quite vocal about it) should definitely get bumped up in priorities, because it degrades the experience and trust of the people who will decide whether they buy your next title or not.

          1. 3

            because it degrades the experience and trust of the people who will decide whether they buy your next title or not

            I can’t imagine how mad I’d be if I had actually spent money on this game and then found out they had this little regard for how my time was wasted.

      2. 9

        A few months ago I played Pathfinder: Kingmaker, and I noticed that as the game progressed the save/load times became longer and longer to the point of ridiculousness. I checked the save directory and the save games were huge! Some basic inspection showed they’re just zip files, and after unzipping it turned out it was saving this huge logfile of hundreds of megabytes containing many log entries since I started the game.

        I deleted the logfile, rezipped, and my save/load times were fast again. I later discovered there’s actually a mod to do this automatically.

        I get that these mistakes happen: that’s okay and we do all make them. It’s the not fixing it that annoys me, and the closed source which means it’s very hard/impossible to fix it as a user. My Pathfinder: Kingmaker case was actually pretty simple, but I don’t really have the experience (or patience!) to muck around with dissemblers like the GTA article.

      3. 26

        I maintain my original position that sscanf calculating the entire length of its input is absolutely ridiculous. Are *scanf difficult to use safely, not very robust, and somewhat baroque? Yes. Should sscanf("%f") be a correct (not performance-killing) way of reading floats? Also yes. (Though aside: the OP seems to be reading data from files, so they could have just used fscanf, which has correct performance already.)

        Glibc is guilty of this (the trail is convoluted, but ends up at _IO_str_init_static_internal), as is the freebsd libc (and thus also the apple and android libcs, as well as those of the other BSDs). Since the original bug was in GTA, which only runs on windows, I must presume msvcrt has the same problem. Musl, however, does not!

        1. 16

          A few more:

          • managarm doesn’t strlen but looks broken for unrelated reasons. (Assumes nul byte means eof.) Also has codebloat because of templates.

          • serenityos tries to implement fscanf in terms of sscanf, not the other way around! Unfortunately that means it chomps a whole line of input at every call, so it doesn’t even work correctly. Horrifying.

          • uclibc and newlib are the same as bsd

          • pdclib has ok performance, but with an interesting implementation: it duplicates code between sscanf and fscanf, though the heavyweight format parsing is shared.

          • dietlibc and sortix have the sensible, simple implementation

          1. 6

            Of course this news has led to a patch: https://reviews.freebsd.org/D29007 :)

          2. 17

            Lord knows I’m sure I’ve done the exact same thing with sscanf, but reading the article where it talks about 3D models reminds me of a similar story.

            I can’t remember where I read it but it was, I believe, a person at Pixar talking about something they wrote for RenderMan. The software used a simple hash function to look up files based on the first character of the filename with a linked list in each of the buckets; not particularly fancy, but it worked fine during testing and everything seemed great.

            In production it was super slow. Turned out in production, all of the filenames were passed as absolute paths, so every file started with /, so it turned into a linked list traversal for every lookup.

            I can’t remember where I read it, so I’m sure I’ve forgotten something, but I remember liking the story and thought it was interesting.

            1. 1

              That Renderman hash table turning into a linked list one sounds familiar, I believe I’ve heard it too with the same details.

            2. 3

              Not really a C expert, but I try to use std::istream whenever C++ is available while parsing anything that needs speed. Seek until ‘v’, peek ‘e’, ‘r’, ‘t’, ‘e’, ‘x’, ’ ’, then read bool ohyeah = in>>x>>y>>z; No string scans, no gulping the file, just plain old single pass.

              1. 2

                I’m wondering if any parser / lexer generators protect you from doing this to yourself by am always memcpy()ing out the matched part for a given production into a little null terminated buffer?

                1. 3

                  Indeed! It’d probably be cheaper overall to carry around the matches as little struct span { char const *offset; usize_t length; }s instead.

                  1. 4

                    This is also how approximately every other language (C++, D, Java, C#) saves strings.

                    1. 3

                      A difference is that such a struct is not for storing strings, but merely reference into it. This is what Go and Rust calls a string slice, and C++17 calls a string_view. No allocation, no memcpy.

                      1. 3

                        Yes, sorry, that’s what I meant. D doesn’t differentiate between them (slices are the only user-visible type), so I’m not used to thinking of them as separate things.

                        1. 2

                          Box<str> in Rust is exactly { char *data; size_t length; }. Ownership is orthogonal to representation. Parsers like serde can return either one, depending on whether you want to keep the input string in memory, or parse from an unbuffered stream.

                    2. 3

                      Off the top of my head, jsmn does, and OpenBSD’s patterns stuff (which is ripped from Lua) does if you grab the code.

                      In general it’s easy to do and obviously good, but C has no nice way to print non terminated strings (printf %.*s warns if you implement your strings correctly lol) and C++ people can’t handle pointers so everything is std::string. C++17 has span/stringview but anyone who cared about this wrote their own implementations long ago so I’m not sure they really changed anything.

                    3. 2

                      The scanf family does a lot more things than converting numbers. It’s like using a nuke to swap a fly. Using scanf and expecting it to be as fast as strto* family for numbers is ridiculous. In this case a triple of strtod followed by strstr would do the job.