1. 2

    With GPU compute shader/GPGPU performance issues, memory access should always be suspect number one unless proven innocent. In some cases I’ve ended up including conditional compilation for each memory access so I could stub out reads with a fixed value or value computed from scratch, and skip a write entirely in my compute kernels.

    1. 4

      What would it take to have tooling (editor, test harness, …?) recognise patterns like this? If it seems difficult, is it more difficult than an english language grammar checker?

      “You appear to be using a linear scan on an array with an equality test, would you like to use a map/dict/assoc-array?”

      Could that test be compile-time or runtime-only?

      1. 8

        It’s difficult, because linear scan on an array isn’t always the wrong thing. LLVM, for example, has a SmallDenseMap class that does exactly this. It’s designed for sets where the keys fit in 1-2 cache lines and is very fast. If you use this on a 10 MiB data set then performance is going to be incredibly painful. You need some dynamic profiling to indicate that you’re doing this on a large amount of data. If you’re going to do dynamic profiling, then you may as well do simple sample-based profiling and this kind of thing will appear at the top (as it did for the author of this article).

        My guess here is that Rockstar devs did profile this on an early version where there were far fewer objects and perf was fine, the then didn’t look at the loading code again because other things were more urgent.

        1. 3

          I understand the 100% case is hard (and profiling is the current way of approaching this), but an IDE plugin or linter which could recognise “linear scan over array with comparison and break” would be right 90+% of the time (as a wild-assed guess).

          Again - I think it’s like grammar checker. The rules are “mainstream guidelines” and you can ignore them / disable them in a specific case if you know what you are doing.

          If you could calm the linter/plugin with an “assert N < 1000” you could also have it pop an assertion fail at runtime if your estimate of the size was wrong or changed.

          1. 3

            an IDE plugin or linter which could recognise “linear scan over array with comparison and break” would be right 90+% of the time (as a wild-assed guess).

            90% of the time means that 10% of the time you’ll see a false positive. I suspect 90% is pretty optimistic and even then it’s way too high a false positive rate for most developers to turn on in practice. In any of the nontrivial codebases I’ve worked with, you’d see hundreds of warnings for this pattern and almost all of them would be false positives.

            In C++ code, the problem is not that this is the access pattern for a dense map data structure, it’s that the user picked a dense-map data structure when they had a large amount of data. Flagging a warning in the implementation of the dense map is a waste of time: the author knows that it’s not intended to be fast in the cases where the warning is intended for. You’d need to raise the warning to the point where the dense map is instantiated (and that may require some cross-module analysis, depending on whether the lookup path is in the header for inlining or not). You’d then raise a warning on every use of SmallDenseMap, but that’s not helpful because everyone who used that data structure chose to use it explicitly. The bug comes from either not understanding the characteristics of SmallDenseMap or from understanding them but incorrectly estimating the amount of data that you’ll use it for.

            1. 1

              So the warning would be disabled for SmallDenseMap.

              My browser has underlined “SmallDenseMap” with a wiggly red line. I didn’t spell it incorrectly, that’s a false positive. It’s not a problem and it is a useful affordance when I do spell something incorrectly. I could - if I choose add it to my dictionary, or I could ignore the warning or I could craft a rule which said CamelCaseIsExemptFromSpelling.

              The same analogy holds with the tooling suggestion above. It’s basically a linting check. The reason linting checks aren’t compile failures is that they can have false positives, but they are useful.

              Thinking about this more, some of the better linters I’ve used have rules almost like this. But they tend to be focused on idiom and correctness rather than performance, maybe what I’m after is just an extension of that.

              1. 2

                So the warning would be disabled for SmallDenseMap.

                So then it wouldn’t catch the (real) bugs where you’re using SmallDenseMap but on data of a size where it’s not the right solution. Most people don’t roll their own generic data structures, this kind of bug comes from choosing a generic data structure that has the wrong performance profile for what they want. SmallDenseMap is O(n), but with a very small constant so if n is small it’s the best choice. std::map is O(log(n)), but with a comparatively high constant and so my be doing too many compares on small data and so will be slower for data below a certain size. std::unordered_map is O(1) if you have a good hash function, but has a high constant cost and may be O(n) in terms of the key length to compute the hash for lookups.

                Any one of these off-the-shelf data structures is wrong for some data sets. Your proposed linting tool would tell you StrongDenseMap does a thing that is slow for large data sets, but you know that when you implement SmallDenseMap, so you disable it for that class. Now no one sees the warning for SmallDenseMap.

        2. 2

          CI/daily test could generate a flame graph of loading time, and a graph of the history of loading times. And then someone would of course need to keep watching that or you’d have to have a bunch of alert thresholds that would spot issues without generating a lot of false positives.

          1. 4

            Given the type of issue I’m 90% sure they wouldn’t send the production JSON to the CI but a short anonymized sample and then it would be 10s of loading time, like we all would ;)

            1. 1

              Most gamedev studios will do a daily build, which includes tests for loading every level.

            2. 2

              Yes, existing profiling tools would catch this if used.

              I’m thinking more of an IDE plugin which points out to people they are doing something which is likely to be problematic.

          1. 1

            Shouldn’t this be easy to implement in a GCd language? indirect pointers are used so that one can move the objects around right?

            1. 1

              Many (most?) modern GC’d languages and runtimes do not use indirect pointers.

              1. 1

                Aren’t indirect pointers required for compaction? how else do you reduce fragmentation?

                1. 1

                  Not all GC’d languages use compaction - alternatively, you can use zoned/pooled allocators, etc.

            1. 4

              Ranged numeric types, especially combined with linear types (as per my understanding; not a PLT expert) are a pretty awesome feature, it’s encouraging to see the idea being put into practical use.

              Unfortunately, the language seems to target an extremely narrow niche. I get that it’s a Google-internal project, and that niche is probably worth a cool $100M+ for them so there’s no pressure to expand its scope.

              Things that look like they’re dealbreakers for just about anything I’d like to use this for:

              • It looks like you’re only supposed to implement “leaf” libraries with it. It doesn’t look like there’s any kind of FFI, so if what I want to implement would require internally calling down to a C API for another (system) library, I’d effectively have to extend the language or standard library.
              • Memory safety is primarily achieved by not supporting dynamic memory allocations. It’d be a different story if combined with something like Rust’s borrow checker. I mean the support for linear types is already there to some extent…

              On the other hand, the ability to compile down to C is a bonus compared to Rust.

              I guess if you fix those 2 aspects I mentioned you probably end up with ATS anyway… and I have to admit I haven’t quite wrapped my head around that one yet.

              1. 1

                Ranged numeric types…it’s encouraging to see the idea being put into practical use.

                Pascal had subrange types in 1970.

                1. 1

                  It’s been a few decades since I worked with Pascal, but I’m fairly sure Pascal’s support wasn’t as comprehensive as this is. Without linear types, they’re fairly cumbersome to use comprehensively. Specifically, I’m talking about:

                  if (a < 200)
                  {
                     // inside here, a's type is automatically restricted with an upper bound of 200 exclusive
                  }
                  

                  I don’t think Pascal supported this?

              1. 2

                std::any is one of those things that simultaneously seems like an abomination and also brilliant. At least the article advises using std::variant where possible, although I’ll point out that the memory use characteristics of the two are very different, so it’s not just a question of whether you’re storing values from a bounded or unbounded set of types.

                1. 1

                  Instead of using a temporary file, can’t you (ab)use POSIX shared memory objects, i.e. shm_open? Or will these not mmap with CoW semantics?

                  1. 1

                    There’d be no CoW because that would not be a copy at all. Writing to the second array modifies the first one too.

                    1. 1

                      Surely if you use MAP_PRIVATE it’ll use CoW semantics? Or it’ll fail, if that type of mapping is not supported for whatever reason. But it should surely not silently create a shared mapping.

                    2. 1

                      shm_open is Sys V shared memory I think. That might work, yes.

                    1. 3

                      So did the author get arrested or not? Does anyone have any information?

                      1. 13

                        The author was not arrested, and in fact got to have a civil and informal phone chat with our ex-Prime Minister.

                        Sometimes I’m proud to be Australian. Really should finish my citizenship application and make it official :)

                        1. 5

                          I know, that but there is this one line in the article:

                          Update: I have been arrested.

                          This implies to me that there was some further development of the story.

                          1. 23

                            If the rest of the article is anything to go on, this is a joke.

                            1. 3

                              I missed that when I first read it! No-one else is reporting that he’s been arrested, though, so I assume it’s an example of the humour found throughout.

                              E.g.:

                              https://www.theguardian.com/australia-news/2020/sep/16/former-australian-pm-tony-abbotts-passport-details-and-phone-number-obtained-by-hacker

                              1. 4

                                Oh, well… I sure hope that it was a joke, and now I’m also embarrassed that it went over my head.

                            2. 2

                              Sometimes I’m proud to be Australian.

                              Even with this?

                              1. 2

                                Yeah, it’s a mixed bag.

                                1. 1

                                  Of the actions my government has taken that I’m ashamed of, that wouldn’t break the top 20, and it’s still one of the better governments around.

                            1. 3

                              I suspect the “BBB” abbreviation for bulk-only-Transport is intended to contrast with “CBI” - Bulk/Bulk/Bulk vs. Control/Bulk/Interrupt.

                              I never quite worked out what the UAS host controller support was about but I got the impression it was something Windows specific. Something about the UAS driver only being enabled if the driver for the USB host controller’s driver explicitly allowed it or something along those lines, and only in the earliest version of Windows that supported UAS. But I never bothered to fully track this down. It could also just be someone in the standardisation -> HW impl -> SW impl -> Marketing either misunderstanding something or just making things up.

                              1. 5

                                I’ve made it about a third of the way through, but this looks like a very good all-round introduction which goes into enough depth for answering concrete questions you might encounter when implementing anything SCSI based.

                                I wrote a macOS driver for the virtio SCSI Controller a few years ago, and more recently prototyped a mass storage USB device implementation on an Arduino Due, and I think I would have had less trouble had I found this article at the time.

                                1. 12

                                  I gave up after a few minutes. These DTKs exist so that developers of all the apps he complains about can be ported. Why review beta OS on Beta hardware and say at this point that the situation is bad. Of course it is bad - nobody has dine the work yet.

                                  1. 1

                                    Fair enough. I originally had much higher hopes for the DTK and do understand the purpose of them. I thought that it was going to be a much different video than what it turned out to be. However, it is what it is for now. I spoke to the community about my findings and they still wanted to see the video.

                                    1. 4

                                      Based on the demo, I would have expected Rosetta to be able to run every app as is. These are early days though, so I’m not really concerned.

                                      I am really curious how people will manage docker images. Are they going to start building all their images to support arm & amd64? Or will people prefer to use cloud servers that run on arm? Or will this be enough to convince people to switch away from Apple Laptops for development work?

                                      1. 5

                                        If you look at the DTK release notes, they document an issue with page size support in the DTK hardware only, which will not exist in the final hardware, and which prevents Rosetta from running apps which expect to do memory protection operations on 4K pages. This includes just about any App that relies on JIT compilation - this means browsers and Electron apps of course.

                                        They were apparently not exaggerating about the bit about the DTK not being representative of final hardware.

                                        Re: Docker, well, you could run an x86-64 Linux VM inside Qemu on an aarch64 Mac…

                                        1. 4

                                          The wwdc architecture talk said that the Rosetta on the DTK has page size restrictions, so some uses of mmap, etc fail.

                                          Mono’s JIT runs under Rosetta once they recompiled with 16k page support

                                          1. 2

                                            It’s really interesting. So, the default included Ruby version is 2.6.3 and it is Universal.ARM based. However, when installing Ruby from ASDF, it was Intel based. So, we could probably keep developing using a version manager Ruby install if it is going through Rosetta.

                                            The whole docker issue will be up for debate. I bet we’ll be able to run x86 docker containers and things will work as if we were on an Intel processor. Things might just be a bit slower. From a consumer of a cloud service perspective, I just don’t see the benefit right now to using ARM. I’m sure we’ll get to a point where it is a viable option, but currently, it’s no cheaper than x86.

                                            Lately, I’ve been considering going back to Windows/Linux for my development machine. Haven’t moved on this though as my current rig still has plenty of years left.

                                      1. 7

                                        I’ve recently got a NUC8i5BEK. Great machines, small, very powerful for their size, very free software friendly. I’ve been using a 7th gen one for two years as my main desktop before that. Not missing my old Mac Mini that it replaced at all.

                                        However, I have a big problem with the 10th generation and their direction. All 10th generation NUCs come with the Comet Lake CPUs that in turn come with UHD 620 instead of for a GPU. That is the reason I went for an 8th gen for my upgrade—I thought I’d better grab one while they are around. Iris 6xx is a very reasonable GPU for what it is, you can do a bit of video rendering or play an occasional 3D game.

                                        UHD620 is way worse than Iris, and their “gaming” NUCs with dicrete GPUs is an entirely different class in terms of power consumption and thermal profile. I also have no idea what gamer would go for a machine with GPU soldered on. So, I’ll wait for the post-10th gen and see, or switch to something else, if I find what to switch to.

                                        1. 3

                                          problem with the 10th generation and their direction

                                          Doesn’t seem like an intentional direction. It’s just that Comet Lake is sort of a weird stopgap “fake 10th generation” they don’t really care about. Arse Ice Lake — the 10nm 10th generation — will have “Iris Plus” branded (higher than GT2 tier) GPUs. When 10nm stops being vaporware.

                                          1. 1

                                            Wait, I thought they are already shipping Ice Lake chips, just not making NUCs with them. Am I wrong?

                                            1. 1

                                              Yeah, barely. Looks like there are some laptop chips now that ended up in the XPS 13, Surface Laptop 3 and a few others. I’m not surprised that laptop OEMs have priority over NUCs..

                                              1. 1

                                                And high-volume: the 2020 MacBook Air.

                                                1. 1

                                                  13” MBP and MS Surface Book 3 too. The main problem with Ice Lake is it tops out at 4 cores to maximise yield.

                                                  1. 2

                                                    13” MBP

                                                    Thanks, forgot about that! But not the baseline models, which still use an 8th gen CPU (and is basically the previous MBP iteration with the fixed keyboard).

                                          2. 2

                                            I have had a NUC8i5BEH for over a year. And I love it! All the hardware (that I use, haven’t tested Bluetooth) works out-of-the-box on NixOS and suspend-resume works great. I looked this year whether it would be interesting to purchase a new NUC and also found that the 10th generation was a step back.

                                            The only thing I’d have wished for are more USB-C ports (I use the single USB-C port for DisplayPort).

                                          1. 5

                                            I am currently thinking of investing in an ultra wide monitor for my work from home office: a completely different direction than what is described in this article. I basically want more space at the expense of the pixel density.

                                            As I am getting older, I enjoy a lot more a clean layout and clean desk. It is most certainly subjective but minimalism brings me joy.

                                            As such, I am excited by the new LG lineup; their 34” and 38” are now good compromises for gamers and software developers. I happen to be both!

                                            1. 6

                                              I’ve been using a 34” ultra-wide display with 3440x1440 pixels for… (checks) my goodness! Almost 5 years now. I’ve tried various multi-monitor setups but using this one display seems to be the sweet spot for most of what I do. The 38” 3840x1600 displays seem to have a similar pixel size (110 dpi vs 109?) so would probably be even better, though they weren’t as readily available at the time I bought this one. I believe these days you can even get 5120x1440 monsters?

                                              For testing purposes, I’ve also got an LG 27” UHD 4K display. (~163 dpi) I can’t get on with this as a primary display with macOS. At native retina resolution (“looks like 1920x1080”) everything seems huge and I’m constantly fighting with the lack of screen real estate. And as the article says, at native 1:1 resolution, everything is too tiny, and the scaled modes are blurry. So I’m going to dissent on the advice of going for a 27” 4k. The ultra-wide 5120x2160 displays have the same pixel pixel size so I’d imagine I’d have similar problems with those, though the bit of extra real estate probably would help.

                                              Don’t get me wrong, I like high resolutions. But I think for this to work with raster based UI toolkits such as Apple’s, you basically have to go for around 200 dpi or higher. And there’s very little available on the market in that area right now:

                                              I can find a few 24” 4K displays which come in at around 185dpi. That wouldn’t solve the real estate issue, but perhaps a multi-monitor setup would work. But then you’ve got to deal with the bezel gap etc. again, and each display only showing 1080pt in the narrow dimension still seems like it might be a bit tight even when you can move windows to the other display.

                                              Above 200dpi, there are:

                                              • The LG Ultrafine 5K. Famously beset with problems, plus only works at 5K with Thunderbolt 3 inputs, and can’t switch inputs.
                                              • Dell UltraSharp UP3218K. This is an 8K (!) display at 31.5”. So it actually comes in at around 280dpi, plus of course it costs over 3 grand. I mean I’d be happy to give it a try… (I suspect I’d have to use an eGPU to drive it from my Mac Mini though - what the article’s author fails to realise is that DisplayPort 1.4 support depends primarily on the GPU, not port type, and to date I believe Intel GPUs only go up to DP 1.2.)
                                              • ASUS ProArt PQ22UC. Only 4K, but higher pixel density as the panel is only 21.6”. 4 grand though! I’m guessing this has awesome colour reproduction, but that’s wasted on me, so if I was to experiment with 4K displays, I’d go for the 24” ones which cost an order of magnitude less.
                                              • Apple’s Pro Display XDR. At 6K and 216dpi, I’m sure this one is lovely, but I don’t think it’s worth the price tag to me, particularly as it once again can’t seem to switch between inputs.

                                              That seems to be it? I unfortunately didn’t seize the opportunity a few years ago when Dell, HP, and Iiyama offered 5K 27” displays.

                                              Now, perhaps 27” 4K displays work better in other windowing systems. I’ve briefly tried mine in Windows 10 and wasn’t super impressed. (I really only use that OS for games though) It didn’t look great in KDE a few years ago, but to be fair I didn’t attempt any fine tweaking of toolkit settings. So for now I’m sticking with ~110dpi; I’ve got a 27” 2560x1440 display for gaming, the aforementioned 3440x1440 for work, and the 4K 27” for testing and occasional photo editing.

                                              I’m sure 27” at 4K is also great for people with slightly poorer vision than mine. Offloading my 27” 4K onto my dad when he next upgrades his computer would give me a good excuse to replace it with something that suits me better. Maybe my next side project actually starts making some money and I can give that 8K monitor a try and report back.

                                              Another thing to be careful with: high-speed displays aren’t necessarily good. At the advertised 144Hz, my Samsung HDR gaming display shows terrible ghosting. At 100Hz it’s OK, though I would still not recommend this specific display.

                                              (Now, don’t get me started on display OSDs; as far as I can tell, they’re all awful. If I were more of a hardware hacker I’d probably try to hack my displays’ firmware to fix their universally horrible OSD UX. Of course Apple cops out of this by not having important features like input switching in their displays and letting users control only brightness, which can be done from the OS via DDC.)

                                              1. 1

                                                I switched from a single LG 27” 4k monitor to two LG 24” 4k monitors for around $300/each. I’m happy with the change. Looking forward to the introduction of a 4k ultrawide to eliminate the bezel gap in the middle; currently all such ultrawides are 1440p.

                                              2. 1

                                                The 34WK95U-W is a high-DPI ultrawide with good color reproduction. It has temporary burn-in problems but I’ve been using two (stacked) for a year and overall I’m happy with them.

                                                They aren’t high refresh though (60hz).

                                              1. 9

                                                First time c code has made me laugh out loud. I needed that today.

                                                There are wonderful “answers” in the rest of the SO thread, I liked this one illustrating how you can change the speed of the decrement.

                                                int x = 10;
                                                
                                                while( 0 <---- x )
                                                {
                                                   printf("%d ", x);
                                                }
                                                
                                                1. 2

                                                  Won’t that fall afoul of the single-modification rule? After all, the result of

                                                  x += ++x;
                                                  

                                                  is famously undefined because x is modified twice without a sequence point in between. I’d also have to look up if --x is even an lvalue in C. It is in C++ I believe, but this is the sort of weird edge case where the two languages might actually disagree. (I believe the UB/sequence point issue applies to both.)

                                                  1. 3

                                                    I don’t think --x is an lvalue. It is equivalent to x-=1, which is equivalent to x=x-1 except that the lvalue x is evaluated only once. An assignment expression has the value of the left operand after the assignment, but is not an lvalue.

                                                1. 2

                                                  I had a Noppoo Choc mini with nkro, but the implementation was buggy and I’d get double letters in macos (unusable) and occasional double letters in Linux. I used a blue cube adapter to force it into the boot protocol.

                                                  Also, isn’t it also a limitation on how you wire your keyboard?

                                                  1. 2

                                                    I hear some old USB NKRO keyboards used ridiculous hacks like enumerating as multiple keyboards behind a hub, with the first keyboard reporting the first six scancodes, the second reporting the second, etc., or something. Of course, this is a completely ridiculous and unnecessary hack which implies that the people designing the keyboard don’t understand HID (or that the HID stacks of major OSes were too buggy at the time to work properly, perhaps?)

                                                    As for keyboard wiring, that’s a separate matter. My post discusses the limitations of the USB protocol. What the keyboard microcontroller does to ascertain which keys are pressed is entirely up to it. In practice, to save cost keyboards use a key matrix, which creates key rollover limitations. More expensive NKRO keyboards tend to still use key matrices, as I understand it, but add some diodes to the matrix which facilitates NKRO if and only if the assumption that only one key will change between key scans is not violated (a fair assumption if the scan rate is high enough, due to the infeasibility of pressing two keys at exactly the same time.)

                                                    FWIW, I also seem to recall that it’s common for modern “NKRO” keyboards to actually only be 10-key rollover, on the premise that humans only have 10 fingers (feels like dubious marketing to me.) I’m unsure as to whether this is to do with the key matrix, or whether they just decided to use a 10-element array as their reporting format rather than a bitfield.

                                                    However, nothing stops you from making a keyboard which, for example, wires every key individually up to a microcontroller with hundreds of pins (and thus has the truest possible NKRO). It would simply be prohibitively expensive to do so, less because of the MCU, more because of the PCB layers it would require; I worked this out some time ago and suspect it would take about an 8-layer PCB.

                                                    The Model F keyboard is known for supporting NKRO as an inherent benefit of its capacitative sensing, unlike its successor the Model M. Someone made an open hardware controller for existing Model F keyboards, enabling them to be retrofitted with USB, with full NKRO support.

                                                    1. 1

                                                      Can you explain why a hundred traces would require multiple PCB layers? In my mind, the MCU goes in the middle, with traces spidering out to each of the keys, and a ground belt surrounding the board. A second layer would be used to get the data and power into the MCU.

                                                      1. 1

                                                        Maaaaaybe this would be feasible with a large QFP/QFN package? The chip I was looking at was only available as BGA with the necessary pin count; the escape routing seemed infeasible with a low number of layers, and the manufacturer recommended 6-8, IIRC.

                                                        1. 1

                                                          Oh yeah, pin arrays are dark magic as far as I’m concerned.

                                                    2. 2

                                                      I had a Noppoo Choc mini with nkro, but the implementation was buggy and I’d get double letters in macos (unusable) and occasional double letters in Linux. I used a blue cube adapter to force it into the boot protocol.

                                                      Unfortunately, buggy firmware in USB devices is ridiculously common.

                                                      HID stacks in OSes/windowing systems also don’t necessarily treat edge cases or rarely used report descriptor patterns equally, so you can end up with macOS, Linux/X11, and Windows doing slightly different things.

                                                      It’s likely your issue could have been worked around software side too, I assume it worked “correctly” in Windows? I’m not aware of a generic HID driver for macOS which lets you arbitrarily rewrite report descriptors and reports into a format that WindowServer/Core Graphics deals with as intended. I’m guessing there might be some kind of built-in system for this in Linux or Xorg though.

                                                      Also, isn’t it also a limitation on how you wire your keyboard?

                                                      Yes, definitely, though that’s not as simple as supporting a hard limit of N simultaneous key presses, but rather that certain combinations of key presses become ambiguous, depending on which keys are wired to the same matrix rows and columns.

                                                    1. 7

                                                      The UK government has its many failing, but its web services are second to none.

                                                      1. 2

                                                        Well, the web services they do have are indeed pretty good, but other countries have e.g. tax filing fully digitised. As a non-UK resident with UK-based income, I’ve so far not been able to file self-assessment online in the UK, as various components of the form which are relevant in my case still haven’t been digitised. My situation can’t be that rare, so I strongly suspect there are various other situations where you can’t file online either. Meanwhile, I’ve submitted absolutely every single tax filing in Austria, no matter how obscure, online - since 2008, when I first needed to start filing here.

                                                        Offline or online, the explanation sheets on the UK filings are much clearer however. The Austrian ones are largely quotes or paraphrasings of the relevant part of the tax laws, which requires a lot of unpicking, or help from an accountant. (In fact the edge cases are often so difficult to nail down that we’ve regularly had accountants give us incorrect advice, so we’ve ended up going back to doing the filing ourselves. Whenever I raised something that didn’t seem quite right with the accountants, they just asked the tax authorities for clarification, who invariably didn’t reply, so I was left to research and interpret the relevant laws myself anyway.)

                                                        I do also really appreciate the team publishing stuff like the linked article.

                                                      1. 6

                                                        This looks like it hasn’t been updated since before c++-11. While plenty of it is still quite valid, I’d be really curious to see how much has changed in recent years.

                                                        1. 2

                                                          At first glance, the FQA section is largely still up to date; obviously, none of the new features are asked about, but the answers to the existing questions mostly look accurate and complete.

                                                          Some improvements have been made regarding constructors, particularly forwarding constructors, which would probably warrant slight modifications of some of the answers.

                                                          Perhaps unsurprisingly, a couple of the issues listed in the defects section have been addressed:

                                                          No way to locate definitions

                                                          This one is basically a complaint about the lack of module system. Modules are being added in C++20 and are supported by some compilers already. I haven’t used them yet myself and don’t know much about them so can’t really say much about the specific points mentioned.

                                                          No high-level built-in types

                                                          The lack of syntactic support for higher-level types (you can’t initialize std::vector with {1,2,3} or initialize an std::map with something like {“a”:1,“b”:2} or have large integer constants like 3453485348545459347376) is the small part of the problem.

                                                          This is now largely possible.

                                                          Cryptic multi-line or multi-screen compiler error messages, debuggers that can’t display the standard C++ types and slow build times unheard of anywhere outside of the C++ world are the larger part of the problem.

                                                          clang made leaps and bounds of progress over earlier compilers in this regard, to the extent that I hardly run into the compiler output vomit problem at all anymore these days.

                                                          1. 3

                                                            I think, for example, 6.1 is different now. Post C++11, and moreso post C++14, consistency across compilers has drastically improved in my experience.

                                                            For 6.3, I think the need for recompilation cycles has been reduced, their length has been reduced, and it’s much easier to deliver a C++ interface now than it was in 2009.

                                                            For 6.4, templates have gotten easier, compilation times shorter, and post LLVM especially, error messages are significantly clearer.

                                                            I can’t go through them all point-by-point, but in other cases the standards process and documentation has definitely improved since the first ISO standard, and many of the gripes seem to be with that.

                                                            But my point isn’t really to challenge this FQA… I’d just be really interested in seeing an examination of it for the last 3 versions of the standard to see how the facts on the ground have improved or deteriorated.

                                                            I’ve only been able to move to (compilers that support C++11 and later) over the past year or so, so both my agreement with the criticisms here and my perception of the improvements is very likely distorted.

                                                        1. 5

                                                          To me Objective-C has been eye-opening: it has shown that a language can do just fine without constructors. There’s no need for all these complications and special cases. Not having constructors makes object construction more flexible and powerful.

                                                          1. 1

                                                            A significant difference between C++ and Objective-C is that Objective-C zero/nil-initialises all fields on alloc, so object state is well-defined from the beginning; though if you need a field to start with a different value to ensure the invariants hold, you’re back to relying on (init…) convention. C++’s designers decided against automatic initialisation well over 2 decades ago, presumably for performance reasons; (Not backwards compatibility; they could have made this another difference between struct and class.) obviously, this has a bunch of drawbacks…

                                                            1. 1

                                                              ObjC, being so dynamic, can’t do much about it. But in a C++like language the performance issue could have been avoided. The optimizer could simply see the zeroing is a redundant store and eliminate it. Or it could take Rust’s approach of having only struct literals and combine that with C++’s copy elision.

                                                              I suppose in C++ case the design went in the wrong direction already at the C stage, and C++ just dug deeper.

                                                              This:

                                                              struct foo f = { magic aggregate };
                                                              

                                                              could have been this from the start:

                                                              let f = (struct foo){ just an expression };
                                                              
                                                              1. 2

                                                                My parent post wasn’t a defense of C++; I’ve been using it for 20 years and it’s been my primary language for well over 50% of career earnings, so I’ve seen enough of the good, the bad, and especially the ugly, that I think it’s fundamentally flawed in various ways.

                                                                Many design decisions need to be viewed in context of their time though. (Near) full backwards compatibility with C was an enormous advantage initially, and trivial C interop still is very important now. Class & inheritance-based OOP was a big tickbox buzzword back then. Optimisers were much more basic. Rust’s compile times today are awkward on multi-gigahertz multicore monsters, imagine what it’d be like on a 486 with 33MHz and 4MB RAM. C++‘s build times were significantly slower than C’s, but for many the trade-off was worth it. Fancier, expensive to compile features would have probably tipped it too far though.

                                                                Hence my point about class vs struct. class never had any backwards compat to uphold, so they probably should have made struct the legacy POD aggregate and made classes never-POD, always-safe.

                                                          1. 2

                                                            Does anyone know what would’ve happened if spectre and all of the other issues hadn’t come to light? Does that even affect the outcome here?

                                                            1. 5

                                                              I doubt the bottom line requests/Joule figure for the Intel system would increase by anywhere near 25% if you disabled all the vulnerability mitigations. The effect is measurable, but not that big on most realistic workloads. Intel perhaps would have pulled ahead on some of the microbenchmarks that were tight. Also bear in mind that AMD’s CPUs were also affected by some of the issues, so those results include a partial mitigation penalty too.

                                                              1. 5

                                                                I don’t know about that. People saw a wide range of impacts, from negligible to 50%, and everything in between. I recall 10–15% was a common range. I’m not sure what impact it has on Cloudflare’s workload, but I can easily imagine that lacking the mitigations, it would no longer have been such a clear victory for AMD.

                                                                1. 2

                                                                  “Realistic workloads” usually don’t include a ton of context switches which I expect CloudFlare’a gear does a lot of.

                                                                  1. 2

                                                                    CloudFlare’s team have mentioned that they use a lot of eBPF.

                                                                    1. 2

                                                                      That’s just for filtering, though. I’d expect a ton of context switches on a heavily loaded cache node constantly moving data between user and kernel space so the bits can go from disk/application memory to the kernel and through the network stack. It’s not like they’ve moved Nginx & friends into the kernel.

                                                              1. 5

                                                                Unfortunately, the comparison is between Intel Skylake SP (2018) and AMD Rome (2019) systems. Comparing Rome with Cascade Lake SP (2019) would be a bit more representative for deciding on what system to build right now. (Though I suspect the conclusion would not be that different as Cascade Lake only has some minor microarchitecture and fab process tweaks vs Skylake.)

                                                                1. 2

                                                                  This is about C++ move semantics, as you may have guessed if you’re familiar with big names in C++ land, but probably not if you aren’t.

                                                                  The initial explanation and summary is good. The Q&A goes off into the weeds a bit by beating about the bush on the definition of “unspecified” - mostly talking about what it doesn’t mean instead of what it does. Still, as someone who learned C++ shortly after the ‘98 standardisation and has only picked up bits & pieces here and there in the parts of the language since then, it’s a nice confirmation that my understanding of move semantics was correct.

                                                                  Then, however, there’s the assertion that a class is buggy if moving leaves it in an unusable state; is it just me or does this apply to a substantial part of the standard library?

                                                                  But what about a third option, that the class intends (and documents) that you just shouldn’t call operator< on a moved-from object… that’s a hard-to-use class, but that doesn’t necessarily make it a buggy class, does it?

                                                                  Yes, in my view it does make it a buggy class that shouldn’t pass code review.

                                                                  So for example, most of the STL containers would be considered buggy because using their iterators will likely be undefined after a move?

                                                                  E.g.

                                                                  std::vector<int> foo = { 1, 2, 3 };
                                                                  auto two = std::find(foo.begin(), foo.end(), 2);
                                                                  std::vector<int> bar(std::move(foo));
                                                                  if (two == foo.end()) // pretty sure this is not valid
                                                                     printf("2 not found\n");
                                                                  

                                                                  An interesting aside that stood out to me:

                                                                  (Other not-yet-standard proposals to go further in this direction include ones with names like “relocatable” and “destructive move,” but those aren’t standard yet so it’s premature to talk about them.)

                                                                  This sounds to me like a suggestion that C++ should gain something similar to Rust’s move semantics, where values can be “consumed” so that once moved, it is an error detectable at compile time to attempt to access it - does anyone here know any more about this? Any early implementations in compilers that we could start playing with?

                                                                  1. 1

                                                                    I don’t think Herb is saying that if a move invalidates iterators, we should consider the class buggy. If you copy a vector then .clear() the original, that also invalidates any iterators you had on the original. That’s just a sharp edge that iterators have in general.

                                                                    I think he’s saying that move shouldn’t put the vector into a state where calling methods on the vector itself is illegal. Imagine if after a move, calling .size() on a vector was undefined behavior. That big a gotcha seems hard to justify: isn’t it pretty cheap to reset the vector to its initial, valid, empty state?

                                                                    1. 1

                                                                      Calling operator[] on such a vector would be UB though. So the distinction is rather subtle: If calling a method is not UB in any other state of the object, then it should also not be UB after a move. If there are states where some methods have UB, reliable methods for detecting and recovering from those states should at minimum also apply to the state after a move.

                                                                      Either way, this of course just highlights the language shortcoming of not being able to terminate the lifetime of an object before it goes out of scope, so you have to be able to reason about these post-move zombies.