Threads for fanf

  1. 5

    There’s more about this in the feedback submitted by a group of open source internet infrastructure software developers, including my employer isc.org. (Although isc.org is based in the US, most of our software engineers are in Europe.)

    And previously on the NLnetLabs blog and more recently on the isc.org blog

    1. 2

      When I was a teenager learning to code on the Acorn Archimedes, I was fascinated by articles in computer magazines about Unix. The closest I could get to it myself was using the Norcroft C compiler, which had a very unix-like suite of tools somewhat uncomfortably ported to RISC OS.

      And now, 30 years later, I have an M1 MacBook pro. An ARM-powered Unix workstation. Teenaged me would have been very pleased that we have achieved the Right and Proper result of progress in technology, except for the fact that it is made by Apple not Acorn!

      1. 11

        What makes a password memorable has little to do with the password itself, but rather how it is used. If you use a password a lot, you will have memorized it within a week or two, regardless of how it was generated.

        I have a couple of passwords which I use frequently, for things like logging into my workstation and my password manager. These are generated randomly, and I memorize them. When I change them (which is very infrequently) I keep a written copy to hand until I have memorized them, and I keep a written copy in a safe place. Everything else my password manager remembers for me.

        My memorized passwords do not have to obey other people’s rules about character selection, so I can tune them to be easier to type: they are only lower case letters. This makes it more reasonable to use a long password even on my phone where FaceID’s unreliability means I have to type it more than I would like.

        My current passwords are 16 letters (75 bits, more than enough), which I wrote down as 4 groups of 4. The 4x4 rhythm helped with memorising. I found I remembered the groups first, then it was a matter of remembering the right groups in the right order in each password.

        1. 3

          Woah very interesting! I’ve been thinking about escaping and quoting a lot recently, and the mess of JSON, CSV, HTML, shell, etc.

          • JSON: C-style backslash escapes like \n and \\ [1]
          • CSV: double quote fields with commas, and then sometimes double the double quotes? Wah?
          • HTML: " style escapes. Sometimes single quotes are special, but you need a hex escape for that?
          • Shell: bash has 7 different kinds of string literal with different escaping rules
            1. double quoted, respecting $ and backticks and \
            2. unquoted: respecting $ and backticks, but a different meaning for \ [2]
            3. raw single quoted strings, unlike C/Python/JS
            4. strings with C-style $'\n' (or the POSIX echo -e format, which is slightly different)
            5. here docs that respect substitution
            6. here docs that don’t
            7. dynamic printf format strings
          • arguably JSON strings are an 8th – they appear in bash scripts, and are different than bash’s rules

          So it’s a big mess


          The solution looks a little unconventional, but I guess if it weren’t, there would be a solution already! Definitely will take a closer look and see how other languages including data languages could be adapted (e.g. I’m sure people here have encountered CSV inside JSON, HTML inside JSON, or even JSON inside JSON, which seems to be a common bug)

          FWIW Bryan Ford is the inventor of Parsing Expression Grammars (2004), which are now used in the parsers of CPython (as of ~2018) and Zig. A nice case of theory to practice in a relatively short amount of time :)

          [1] aside: I discovered that in JS and Python, \xff is treated like a synonym \u{ff}, which IMO is a huge mistake. \xff should be a byte, not a code point.

          [2] echo \n outputs n, while echo "\n" usually outputs \n, or sometimes a newline!

          1. 2

            This reminds me of my old idea for generalized string literals.

            A couple of years ago I did a survey of the state of string literals in various programming languages because a lot of languages have grown more complicated literal syntax in the decade since I wrote down my idea. When I started thinking about generalized literals, the syntax I had in mind was rather outlandishly complicated, but by today’s standards it seems quite reasonable!

            1. 2

              Update: I think I decided that like C++ (thanks to your page) the tag can be a suffix.

              Honestly I’ve never used that in C++, but it seems like it’s OK ? This seems to make sense:

              echo "foo $var"
              echo "foo $var"html
              echo y"foo \n \{var}"
              echo y"foo \n \{var}"html
              

              And then multiline versions work the same way:

              proc p {
                echo """
                foo $var
                line two, with leading space stripped
                ""html
              }
              

              I feel like this makes sense:

              • the prefix r or y, similar to Python and Rust, tells you how you PARSE it
              • the suffix like html, similar to C++, tells you how to EVALUATE it

              ?

              Interested in feedback

              https://github.com/oilshell/oil/issues/1442

              1. 1

                Interestingly, Python is experimenting with tagged strings, too: https://github.com/jimbaker/tagstr

                1. 1

                  Hm interesting! didn’t know that

                  Definitely seems so

                  https://github.com/jimbaker/tagstr/blob/6ac5fcf23bdf81a9a2f26534d39ec9eb5d2e724b/examples/sql.py#L180

                  Although I wonder if you have sql"" where you put the r"" or the b"" etc.

                  Also it says PEP 999 in github, but that doesn’t exist: https://peps.python.org/pep-0000/

                  1. 1

                    I think the “999” is a placeholder.

              2. 1

                I’ve been following this project to add something very similar to Python: https://github.com/jimbaker/tagstr

                1. 1

                  Hm great survey! I hadn’t seen that page, and some of the languages have indeed gotten more elaborate than I thought

                  About choosing {} [] () as delimiters, I made a similar choice >10 years ago for a “JSON Template” language that ended up influencing the Go’s text/template. Although Go abandoned that feature and chose {{ }}.

                  So I see the logic, but for some reason (maybe not a good one?) I’m reluctant to make that choice again … But I’d be curious about your experience with it.


                  I am trying to reduce the number of string literal types from ~8 to 3 or 4. I think I have “one string literal to rule them all”, called YSTR – JSON-compatible, C escapes, Swift-style interpolation

                  It’s a successor to QSN. https://www.oilshell.org/release/latest/doc/qsn.html

                  The main flaw of QSN is that it’s not JSON compatible. It was designed to be consistent with Oil string literals, but I think it’s actually more important to be consistent with JSON/JS/Python string literals.

                  Probably the best description of YSTR is on the feedback I just gave about matchertext:

                  https://github.com/dedis/matchertext/issues/1#issuecomment-1371826392

                  There is also a ton of disjointed brainstorming on #oil-discuss-public : https://oilshell.zulipchat.com/#narrow/stream/325160-oil-discuss-public/topic/Unifying.20String.20Notation.3A.20JSON.2C.20CSV.2C.20HTML.2C.20C.2C.20Shell

                  I’d definitely be interested in feedback. I think what you’re suggesting is very similar, although I think YSTR is a little more conventional in the common cases, while still supporting rare cases.


                  Though your post did remind me is that I have not managed to stuff in JavaScript-style tags.

                  I wonder if that can be done in the parser, not in the lexer.

                  Oil has an “atom” / symbol syntax %myatom, which is just like an opaque string, with different syntax.

                  So I wonder if we could do something like:

                  var x = "raw interpolation of ${var}"
                  var y = %html "escaped interpolation of $[var]"
                  

                  It’s basically having the “space” as an operator.

                  I guess one downside is that you can’t put that in commands

                  echo %html "escaped interpolation of $[var]"
                  

                  Doesn’t make sense …

                  The reason I don’t want a prefix is that we already have r'' like Python, and probably y"" for YSTR

              1. 7

                When programmers discuss the problems caused by leap seconds, they usually agree that the solution is to “just” keep time in TAI, and apply leap second adjustments at the outer layers of the system, in a similar manner to time zones.

                Abolishing leap seconds will be helpful for users that have very tight accuracy requirements. UTC is the only timescale that is provided by national laboratories in metrologically traceable manner, i.e. in a way that provides real-time access, and allows users to demonstrate exactly how accurate their timekeeping is.

                TAI is not directly traceable in the same way: it is only a “paper” clock, published monthly in arrears in Circular T as a table of corrections to the various national timescales. (See the explanatory supplement for more details).

                The effect of this is that users who require high-accuracy uniformity and traceability have to implement leap seconds to recover a uniform timescale from UTC - not the other way round as the programmers’ falsehood would have it.

                “Disseminate TAI (along with the current leap offset) and implement the leap seconds at the point of use” might be a “programmers’ falsehood” but it’s also what 3 out of 4 GNSS systems actually do, so it has something going for it.

                The fact that UTC (and not TAI-like timescales) is blessed for dissemination and traceability is downstream of ITU’s request that only UTC should be broadcast; not because of any technical difficulty with disseminating/processing traceable TAI-like timescales:

                • GNSS system times are not representations of UTC, and being broadcast they are not fulfilling requests of ITU, which is recommending only UTC to be broadcast
                • GNSS system times shall be considered as internal technical parameters used only for internal system synchronization, but this is not the case.

                The time that comes out of an atomic clock looks like TAI, adding the leap seconds is something that has to come afterwards, and a system designer gets to choose when. Leap seconds don’t factor into calculations involving the precise timing of moving objects, whether the objects are airplanes, the angles/phases of generators in a power grid, particles in an accelerator, RF signals, etc. Unless you’re OK with grievously anomalous results whenever there’s a leap second, you want a timescale that looks like looks like TAI, not one that looks like UTC. Why bake the leap seconds into your timescale early on if you’re going to need to unbake them every time you calculate with times?

                The designers of GPS, BeiDou, and Galileo all wisely flout this ITU recommendation: their constellations broadcast a TAI-like timescale (alongside the current leap second offset). The designers of PTP also flout this recommendation – by default, the timescale for PTP packets is TAI (usually derived from a GNSS receiver), not UTC. Should you want to reconstitute a UTC time, there is a currentUtcOffset field in PTP timestamps, whose value you can add to a TAI time.

                This “disseminate UTC only” ITU recommendation has been at fundamental odds with real-world PNT systems ever since GPS was known as “NAVSTAR GPS”.

                1. 1

                  Except GLONASS does disseminate UTC, including leap seconds.

                  If you want your time signal to be broadly useful for celestial navigation, UTC is the way to go as it’s (for now) guaranteed to be within 1s of UT1. I believe that’s where the ITU’s recommendation comes from. That said, it’s probably time for this usage application to take a step back compared to the broader issues caused by leap seconds.

                  1. 1

                    You aren’t really arguing against what I said, because my “falsehood” did not talk about disseminating TAI along with the UTC offset (that isn’t “just” TAI). Those paragraphs were really an introduction to the next section where I explain that systems can’t avoid working with UTC. And GPS and PTP do not avoid working with UTC: as you said, they tackle its awkwardness head-on.

                    The way GPS handles UTC is even more complicated, though. Take a look at the GPS interface specification, in particular IS-GPS-200 section 20.3.3.5.2.4 (sic!) where it specifies that UTC is not “just” GPS time plus the leap second offset: there are also parameters A_0 and A_1 that describe a more fine-grained rate and phase adjustment. Section 3.3.4 says more about how GPS time relates to UTC:

                    The OCS shall control the GPS time scale to be within one microsecond of UTC (modulo one second). The LNAV/CNAV data contains the requisite data for relating GPS time to UTC. The accuracy of this data during the transmission interval shall be such that it relates GPS time (maintained by the MCS of the CS) to UTC (USNO) within 20 nanoseconds (one sigma).

                    This is (basically) related to the fact that atomic clocks need to be adjusted to tick at the same rate as UTC owing to various effects, including special and general relativity, e.g. NIST and the GPS operations centre are a mile high in Colorado, so their clocks tick at a different rate to the USNO in Washington DC, and a different rate from the clocks in the satellites whizzing around in orbit.

                    And you are right that UTC is a spectacularly poor design for a reference timescale, hence the effort to get rid of leap seconds.

                  1. 10

                    A related useful fact I’ve learned recently:

                    Conversion from a Unix timestamp (a number) to date/time in UTC (year-month-day h:m:s) is a pure function, it needn’t think about leap seconds.

                    As a corollary, software can use human-readable times in config without depending on OS timezone information.

                    https://blog.reverberate.org/2020/05/12/optimizing-date-algorithms.html

                    1. 11

                      Which, incidentally, means that all the documentation which describes UNIX time as “the number of seconds since 01/01/1970 UTC” is wrong. Wikipedia, for example, says that “It measures time by the number of seconds that have elapsed since 00:00:00 UTC on 1 January 1970”, which is incorrect. (Though the POSIX spec it links to seems vague enough to at least not be incorrect; it says the “Seconds since epoch” is “A value that approximates the number of seconds that have elapsed since the Epoch”.)

                      I spent a long time trying to figure out the best way to correctly convert between an ISO-8601 timestamp and a UNIX timestamp based on the assumption that UNIX time counted actual seconds since 01/01/1970 UTC, before I found through experimentation that everything I had read and thought I knew about UNIX time was wrong.

                      1. 11

                        I would fix that Wikipedia article, but you (or the others in the discussion) seem to be better prepared to come up with correct wording, so I most humbly encourage someone to take a whack at it, in the spirit of encouraging people to get involved. Don’t worry, you won’t get reverted. (Probably. I’ll take a look if that happens.)

                        1. 8

                          Quoting from that article:

                          In Unix time, every day contains exactly 86400 seconds but leap seconds are accounted for. Each leap second uses the timestamp of a second that immediately precedes or follows it.

                          Well, that’s certainly one way to handle them…

                          1. 1

                            Yeah, exactly the same story here.

                          2. 3

                            My favourite versions of these functions are on my blog: broken-down date to day number and day number to broken-down date. Including the time as well as the date (in POSIX or NTP style) is comparatively trivial :-)

                          1. 29

                            Why Twitter didn’t go down … yet

                            I was hoping for some insights into the failure modes and timelines to expect from losing so many staff.

                            This thread https://twitter.com/atax1a/status/1594880931042824192 has some interesting peeks into some of the infrastructure underneath Mesos / Aurora.

                            1. 12

                              I also liked this thread a lot: https://twitter.com/mosquitocapital/status/1593541177965678592

                              And yesterday it was possible to post entire movies (in few-minute snippets) in Twitter, because the copyright enforcement systems were broken.

                              1. 5

                                That tweet got deleted. At this point it’s probably better to archive them and post links of that.

                                1. 11

                                  It wasn’t deleted - there’s an ongoing problem over the last few days where the first tweet of a thread doesn’t load on the thread view page. The original text of the linked tweet is this:

                                  I’ve seen a lot of people asking “why does everyone think Twitter is doomed?”

                                  As an SRE and sysadmin with 10+ years of industry experience, I wanted to write up a few scenarios that are real threats to the integrity of the bird site over the coming weeks.

                                  1. 12

                                    It wasn’t deleted - there’s an ongoing problem over the last few days where the first tweet of a thread doesn’t load on the thread view page.

                                    It’s been a problem over the last few weeks at least. Just refresh the page a few times and you should eventually see the tweet. Rather than the whole site going down at once, I expect these kinds of weird problems will start to appear and degrade Twitter slowly over time. Major props to their former infrastructure engineers/SREs for making the site resilient to the layoffs/firings though!

                                    1. 2

                                      Not only to the infra/SREs but also to the backend engineers. Much of the built-in fault-tolerance of the stack was created by them.

                                  2. 2

                                    https://threadreaderapp.com/thread/1593541177965678592.html

                                    I have this URL archived too, but it seems to still be working.

                                    1. 1

                                      hm, most likely someone would have a mastodon bridge following these accounts RT-ing :-)

                                    2. 2

                                      FWIW, I just tried to get my Twitter archive downloaded and I never received an SMS from the SMS verifier. I switched to verify by email and it went instantly. I also still haven’t received the archive itself. God knows how long that queue is…

                                      1. 2

                                        I think it took about 2 or 3 days for my archive to arrive last week.

                                    3. 2

                                      oh, so they still run mesos? thought everyone had by now switched to k8s…

                                      1. 13

                                        I used to help run a fairly decent sized Mesos cluster – I think at our pre-AWS peak we were around 90-130 physical nodes.

                                        It was great! It was the definition of infrastructure that “just ticked along”. So it got neglected, and people forgot about how to properly manage it. It just kept on keeping on with minimal to almost no oversight for many months while we got distracted with “business priorities”, and we all kinda forgot it was a thing.

                                        Then one day one of our aggregator switches flaked out and all of a sudden our nice cluster ended up partitioned … two, or three ways? It’s been years, so the details are fuzzy, but I do remember

                                        • some stuff that was running still ran – but if you had dependencies on the other end of the partition there was lots of systems failing health checks & trying to get replacements to spin up
                                        • Zookeeper couldn’t establish a quorum and refused to elect a new leader so Mesos master went unavailable, meaning you didn’t get to schedule new jobs
                                        • a whole bunch of business critical batch processes wouldn’t start
                                        • we all ran around like madmen trying to figure out who knew enough about this cluster to fix it

                                        It was a very painful lesson. As someone on one of these twitter threads posted, “asking ‘why hasn’t Twitter gone down yet?’ is like shooting the pilot and then saying they weren’t needed because the plane hasn’t crashed yet”.

                                        1. 8

                                          Twitter is well beyond the scale where k8s is a plausible option.

                                          1. 2

                                            I wonder what is the largest company that primarily runs on k8s. The biggest I can think of is Target.

                                            1. 3

                                              There’s no limit to the size of company that can run on kube if you can run things across multiple clusters. The problem comes if you routinely have clusters get big rather than staying small.

                                              1. 1

                                                Alibaba, probably.

                                                1. 1

                                                  Oh, I didn’t realize that was their main platform.

                                                2. 1
                                                  1. 2

                                                    I was thinking about that too, but I’m guessing that CFA has a fraction of the traffic of Target (especially this time of year). Love those sandwiches though…

                                              2. 2

                                                Had they done so, I bet they’d already be down :D

                                                1. 1

                                                  I work at a shop with about 1k containers being managed by mesos and it is a breath of fresh air after having been forced to use k8s. There is so much less cognitive overhead to diagnosing operational issues. That said, I think any mesos ecosystem will be only as good as the tooling written around it. Setting up load balancing, for instance . . . just as easy to get wrong as right.

                                              1. 2

                                                Nice article. The first link (“I rewrote BIND’s DNS name compression algorithm”) does not work for me. It goes to https://dotat.at/@/2022-07-01-dns-compress.md

                                                1. 2

                                                  oops! now fixed, thanks for spotting the mistake

                                                1. 8

                                                  If curl ever does decide to go C99+, VLA usage should be banned, IMHO. That’s pretty easy to enforce.

                                                  They were required in C99, then became optional in C11+, and I have yet to hear of any decent justification for using them in clean/secure code. In my mind, they’re better abandoned as a past misfeature.

                                                  Either you’re letting the variable sizing part get big enough at runtime that it could exhaust stack space (ouch!), or you’re sure the variable size is constrained to a stack-reasonable limit, in which case you’re better off just using a fixed stack array len at that limit value (it’s just an SP bump after all, there’s not much real “cost” to “allocating” more). There are some edge case arguments you could make about optimizing for data cache locality for variable and commonly-small stack arrays that are occasionally larger (but still reasonable and limited), but it doesn’t seem worth the risks of allowing them in general.

                                                  1. 6

                                                    VLAs are worse than that: the compiler has to add code to touch each page that might be needed for the VLA so that the kernel isn’t surprised by attempts to access unmapped pages a long way from the current stack allocation.

                                                    One of the first things I did after starting my current job was to evict a rogue VLA and make sure no more of them appeared. https://gitlab.isc.org/isc-projects/bind9/-/issues/3201

                                                    1. 1

                                                      Do you mean flexible array members with VLAs ? Because that’s something I’m currently trying to find out more about. I find them very useful to avoid indirection and easily deep-copy things in a GC.

                                                      1. 3

                                                        I wouldn’t think so. They are unrelated. Yes, flexible array members has its good uses (to group allocations), whereas VLAs are mostly a trap. You can totally disable VLAs (-Werror=vla) and keep flexing the last struct member.

                                                        VLAs are about dynamic memory allocation on the stack, whereas flexible array members are about using dynamic memory for dynamically sized structs. So they could be used together, but not that you should.

                                                        Nearly all VLAs I’ve come across have been accidental. In particular, the constant-sized VLA trap:

                                                         // VLA in disguise because BUFSIZE is not a constant expression:
                                                        const size_t BUFSIZE = 42;
                                                        char buf[BUFSIZE];
                                                        
                                                        1. 1

                                                          I meant the whole general concept of VLAs, at least any use-case I’ve seen of them. Could you give an example of what you mean? I have hard time wrapping my head around how flexible array members would fit together with VLAs.

                                                      1. 58

                                                        .sh domains considered harmful… for the exact same reasons. It’s also owned by ICB.

                                                        1. 21

                                                          The issue is that the Chagos islanders were evicted by the British so that Diego Garcia could be handed over to the USA for a military base. The Chagossians were not compensated in any way. Using a .io domain name (unwittingly) shows disregard for colonial land theft and ethnic cleansing.

                                                          Whereas Saint Helena and Ascention island (.ac, ICB again) were uninhabited before european colonization, so their domain names are not associated with that kind of appalling evil. There are legit reasons for disliking ICB’s involvement and the misappropriation of the domain registration money, enough reasons to boycott them, but it is in no way exactly the same as what happened to the Chagossians.

                                                          1. 28

                                                            Does anything get better for Chagos Islanders by not using .io domains? If not, what’s the difference between this and all the other horrid shit which goes into the products we all use every day?

                                                            1. 4

                                                              Probably not. But we shouldn’t let the existence of great wrongs overwhelm our ability to address lesser wrongs. That’s just paralysis, or compassion fatigue.

                                                            2. 3

                                                              Ah, my bad. I can’t edit the original comment… stop upvoting me, this is off-topic anyway! :P

                                                              1. 3

                                                                Whereas Saint Helena and Ascention island were uninhabited before european colonization […]

                                                                AIUI Chagos Islands were also uninhabited before European (French) colonisation.

                                                                1. 1

                                                                  I treat the use of a .io domain as a medium-scale negative signal. It doesn’t mean the people involved are definitely racists, but it indicates that they don’t care enough to do a little research. There are plenty of gTLDs to choose from.

                                                              1. 2

                                                                A nitpick - loop buffers (not “loop caches”) are common, and not just at Intel or specific to x86. In context, the author appears to mean uop caches, which Intel and AMD have both used for some years. (ARM now does too - the Cortex A77 can retrieve six uops from the uop cache on hit, for instance, while it can only decode four ops.)

                                                                Generally good article, though “instruction set doesn’t matter technically” is perhaps an overly broad thesis. “Instruction set doesn’t matter to final performance as long as it’s not egregiously mis-designed” is perhaps more accurate - a modern high-performance implementation of VAX would still hurt. Something that is, IMO, important to note - and not really touched on in the article - is that ISA complexity has a direct impact on design costs (a little) and design validation costs (a fair bit.)

                                                                1. 17

                                                                  Generally good article, though “instruction set doesn’t matter technically” is perhaps an overly broad thesis.

                                                                  There was an ISCA paper making this claim a few years ago (with a deeply questionable methodology) and it’s been used to excuse a lot of RISC-V design decisions. There are some design decisions that have quite a lot of impact on performance. Our internal evaluation suggests that, with the same investment in microarchitecture and within the same power budget, the difference between a mediocre ISA and a good one is over 20% end-to-end system performance. A couple of examples with big knock-on effects:

                                                                  If you don’t have conditional move in your ISA then the compiler can’t do if-conversion at all. This means that you have a lot more branches and so you need a lot more branch predictor state to get the same performance. Depending on the exact branch predictor and the pipeline design, the extra state is 25-75%, which translates directly to power and area overhead in your branch predictor. The overall impact of this depends on how big your branch predictor is, proportionately to your overall ISA.

                                                                  Addressing modes have a big impact on modern systems with register rename. If you don’t have rich addressing modes then you need an intermediate register to hold the value of address computation. That’s mildly painful because you need to assign it in the register rename unit, but it’s not that bad. The problem is if you don’t then immediately clobber that register in the same basic block. Now the CPU has no way of knowing whether this value is going to be used (you may clobber it at the start of the next basic block, but you’re executing that in speculation and you might be executing complete nonsense instructions that need to be rolled back) and so it has to keep that value live. This leads to increased register pressure, which causes back-pressure in the entire system.

                                                                  Aside from the tone and the fact that it uses terms of art in non-standard ways, I mostly like the article. The impact of compilers was really important and the shift from ISA-as-programming-environment to ISA-as-compiler-target can’t be overstated. Similarly, the knock-on effect of a lot of the tradeoffs is important.

                                                                  The article conflates OoO (executes instructions in dependency order, rather than in program order) with superscalar (executes multiple instructions in different pipelines in parallel), which is understandable but undermines some of the argument. For example, load-store architectures are a much better fit for superscalar architectures because you want separate load-store pipelines and you can to minimise the book keeping that you need to reassembly the architectural state. If you crack a register-memory op into a load, op, store sequence (or just a load-op sequence if the target is a register) then you need to track a lot more state that’s shared between pipelines, which impacts power.

                                                                  There are a few things that I disagree with:

                                                                  Instruction-set decoding requiring 20k transistors is important with small budgets, but meaningless with large budgets.

                                                                  This is true only if you ignore power consumption.

                                                                  Moore’s law didn’t end, but Dennard Scaling did. 20K transistors that you can power up for accelerating a specific phase of computation and turn off at other times are now basically free (ignoring validation, which is the most expensive part of bringing a new core to market). 20K transistors that you have to power all of the time is a huge drain and will impact both the maximum performance and the maximum power efficiency of your microarchitecture. This is a big part of the reason that Intel has struggled to scale x86 down to mobile devices. An Arm core with a 250mW power budget has a lot left over for compute, an x86 CPU burns a large fraction of that on the decoder. For any battery-powered workload where you’re doing a small amount of compute (just enough to prevent the core from sleeping), a design like x86 really struggles. If you’re very lucky, an x86 core will execute purely from a micro-op cache for a long time, but in general this isn’t long enough to usefully power-gate the real decoder.

                                                                  More importantly, a large micro-op cache consumes SRAM, offsetting one of the earlier advantages of CISC. When CPU speeds started to be noticeably much faster than DRAM, x86 looked pretty good as a compression scheme. An 8 KiB instruction cache full of x86 instructions held a lot more of a program than an 8 KiB instruction cache full of most RISC instructions (though not Arm, especially not Arm with Thumb-2). As with decoders, these close caches need to be powered all of the time and so hurt power consumption if they need to be big (Itanium needed really huge ones, to the extent that it couldn’t even fit within a desktop power budget very easily).

                                                                  This, the relaxed memory model, good overall code density (more for M32 than A64), and so on are all reasons why it’s easier to scale an Arm core down to a lower power budget even with a high transistor budget than an x86 chip. Power budget matters in almost all market segments, but the impacts are different depending on the absolute numbers. If your power budget is measured in mW (or uW) then you want to throw away absolutely anything that isn’t essential to computation. If it’s measured in multiple W per core, then you want to make sure that you’re able to feed instructions and data into multiple pipelines without wasting too much on book keeping (decoder matters, register rename overhead matters far more).

                                                                  But more the the point, it performs almost identically running x86 code – it runs x86 code at native x86 speeds but on an ARM CPU. The processors are so similar architecturally that instruction-sets could be converted on the fly – it simply reads the x86 program, converts to ARM transparently on the fly, then runs the ARM version

                                                                  This is glossing over a lot of cleverness. The M1 has a flag that switches it into TSO mode, which incurs a performance (and power) penalty over the relaxed memory model (things have to stay in store queues longer), but is much more efficient than having an emulator insert fences all over the place. I suspect that Apple will drop x86 emulation support at some point and the M5 or whatever will lose this mode.

                                                                  I also have no idea where the 100,000 transistors number that the article repeats comes from. ARM1 was around 30K transistors (Cortex-M0 is a similar size), our CHERI micorcontroller is quite a bit under that.

                                                                  1. 7

                                                                    I think Robert Graham overstates his argument, and as you (David Chisnall) say there are reasons that RISC still matters.

                                                                    My favourite reference on this topic is John Mashey’s RISC-vs-CISC usenet comp.arch post, https://www.yarchive.net/comp/risc_definition.html Mashey’s post dates from the mid-1990s which Graham correctly identifies as a turning point, although I think Graham’s reasons are off the mark.

                                                                    Mashey analyses instruction sets numerically and shows that there is a fairly clear dividing line between RISC and CISC ISA designs. But a striking outcome of the analysis is that the two architectures that survived and thrived are the least RISCy RISC (ARM) and the second-least CISCy CISC (x86). (The least CISCy CISC is IBM z / 360 / 370 which is also doing much better than other old CISC designs.)

                                                                    Instead of “the end of RISC” a more accurate subtitle would have been “the end of CISC”. There were a lot of non-technical reasons for the death of many instruction set architectures, like consolidation in the unix market and the difficulty of making enough money to compete with Intel. In particular, the 68060 was very challenging to make superscalar because of its almost VAX-like complexity. For instance, 68k has memory-to-memory ALU ops with complex addressing modes, whereas x86 mostly limits the programmer to reg-to-mem or mem-to-reg, with relatively simple addressing modes. When 68k was retargeted at the embedded market as ColdFire, it was stripped back to a simpler kind of CISC. And when amd64 was designed, it lost some 80286-era CISCy baggage and became even more RISCy.

                                                                    I suppose my complaint is that Graham characterises all CISCs as VAX-like and all RISCs as MIPS-like. He implies it’s as easy to make a superscalar VAX or 68k as it was for x86. On the RISC side, he says they were all designed as compiler targets, but ARM was designed by assembly language programmers to be nice to program in assembly; and the Archimedes was not a great platform for Unix because of its weird MMU. So the most successful RISC doesn’t have a Unix heritage despite Graham arguing that RISC couldn’t happen without C and Unix.

                                                                    And after the Itanium expensively demostrated that VLIW, static scheduling, and magical compilers are not the future of instruction set design (killing at least two RISCs in the process), the reaction from instruction set designers was to go back to RISC principles because they work well for superscalar processors as well as for 1980s transistor budgets.

                                                                    1. 8

                                                                      It’s possibly worth explaining why mem-mem ops are painful in superscalar (an, especially, out-of-order) designs. It basically boils down to alias analysis. The really nice thing about registers is that there’s no possible indirection that reaches a register. If you know two instructions refer only to registers and you know that those register numbers are disjoint, you know that they are independent. This is a big part of the reason that FPU designs that emerged after FPUs became on-die components retained separate register files: you could schedule any of them except the memory ones completely independently of integer ops. With a memory operand, this all becomes a lot more complex. If one op writes to memory and another reads then you can’t tell if they’re independent until you’ve finished computing their addresses. The store forwarding logic that you need for good performance gets a lot more complex the more you have loads that feed into other instructions on a critical path. With a load-store architecture and 16 or more registers, it’s easy for a compiler to keep live values in registers, which reduces silicon overhead (I still believe that a SPARC-like ring of register windows with asynchronous write-back and reload requiring an explicit serialisation instruction if you ever wanted to guarantee ordering between the spill stack and normal loads and stores would be even lower overhead than modern RISC designs, for precisely this reason: stack spills and reloads consume a lot of store queue space).

                                                                      The core lesson for any ISA that wants to scale to modern designs is simple: it’s all about the metadata. The biggest area and power overheads come from tracking state. If you require too much hidden state then you will have a lot of non-compute area. If you require too much explicit book keeping, you will have area dedicated to compute that is used to execute instructions that are not critical to the workload. Striking the right balance is very hard.

                                                                    2. 1

                                                                      Wow thx.

                                                                      TIL Apple M1 has a TSO mode. That is a brilliant solution.

                                                                  1. 15

                                                                    I have this weird problem where I’m simultaneously excited and compelled by rust (open! memory safe! community! cute crab!), and disgusted by it (magic operators? borrow checker? error boilerplate?)

                                                                    can anyone recommend a solid, simple place to start? every time I read Rust I feel like I’m looking at a Rails codebase.

                                                                    maybe I have rust fomo or something

                                                                    1. 18

                                                                      I just went through the Programming Rust book (O’Reilly) and thought it was really well done. I would recommend reading it before the “official” Book.

                                                                      I was familiar with the type system features (generics, traits, etc.) — I’m one of those people who says “oh, it’s sort of a monad” — so it was mostly the lifetime stuff I had to contend with.

                                                                      I would describe the barrier by saying that in C++ you can make any data structure you want and it’s up to you to ensure you follow your own rules on things like ownership, immutability, passing between threads, etc. Whereas in Rust, you tell the compiler what you’re going to do, it checks whether that makes sense, and then insists that you actually do it. But compilers are dumber than people, so you can come up with a data structure that you “know” is fine, yet you cannot explain to the compiler so it can prove that it’s fine. You have to work within the model of “move by default” and unique/shared references that is quite unlike the C++ assumptions.

                                                                      Your choice at that point is (1) just relax, clone it or use a reference count or whatever, those wasted cycles probably don’t matter; (2) use a different data structure, and remember there are lots of good ones waiting for you in the library; (3) go ahead and do what you want using a minimal amount of unsafe code and a thorough comment about the safety guarantees you put in but couldn’t express to the compiler. (Number 3 is what the library code does that you should be using.)

                                                                      Once this clicked, I started to really enjoy the language while working through Advent of Code and various toy problems.

                                                                      1. 11

                                                                        Admittedly, beginner and intermediate resources are a bit of an ongoing problem that the Rust community is working to solve, which this article is meant to be a small part of. Apart from the book, which is great but can be a bit overwhelming at first, a good place to start is Rust By Example, which is a tad more practically oriented.

                                                                        1. 7

                                                                          Seconding, thirding, and fourthing the Programming Rust recommendation, notably the 2nd edition. I struggled with other attempts at learning Rust before (incl. The Rust Book), and the O’Reilly one was a breakthrough for me, suddenly I seemed to not get blocked by the language anymore afterward. (I still get slowed down from time to time, but I seem able to find a more or less pretty/ugly workaround every time, and thus I now feel I am in the intermediate phase of honing and perfecting my skills.)

                                                                          As for your “excited and compelled” note: my areas were a bit different, but what made me finally accept Rust’s downsides, was when I eventually realized that Rust manages to be very good at marrying two things, that IMO were previously commonly seen as opposites: performance + safety. It’s not always 100% perfect at it, but really close to that. And where it is not, it gives you a choice in unsafe, and notably that’s where you can see some sparks flowing in the community. (That’s because two camps of people previously quite far away from each other, are now able to successfully live together as a Happy Family™ — just having occasional Healthy Family Quarrels™.)

                                                                          As for error handling, as discussed in the OP article, personally I’m quite confused by two things:

                                                                          1. Why not use the thiserror & anyhow libs? I believe reaching for them should be the initial instinct, only possibly hand-unrolling the error types if and only if run-/compile-time performance is found to be impacted by the libs too much (iff proven through profiling). Notably, thiserror & anyhow are described in the Programming Rust, 2nd ed. book, which was one of the reasons I evaluated the book as worth reading. And the creation of those two libraries was a notable argument for me to try learning Rust again, precisely due to how they kill error boilerplate.
                                                                          2. Why does the article recommend and_then over the ? operator? Wasn’t ? introduced specifically to combat and streamline callback-style code required by and_then?
                                                                          1. 1
                                                                            1. I briefly pointed out the existence of crates that can reduce the boilerplate in the article, though I didn’t name any, because I decided it was beyond my ideal scope. I didn’t want to write an entire expose on Rust error handling, because many others had done that better than I ever could, but rather a short and accessible guide on combinators, which are often neglected by people just getting comfortable with Rust.

                                                                            2. I never recommended and_then over ?, they’re not mutually exclusive. Sometimes, coming up with names for every individual step’s outputs can be a bit difficult, or a bunch of steps are very tightly logically related, or you simply want to avoid cluttering the code with too many ? operators that can be a bit hard to notice sometimes, especially within as opposed to at the end of a line. I perhaps should have worded myself a bit better, but overall I think and_then simply serves as a good motivator for a reader to go look at the rest of the more specific combinators in the Result type, and I never implied it would at all replace ? in most Rust code.

                                                                          2. 4

                                                                            If you’re used to TDD, rustlings can be a great resource for getting into the swing of things with Rust: https://github.com/rust-lang/rustlings

                                                                            It walks you through the same things as the The Rust Programming Language, but programmatically. I tend to learn better by doing, so it works for me. But the “puzzles” can be cryptic if you don’t use the hints.

                                                                            1. 3

                                                                              can anyone recommend a solid, simple place to start?

                                                                              I started on Rust doing last year’s Advent of Code. It was tremendously fun, and TBH a much softer learning curve than i imagined. It helps that the AoC challenges can usually be solved by very straightforward algorithms of parse input -> process data -> output, so i didn’t really have to “fight the borrow checker” or even think about lifetimes. And also Rust standard library and docs are excellent.

                                                                              I think only on one of the problems i got a bit stuck by Rust’s semantics and i had to learn a little bit about lifetimes. I tried modelling that problem in a more OO way, in the usual “graph of objects” way. Turns out, that’s usually not a very happy path on Rust, as you suddenly have to consider lifetimes seriously and the “who owns who” problem. Still, nothing that a coat of Rc (reference-counted pointers) on all those objects could not fix. And the problem could also be solved using a built-in data structure and some indexing, instead of having objects that point to each other.

                                                                              Anyways, i digress, my point being that i found Advent of Code an entertaining and effective way of approaching the language for the first time :)

                                                                              1. 1

                                                                                Rust has good PR, but when it comes down to brass tacks, most of the truly innovative stuff is still written in C. Maybe Rust will eventually dominate over C. But it’s been 8 years already. It’s also possible Rust will end up like C++ or Java. Only time will tell. I’m in no rush to learn the language.

                                                                                1. 3

                                                                                  Replacing C in our current infrastructure is a totally impossible task, especially in 8 years. I don’t think Rust needs to replace C to be meaningful and useful apart from any hype.

                                                                                  1. 1

                                                                                    I think the problem with anything replacing C is that the people left writing C are those that have convinced themselves it’s a good idea.

                                                                                    1. 3

                                                                                      I am employed to write C. I would prefer to be writing Rust, but BIND is critical infrastructure that needs to be maintained. I think it’s more true that C and C++ will continue to be used because of inertia, in terms of programmers’ skills and existing code. But you are right that this inertia will often get rationalised as a good idea, rather than due to the more realistic expense of change.

                                                                                      1. 1

                                                                                        Oh yeah, you’re totally right about inertia especially when it comes to large existing projects.

                                                                                1. 1

                                                                                  Company: Internet Systems Consortium

                                                                                  Company site: https://www.isc.org

                                                                                  Position(s): technical support engineer

                                                                                  Location: remote (preferably US business hours)

                                                                                  Description: Do you have some experience in successfully running critical network infrastructure and want to help others do the same? ISC is looking for another Technical Support Engineer to help sysadmins running our BIND9 and Kea DHCP software.

                                                                                  Tech stack: RT for custimer support tickets, Mattermost for chat, Zimbra for mail and calendars, Gitlab for software engineering. Kea DHCP is C++, BIND9 is C11.

                                                                                  Compensation: TBD

                                                                                  Contact: see https://www.isc.org/careers/ (please mention I referred you!)

                                                                                  1. 4

                                                                                    I’d like to see offset integers where they sort naturally in byte representation. So 00000000 is -128 and it goes up sequentially from there (01111111 is -1, 10000000 is 0, 10000001 is 1). It involves just flipping the first bit.

                                                                                    You can attempt similar transformations on (non-subnormal) floats. I have played with an Avro-esque data format where you can compare (eg sort into order) data directly via memcmp without decoding or understanding the semantics of the data format. I still haven’t decided whether this is an awesome or terrible idea…

                                                                                    1. 1

                                                                                      Data structures for ordered lookups can be a lot more efficient when they are specialized to dumb lexicographic order instead of some arbitrary ordering function. Radix trees and my qp-trie depend on lexicographic ordering to give meaningfully ordered lookups.

                                                                                    1. 2

                                                                                      Article talks about how they have less than 1U (1.75”) of space between trays with a lip on the bottom so they can’t fit a particular piece of hardware he wants. Is there a reason they can’t just not install the trays in every mounting hole on the racks? Just move a shelf up one mounting hole. This seems like an artificial constraint.

                                                                                      1. 3

                                                                                        They didn’t have any more holes at the top of the rack. The way shelves work would have left it wonky without both holes screwed in (they are two post, so they need four screws on the front side to really stand up).

                                                                                        The reason you don’t do this in real racks is airflow is always from cold isle to hot isle and you don’t want to let hot air spill back through the front. But they don’t seem to have plugged many of the other holes to keep airflow unidirectional.

                                                                                        Though I agree, it does seem to be more aesthetics than functionality in this case.

                                                                                        1. 1

                                                                                          Sadly the holes on the vertical mounting posts are not evenly spaced: the gap between the hole at the top of one U and the bottom of the next U is less than the gap between the centre hole in the U and its top and bottom holes. You can see this if you look closely at the diagrams in TFA, and the photo that shows one of the rear posts.

                                                                                        1. 3

                                                                                          This sounds like a thing which might be more convenient with some tooling support. Like you have a (partial) ordering over all .h files, the IDE knows about it, and if you type var foo = std::make_unique<Bar>() then the IDE automatically inserts #imports for <bar.h> and also all the headers that <bar.h> depends on, in the right order so that everything works out.

                                                                                          …at which point you’ve invented like half of a proper import system, but oh well.

                                                                                          1. 4

                                                                                            …at which point you’ve invented like half of a proper import system, but oh well.

                                                                                            Maybe? Proper import system I think is an unsolved problem for languages with C++/Rust style of monomorphisation. Semantics-wise, Rust crate/module system is great (that’s my favorite feature apart from unsafe). But in terms of physical architecture (what the article talks about) it’s not so great.

                                                                                            • There’s nothing analogous to pimpl/forward declaration, which significantly hamstrings separate compilation. C++ is better here.
                                                                                            • Although parsing and typechecking of templates happens once, monomorphisation is repeated for every compilation unit, which bloats compile time and binary size in a big way.
                                                                                            1. 1

                                                                                              analogous to pimpl/forward declaration

                                                                                              Box<>‘d opaque types? I’ve seen multiple blog posts mentioning using this for mitigating dependency chains.

                                                                                              Although parsing and…

                                                                                              I miss the SPECIALIZE pragma from GHC Haskell. Your generic functions get a slow fully polymorphic version generated (with an implicitly passed around dictionary object holding typeclass method pointers) and then you could easily write out a list of SPECIALIZE pragmas and to generate monomorphic copies for specific types you really care about the performance on.

                                                                                              This feels like it ought to be possible in principle to deduplicate monomorphisations happening in different compilation units with a mutex and a big hash table.

                                                                                              1. 1

                                                                                                Box<>‘d opaque types? I’ve seen multiple blog posts mentioning using this for mitigating dependency chains.

                                                                                                I don’t believe there’s a functional analogue to pimpl in Rust, but I need to see a specific example to argue why it isn’t.

                                                                                                What you could do in Rust is introducing dynamic dispatch, but it has significantly different semantics, is rather heavy weight syntactically (requires introducing single-implementation interfaces and a separate crate), and only marginally improves compilation time (the CU which “ties the knot” would still needs to be recompiled. And you generally want to tie the knot for tests).

                                                                                            2. 2

                                                                                              Tooling increasingly supports modules, which require you to do the opposite thing: have a single header for each library, parse it once, serialise the AST, and lazily load the small subset that you need. This composes with additional tooling such as Sony’s ‘compilation database’ work that caches template instantiations and even IR for individual snippets.

                                                                                              The approach advocated in this article imposes a much larger burden on the programmer and makes it very hard for tooling to improve the situation.

                                                                                              1. 2

                                                                                                This reminds me a lot of Robert Dewar’s paper on the GNAT compilation model, https://dl.acm.org/doi/abs/10.1145/197694.197708

                                                                                                He ditched the traditional Ada library database, and instead implemented Ada’s with dependency clauses in a similar manner to C #include, which made the compiler both simpler and faster.

                                                                                                1. 1

                                                                                                  Interesting, thanks. I am vastly out of touch with what’s happened in C++ since 1998.

                                                                                                  1. 1

                                                                                                    In 2004 the approach advocated in the article paid off. And the larger burden was not quite enough of an ongoing thing to really hurt.

                                                                                                    Modules would be much nicer if the ecosystem support is there. (I’m kind of thankful not to need to know whether it is… I spend a lot less time with my C++ tooling in 2022 than I did in 2004.)

                                                                                                    And this:

                                                                                                    additional tooling such as Sony’s ‘compilation database’ work that caches template instantiations

                                                                                                    sounds like the stuff dreams are made of.