Threads for Verdagon

  1. 1

    Generics are a Nightmare. Generics are much harder to implement than templates.

    I got curious about the implementation… I’ve been reading through the Vale blog posts for a while but it’s gonna take me a while to digest some of this!

    struct Tup<T1, T2> {
      0 T1;
      1 T2;
    func drop<T1, T2>(tup Tup<T1, T2>)
    where func drop(T1)void, func drop(T2)void {
      [a, b] = tup;

    It looks like generic functions can query the existence of function overloads? I think it’s like an anonymous trait impl lookup?

    The implementation of Option (Opt) is also interesting

    abstract func isEmpty<T>(virtual opt Opt<T>) bool
    where func drop(T)void;
    func isEmpty<T>(opt None<T>) bool
    where func drop(T)void
    { return true; }
    func isEmpty<T>(opt Some<T>) bool
    where func drop(T)void
    { return false; }
    1. 2

      Your understanding is correct! These generics don’t require traits, they only require that a function exists for a certain type. It’s a nice way to decouple functions from data. They’re explained more here:

      Also, that Tup is actually a casualty of the great generics endeavor. Before generics, it used to work like in and handle any number of elements. It’ll be fun to get that working in a post-generics world.

      #!DeriveStructDrop means “don’t automatically generate a drop function for this, I’ll be doing it myself”

      Some planned syntactic improvements that might be relevant:

      • where ~T instead of where func drop(T)void
      • Default to #!DeriveStructDrop, so the automatic drop function is opt-in instead of opt-out.
      • Have some sort of block to surround e.g. those isEmpty funcs, so that we can write the common where clause once.

      Happy to clarify anything, and feel free to drop by the discord too! Lots of folks toss around questions in there.

    1. 19

      Well, at least it’s short.

      The Company needs to be clear on what IP it owns and has rights to. Its customers, employees, and investors depend on the Company having the legal rights to the products and services it is providing so that the Company can continue operating and doing business.

      I think that this is a smokescreen. Investors want IP agreements so that they can claim that the company is duly diligent in sourcing its software, regardless of how undocumented the internal software supply chain actually is.

      The Company also believes that it’s important to be clear on what it doesn’t own. The Company doesn’t want you looking over your shoulder every time you work on something personal or worrying that the Company will someday seize your open source non-lethal mousetrap simulation software. In other words, the Company isn’t interested in appropriating your personal projects.

      I wish I could believe this! Multiple prior employers have tried to seize my personal projects.

      Due to issues of scale, fairness, and consistency, the Company cannot, by and large, negotiate its terms.

      This is sneerworthy.

      In some cases, the Company may need rights to Your IP.

      You mean that the company may want rights. Honestly, this sort of thing makes me wish that companies just proffered CLAs to their employees on a per-project basis. That way, the company doesn’t have any uncertainty about which rights have been granted; they’re in the license.

      The Company might someday need to show the work that went into the development of IP that it uses or has used, or to establish that it owns the IP or has rights to it. To help in those situations, you agree to maintain all records relating to the development of any Company IP, and, if the Company asks, to provide those records to the Company.

      Nope. In fact, this entire paragraph is rude. In general, at the time of expression of idea or concrete implementation, an employee will have already turned all such records over to the company. The company should require employees to keep records in systems of record anyway.

      1. 6

        If you’re comfortable / able to share, what projects did employers try to seize from you?

        1. 13

          Probably the funniest one was when Google wanted copyright over a reverse-engineered Minecraft server. Other contributors helped push back, and Google eventually gave up. They also wanted one of my programming languages; apparently it was an unwritten priority to keep tight control of such projects to avoid repeating the setup for a then-ongoing lawsuit.

          1. 1

            The setup for Oracle v Google was an un-seized personal project from an employee? Can I read about it somewhere? Just skimmed the background section on Wikipedia but it doesn’t seem to come up.

            1. 3

              No. Rather, Google v. Oracle was presented to me as a pretext for why Google ought to have control over the Monte programming language. It’s a sort of implicit threat which is easy to deny as a misunderstanding: if you don’t cooperate, Oracle will sue you in the middle of the night and you’ll need a big strong employer to fund your legal defense. To be fair, in the Cover Oregon scandal, Oracle sued their customers, and Monte is distantly related to Java, but it struck me as tone-deaf; Monte’s largest influences are E and Python, which have liberally-licensed documentation and reference implementations.

      1. 5

        Hah, I think I know what I’m doing this weekend.

        I’m super enthusiastic about CHERI because I secretly and privately suspect that hardware-level security is essential for systems security. The CHERI approach is sometimes contrasted to e.g. Rust’s, in that Rust prevents unsafe program flow at compile-time, whereas CHERI disallows it at runtime even if it leads to a crash, but I think the two are complementary and that CHERI’s is actually essential for some types of software. I don’t think CHERI’s value lays primarily in protecting unsafe legacy code, I think it’s a very good foundation for hosting future code.

        If you look at the kind of vulnerabilities that were found in unsafe Rust code over the years, many (if not most) of them aren’t really textbook buffer overflows, things like “I have this eight-item buffer, and this index, and I didn’t check the index, so now I’m writing the 65536th item and oooooh there goes your stack”.

        Lots of them are actually due to things like premature returns out of an unsafe context which introduces inconsistent state in otherwise safe constructs, so you wind up doing the wrong thing in the safe code that’s executed afterwards. That is, many of them work by injecting incorrect state at runtime into contexts that have been checked at compile time. As long as hardware models are memory unsafe, they will need memory unsafe constructs to manipulate them, and will forever be subject to the tyranny of Shea’s law, that interfaces are the primary locations for screwing up a system.

        1. 9

          I’m a huge fan of safe languages (my main objection to Rust is that it has so many caveats next to its safety) for any domain where they’re usable. One of my goals with this project was to demonstrate that being able to rely on memory safety for you lowest-level systems code (including assembly) has a huge impact on the things that you can build.

          One of the examples in the repo has a JavaScript VM. The FPGA on my desk is using a lightly compartmentalised MQTT/TLS/TCP/IP stack to fetch JavaScript bytecode from the Azure IoT hub, which it then runs in a compartment with access only to the memory allocator, the UART and the MMIO register for the LEDs on the board. The JavaScript VM is not trusted to isolate the JavaScript, a bug in it gives no more rights than the JS has anyway. It’s currently running some code that shows some patterns on the LEDs. A complete compromise of this compartment gives no way of accessing the network stack.

          1. 6

            Disclaimer - I’m a co-author of this blog.

            I highly agree with you. Regarding Rust and CHERI, I think it’s a mistake to capture the debate as “Rust vs CHERI”. We need both - Rust is the right, wise choice for most codebases (runtimes, parsers, and most parts of a modern OS). However, as we wrote in our first blog on our project - “The core parts of any OS, including an RTOS, involve doing unsafe things. The memory allocator, for example, has to construct a notion of objects out of a flat address range. Safe Rust can’t express these things and unsafe Rust doesn’t give us significant benefits over modern C++ code that uses the type system to convey security properties.” (ref: ).

            That’s precisely where CHERI comes into play. Unlike other options, it introduces deterministic mitigations to most memory safety bug classes. And in addition, it doesn’t rely on secrets (unlike other mitigation strategies that are relied on secrets and are vulnerable to architectural side channels).

            I’m very excited about our approach because scaling CHERI to small cores can revolutionize the IoT and embedded industries. Think about how many different C/C++ codebases these environments have (mostly C, by the way). And in top of that, IoT/embedded products usually don’t have modern memory safety mitigations in place. CHERIoT is a fantastic solution to that - get the new hardware, use the toolchain to rebuild your code - and that’s it. You have deterministic mitigations to spatial safety, heap temporal safety, pointer integrity, and so on.

            1. 1

              Once you compile your C/C++ program to target CHERIoT, is it very loosely analogous to compiling it with (much stronger versions of) sanitizers? So code that in the past had seemingly “worked” while doing some weird undefined-behavior memory stuff will cause a crash?

              1. 4

                In addition to Saar’s comment, it won’t necessarily crash. If the compartment has an error handler then this will be called before any corruption occurs. The compartment can then try to recover (typically by resetting to a known-good state and returning to the calling compartment).

                1. 3

                  The short answer is “yes”, but emphasizing “much stronger versions of” :)

                  Sanitizers can help you with spatial safety guarantees - they have bounds check before any dereference, so all loads/stores should be protected for bounds (by the way, Firebloom does that in iBoot, see this for details: ).

                  Besides spatial safety, there are a few chosen additional points I would like to emphasize:

                  1. Lifetime ownerships: CHERIoT has deterministic mitigation for heap temporal safety - all UAFs/double free/dangling pointers are not exploitable because of the Load Barrier and revocation. However, with sanitizers, if you free an instance of type A, and reclaim it with another object while holding a dangling pointer of type A*, your sanitizer can’t catch that.

                  2. Pointer integrity: sanitizers can’t enforce pointer inregrity. Attackers can corrupt/fake pointers freely or confuse them with integers. With CHERIoT, it’s impossible to forge pointers, and there is no confusion possible between reference to memory and integers. This is a powerful property enforced in the architecture.

                  3. In our case, the compiler is not in the TCB for all the security properties (as it is with sanitizers). Look at the “Intracompartment” subsection in the blog for details.

                  And of course, on top of that, the cost of sanitizers is high: it highly increases code size and hurts perf.

                2. 1

                  I have a weird question for you, along those lines: is it possible to combine software-based mechanisms (like the borrow checker) and hardware-based ones like CHERI, to get the best of both worlds? IIRC, CHERI has some (small amount of) overhead, and it would be interesting to see what a language could do for memory safety if it knew it was targeting CHERI.

                  From my reading on CHERI, it likely isn’t possible because it basically means bypassing very pervasive and universal restrictions, and I’m not sure it even makes sense to ask if it can be bypassed. Perhaps if the compiler threaded a “universal permission” value of sorts through every function, and the compiler could use that permission when trying to dereference something it knows is live?

                  Context: I’m the lead of the Vale programming language (, and once we finish our “region borrow checker” proof of concept (hopefully this week, fingers crossed!) which currently falls back to generational references (think memory-tagging) but I’ve been wondering how it might integrate well with CHERI.

                  1. 4

                    IIRC, CHERI has some (small amount of) overhead, and it would be interesting to see what a language could do for memory safety if it knew it was targeting CHERI.

                    Combining in this way is hard because (as you say) you need to have a mechanism that allows verified code to turn off checks, but which can’t be used by untrusted code (if it can, you have no security). The overhead of CHERI is pretty low. It’s basically the overhead of doubling pointer sizes. I think the worst we see on CHERIoT Ibex is around 15% and it’s that high because it’s aggressively optimised for area at the expense of performance and so doesn’t increase the size of the memory bus. I’m exchange for that, you get spatial and temporal memory safety, no pointer-data type confusion, and no ability to fake pointers.

                    It’s more interesting to me to use static type system properties to get a larger set of guarantees than the ones that the hardware can provide.

                    That said, you should look at the JavaScript interpreter in the repo. It uses 16-bit integers for JS pointers. They are treated as indexes into a small set of capabilities that come from the heap allocator. Within type-safe JavaScript code, you can use tiny pointers. When you interop with C or assembly, you can expand them to full pointers. For extra fun, the JS GC integrates with our temporal safety model so any pointer to the GC heap is implicitly invalidated when the GC runs.

                3. 1

                  This is something I don’t hear often, I’d love to learn more! Do you happen to recall any real-world examples of unsafe code causing unsafety in the surrounding safe code? I can work up a trivial example of it, but it would be interesting to read about a real case.

                  Also, are there certain domains that should care more about this kind of thing than others? Safety critical systems, or ones particularly likely to be attacked perhaps?

                  (I’ve never heard of Shea’s law, that’s a good one.)

                  1. 1

                    This is something I don’t hear often, I’d love to learn more! Do you happen to recall any real-world examples of unsafe code causing unsafety in the surrounding safe code? I can work up a trivial example of it, but it would be interesting to read about a real case.

                    Sure, there’s actually a cool paper that inventories several classes of such bugs, and develops a formal framework for their analysis: . The one I was aluding to above was CVE-2020-36317.

                    Also, are there certain domains that should care more about this kind of thing than others? Safety critical systems, or ones particularly likely to be attacked perhaps?

                    That’s kind of why I found that bug interesting – actually, that’s how I ran into the RUDRA paper. I wasn’t specifically looking into memory safety violation patterns, what I was looking into was error handling in unsafe contexts, because that’s pretty relevant for safety critical systems. I was recently reminded of it in another thread here, I think @david_chisnall posted a link a few weeks ago and I kept coming back to it afterwards because I am now actually specifically interested in its subject.

                    But I think it’s a matter that has significantly wider relevance than just safety-critical code. Interfacing with underlying OS features routinely involves interfacing via unsafe code. So, frankly, I think pretty much all fields should care about this kind of thing, because pretty much all of them rely on it to some degree.

                    (I’ve never heard of Shea’s law, that’s a good one.)

                    It’s #15 on an equally notorious list :-).

                1. 5

                  Very interesting. I held a talk with essentially the opposite message :).

                  The TL;DW of the talk is: constraints can actually bring about new and better abstractions. Adding constraints can enforce standardization, which in turn enables new abstractions to be created by being able to make certain assumptions. For example: the Rust borrow checker enforces constraints which leads to stronger guarantees, enabling race-free concurrency (modulo unsafe).

                  I don’t disagree with the article, so I’ll need to merge these two perspectives.

                  1. 4

                    I skimmed the talk, and from what I can tell you’re talking about constraints that are “limits on your expressiveness”, while talking about constraints that are “numerical goals of the system”. It’s the same word, with two very different meanings, which we both use in software.

                    ie, the Rust borrow checker enforces constraints, which lead to race-free concurrency, which we need to use to satisfy the performance constraints of the program.

                    1. 2

                      Yes, that’s right. I’m curious though if the line is always that clear, or if there are fuzzy things happening at the boundary? If you generalize performance constraints into “non-functional constraints”, it’s easy to imagine self-imposed non-functional constraints that serve to reduce complexity.

                    2. 3

                      Thank you for this talk. As an everyday developer, I subscribe to this viewpoint both personally and professionally. I learnt this from a podcast episode Constraints Liberate with Mark Seemann.

                      1. 2

                        Thank you, it’s great to hear it resonates :) hopefully it wasn’t too abstract (pun not intended), it was quite challenging to make practical and concrete.

                      2. 2

                        Two valid perspectives, to be sure. I think one of the distinguishing factors is how much a constraint helps with inherent complexity and how much it adds unnecessary artificial complexity.

                        The borrow checker is a great example of this. It surfaces (and enables us to better handle) the complexity inherent to zero-cost memory safety and zero-cost fearless concurrency. However, its core mechanism (eliminating shared mutability) also adds some artificial complexity, which can be felt when e.g. trying to implement an observer pattern within the borrow checker (hint: you can’t).

                        For some domains, the zero-cost aspect is worth the artificial complexity. For other domains, something like Pony works better, and the borrow checker’s constraints’ artificial complexity becomes a net negative.

                      1. 5

                        Wonderful article, I think anyone who wants to be a better software engineer should read it. I was recently considering how constraints affect complexity and developer velocity in programming languages, and my findings resonate a lot with what’s in this article.

                        My favorite part:

                        Constraints can break modularity … punch holes in abstractions

                        This is the most damaging thing a constraint can do, and it’s often incredibly subtle and hard to notice unless you’re the one writing it. A good engineer will try (or at least consider) the simplest and most decoupled solution first. But then if they discover a constraint that it breaks / doesn’t uphold, the engineer must unfortunately diverge from the simple solution to satisfy those constraints.

                        It gets even more insidious: the more one works with a certain constraint, the more often they default to the more complex solution that satisfies it. It’s harder to spot when just reading the code, because we don’t have the original, simplest design handy to compare to.

                        It happens particularly often with infectious constraints like async/await, borrow checking, and often even with static typing. Luckily, there are solutions: Go’s goroutines and JVM’s Loom successfully decouple async/await’s constraints out of the code, GC (and RC) decouple memory safety concerns out of the code, and interfaces resolve this for static typing. These sometimes have a run-time cost, but it’s often worth it to maintain a codebase’s simplicity: upholding developer velocity is often more important than losing a little run-time speed, especially if not in the hot path.

                        I would love to hear anyone’s thoughts or experiences in reducing the impact of constraints, or on how to isolate constraints so their resulting complexity doesn’t infect neighboring areas of the codebase.

                        Great article!

                        1. 5

                          Great article!

                          If one wanted an actual conclusive number, how might they get it? Higher sample sizes? Longer run time?

                          I also wonder how much LLVM’s own optimization and assumptions are in play, and whether one might fork LLVM to take out bounds checks instead.

                          1. 2

                            I’d start with a better representation of the results. Definitely more than one number. can handle different distributions and outliers for example. Step 1 is basically “show me the distribution of the results”.

                          1. 1

                            I just don’t understand this need to be first out the door with features. Microsoft, a behemoth of a company, has never been first to market anything in its lifetime, except maybe BASIC interpreters (maybe). Operating system for the PC? Nope (there were three available up on the release of the IBM PC, and Microsoft wasn’t the first choice). GUI? Nope. Spreadsheet? Nope. Word processor? Nope. Slide presentations? Nope. Cloud offerings? Nope. Linux support? Nope. It hasn’t seemed to hurt them being second, or third, or fourth, to market one bit.

                            1. 5

                              I think it’s more about getting your product out there before you run out of money.

                            1. 7

                              I’m absolutely not convinced by the Google Earth argument:

                              For example, the Google Earth app is written in a non-memory-safe language but it only takes input from the user and from a trusted first-party server, which reduces the security risk.

                              As long as that server isn’t compromised, or a rogue DNS server doesn’t direct clients to a host controller by attackers, or… PDF viewers also nominally only “take input from the user”, but I will run out of fingers if I try to count arbitrary code execution vulnerabilities in PDF clients.

                              Anything that can take arbitrary input must never trust that input blindly.

                              1. 8

                                Note that the author worked in Google Earth team, and is speaking from experience.

                                I agree you should not trust input blindly, but it is a spectrum, isn’t it? I hope we can agree that memory safety is less important in Google Earth compared to Google Chrome. That’s all the author is saying.

                                1. 5

                                  I think by “take input from the user” he meant GUI events like click/drag/keystroke, which we can agree are pretty safe. (Mostly. I just remembered that “hey, type this short gibberish string into your terminal” meme that triggers a fork bomb…)

                                  “Open this random PDF file” is technically a user event, but that event comes with a big payload of untrusted 3rd party data, so yeah, vulns galore.

                                  1. 4

                                    I feel like Earth is a bad example, then. Google Earth lets you read/write your GIS data as a KMZ file. KMZ files are zipped(!) XML(!!) documents – that’s quite a bit of surface area for a malicious payload to work with.

                                    1. 1

                                      Keep in mind Earth is sandboxed, since it runs on the web (and Android and iOS) so theres just not much damage that can be done by a malicious KMZ.

                                      1. 1

                                        😬 At least there probably aren’t too many people using that feature. Unless they start getting urgent emails from “the IT department” telling them “you need to update your GIS data in Google Earth right away!!! Download this .kmz file…”

                                        1. 2

                                          But some of the people who do use that are very juicy targets who make heavy use of features like that, like people mapping human rights abuses or wars. It’s not like this software is just a toy that doesn’t see serious use.

                                    2. 2

                                      Earth passed some pretty intense security reviews before it was launched and before every major feature release, and those security teams know their stuff.

                                      Likely a big help was how well sandboxing works in WebAssembly, Android, and iOS.

                                    1. 20

                                      I love this topic because it’s so nuanced

                                      It’s not nuanced - you always want memory safety. Memory unsafety is a bug, the severity of which might depend, but it is always a bug.

                                      1. 9

                                        I’d argue that up until these last few years, it was nuanced. The proving out of Rust has destroyed the nuance for all cases except assembler.

                                        1. 4

                                          That could be true, except for the artificial complexity and iteration costs that can sometimes be introduced by the borrow checker’s restrictions. Sometimes, the cure is worse than the disease. Not often, but sometimes.

                                        2. 6

                                          Not sure if you read all the way through, but the article mentions a few reasons where one might opt for less safety, mostly in the costs of all memory safety approaches.

                                          1. 5

                                            It’s nuanced because the definition of memory safety depends on some abstraction for memory regions. Memory safety is a spectrum. An MMU enforces that every memory access must be to some valid memory. With a sanitizer-like approach, it means that every memory access is to a valid live object. In a typical programming language it adds a provenance model so every pointer is derived from a pointer to a valid object and any memory access via that pointer can only access the object and only for the lifetime of that object.

                                            If you are implementing a memory allocator then a definition of memory safety that is built on top of an object model doesn’t really help because you are the thing responsible for providing the definition of an object. This is even more true for an OS kernel because you are providing the process abstraction and are responsible for page table setup. You can pretend that you have a language with object-level memory safety but anything that can modify page tables can bypass it and if most of your code pokes the page tables directly then you have no benefit. You need to understand what the benefits that you want from memory safety are, what is able to bypass memory safety, and how you limit the danger there.

                                            Even if you are writing application code, C libraries can bypass language-level memory safety and the OS kernel can bypass everything (and needs to for I/O). You still need to think about what you’re getting from memory safety, whether it’s reducing cognitive load (I have a GC and plenty of RAM, I don’t need to think about lifetimes!), reducing bugs (my compiler errors if I write this, I don’t have to debug it!), or providing a strong security guarantee for isolating untrusted code (I can provide a plugin API without violating the integrity of data owned by my own code!).

                                            1. 4

                                              Memory unsafety is a bug, the severity of which might depend, but it is always a bug.

                                              Be careful what words you are choosing, because this sentence right there is false. Even though more likely than not you know everything I’m outlining below.

                                              • A bug is when a program does not behave as desired. This includes crashes and security vulnerabilities.
                                              • Something is unsafe when using it allows us to program bugs. A C function that dereference pointers they accept as argument are unsafe, because if we give them an invalid pointer we get Undefined Behaviour™, nasal demons…

                                              For instance, this API is unsafe:

                                              // Prints "Hello <name>!"
                                              // Does nothing if name is NULL
                                              void hello(const char *name)
                                              	if (name == 0) return;
                                              	printf("Hello %s!\n", name);

                                              Because if the string I provide is not NULL terminated I’m going to run into trouble:

                                              const char tom[3] = {'T', 'o', 'm', };

                                              The unsafety of hello() allowed the bug in user code. For hello() itself to have a bug, I need to make a mistake, such as forgetting the NULL check:

                                              // Prints "Hello <name>!"
                                              // Does nothing if name is NULL
                                              void hello(const char *name);
                                              	printf("Hello %s!\n", name);

                                              C is dangerous and a huge source of bugs. But it is not itself a bug.

                                              Likewise, walking a slackline in the mountain with no safety equipment is insanely dangerous, but you won’t die because of the line itself. Most likely it will be your inadequate competence, lack of attention, or hubris. (Nevermind adequate competence may be unattainable to begin with).

                                              1. 3

                                                Not wearing a seatbelt while driving, on the other hand, could very well ding your car insurance payment if you get into a collision:

                                                1. 2

                                                  The bug there is then the lack of a comment on hello() saying “Takes a string that must be null-terminated”. The implementation is under-defined.

                                                  1. 3

                                                    The hello() does not take a length parameter. It’s obvious to anyone it expects a null-terminated string. How else are you going to determine the end of the string, look for an EOM control character?

                                                    But that’s not important. Even if I conceded your point and fix the comment, my main point would remain: hello() can be unsafe and bug-free.

                                              1. 12

                                                Replicating my take from hackernews:

                                                Chris makes some points which are interesting in their own right, particularly the one comparing the amount of research which has gone into tracing/copying gc vs rc. But I think there is an unmentioned bias which is worth pointing out. Tracing gc has found massive success in enterprise java applications, which:

                                                • Need to scale in terms of code size and contributor count, and therefore need the modularity afforded by tracing gc

                                                • Need to scale in terms of heap size, and therefore run into the pathological fragmentation issues that come from non-compacting gc

                                                • Need the higher throughput afforded by tracing gc

                                                • Generally run on massive servers that have the cores, memory size, memory speed, power budget, and heat dissipation required to support high-quality tracing gc

                                                Contrariwise, swift is a product of apple, a laptop and phone company, and is used to make laptop and phone apps which:

                                                • Are unlikely to ever get really huge (people balk at large downloads)

                                                • May run with limited memory, and need to share that memory with other apps

                                                • Run on battery-powered hardware with little or no active cooling, and so need to attempt to conserve power

                                                No silver bullet, as they say. I implement APL, which absolutely cannot do without reference counting, except perhaps in some novel configurations. I think that for the more usual case of a pointer-chasing language, tracing gc is almost certainly the right choice.

                                                Lastly, however, I would be remiss not to bring up bacon et al, ‘A Unified Theory of Garbage Collection’, which notes that tracing gc and reference counting are not really opposed to one another, and that a sufficiently advanced implementation of either will approach the other. Lattner mentions barriers (though it is important to note that barriers are much cheaper than reference count twiddling); on the other side of the coin, consider deferred reference counting, one-bit reference counts, …

                                                The bottom line being that memory management (or more generally allocation of finite resources; see e.g. register allocation) is hard, in all cases; a sufficiently robust and general solution will end up being rather complicated no matter what it does; and, again, there is no silver bullet.

                                                1. 5

                                                  I think the first time I met Chris was at FOSDEM 2011, when I had almost exactly this conversation with him. I was giving a main track talk, he was giving a keynote and so we were staying in the same hotel and had breakfast together. I complained a lot about GC in Objective-C. I had two main objections:

                                                  First, the type system didn’t differentiate between memory that could and couldn’t store Objective-C pointers. NSAllocateCollectable had a flag indicating whether the GC’d buffer could contain pointers to GC’d objects but the return type was void* in both cases, as was the return type of malloc. If you cast a void* buffer to id*, you could store pointers in it without any compiler warnings. If the underlying storage for the buffer were on the stack, in a global, the return from NSAllocateCollectable with NSScannedOption, or a field (possibly an array) in an object, then everything worked fine. If it were anything else, then sooner or later the GC would deallocate the object and you’d have a dangling pointer. This meant that Objective-C with GC had more memory safety bugs than with manual retain / release.

                                                  Second, there was no good incremental adoption story. You had to port your Objective-C to work with GC’d mode and then you could also do manual memory management and compile it in a mode where it ran with both modes, but this significantly increased your testing burden and meant that you couldn’t actually use the GC’d version until all of your dependencies had been upgraded. Destruction and finalisation were completely different code paths and you needed to implement both.

                                                  To address the second (on the train to a prior FOSDEM), I’d hacked together something based on the Bacon paper that used the GC barriers that the compiler inserted in GC’d mode but used them to do retain / release operations and then added a cycle detector on top of this. The cycle detector used Objective-C reflection metadata and a fallback path that allowed an object to explicitly walk data structures that were not exposed in this way (e.g. malloc‘d buffers). This let you use retain/release and GC’d object files together and get the cycle detection spanning both. The code had some concurrency bugs but it served as a reasonable proof of concept. During collection of a garbage cycle, it deferred the actual deletion until all destructors had run.

                                                  After listening to my rant, Chris smiled and told me I was going to be very happy that summer. The fact that ARC was announced at WWDC on my birthday is, I’m sure, a coincidence, but I like to think it was Apple’s birthday present to me.

                                                  I still like the determinism of reference counting but it has one very big problem: it does not play nicely with modern cache coherency protocols. For scalable memory use, you want every object to be in one of two states:

                                                  • Mutable, used by a single core at a time (exclusive state in that core’s cache, invalid everywhere else)
                                                  • Immutable, used by as many cores as you like (shared state in all caches)

                                                  Reference counting for mutable objects in this scheme is completely fine. For immutable objects, it is bad because every reference count update is a read-modify-write operation. This is solvable, but only if your type system understands and guarantees global immutability.

                                                  In Verona, when you freeze a region (which creates an immutable object graph from the objects in that region and all child regions), we walk the graph to do SCC construction and generate reference counts for each set of strongly connected components (SCCs). This lets us use reference counting even for cyclic data structures by taking advantage of the fact that the immutability guarantee means that the shape of the object graph can’t change, so any cycle shares a refcount between all objects. Having one reference count for each SCC makes this problem worse in the first instance because taking a reference to any object potentially introduces false sharing with the refcount of another object in the same SCC. The immutability guarantee should offset this significantly by allowing us to significantly reduce refcount manipulations. There was a lot of work on refcount elision in the ’80s that was obsolete in the ‘90s when multithreading became popular and all of the invariants stopped holding in the presence of concurrent mutation. This all becomes possible again in the immutable case because there’s no concurrent mutation. This means that if we hold a reference to object A, then we don’t need to do refcount manipulations for any transient references that we hold to any object reachable from A. We need to incref / decref when assigning to a field in the heap, but we can (if we need to) optimise that as well by keeping a remembered set per region so that we have a local refcount for each SCC that we’re holding in a particular region and only increment / decrement the global reference count on transitions between 0 and 1 for the local refcount.

                                                  The Verona type system also guarantees that no mutable object is shared between threads and so we can do non-atomic refcount manipulation, which simplifies a bunch of things. Reference counting requires a tradeoff between refcount size and cost of handling overflow. You can use a 64-bit refcount and guarantee that you never see overflow (okay, if you incref once per cycle on a 2GHz machine for 2000 years, you’ll see an overflow, but if your program has 2000 years of uptime then you’re winning), but then you’re burning 8 bytes per object. That’s fine for Objective-C, where objects are generally large, but not so fine for languages that encourage small objects with 1-2 fields. You can steal the low bits from the descriptor (/ vtable / isa) pointer, but then you have to handle the overflow case. This is really annoying for atomic refcounting because now you have to do a load/cmpxchg pair and branch on overflow. It’s much simpler for non-atomic cases because you can ignore the case of two increfs coming in at the same time, but it still requires the fallback code.

                                                  One of the starting points for Verona was the acknowledgement that there is no on-size-fits-all solution to memory management. We want you to be able to choose a memory management strategy for each region, independently. If you know that you have linear ownership then we can avoid any dynamic book keeping. If you know that you have no cycles (or you’re happy for cycles to leak until the region is destroyed), then we can do refcounting. If you want complex object graphs, we can do tracing (but the fact that regions are single-owner means that we don’t need to do concurrent GC to avoid pauses, we can do local collection on the region at any point when its owner doesn’t have anything else to do). If you have a load of short-lived objects then we can do arena allocation and deallocate them all at once with no sub-region memory management policy at all.

                                                  1. 4

                                                    I still like the determinism of reference counting

                                                    ‘RC as deterministic’ is a bad meme. For resources other than memory (files, mutexes…), scope-bound or linear handling is more predictable and suffices for simple cases; complex cases (say, caching server) will degenerate to manual management no matter what you do (e.g. in the case of a caching server where you’ve kept around a reference to a file that you want to close, you would rather close it and then crash and burn when somebody tries to read through the other reference, rather than leak it, so if you write something like that in a gc or rc language, you will bypass those mechanisms). For memory, the possibility of cycles and rc cascade means you sacrifice modularity for that predictability, and pay in leaks or pauses when you make a mistake. Whereas gc won’t pause if you use a good concurrent or incremental implementation, and eliminates one class of leak; both of which make it easier to reason about the behaviour of programs using code you didn’t write.

                                                    One of the starting points for Verona was the acknowledgement that there is no on-size-fits-all solution to memory management

                                                    I’m curious: what are your thoughts on ‘A Framework for Gradual Memory Management’?

                                                    1. 5

                                                      Sorry, ‘deterministic’ has a lot of overloaded meanings and you’re right that I’m implying a lot more than I actually want to claim. In the context of reference counting, I specifically mean that your worst-case peak memory consumption is deterministic. This is important for a lot of use cases. On a mobile device, it’s the number that causes your app to be killed. For a server / cloud deployment, it’s often the bounding function for how much you’re paying. With a global tracing implementation, your peak memory consumption depends on the concurrent interaction between your mutator threads and the collector. The collector is not under your control.

                                                      Any GC implementation is a tradeoff in three dimensions:

                                                      • Memory overhead
                                                      • Mutator throughput
                                                      • Mutator latency

                                                      You can optimise for any of these. Stop-the-world collectors optimise for mutator throughput. They impose no penalty on mutators while they’re running (no barriers), but they introduce latency spikes and large memory overhead (memory consumption grows until the pause). Fully concurrent collectors need to synchronise mutator and collector thread states and so impose a mutator throughput penalty from barriers (as does reference counting, form the refcount manipulations) and from having GC threads running (RC can be thought of as having a GC thread that preempts the mutator thread and runs on every decref-to-zero) but don’t impose a latency penalty. Most GC implementations (including ObjC / Swift’s ARC) are tuned for one or two places in this tradeoff space. Our hypothesis for Verona is that, as programs grow in size and complexity, there is no point in this tradeoff space that is a good fit for all of the program.

                                                      1. 1

                                                        Thank you for your fascinating and thorough comments!

                                                        I thought I’d been keeping up to date with Verona’s docs, but apparently not, as a lot of this is new to me! I was particularly surprised that Verona was exploring the idea of making regions immutable.

                                                        Is there anything public that I can read on this? I’d love to compare it to our approach for Vale at [0], where we can immutably open a mutex, or implicitly freeze all pre-existing memory when entering a pure function.

                                                        Also, where is the Verona team based? If it’s in the US, I’d love to stop by and buy a round for them all and talk regions!


                                                        1. 1

                                                          I thought I’d been keeping up to date with Verona’s docs, but apparently not, as a lot of this is new to me! I was particularly surprised that Verona was exploring the idea of making regions immutable.

                                                          Freezing regions to create immutable graphs has always been part of our abstract machine. It’s something that you can do only if your type system has viewpoint adaptation, to allow you to express the idea that an object is immutable because the object whose field you read to access it was also immutable.

                                                          Is there anything public that I can read on this? I’d love to compare it to our approach for Vale at [0], where we can immutably open a mutex, or implicitly freeze all pre-existing memory when entering a pure function.

                                                          Not a huge amount, we’ve open sourced it to collaborate with a few universities but we’re not great at documentation yet.

                                                          Also, where is the Verona team based? If it’s in the US, I’d love to stop by and buy a round for them all and talk regions!

                                                          We’re in Cambridge (UK).

                                                    2. 1

                                                      Having one reference count for each SCC makes this problem worse in the first instance because taking a reference to any object potentially introduces false sharing with the refcount of another object in the same SCC.

                                                      Sorry, what does “in the first instance” refer to here?

                                                      1. 1

                                                        First-order effect. It enables later optimisations but without those it makes the false sharing worse. As second-order effects, we believe, the combination lets us avoid any tracing after the freeze and makes it easy to elide refcounts via simple local dominance analysis but as a first-order effect it increases false sharing on refcount manipulations.

                                                    3. 3

                                                      Yeah–the sort of Occam’s Razor explanation of Apple going this direction is that your initial reference counting implementation is likely to have have decent latency characteristics, which is handy if you want 60fps UI updates on a phone, and then you go back and work to improve throughput.

                                                      With GC you can get good multithreaded throughput earlier, but there’s a long road to combining that with almost-pauseless operation.

                                                      And other factors: trying to be compatible with C complicates moving collectors, and like you mention GC really benefits from spare RAM, which was not really available on early smartphones.

                                                      To me, an area that seems interesting to explore is that Rust has shown that humans can handle annotating their code with a lot of info on lifetimes and what is or is not shared. We’ve seen how that works out with Rc/Arc as the main escape hatch for lifetimes that are hard to capture statically (an almost inevitable choice for Rust since it’s trying to work in a systems context). It doesn’t seem like it’s been worked out quite as far on the GC side.

                                                      1. 1

                                                        And other factors: trying to be compatible with C complicates moving collectors

                                                        Having implemented a moving collector for C, I would go a step further: you cannot have a moving collector without breaking large amounts of real-world C code or imposing very large performance penalties. In C, it is very common to cast a pointer to an integer and then use that as a key in a hash table or tree. Trees are kind-of okay, because you just need to keep a stable ordering, so a moving collector that compacts within an arena works. Hash tables are not because address equality is also used for identity. Identity is where the problems start.

                                                        In Java, for example, there is a hashCode method that returns the address of an object the first time that it’s called. Subsequent calls will return the address that the object had on the first call. If the object is relocated then the old address stays in the header. This is fine in Java because hashCode is not guaranteed to be unique, it is only guaranteed to have a low probability of collision. It’s also possible in Java because arrays are objects with an indexing operator. In C, you cannot grow an object during relocation because it would break array address computation, so you’d need to unconditionally allocate space for the stashed address (and, because arrays don’t really exist in the C type system, you can’t us the index of an object in an array to find the address of the start of an array and amortise this in arrays). Worse, you can take the address of fields in C, so this problem get a whole lot worse if you’re using field-address-equality comparisons to define identity and an object is relocated.

                                                        To support a moving collector in C, you’d need (at the very least) to add padding to every object and make pointers fat pointers so that interior pointers can find the start of the object. That would be a completely new ABI that broke assumptions that a lot of C code has.

                                                        1. 1

                                                          you cannot have a moving collector without breaking large amounts of real-world C code or imposing very large performance penalties

                                                          There is another option: virtual memory tricks. (Interleave sparse pages with disjoint coverage.) I came up with this scheme on a lark, and was shocked to find an actual academic paper had been written on the subject. I don’t have a pointer handy, but I presume it was written by emery berger.

                                                          In Java, for example, there is a hashCode method that returns the address of an object the first time that it’s called. Subsequent calls will return the address that the object had on the first call. If the object is relocated then the old address stays in the header

                                                          There is an interesting alternative to this approach, which I believe is used by SBCL: when objects are relocated, tell hash tables containing them to rehash themselves. This is nice because you can still use an object’s address as a hash. But SBCL only has a copying nursery; its oldspace is immobile; I expect that for the more advanced java gcs, the overhead of rehashing tables in oldspace ends up not being worth it.

                                                          That would be a completely new ABI that broke assumptions that a lot of C code has.

                                                          I find it rather unfortunate the extent to which c programs rely on vendor-specific behaviour. It is of course valuable that mcu vendors can provide bizarro compilers, but this is of fairly little significance to most application programmers, and somewhat limits the use of the standard for the latter. Because it is not practical to change the entire body of existing c code, I would like to see some standardisation–whether optional, unofficial, whatever—of these language features which are exploited by nearly every real c program. Appendix J does this in principle, but not in practice.

                                                          1. 1

                                                            I came up with this scheme on a lark, and was shocked to find an actual academic paper had been written on the subject. I don’t have a pointer handy, but I presume it was written by emery berger.

                                                            Emery Berger did some fairly recent work in this space, but the original paper proposing this was actually written by Vikram Adve, Chris Lattner’s supervisor.

                                                            There is an interesting alternative to this approach, which I believe is used by SBCL: when objects are relocated, tell hash tables containing them to rehash themselves.

                                                            That works if you can track everything that depends on the address. This isn’t the case for C, where you can cast a pointer to a long and the lost that integer value under an unbounded amount of data flow through type punning.

                                                            Because it is not practical to change the entire body of existing c code, I would like to see some standardisation–whether optional, unofficial, whatever—of these language features which are exploited by nearly every real c program

                                                            Take a look at our PLDI paper from a few years back.

                                                      2. 1

                                                        (Random thought: maybe the huge caches on apple cpus are in part a way of compensating for the locality loss incurred by the lack of a moving gc?)

                                                      1. 4

                                                        Oh, so you’re the author of Vale. I had no idea that you were making it to build a game (engine to build a game) - what an incredible yak-shave. Vale has some very neat ideas, and seems to improve on Rust in some ways (e.g. better ergonomics around lifetimes) - props to you!

                                                        1. 1

                                                          Thank you for the kind words!

                                                        1. 5

                                                          To be fair, I feel like way too little projects are made with a certain project in mind, resulting in catch-all languages, frameworks, libraries that just add every feature they can think of and people using them doing things that might not be the best idea simply because it’s possible.

                                                          Of course that’s not anti-flexibility, but usually bigger projects have to make certain design decisions and later bending the whole project to also fit that other use case or have that feature that looks nice on paper or simply is “in fashion” these days tends to result in a mess after the next thing is hyped, causing lot of deprecated, unmaintained code people still have to support for some very specific use case that people in one way or another depend on, maybe without evening knowing.

                                                          So kudos for sticking with actually implementing a project. Certainly speaking for the language. :)

                                                          1. 5

                                                            Thanks =)

                                                            I originally made this language with roguelike games in mind, so I optimized for that. Ironically, I realized over the years that roguelike games are so broad in their requirements and needs, that I ended up accidentally also optimizing for the best general purpose language!

                                                            Still, it’s always been my dream to make a roguelike game in my own language, so now I’m just using roguelike games as a litmus test and convenient testing suite. At least, that’s what I tell myself. I really just wanted to make a roguelike game, I think.

                                                          1. 5

                                                            Of course “break” here is not meant literally. All of this works exactly as designed, and still upholds all safety guarantees.

                                                            It’s important to remember that Rust’s references aren’t mutable/immutable, but exclusive/shared. It is perfectly fine to mutate data via a shared reference, as long as something ensures it’s still safe (e.g mutex or atomic).

                                                            1. 4

                                                              Yeah I don’t like the title here much. The rules here aren’t being broken, they’re being enforced at runtime rather than compile time.

                                                              1. 1

                                                                I do with rust had true immutable more easily. I think it was intended to be mutable/imutable at some point but they changed their minds…

                                                                1. 2

                                                                  Rust has changed a lot before 1.0. Earliest versions were even supposed to have a per-thread GC!

                                                                  But current Rust has a tension between what its memory model really is, and what programmers expect it to be. The prime example is the mut keyword in bindings (let mut): it doesn’t do anything for the memory model. It’s a meaningless talisman. In Rust every owned object is always mutable, but people strongly dislike that, expecting immutable objects or immutable variables to exist. So Rust has a lint that weakly fakes immutable objects, and requires you to add and remove mut to pretend so (not to be confused with &mut which is important and strongly enforced)

                                                                2. 1

                                                                  That perspective makes a lot of sense. Is there somewhere I can read more about it?

                                                                  1. 1

                                                                    Rust calls it “interior mutability”, and there’s UnsafeCell type for making primitives that “break” the borrow checker. The fine details of actual exclusivity rules are still a work in progress, but Stacked Borrows is the current candidate.

                                                                1. 3

                                                                  This came as a surprise to me. While I knew we needed mutexes (and other good multi-threading practices) in Python, I also vaguely believed the GIL helped the user with their threading in some way. It turns out, it doesn’t!

                                                                  Also, I realize that there are a dozen different definitions of “data race”. I prefer the one in the Rustonomicon, but YMMV!

                                                                  1. 1

                                                                    I’ve wasted so much time trying to explain to people in the past just that (a big one being telling people that they shouldn’t be using collections.deque for communication across threads but rather queue.Queue and company). The GIL is an implementation detail, and is there only to make the interpreter threadsafe, not Python code itself.

                                                                    If it makes you feel any better, you’ve written a great example of why people still have to do proper concurrency control, and I’ll be saving it to send to people in the future.

                                                                  1. 2

                                                                    The article claims that Rust doesn’t have seamless structured concurrency. Rayon (and crossbeam’s scoped threads under the hood) solves most of this problem.

                                                                    We can use its trait implementations:

                                                                    use rayon::iter::*;

                                                                    Define some expensive functions:

                                                                    /// A number p is prime if iff: (p-1)! mod p = p-1
                                                                    /// (apparently this is called "Wilson's theorem")
                                                                    fn is_prime(p: u64) -> bool {
                                                                        factorial_mod_n(p - 1, p) == p - 1
                                                                    fn factorial_mod_n(k: u64, n: u64) -> u64 {
                                                                        (1..=k).reduce(|acc, item| (acc * item) % n).unwrap()

                                                                    And then pretty quickly max out every core on your computer finding primes.

                                                                    fn main() {
                                                                        // Find primes up to N.
                                                                        const MAX: u64 = 200000;
                                                                        // This works.
                                                                        let primes: Vec<u64> = (2..=MAX).into_par_iter().filter(|&n| is_prime(n)).collect();

                                                                    We can even reference data on the stack. This works because crossbeam’s “scoped threads” library enforces that the thread joins back into its parent thread before the reference becomes invalid.

                                                                        let my_cool_prime = 503;
                                                                        primes.as_slice().par_iter().for_each(|&prime| {
                                                                            if prime == my_cool_prime {
                                                                                println!("found my prime!");
                                                                    1. 3

                                                                      As mentioned in sidenote 10, one can only sometimes reference data on the stack in parent scopes, only things that happen to be Sync. One will often find that their data doesn’t implement Sync, because of some trait object buried within. In that situation, one has to refactor to make it Sync somehow.

                                                                      If one has to refactor existing code, it’s not seamless, IMO.

                                                                      1. 3

                                                                        Something not being Sync usually means it uses interior mutability without a mutex. It’s pretty hard to make something magically have a mutex in it; even if you can, it’s a 50/50 chance that it’ll be heavily contested and you’ll need to restructure it anyway. The other main time I’ve had something not be Sync is when it contains an OS resource or thread-local or such that is explicitly not allowed to be shared between threads, and you generally can’t fix that just by find-and-replacing it with Mutex<T>.

                                                                        Sometimes complicated things are just complicated, I guess.

                                                                        1. 1

                                                                          See the Rust example linked from the article (I think it was in a side note), something as simple as having a regular trait object somewhere in your data is enough to make something not Sync.

                                                                          1. 2

                                                                            Trait objects can be sync if you write the type as dyn Trait + Sync.

                                                                            This needs to be written explicitly, because otherwise the trait allows non-sync implementations, and you could have a data race.

                                                                            1. 1

                                                                              Yep, that’s precisely the issue that’s in the way of Rust’s seamless concurrency.

                                                                              1. 2

                                                                                Oh, come on. If it doesn’t accept buggy code it’s not seamless? Running of single-threaded code on multiple threads should not be “seamless”.

                                                                                1. 1

                                                                                  Excuse me? I didn’t say anything about accepting buggy code.

                                                                                  1. 2

                                                                                    You’ve complained that non-Sync (i.e. non-thread-safe) trait objects are prevented from being used in multi-threaded code. If they were allowed, that would be unsound, and “seamlessly” allow data race bugs.

                                                                                    Just to be clear: Rust trait objects can be compatible with multi-threaded code. You don’t need to eliminate them or work around them. They just come in two flavors: thread-safe and non-thread-safe. If you have a non-thread-safe trait object, then of course you can’t “seamlessly” use it, as that would be incorrect. But when you have dyn Trait + Send + Sync, it just works in both single-threaded and multi-threaded contexts.

                                                                                    1. 1

                                                                                      That’s a pretty big misinterpretation of my words. I’m happy to clarify what I’ve said.

                                                                                      I am not suggesting any changes to Rust to make safe seamless concurrency possible; I’m not sure that’s even possible, without introducing a lot of complexity to the language. I’m not saying it should change its rules to allow use of non-Sync/non-Send data in multi-threaded code. I’m not saying anything at all about changing Rust.

                                                                                      I’m simply stating that, because Rust requires refactoring existing code (for example deeply changing trait objects and other existing data to be Sync), it cannot do this seamless concurrency. If you’re really curious about the root cause of why Rust requires Sync and Send, and therefore why it can’t do seamless concurrency, let me know and I can enlighten. All I mean so far is that Rust cannot do this, because of its own decisions and limitations.

                                                                                      1. 3

                                                                                        But by this definition, languages that care about correctness of multi-threaded code can’t be “seamless”. Because otherwise you can always be in a situation where you’re dealing with code that either actually isn’t thread-safe, or incorrectly declares itself to be thread-unsafe, and you need to fix it instead of having compiler ignore the problem and let you run it anyway. From Rust’s perspective the refactoring you’re describing is a bug fix, because this situation would not happen in properly written code.

                                                                                        1. 1

                                                                                          That’s not true; the article shows an example where a language can safely have multiple threads concurrently reading the same data, without refactoring.

                                                                                          It also links to an example showing that it is not possible to do safely in Rust, without refactoring. Given that it is possible, but not in Rust, it indicates that the problem is with Rust here.

                                                                                          From Rust’s perspective the refactoring you’re describing is a bug fix, because this situation would not happen in properly written code.

                                                                                          Yes, if we tried this in Rust, it would be a bug in that Rust code, because Rust can’t safely express this pattern. That doesn’t mean it would be a bug in other languages.

                                                                                          1. 2

                                                                                            You’re disqualifying Rust for not being able to safely express this pattern? But you include OpenMP, which doesn’t even have these guarantees in the first place? How is that not favoring lack of safety checks as seamlessness?

                                                                                            1. 1

                                                                                              I see you’re bringing up OpenMP suddenly, when our conversation so far hasn’t been comparing C to Rust, it has been about safe languages.

                                                                                              Perhaps our conversation’s context didn’t make it clear enough, so I’ll be more specific: Rust cannot support seamless fearless structured concurrency, but the article shows that we can have seamless fearless structured concurrency in another language, such as Vale.

                                                                                              I’m sure you’re discussing in good faith, and not intentionally straw-manning what I’m saying, so I’m happy to keep clarifying. I always enjoy exploring what’s possible in language design with those who are honest and curious.

                                                                                              1. 1

                                                                                                I’m just contesting what can be called “seamless”, which is more of argument about semantics of the word, and tangential to the Vale language, so it’s probably not worth digging this further. Thank you for your responses.

                                                                            2. 2

                                                                              Yeah, it would be interesting to have trait objects follow the same rules as deriving traits, same way that new types are automatically Send+Sync+Drop when they can be unless you specify otherwise. ie, saying dyn Trait usually means dyn Trait+Send+Sync+Drop. It would probably be a breaking change to do that, so will probably never happen, but I wonder whether it’s one of those things would abruptly spiral out of control and make everything in the world break, or whether it would be a minor inconvenience to implement and then mostly just work everywhere.

                                                                      1. 2

                                                                        The opening sort of intertwines simple concurrent dispatch into the definition of “structured concurrency”. The latter isn’t about how easy it is to make things parallel, but instead having constructs which automatically wait for concurrently dispatched work to complete in a structured manner.

                                                                        1. 1

                                                                          Great point, I focused on the benefit but didn’t really include that kind of definition. Adding now, thanks!

                                                                        1. 2

                                                                          Nice read. Thanks!

                                                                          OT, but how did you get the notes in the sidebar? I like it a lot. I was wondering if you used a third-party library.

                                                                          1. 4

                                                                            Thanks! I hand-rolled that feature =)

                                                                            Basically, I slice the page into multiple “rows”, where the left side of the row is text, and the right side is a list of sidenotes. One can see it more clearly when appending ?nojs to the URL. The left side of each section is white, and the right side of each section is blue. It only looks like the page is two columns.

                                                                            It’s nice because it still looks good when javascript is disabled. But if javascript is available, then it goes in and better aligns the notes with their “anchors” in the actual text.

                                                                            There probably are libraries out there for it. I haven’t found any, but then again, I probably didn’t look too hard!

                                                                            1. 1


                                                                              I’m on about my 12th iteration of blogging, and currently I’m trying to let ghost do all of the heavy lifting while I just keep a server running and patched. It’s funny: I could get into the templates and hack around a bit, but it’d would be just as easy (or easier) to hand-roll something myself.

                                                                              It’s a nice look, cool idea, and I love the noscript fallback!

                                                                          1. 8

                                                                            If I was to argue why rust is not the future of software I would make the following claims:

                                                                            • It is a complex monolithic piece of software with no standard and only one implementation.
                                                                            • There is reason to believe that our improved understanding of type theory will let us leapfrog rust with better security guarantees and simpler descriptions of programs.
                                                                            • Speaking of the descriptions, rust has a complicated syntax, not as complicated as C++, but complicated enough that it is unlikely that someone happens to come up with the same one or a compatible one. In that sense I believe that “rust the syntax” is not the future of software either since I believe we will use nicer ways of writing type signatures.
                                                                            • “rust the compiler” is also not the future of software, it does too much in one step. The collection of tools approach in the C world may have just been to try and make C safe after the fact but I believe the future of programming is using the same language and syntax in an interactive, hot-reloadable, repl-style as you then compile with dependent programming and proofs of correctness. The trick is to evaluate the same data in different ways.

                                                                            However I think rust is a perfectly fine step on the way towards the future and a sufficient tool for this gap that C++ was not managing to serve sufficiently well.

                                                                            1. 2

                                                                              There is reason to believe that our improved understanding of type theory will let us leapfrog rust with better security guarantees and simpler descriptions of programs.

                                                                              Would love to read more on this!

                                                                              1. 3

                                                                                I guess the most convincing evidence is something like ATS:

                                                                                But it’s what all the smart people are competing to solve so it’s hard to believe no one will ever improve on rust.

                                                                                Actually I kinda hope that rust is going to end up in a role like java, monolithic enterprise language with solid platform. The FOSS world is better served by lisps imo (and I realize that this statement is all kinds of stupid but it makes sense to me).

                                                                                1. 1
                                                                                2. 1

                                                                                  I think this comment is a better articulated version of what the linked post was trying to achieve.

                                                                                1. 2

                                                                                  What is the idiomatic way to get data out of a thread in this model? Or to gather outputs from each iteration of a parallel foreach?

                                                                                  1. 2

                                                                                    If we produce a value from the body of the foreach, then they’ll be assembled into a list, that we can capture in a local:

                                                                                      results =
                                                                                        foreach i in range(0, 5) {
                                                                                          pow(i, exponent)

                                                                                    Part 2 will also explore using mutexes and channels; threads can all write to the same channel, so we can get data from them earlier.

                                                                                  1. 2

                                                                                    That’s probably the main downside of AOT-compiled languages: you have to decide to inline or not during compilation, not during run time, and you can’t inline after dynamic linking. And function calls are relatively expensive.

                                                                                    I’m curious: are there examples of JIT compilers that can also monomorphize at run time? Probably JITs for dynamic languages do almost that, but are there JITs for static languages that do monomorphization?

                                                                                    1. 3

                                                                                      JVM probably does it.

                                                                                      Dart and TypeScript pretend to have types at “compile” time, and then run in a dynamically-typed VM with hidden classes optimization.

                                                                                      1. 3

                                                                                        Note that inlining itself is, in some sense, an AOT specific concept. In a JIT, you don’t need to care about function boundaries at all, you can do a tracing JIT.

                                                                                        The TL;DR is that you observe a program at runtime and identify a runtime loop: a sequence of instructions that is repeatedly executed. You than compile this loop as a whole. The loop can span multiple source functions/runtime function calls. In each function, the loop includes only the hot path parts.

                                                                                        So, a powerful JIT can transparently tear through arbitrary many layers of dynamic dispatch, the code is fully monomorphized in terms of instruction sequence.

                                                                                        What a JIT can’t do transparently, without the help from source level semantics, is optimize the data layout. If a Point is heap allocated, and a bunch of Points is stored in a HashMap, the JIT can’t magically specialize the map to store the points inline. Layout of data in memory has to be fixed, as it must be compatible with non-optimized code. The exception here is that, when an object doesn’t escape, JIT might first stack-allocate it, and then apply scalar replacement of aggregates.

                                                                                        1. 1

                                                                                          The exception here is that, when an object doesn’t escape, JIT might first stack-allocate it, and then apply scalar replacement of aggregates.

                                                                                          JITs can be a bit more clever with escape analysis: they don’t have to prove that an object never escapes in order to deconstruct its parts, they just have to make sure that any deconstruction of an object is never visible to the outside world. In other words, one can deconstruct an object temporarily provided it’s reconstructed at exit points from the JIT compiled code.

                                                                                          1. 1

                                                                                            For dynamic language JIT compilers—e.g. LuaJIT—the compiler has to insert type check guards into the compiled code, right? And other invariant checks. How much do these guards typically cost?

                                                                                            I can imagine how a large runtime loop (100s of instructions) could place guards only at the entry point, leaving the bulk of the compiled section guard-free. I can also imagine eliding guards if the compiler can somehow prove a variable can only ever be one type. But for dynamic languages like Lua it could be too hard to perform meaningful global analysis in that way. If you have any insight I’d appreciate it, I’m just speculating.

                                                                                            1. 2

                                                                                              I am not really an expert, but here’s my understanding.

                                                                                              Language dynamism is orthogonal to AOT/JIT and inlining. JITs for dynamic languages do need more deoptimization guards. The guards themselves are pretty cheap: they are trivial predicated branches.

                                                                                              As usual, what kills you is not the code, it’s data layout in memory. In a static language, an object is a bunch of fields packed tightly in memory, in a dynamic language, a general object is some kind of hash-map. Optimizing those HashMap lookups to direct accesses via offset is where major performance gains/losses are.

                                                                                          2. 2

                                                                                            I believe C#’s CLR does this, it acts like Java at compile time but then monomorphizes generics at run time.

                                                                                            1. 1

                                                                                              .NET generics use monomorphization for value types and a shared instantiation for reference types. .NET Generics Under the Hood shows some of the implementation details.

                                                                                            2. 2

                                                                                              To alleviate this, I think Clang has a feature to profile a program and to use that information to guide the optimisation when you compile it again. This is used by Apple when building Clang itself.