Threads for mqudsi

  1. 13

    Nice article!

    To C programmers, this can sound like a downside of Rust - there are many places in real-world code where you do in fact statically know that your index is in-bounds

    It’s worth mentioning that Rust is able to elide bounds checks in a lot of cases: but that this is a product of the optimiser, not a static guarantee.

    1. 12

      This isn’t specific to Rust. LLVM is very good at this kind of elision, it’s part of the scalar evolution framework that is able to compute the range of possible values for a loop induction variable, which then lets the check know that it will pass for all possible inputs if the upper bound can be shown to be less than or equal to the length. The same benefit is visible with C++ bounds-checked types and with other safer front ends.

      That said, the folks at Kings College London who are working on CHERI Rust reported about a 10% speed up from a compiler mode that disables all dynamic memory-safety checks, geomean across the benchmarks that they ran, so I’m not sure I believe this except for codebases where the bounds are all very static. That’s probably the case for encryption libraries where basically everything except the input and output are fixed length, but encryption libraries are not even slightly representative of other code.

      That’s not to say that you shouldn’t use Rust. If you write correct C code then you will need to manually insert bounds checks in almost all of the same places and the performance delta will be much less (and probably favour Rust because the type system can communicate more information to optimisers).

      1. 3

        It’s a huge part of a a small but important problem I have rust: it relies extensively on llvm to make what it does possible. Without llvm’s “magic” there is no way that rust could compete with any other language, let alone C and C++ (in terms of performance). The compiler emits so much unoptimized, redundant IR that’s chock-full of branches and relies on llvm to figure out what can be stripped altogether. Unfortunately every time rust updates its llvm backend, there are regressions (and wins!) in terms of codegen. There are no guarantees in the way that llvm process IR.

        1. 4

          I think that’s unavoidable for any nontrivial language. Clang has the same problem, the difference is that LLVM developers are more likely to test clang and there are some vaguely decent C/C++ benchmark suites. Rust could significantly improve this by having a nightly build not that built and ran a load of Rust code with LLVM trunk and reported regressions.

          So far, I’ve seen very little engagement from the Rust compiler folks with LLVM upstream. There have been several threads on subject like pointer provenance that will directly impact Rust, with no on saying ‘I work on the Rust compiler and this is what would work for us’.

          1. 2

            Without llvm’s “magic” there is no way that rust could compete with any other language, let alone C and C++ (in terms of performance).

            AIUI without LLVM’s “magic” there is no way C and C++ would compete in terms of performance.

        2. 1

          When you say “a lot of cases”, which cases do you have in mind? I read about Rust’s ability to elide bounds checks but so far I’ve never actually seen it happen. I don’t often program in Rust though.

          1. 8

            one case is when you have two array indexing operations in the same function where one index is known to be at least as large as the other, for example: https://godbolt.org/z/Y3sbhe9YG (note the cmpq $4, %rsi followed by the jbe to the panicking branch only appears once despite two indexing ops). At the end of the day this is “just” a pretty simple local optimization as far as LLVM can be concerned

            1. 3

              Thank you. A Compiler Explorer example too - perfect!

              Update: I wonder how often that situation comes up in practice. Do you need to have both indices be hard coded? Rust defines integer overflow in release mode as twos-complement wrapping, so you can’t be sure that x+1 is at least as large as x.

              This has two bounds checks: https://godbolt.org/z/rbGWhMrEY

            2. 1

              I have not confirmed this but I am pretty confident that if you iterate over an array you don’t have bounds checks, because the iteration is already bound.

              EDIT: here’s how this happens for vectors, this is very hard to grok but basically when iterating over a vector the code doesn’t do bounds checking (because of the idea that we have already confirmed that the vector has N elements). So a lot of code that is “do thing over X collection” just completely removes these checks!

              I do think that a lot of this code is… like maybe Rust is expecting the compiler to do a lot of work to simplify a lot of this.

              1. 1

                Yes, using iterators is the only situation I’ve ever come across before in my own Rust programming where there are no bounds checks. However, I thought that was because Rust didn’t inject bounds checks in that situation, rather than because they were added but later elided.

          1. 4

            FWIW, to prevent the bug where a = b is slow for big types, Google’s C++ style guide used to mandate DISALLOW_COPY_AND_ASSIGN (which used to be DISALLOW_EVIL_CONSTRUCTORS I think) on all types (most types?)

            Instead you would do something like a = b.Clone() I believe

            Looks like that’s been gone for awhile in favor of C++ 11 stuff, which I don’t really like:

            https://google.github.io/styleguide/cppguide.html#Copyable_Movable_Types

            A lot of good software was written in that style, but it has grown bureaucratic over time, and as the C++ language evolved

            1. 6

              Indeed, C++ also makes it easy to create accidental copies, because it’s kind of the default. Relevant example:

              for (const auto accidental_copy : collection) {}
              

              Rust is like you describe with Google’s style guide: No implicit copying, but a clone trait you can implement and call. It’s one of those things that make the language a bit harder, but mostly because it makes mistakes harder.

              Even python seems to agree, in principle:

              > python -c 'import this' | grep Explicit
              Explicit is better than implicit.
              
              1. 8

                Rust’s plain-data types can opt in to implicit copies (derive Copy). But it’s really nice to have no copies as the default.

                Also ability to return a temporary read-only reference helps in cases like this, because you can give access to an element without copying it and without risk of the caller modifying it or being left with a dangling pointer.

                1. 2

                  You’re supposed to not derive Copy unless your type can be blitted (or just memcpy’d) rather than needing actual construction. I don’t know if implementing it when you’re not supposed to can break compiler assumptions (as I’ve observed Clone still gets called) but as a rule, it would avoid the worst of these slow copy cases.

                  1. 2

                    Yeah, Copy types are the exception that proves the rule. They aren’t a performance concern in the same way, because they can’t own a resource (because of that memcpy copyability: if it needs a destructor, it should need a custom clone function).

                    Because this exception is really a sensible distinction, you can still search the codebase for “clone” and be confident you aren’t copying resources, which is what mostly matters, and is more readable than C++, where you have to delete operators to see if they are being called, which is easier said than done for non-custom types.

                2. 4

                  C++ got here via a very complex path. First, C had structures and structure assignment was defined as memcpy. Removing that would break compatibility with any C headers that contained functions or macros that do structure assignment. Then C++ decided that struct and class were equivalent except for the default visibility, which meant that the rule applied for any C++ object as well as any C structure. But some C++ classes needed to do non-trivial things (not just memcpy) on copy, and so added copy constructors and copy assignment.

                  Smalltalk-derived languages assumed garbage collection and so required all objects to be allocated on the heap, but C++ did not want to do this (and couldn’t for C compatibility) and so came with the additional constraint that you had to be able to copy an object into space reserved by the caller of the copy operation (this includes copying objects into globals and onto the stack). This meant that you coudn’t implement something like clone(), where the return type depends on subclass relationship, you had to model these things as separate allocation and initialisation.

                  This is further complicated by the fact that returning a structure is, in C++, modelled as copying the structure to the caller. For a long time, this copy has been optional (compilers may elide it - if you declare a local structure then return it then it will actually be created in the space that the caller passes down the stack). This was fun because copy constructors can have side effects and so it was observable in the abstract machine whether the compiler did this optimisation. C++17 made this mandatory, but this meant that any object that you wanted to return as an on-stack value needed to have a copy constructor.

                  So now we’re in a state where you need copy constructors and you need them to exist by default and where deleting them breaks some language features. Now C++11 came along with auto types. C++ has syntactic modifiers to say ‘pointer-to-X and 'reference-to-X but nothing (outside of exciting type_traits things) to say bare-type-that-this-reference-thing-applies-to. If auto deduces T, then you can modify it to a reference by adding &, but if auto deduces T& then it’s hard to modify it back to T. This meant that auto had to default to deducing T, which meant that it was very easy to accidentally copy things. Compilers try to warn about this.

                  Are also some interesting corner cases. You can make copy constructors explicit (so they won’t be invoked without explicit syntax saying that you meant to call them) but you can’t do the same for copy assignment. Herb’s new syntax cleans up a lot of this but the Rust approach (which doesn’t have any of the legacy baggage) is much better than the current state of C++.

                  1. 1

                    I actually am glad that auto usually gives T and not &T in these contexts - imagine how hard it would be to tell at a glance if a copy was being made or not (even if avoiding the copy requires an extra symbol in the soup).

                  2. 3

                    Yeah I definitely like opt-in better than opt-out, especially since code is generated! Too bad Go doesn’t have a mechanism for either one.

                  3. 1

                    It is really hard to create a type in Go where copies like this are slow, because almost all primitive Go types have reference semantics. In practice the only way is to define types containing arrays (not slices) of substantial size. I haven’t seen a lot of Go code that has e.g. type x struct with buf [10000]uintptr or whatever.

                    1. 2

                      Yeah that is an interesting difference, but I wouldn’t say “really hard” – since the OP had the problem, and someone else pointed to a comment where Russ Cox fixed the same problem in the Go standard library for a 40 byte struct, which was presumably all primitives


                      Digression …. I think contrasting the languages is interesting:

                      1. Rust: opt in to a = b being a copy for value types
                      2. C++: opt out of a = b being a copy for value types
                      3. Go: a = b is always a copy for value types

                      Thinking about it a bit more, I think it’s a misfeature to reuse the same syntax for 2 different things

                      1. a = b is a pointer assignment – O(1) with respect to the size of the type
                      2. a = b is a value assignment – O(N) with respect to the size of the type

                      e.g. in my head I have been making a big list of things where “syntax doesn’t match semantics”, and it hadn’t occurred to me that this is one

                      I get your caveat that Go and C++/Rust structs aren’t equivalent though

                      1. 1

                        in my head I have been making a big list of things where “syntax doesn’t match semantics”,

                        I’m interested: what else is on that list?

                        1. 1

                          A perf bug I have fixed repeatedly in other people’s Python code is:

                          if item in L:  # linear search through list
                            pass
                          
                          if item in D:  # dict or set membership
                            pass
                          

                          If this is in a loop, then the first one is O(N^2), whereas the second is O(N), and I’ve seen that make scripts go from a 5 minute runtime to a second or so..

                          So that is very similar to a = b, the syntax is too close for operations that mean different things. Most people just look at the syntax and don’t “visualize” what’s going on underneath, which is very different.


                          A really trivial one that I remember someone having an extremely hard time with: static means 3 or 4 different things in C and C++:

                          static void f();  
                          static int x;
                          

                          A Java programmer simply couldn’t wrap his head around the fact that the annotation on f() means “local to this translation unit” / not exposed to the linker. It’s a gratuitous “pun” because they didn’t want to reserve a new keyword, which is extremely confusing.


                          A lot of these syntactic conflicts are about shell. For example it took me a long time to realize that quoting means different things in different contexts in shell!

                          echo $x
                          echo "$x"
                          echo hi > $x
                          echo hi > "$x"
                          a=$x
                          a="$x"
                          local a=$x
                          local a="$x"
                          

                          I never wrote a blog post about that, but most shell experts won’t be able to tell you the difference between them. Most of all because common shells don’t even agree on what they mean :)


                          Another example: People don’t know how to use the find command because its syntax doesn’t match what they are used to for expression semantics: https://www.oilshell.org/blog/2021/04/find-test.html

                          More:

                          https://github.com/oilshell/oil/wiki/Shell-WTFs

                          https://github.com/oilshell/oil/wiki/Language-Design-Principles

                  1. 1

                    My first and actually only reason for switching (from zsh) to fish way back when was that my terminal started up significantly faster with fish as my default shell rather than with zsh.

                    I’ve become a core fish maintainer since then, and while I cannot speak to zsh, we’re always profiling fish startup and have made massive gains since when I first adopted fish for the same reason.

                    So if you’re on the market for a faster shell, give fish a try!

                      1. 4

                        To accept these terms, you must be enrolled to make regular payments through any of the payment platforms pages listed above, in amounts qualifying you for a tier that includes a “patron license” or otherwise identifies a license under these terms as a reward.

                        How long does one have to be a patron? Indefinitely?

                        1. 2

                          Yes. Your license terminates if you stop being a patron.

                        2. 2

                          It does not seem to have been written or edited by a lawyer. The language is rather weak and I would be extremely hesitant to use this for anything of value.

                          1. 3

                            It was written by a lawyer who specialises in software licensing, @kemitchell.

                            1. 3

                              It does not seem to have been written or edited by a lawyer.

                              Made my day.

                              The language is rather weak and I would be extremely hesitant to use this for anything of value.

                              What strikes you as weak about the language?

                              1. 1

                                most free software legalese is excessive in it’s verbosity.

                            1. 2

                              I wish they would slow down a bit more or focus more on a longer lasting LTS. We’re still on .NET Core 3.1. Fire and motion I guess.

                              1. 9

                                I believe .NET is now on a yearly release cycle, with LTS (three years) releases every two years. .NET Core 3.1 support ends in five weeks, so you should strongly consider moving to .NET 6 (the current LTS release).

                                As for the pace, I think they’ve hit a good balance. .NET Core was definitely a tumultuous period, but the future of .NET looks good! I’m glad we now have a dependable release cycle.

                                I don’t think I’d want them to support LTS versions for more than three years. That gives you a full calander year of support to upgrade to the latest LTS release, so it should be reasonable to keep up with releases.

                                1. 4

                                  Compared to previous upgrades, .NET 3.1 to 5 or 6 is fairly painless on the “how much do you have to rewrite for the sake of rewriting” axis but there are new features and better ways of doing things available. The bigger issue is that some APIs behave mildly to radically differently beneath the hood in ways that can either not make even the smallest difference to you/your app or can be huge surprises when you discover them in production. YMMV, of course.

                                1. 6

                                  This is the tweet that I believe prompted the blog post. https://twitter.com/antirez/status/1587581541022142464

                                  I was one of the voices saying that I wish people had some other go to datastructure than linked lists for learning a new language.

                                  Mainly because I love Rust (especially the borrow checker), and I hate the idea people get turned off the language due to ownership rules messing with their first attempt coding something.

                                  1. 8

                                    Now I’m wondering how many C books and tutorials include buggy linked lists with behavior worse than O(n) append.

                                    There are some pretty bad C books, and even good ones do have errors. https://wozniak.ca/blog/2018/06/25/1/index.html

                                    1. 3

                                      Thanks for sharing that link. I fear I have permanently lost some coding iq from stepping through the (mentally corrected) version of the code sample from that page.

                                    2. 7

                                      messing with their first attempt coding something.

                                      It’s totally fine if Rust is not a good choice for somebody’s first language. C++ is a horrible first language, as are several others.

                                      1. 2

                                        Yeah. I meant “coding something in Rust”.

                                        Wheter Rust is a good first language, I don’t know. I think it could be, because there’s plenty of stuff you can do without being even close to “fight the borrow checker”.

                                        Maybe it comes down to whether teaching pass by reference vs value is something that’s OK to learn early or not.

                                      2. 3

                                        The borrow checker vs linked lists conflict is so unfortunate. I wonder if borrow checking hasn’t been done before, because every time a language researcher considered such design they’ve thought “it won’t even work on linked lists, obviously a dead-end idea”.

                                        1. 6

                                          I wonder if borrow checking hasn’t been done before, because every time a language researcher considered such design they’ve thought “it won’t even work on linked lists, obviously a dead-end idea”.

                                          I think the historical record (read: breadcrumbs through the literature) suggests otherwise; there’s been continual progress on this kind of thing stretching back 30 years (some original type theory work in the early 90’s, some work on applying it to systems programming in the context of Cyclone in the early 00’s, various research-grade projects, and Rust first showed up on the scene in 2010 but it takes a while for a language to become mature). I think the reason it hasn’t been done before is because it wasn’t actually trivial figuring out how, and it took a while to get there.

                                          1. 4

                                            I wonder if borrow checking hasn’t been done before, because every time a language researcher considered such design they’ve thought “it won’t even work on linked lists, obviously a dead-end idea”

                                            This makes me wonder if a prerequisite for Rust’s creation was a critical mass of people who all hold the opinion “eh, linked lists suck anyway, no big deal”.

                                            1. 10

                                              prerequisite for Rust’s creation was a critical mass of people who all hold the opinion “eh, linked lists suck anyway, no big deal”.

                                              I don’t know why this is how people think about rust. For me, as a low-level dev that does tons of crazy stuff that isn’t necessarily borrowck-friendly, I just figured “free safety checks when/where possible” then back to unsafe when I can’t. It’s not like you lose anything compared to before, you just don’t get to take advantage of the safety guarantees the language has to offer as much as you would like to at all times.

                                              (I also don’t end up reaching for unsafe as often as I thought I would.)

                                              1. 13

                                                The Rust community is pretty big now, and opinions on unsafe vary a lot. Some people write really twisted or inefficient code just to avoid unsafe {}. I guess it depends whether you see Rust as safer C, or faster Python.

                                                1. 1

                                                  I’ll bite my tongue here and refrain from saying anything other than I agree with you.

                                                2. 3

                                                  It’s not like you lose anything compared to before,

                                                  Ah, this is a very good way to look at it

                                            2. 2

                                              Anyone know what Rust blog post antirez was reading?

                                            1. 6

                                              One of the only times linked lists are “undoubtedly the right option” is if you’re writing some sort of intrusive data structure to avoid an allocation by reusing caller stack from another location. Not sure how many run into that often.

                                              Also in an osdev context, where you have drivers/filters/etc that chain operations to the next installed one, etc. and you want the control flow to be left to each filter/driver/etc in turn rather than making the decision top-down.

                                              1. 5

                                                Linked lists are great for stack or queue data structures. They have O(1) insert at the start and ends. Traversal is painful for caches but inserting at one end and removing from either that end or the other end are very fast and does not require allocation. There are a lot of situations where those are useful properties. They also allow constant-time insertion into the middle.

                                                Doubly linked lists allow constant-time insertion removal from middle without traversal. Array-like data structures are linear complexity for this. Again, this is a useful property for some use cases.

                                                Linked lists are really bad for cache usage (and TLB pressure) for traversal. If your algorithm requires iterating over the list anywhere near a hot path, they’re probably the wrong data structure.

                                                1. 1

                                                  The problem with the use cases you listed can sometimes be the 8-16 bytes of extra memory per item are rather inefficient. For that, you often just use linked list of arrays to amortize the pointer size across multiple items.

                                                  1. 1

                                                    You only safe that overhead if the objects are inline in the arrays, but then removal is more expensive because you either defragment the array with copies (cost scales with object size) or you have more expensive search for the next element (lots of data dependent branches to hurt prediction).

                                                    Lists of arrays are great for things like text, where you want fast iteration and fairly cheap insertion. There, though, the goal isn’t to amortise the cost of the pointer (though that’s a nice benefit), it’s to improve cache locality on traversal. If the list elements are cache-line aligned then you can also vectorise most per-character operations with the widest vector unit available and get good performance. Even in this use case though, you often want random access and so use a richer indexing structure than a list.

                                                2. 4

                                                  Could you give a code example of that usecase? I’m a bit dense today.

                                                  1. 9

                                                    Not being dense at all; it’s not something most people use outside of low-level dev. Instead of having a list that exists independent of the object(s) and storing the objects in the list, the list is incorporated into the structure of the object (not very OOP!). The clever part is that these objects may be pre-existing in very different contexts/locations (such as the stacks of callers in different threads) and so you now have a way to navigate between all these objects without having dynamically allocated them anywhere.

                                                    It’s also used as an esoteric concept for writing performant low-level multithreading primitives or frameworks. Thread A wants to acquire a user-space mutex. Instead of the mutex containing a vector of some struct { thread_id, timeout, etc } and need to dynamically allocate, reallocate, etc all that each time a new thread is waiting on the mutex, Thread A reserves space for that structure (with an additional next pointer in it now) on its stack inside the mutex.wait() function but before the actual blocking wait is carried out internally. The mutex just has a single pointer current_waiter that’s either null or the first thread to block waiting for the mutex. Thread A either sets current_waiter to point to the area it reserved on its own stack or traverses the what-we-now-call-intrusive linked list to add itself at the end (no allocations needed).

                                                    Also, intrusive data structures let you do crazy things like store the same object in multiple lists/trees at once without needing to refcount.

                                                  2. 2

                                                    I do this quite often. It’s basically a stack-allocated linked list where nodes reside in different frames of the call stack. Here is one example: https://github.com/build2/libbutl/blob/master/libbutl/diagnostics.hxx#L284

                                                    Another common case where I tend to use a list instead of a vector is when I need iterator and/or node stability.

                                                    1. 1

                                                      (Disclaimer: 30 years systems programming, still going strong…)

                                                      “One of the only times.. intrusive data structure.. stack re-use”

                                                      I don’t necessarily agree with this framing. I’d prefer to say, you can make extreme gains by using a linked list, for as long as you well-order your data according to performance requirements .. whereby the stack is hardly relevant as anything but pointer angels pointing to a haven of text segments.

                                                    1. 11

                                                      I find it funny RST (restructured text) is never mentioned in these debates.

                                                      1. 10

                                                        On the contrary, the only time I ever see it mentioned is precisely in these debates.

                                                        1. 4

                                                          Rst has its problems (terrible headers, no nested inline markup) but the extensibility you get is just wonderful.

                                                          1. 2

                                                            Yes, glad someone mentioned the headers.

                                                            A lot of python stuff still uses RST.

                                                            1. 1

                                                              I don’t really care about extensibility if it means every time I want an in-line code block with part or all of it linked to another document I need to write my own role. Not supporting nested in-line markup is just brain-dead.

                                                            2. 3

                                                              It was a more robust and better designed option, retaining the essential same mindset as markdown. It is unfortunate that markdown won the popularity contest. But marketing and hype dictated the outcome.

                                                              1. 8

                                                                But marketing and hype dictated the outcome.

                                                                It’s funny, but OG Markdown was just a dude with a then popular blog and a whole mess of Perl: https://daringfireball.net/projects/markdown/

                                                                Markdown is a text-to-HTML conversion tool for web writers.

                                                                The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.

                                                                (my emphasis)

                                                                The list of filters (from https://daringfireball.net/projects/markdown/syntax#philosophy), links from the original:

                                                                Gruber never intended MD to become the be-all and end-all of authoring. In fact I think that’s why he didn’t let CommonMark use the Markdown name.

                                                                1. 2

                                                                  Gruber never intended MD to become the be-all and end-all of authoring. In fact I think that’s why he didn’t let CommonMark use the Markdown name.

                                                                  Yes, also because he didn’t want Markdown to be a single standard. Which is why he has no problems with “GitHub-flavored” markdown and didn’t want CommonMark using the markdown name.

                                                              2. 3

                                                                RST is mentioned as an inspiration to MD, see my comment down below https://lobste.rs/s/zwugbc/why_is_markdown_popular#c_o1ibid

                                                                1. 1

                                                                  I recently fell down a bit of a Sphinx rabbit hole that got me into RST after years of Markdown and org. I really really appreciate how easy they make it to add your own plugins that can manipulate the parsed tree. That project is temporarily on the shelf but I’m hoping to get back into it when the snow falls more.

                                                                1. 0

                                                                  Do we allow paywalled articles now?

                                                                  1. 2

                                                                    I use a regex redirect extension to rewrite medium URLs to scribe.rip, the alternative viewer-respecting fronted to Medium.

                                                                    Here you go: https://tomaszs2.scribe.rip/how-rust-1-64-became-10-20-faster-on-windows-3a8bb5e81d70

                                                                    That said, if you already know what PGO is, there’s no point in reading the article.

                                                                    1. 1

                                                                      My apologies, I didn’t realize there was a paywall… loads just fine for me.

                                                                      1. 1

                                                                        Medium and its A/B tested monetization is fickle and part of the reason nobody should use it for publishing content.

                                                                    1. 3

                                                                      Writing a debugger as a process parasite is an interesting exercise, on top of how tiny the distance it is between such a tool and an implant. I took this to quite some lengths a while back, and something that I still use on a daily basis: https://arcan-fe.com/2020/02/10/leveraging-the-display-server-to-improve-debugging/

                                                                      1. 3

                                                                        That is truly awesome and refreshing to see from a holistic “everything really meshes together” approach. I have to try Arcan.

                                                                      1. 1

                                                                        I saw this before, it looks for some predefined log message formats and turns them into structured data, which surprise surprise, compresses much better. Honestly it seems like you could accomplish the same thing with logstash, fluentbit, etc. by defining a parser and turning messages into json. Any moderately intelligent backend should be able to compress the data that looks largely the same pretty effectively.

                                                                        1. 1

                                                                          The columnar storage aspect also contributes greatly to the increased compressibility.

                                                                        1. 3

                                                                          Wow! It seems to me you’re really painting yourself into a corner with your requirements, specifically this one:

                                                                          A semaphore can be created/declared with an “initially available concurrency” and a “maximum possible concurrency,” which may differ (indeed, “initially available concurrency” is often zero),

                                                                          There may be more sophisticated variants, but by default, a semaphore is little more than a thread-safe integer with a claim/release API. The “available concurrency” of a semaphore is what you declare it to be upon instantiation, and by default isn’t mutable. If you drop the mutability requirement – which I think is appropriate – is the problem more tractable?

                                                                          1. 1

                                                                            Only partially. The “available concurrency” is necessarily always mutable, no? It’s true that the max concurrency is optional and makes some things harder (like all of release) but that value is immutable (or never observably mutated). Only the available concurrency count is mutated, as it has to be to track remaining (unreturned) borrows.

                                                                            pthread semaphores are indeed more basic; I modeled the features of this semaphore after the one in .NET which lets you limit the maximum.

                                                                            1. 1

                                                                              The “available concurrency” is necessarily always mutable, no? It’s true that the max concurrency is optional and makes some things harder (like all of release)

                                                                              Ah, sorry! I mis-wrote:

                                                                              The available concurrency maximum concurrency of a semaphore is what you declare it to be . . .

                                                                              Apologies.

                                                                              pthread semaphores are indeed more basic; I modeled the features of this semaphore after the one in .NET which lets you limit the maximum.

                                                                              I understand a semaphore to be a primitive that enforces a (well-defined) maximum level of concurrency for something. Given that definition, a semaphore with a dynamic capacity doesn’t seem to be a semaphore at all. Say you take a claim when S is at capacity 100, and, while you hold that claim, the capacity of S reduces to 10. So S claims it’s enforcing a limit of 10, but it isn’t! There are as many as 100 outstanding claims, and there’s no way to know when those claims will be returned.

                                                                              You can address those problems by layering on complexity — you can play some games with claim timeouts, or by reducing the guarantees provided by the type, or maybe some other stuff. But I don’t think that’s a semaphore any more, I think it’s actually an ersatz worker pool (without the workers).

                                                                              1. 2

                                                                                Thanks for clearing that up! (I also meant to link to the Windows semaphores which is a better overview rather than .NET semaphores, but they’re one and the same really. They unlock more features than Posix semaphores but aren’t actually too different - a pthread semaphore is a Win32 semaphore (or the semaphore demonstrated in the article) with a hard-coded max count of SEM_MAX_VALUE.)

                                                                                The maximum concurrency even for the semaphores in TFA is actually still hard-defined and unchanged, even if it can be (temporarily) artificially reduced - and this is actually no different than even a pthreads semaphore. To use unix lingo, when a caller calls sem_wait() you have no guarantee that they’ll ever call sem_post(). I’m not sure if you consider sem_init(10) to create a semaphore with a maximum of 10 or with a maximum of SEM_MAX_VALUE, which of course makes a difference to how everything is interpreted.

                                                                                But the issue you mention

                                                                                So S claims it’s enforcing a limit of 10, but it isn’t! There are as many as 100 outstanding claims, and there’s no way to know when those claims will be returned.

                                                                                Is actually exactly what I address in the article, but affects only the first prototype Semaphore modeled in the post. In the final prototype and the actual library, I solve this via some additional bookkeeping without any timeouts, etc by adding a separate variable to track the total number of concurrency tokens (both available and issued) incremented when a free-standing call to release() is made but not incremented when a release() comes after/is paired with a wait()/acquire(), something the rust typesystem enables me to do and enforces for me.

                                                                                But I don’t think that’s a semaphore any more, I think it’s actually an ersatz worker pool (without the workers).

                                                                                I don’t necessarily disagree - this is what makes Win32 semaphores so much more powerful than their unixy counterparts (though they both just call themselves “semaphore”) but the semaphore that I demonstrate takes that utility a step further by providing additional safety guarantees that make it possible to use this “enhanced semaphore” type without the footguns.

                                                                                1. 1

                                                                                  But the issue you mention . . . Is actually exactly what I address in the article

                                                                                  I saw that, but the picture isn’t quite complete in my head. (Probably because I lack sufficient Rust knowledge!) If the maximum concurrency is held fixed, then I know a priori the limit which is enforced. But if I can change that limit over time, how do I know what limit is currently in effect?

                                                                                  1. 1

                                                                                    The immutable maximum concurrency a semaphore was initialized with is always the limit that’s in effect as far as any individual caller is concerned.

                                                                                    The discussion about artificially further reducing concurrency is kind of a red herring since it’s really just a special case of “you never know at any given time before calling wait() how much currently available concurrency is still left (not “what is the max concurrency”) in the semaphore to borrow” because any other number of threads/stacks could have called wait() and not yet called release() before you.

                                                                                    In normal cases, you have n threads always calling wait() and once available concurrency is given to them, they spend variable time t bounded by (0,∞) in the restricted-concurrency scope doing some work before calling release() and returning their concurrency token to the semaphore pool, making it available for another thread.

                                                                                    Artificially reducing concurrency (sem.wait().forget()) is functionally equal to doing just that but as t → ∞. Remember that we also have a free-standing try_release() so just like at any time some thread can call sem.wait().forget() to reduce the concurrency, any thread can also call sem.try_release() to increase the concurrency effectively turning a previous sem.wait().forget() (from the same or any other thread) into sem.wait(); ...; sem.release().

                                                                                    No matter how many threads call sem.wait().forget(), the available count will never drop below zero (by definition). Likewise, no matter how many threads call sem.release() or sem.try_release() before or after calling sem.wait(), the available count will never exceed max_count.

                                                                                    To clarify further, since there’s no concept of ownership this makes calls to release() completely fungible no matter if if thread x called release() or if it was thread y that did. This means there’s no difference between these two cases:

                                                                                    fn t1(sem: &Semaphore) {
                                                                                        sem.wait();
                                                                                        std::thread::sleep_ms(rand(0, u32::MAX));
                                                                                        sem.release();
                                                                                    }
                                                                                    
                                                                                    fn t3(sem: &Semaphore) {
                                                                                        std::thread::sleep_ms(1000);
                                                                                        sem.wait();
                                                                                        std::thread::sleep_ms(rand(0, u32::MAX));
                                                                                        sem.release();
                                                                                    }
                                                                                    

                                                                                    where one thread obtains a semaphore first and holds it for some variable amount of time before returning it, and another thread later tries to obtain that same semaphore, and the following:

                                                                                    fn t1(sem: &Semaphore) {
                                                                                        sem.wait().forget();
                                                                                    }
                                                                                    
                                                                                    fn t2(sem: &Semaphore) {
                                                                                        std::thread::sleep_ms(rand(1500, 3000));
                                                                                        sem.try_release();
                                                                                    }
                                                                                    
                                                                                    fn t3(sem: &Semaphore) {
                                                                                        std::thread::sleep_ms(1000);
                                                                                        sem.wait();
                                                                                        std::thread::sleep_ms(rand(0, u32::MAX));
                                                                                        sem.release();
                                                                                    }
                                                                                    

                                                                                    At all times, regardless of how many calls to sem.wait().forget() or sem.try_release() are made, the available concurrency is still bounded by [0, sem.max_count]. The minimum is always 0, the maximum is always the immutable max_count. Any number of threads/stacks could have called wait() and not yet called release() before t3 tries to obtain the semaphore or any number of threads/stacks could have called wait().forget() while any number of threads may have called try_release() successfully before (or while) t3 tries to obtain the semaphore.

                                                                                    1. 1

                                                                                      The immutable maximum concurrency a semaphore was initialized with is always the limit that’s in effect as far as any individual caller is concerned.

                                                                                      There are two classes of callers for a semaphore: the “users” who claim and release it, and the “owners”who use it to assert (or interrogate) constraints on other resources. The first class of callers doesn’t care at all about this limit, their API is just claim and release. I’m speaking about the other class of callers, who need to know how many concurrent actors are presently allowed to access the resource. Concretely: what method returns the current effective concurrency limit?

                                                                                      1. 1

                                                                                        What I’m trying to say is that there is no “current effective concurrency limit” that in any meaningful way differs from “currently available concurrency.” The value of sem.max_count is always the hard limit and the value of sem.count is the currently available concurrency (for your “owners” or your “users” alike). The method to retrieve that wasn’t part of the public api, but I’ve just corrected that.

                                                                                        1. 1

                                                                                          OK, so this matches my understanding, but it seems to conflict with the earlier statement that

                                                                                          A semaphore can be created/declared with an “initially available concurrency” and a “maximum possible concurrency,” which may differ (indeed, “initially available concurrency” is often zero)

                                                                                          “Available concurrency” is the difference between the number of actors currently using the semaphore, and the capacity limit of the semaphore. At creation, the number of actors using the semaphore is gonna be zero, like, always. If it’s not zero, that means that the semaphore has loaned out flags to things that aren’t actually interacting with the resource it’s supposed to be guarding, I think? If so, that’s a problem, right? Because it’s my understanding that maximum concurrency minus available concurrency is meant to be equal to the number of actors currently interacting with the guarded resource. Right? No? Is my understanding incorrect?

                                                                                          1. 1

                                                                                            It depends on what you are modeling with the semaphore/what is marshalling access to.

                                                                                            If the resource being modeled is a resource that exists independent of your code, then you’ll create a semaphore with initial == count (e.g. semaphore represents the number of physical cores). But there are many cases (as described in TFA, e.g. rate limiting tokens, earned resources, etc) where your code dynamically creates instances of the object (up to max_count) and uses the semaphore to marshal access to these instances, either before or while the instances are still being created. In these cases, the assumption in your previous post no longer holds.

                                                                                            So this semaphore lets you represent the traditional case by setting max and initial to the same value or handle much trickier and more advanced resource marshalling by letting them differ, in which case you need to reframe the context in which you understand the semaphore.

                                                                                            1. 1

                                                                                              It depends on what you are modeling with the semaphore/what is marshalling access to.

                                                                                              I don’t understand why this would matter. A semaphore is an abstraction with no concrete relation to the resource it guards.

                                                                                              But there are many cases (as described in TFA, e.g. rate limiting tokens, earned resources, etc) where your code dynamically creates instances of the object (up to max_count) and uses the semaphore to marshal access to these instances, either before or while the instances are still being created. In these cases, the assumption in your previous post no longer holds.

                                                                                              I see. I’ve always understood a semaphore to be a very simple primitive, always of fixed capacity, and without capabilities like you describe here. It seems to me like this use case is better modeled by a worker or resource pool. But, who knows!

                                                                          1. 10

                                                                            When presented with additional responsibility/task x there are two possible scenarios that can play out: /u/cetera seems to focus only on the first one where you take on more work than you can carry (implicit read: that someone else can carry instead) and you should think twice or say no; but the other scenario is if you don’t do it no one else will in which case maybe you carrying more than you can handle (and dropping the ball some because it IS a lot) is still globally better (at least for everyone else) than you not doing it and no one else doing it.

                                                                            I’m not talking about fancy $work where the company has to make sure tasks get done and hire personnel as needed, but more big picture. Maybe it’s somewhere you volunteer. Maybe it is for work but it’s a small company that genuinely can’t afford to hire someone else. Maybe it’s just among friends. Maybe it’s your startup and the job has to get done one way or the other. You know, life.

                                                                            1. 3

                                                                              I agree, but there’s only so long you can over burden yourself before burning out. IMO of you find yourself regularly having too much work, the first step is to find a way to shed some of the load; if the company really can’t afford anyone, look for a new job; if it’s your startup, burn your money instead of your time; if it’s for life hire someone to do the work.

                                                                              If you don’t, you’ll still end up dropping the work, but you won’t get a choice of what to drop, you’ll burn out all at once.

                                                                              1. 4

                                                                                I absolutely agree but wanted to point out that in real life there are tradeoffs involved and a cost to it that goes beyond “just say no.” It’s really very easy to sit there and recommend “don’t take on more than you can chew” and I’m sure there’s no one that will disagree with that academically speaking, but the real world isn’t so cut and dry.

                                                                            1. 2

                                                                              If I’m understanding correctly, this requires you to be using Tailscale already, right? I.e. the client-proxy connection is secured only when they are both peers in a Tailscale network.

                                                                              1. 4

                                                                                Yes, at least on the client-end. The server can be on the open internet and not part of a Tailscale network:

                                                                                As the name suggests, this proxy sits between your Postgres client and your cloud-hosted database. It only accepts connections from clients over Tailscale, by using our tsnet library.

                                                                                This is, of course, pure marketing. There is no vulnerability in making this pgproxy also listen (TLS or no TLS) on localhost and via unix domain sockets so that you can deploy it on the same machine your app server is deployed to and use it to enforce proper TLS to the remote postgres instance. That would also reduce latency as compared to connecting from your app server to pgproxy over Tailscale and then connecting from there to your postgres instance, but then they wouldn’t make any money off of it.

                                                                                1. 10

                                                                                  The more charitable, and true outlook: pgproxy is something we wrote for our own internal use, and then subsequently open-sourced. It uses Tailscale because that’s what fit best into our production infrastructure.

                                                                                  I’ll also note that simply making this listen on localhost doesn’t solve the problem in a satisfying way, because in doing that you can no longer have an allowlist on the server side that is restricted to a single entity that you know is doing TLS correctly. Instead, you’re relying on every client (including desktop and CLI clients run by humans) to remember to connect to the localhost proxy, and not to the server directly. Unfortunately the latter is the most obvious thing that people would do if they were trying to connect to the database.

                                                                                  To fix that, you need to move the proxy away from the clients somehow, so that you can wield the single tool most hosted DBs offer (IP allowlists) in a way that prevents this security regression. And once you’ve moved the proxy off the client’s machine, you’re right back to the problem of securing the client<>proxy datapath, with the constraint that you cannot rely on the postgres client to get it right. The simplest way to achieve that, is a VPN. And yes, we built this tool to work over Tailscale, since that’s naturally the VPN we’re using.

                                                                                  With all that said, the code’s open source, so you’re more than welcome to take the couple hundred lines that implement the proxying logic, and stick whatever TCP listener and method of securing the client connection on it. Should be less than an hour of work once you’ve figured out the transport security bit.

                                                                                  1. 2

                                                                                    To address the problem from another angle: the assumption is you’re using a hosted database, so you can’t just put the DB on Tailscale too, right? Seems like in cases where you control the DB server that would be the simplest choice.

                                                                                    An issue I ran into recently was that I wanted to run Postgres on Fly.io and connect to it from AWS Lambda, but I couldn’t figure out how to get Lambda to connect to the Fly.io WireGuard network, so I had to give up and use RDS instead. It seems like it should have been a trivial problem if there was just “Let’s Encrypt Postgres” for servers and clients.

                                                                                    1. 1

                                                                                      It uses Tailscale because that’s what fit best into our production infrastructure.

                                                                                      That’s fair, but the marketing spin on this being the only secure way to use postgres isn’t.

                                                                                      [..] because in doing that you can no longer have an allowlist on the server side that is restricted to a single entity that you know is doing TLS correctly.

                                                                                      There’s no risk to the server that only accepts connections TLS connections from clients that don’t validate the TLS certificates (this would be akin to claiming the existing of curl -k makes my HTTPS server insecure). Configuring your server so it only accepts connections from a single address associated with the pgproxy instance that only accepts connections over Tailscale is at best an attempt at encouraging best practices by clients and for the safety of the clients but doesn’t affect the server.

                                                                                      The simplest way to achieve that, is a VPN.

                                                                                      I would argue that this means people should really just look into putting their postgres nodes on a VPN and only listening for connections on the VPN subnet (whether that’s Tailscale or otherwise), leaving pgproxy aside altogether. (Which would also be in your benefit, anyway.)

                                                                                      1. 9

                                                                                        the marketing spin on this being the only secure way to use postgres isn’t.

                                                                                        That’s certainly not what I intended when I wrote the article. I intended as “here’s a problem you can encounter in some system architectures, and here’s a thing we made to help with that.” As you say, this is definitely not the only way of doing it, and it has a bunch of tradeoffs that may or may not match what someone else needs. In my defense, the time from blank page to publishing this article was maybe 1h, so I may have gotten the tone and message wrong in places.

                                                                                        There’s no risk to the server that only accepts connections TLS connections from clients

                                                                                        This is technically true, but also misses the more important point: what you’re trying to keep safe is the data stored within the database server, not the clients or server per se. Allowing clients to connect insecurely compromises that security, even if both endpoints are completely happy doing so.

                                                                                        Yes, you can achieve a secure transport in a number of ways, but the downside of most strategies is that it relies on all clients being correctly configured all the time, and if you forget one or someone spins something up in a default configuration, you lose. That’s a very brittle setup that’s ripe for regressions, in my experience.

                                                                                        I would argue that this means people should really just look into putting their postgres nodes on a VPN

                                                                                        The article opens with exactly this, yes :). There’s a realpolitik thing going on here: yes, in the abstract ideologically pure system architecture, the database would be hosted somewhere that can only be accessed securely in the first place. However, as the world moves towards more hosted as-a-service type things, that line is becoming increasingly hard to hold, and in a lot of realistic deployments the question becomes “okay, how do we secure this architecture”, rather than preventing it from happening.

                                                                                        That’s what happened to us, incidentally: we had a database running directly on Tailscale for some business analytics stuff, but then the need for fancier analytical capability, combined with lack of time to do a good job of self-hosting, took us to a TimescaleDB hosted database - and so we ended up having to secure a DB connection with an internet leg.

                                                                                1. 3

                                                                                  This is the one and only font I’ve ever purchased for personal use. I use it in all my coding and terminal environments and love every second of it.

                                                                                  For those not wanting to shell out $ for a font, Fantasque Sans Mono is relatively similar.

                                                                                  1. 4

                                                                                    There’s also Comic Mono.

                                                                                    I kinda like Fantasque too.

                                                                                    If you squint, Cascadia Code is closer to Comic Sans than people might be comfortable with…

                                                                                    1. 3

                                                                                      If you squint, Cascadia Code is closer to Comic Sans than people might be comfortable with…

                                                                                      You’re not wrong and no squinting is needed. I can’t believe the same division behind “Consolas” was behind “Cascadia Code” (although, come to think of it, it’s probably not the same division at all!).

                                                                                      1. 1

                                                                                        Yeah, pretty sure it’s intended to be an update to “FIXEDSYS”, but it definitely has Comic Sans energy. It’s okay in the terminal, but I can’t look at it in a text editor without cringing.

                                                                                    2. 2

                                                                                      Can I ask why you feel that way? What do you love about it? Just its “friendly” feel or actual, demonstrable typographic qualities like readability, kerning, aesthetics, etc?

                                                                                      (Because I wouldn’t download this even if it were free.)

                                                                                      1. 2

                                                                                        To me there’s something about it that’s easier on my eyes and brain…I can’t exactly explain it. AFAIK I’m not dyslexic, but to me it feels like there’s a “softening” of cognitive load when reading it.

                                                                                        Clearly this is personal and I can see how some would see it as hot garbage :-)

                                                                                        1. 2

                                                                                          Iosevka Comfy seems to have the same feel you’re talking about without going overboard Comic Sans. I don’t really like it, myself, but lots of people do.

                                                                                      2. 1

                                                                                        I’m curious if this has the same reading benefits for dyslectic people that Comic Sans does. And whether it is captures the typographic abominations of Comic Sans or whether it’s actually closer to Chalkboard.

                                                                                        1. 1

                                                                                          Just from looking at the samples, (at least in the heavier font styles) I think it’s closer to a Comic Sans MS than the other “kid-friendly” handwriting fonts.

                                                                                      1. 4

                                                                                        While a strictly binary semaphore (max concurrency == 1) can guarantee that there will never be multiple writers accessing a memory region, there’s not much theoretical benefit to such a binary semaphore over a mutex – in fact, they’re interchangeable

                                                                                        Not as the terms are normally used, though I admit that there’s some ambiguity. The key difference in common parlance is that a mutex must be released by the same thread that acquired it. A binary semaphore does not have this restriction. This means that it is more flexible (you can acquire the semaphore, then run a pipeline of operations on some data, with each stage on a separate thread, which you can’t with a mutex). In contrast, the fact that a mutex has a notion of an owning thread lets it be extended in several useful ways:

                                                                                        • Error-checking mutexes, which abort if unlocked from the wrong thread. These are probably not useful in Rust, where I’d expect mutex ownership to be tracked by creating a mutex owner object (they’re not useful in modern C++ either, where lock_guard and unique_lock give the same abstractions).
                                                                                        • Recursive mutexes, where your goal is to protect against concurrency but not reentrancy. These can be locked multiple times by the same thread and are unlocked only when that thread has unlocked the same number of times that it locked.
                                                                                        • Priority-propagating mutexes, where a high-priority thread that tries to lock a mutex can loan its priority to a low-priority thread that currently owns it (without this, it’s easy to build priority inversion, where a low-priority thread is starved by a medium-priority thread and so doesn’t make the progress required to allow a high-priority thread to run).

                                                                                        Looking at the implementation in the article, there’s a lot of subtlety hidden in the standard library. Consider the following sequence:

                                                                                        1. Thread A sees the value 0.
                                                                                        2. Thread B increments from 0 to 1.
                                                                                        3. Thread B sends a wake event.
                                                                                        4. Thread A waits.

                                                                                        This is probably handled by AutoResetEvent, but this is generally something that’s very difficult to build an abstraction over because you have to either deal with spurious wakes or ABA problems (C++20 made the decision to force programmers to deal with ABA problems, the underlying OS APIs force programmers to deal with spurious wakes. I’ve no idea why C++20 made the choice it did, spurious wakes are generally much easier to deal with and the’ve made the standard library deal with this in a way that must introduce ABA problems). AutoResetEvent has two states and a limited transtion state, so it’s fine as a primitive, but I’m not convinced that this is safe in the presence of more complex orderings as used here. Normally, I’d expect a semaphore to have some reasoning about why it can’t lose wakes.

                                                                                        I don’t understand why they want a try-release. I’d expect the acquire to hand back a mut object and release to consume one of these. If the returned thing is sendable then you can pass it between threads but only something that has acquired a semaphore can release it. This is what the semaphore abstraction is meant to do: it’s a bucket of semaphore flags that let you take one if there is one and put one back if you have one. It doesn’t couple the flag to the identity of the person who takes it, you can pass it around however you want, but only flags that were previously removed from the bucket can be put back. It looks as if this is what they do in the next step, but I don’t understand the digression around the edge. You get there directly if you think about the physical system that it’s trying to model.

                                                                                        1. 2

                                                                                          [A semaphore] doesn’t couple [a taken flag] to the identity of the person who takes it, you can pass it around however you want, but only flags that were previously removed from the bucket can be put back.

                                                                                          I’ve never seen a semaphore that uniquely identifies the taken and returned things. Am I missing out? 😅

                                                                                          (I’ve always seen take as an operation, with neither parameters nor return values, that mutates internal state in one direction; and put as an operation that mutates the same state in the opposite direction.)

                                                                                          1. 1

                                                                                            I’ve never seen one because I’ve only seen semaphores used in languages like C that lack linear types and you need linear types for this to be usefully implementable (if the returned thing can be copied then you may as well use an integer). The original abstraction is intended to model a bucket of flags (a synchronisation mechanism used for mutual exclusion on railways) though and so the physical system that it’s modelling does have this property.

                                                                                            If I were implementing a semaphore in a new language, I would start by thinking about how I map a model of the physical system into that language, rather than how I would approximate the API that another language’s implementation has. In a language with linear types, you can do better.

                                                                                          2. 2

                                                                                            Thanks for taking the time to reply. You’re right about the possible differences between a semaphore and a mutex, I skipped past those finer points to avoid getting lost in the woods. However, do note that in practice the most commonly used mutexes (from pthreads) don’t actually store owner information and actually function closer to a binary semaphore (down to allowing being unlocked from another thread) - there’s a small demo in TFA that demonstrates that (at least on Linux), pthread mutexes don’t actually track thread ownership (though you’re supposed to treat them as if they do), allow a different thread to (successfully) call unlock() then lock() even while another thread supposedly has the mutex locked – and the spec specifically allows for ownership information to be elided from the implementation, in favor of a faster, slimmer mutex (at least with the default pthread mutex attributes, when not specifically requesting a recursive mutex).

                                                                                            AutoResetEvent does indeed take care of synchronizing the cache/memory between the releasing and acquiring cores (and simultaneously masks spurious wakes), but as I realized in discussing this on r/rust, the event-bypassing optimization in case the semaphore has available concurrency means that as things currently stand, it’s possible to acquire the semaphore without the memory barriers, which is something I have to either fix or document (which it is depends on whether a semaphore is primarily used to limit concurrency or to synchronize views - personally, I would use a AutoResetEvent or ManualResetEvent to synchronize/signal when handling shared work, but doesn’t mean much).

                                                                                            Regarding try_release() and why release() doesn’t take the concurrency object returned from wait() (or acquire(), if you prefer): as mentioned, the design motivation there is that you can create a semaphore where initial concurrency is less than maximum concurrency (which, as was mentioned by u/peterbourgon, is somewhat of a self-imposed design constraint, though it does unlock additional features and functionality as described in the post). If the initial concurrency is, say, zero, you still need a way to at least initially increment that (up to the max) and you can’t if release() consumes a token only produced by acquire().

                                                                                            I think the explanation for this and for the rest of the divergences you are asking about all stem from “which” semaphore we are talking about: there are many and they all go by the same name. This library draws its inspiration primarily from Win32 semaphores which are quite different (and more directly useful) than pthread semaphores, except for the footguns which the semaphore I demonstrate tries to solve aided and abetted by RAII and rust’s type system.

                                                                                            (Also note that a posix semaphore is just like a Windows semaphore (or a TFA semaphore) with an initial count of x and a hard-coded maximum count of SEM_VALUE_MAX instead of user-provided.)

                                                                                            1. 1

                                                                                              However, do note that in practice the most commonly used mutexes (from pthreads) don’t actually store owner information

                                                                                              POSIX error checking mutexes and recursive mutexes both do perform this check. I’ve not checked the Linux implementations (they glibc, musl, and bionic all do it slightly differently), but on FreeBSD and Darwin (at least), the owner is always tracked so that you can debug deadlocks. I’d be surprised if it were not the case on Linux (if the mutex is not an error-checking murex, the same-thread unlock is not enforced).

                                                                                              The man page that you cite is not the spec, this is the spec. It specifically says that it is UB to u lock from a different thread for normal cases and will raise an error in other cases (robust or error checking).

                                                                                              This doesn’t just permit eliding the owner information, it permits eliding atomic operations that would be required for some simultaneous updates. I don’t think that’s useful with modern weak memory models, but on Alpha you could implement a mutex that would be fast but potentially end up in an undefined state if you locked it on one core and unlocked it on another very rapidly.

                                                                                              AutoResetEvent does indeed take care of synchronizing the cache/memory between the releasing and acquiring cores (and simultaneously masks spurious wakes

                                                                                              Missed wakes are the things that worry me. These things are usually implemented on top of futex or similar abstractions (Windows, FreeBSD, and XMU all have slightly different variants of the same idea). In the simple case, these things acquire a mutex (logically one per memory location, though practically often one per group of memory locations, since it never acquires two at once, so that it can bound the space overhead in the kernel). It then does an atomic compare and sleeps if the compare returns true. The wake acquires the same lock and wakes at least one of the sleeping threads, then releases the lock.

                                                                                              Building the AutoResetEvent abstraction on top of a futex requires you to implement something like a counting semaphore so that you can ensure that two wakes alway trigger two waits to complete. This hides all of the difficult parts of implementing a semaphore: you are effectively building a semaphore on top of something that is built on top of a semaphore. A semaphore implemented on top of the low-level OS primitives typically needs to implement a sleepers count as well if it wants to avoid either lost wakes or thundering herds.

                                                                                              It would be interesting to see a cache-efficient Rust implementation of a spin-wait semaphore. This requires keeping a linked list of sleepers so that you are not generating cache-coherency traffic except for wakes. Your implementation pints this to a standard library, which punts it to the kernel. I don’t believe that this is possible to implement in safe Rust because it requires a linked list where every node is aliased between two threads.

                                                                                              Regarding try_release() and why release() doesn’t take the concurrency object returned from wait() (or acquire(), if you prefer): as mentioned, the design motivation there is that you can create a semaphore where initial concurrency is less than maximum concurrency

                                                                                              This is also possible if you provide a create-flag API. If you want a capability-secure version of this, then this should take a token that is returned when the semaphore is created, so only code that is authorised to create new flags can do so.

                                                                                              I think the explanation for this and for the rest of the divergences you are asking about all stem from “which” semaphore we are talking about

                                                                                              Unless otherwise specified, I would assume any discussion of semaphores is referring to Dijkstra’s paper.

                                                                                              1. 1

                                                                                                I’d be surprised if it were not the case on Linux (if the mutex is not an error-checking murex, the same-thread unlock is not enforced).

                                                                                                I guess I should clarify that I am specifically speaking about normal, default-attribute-initialized pthread mutexes.

                                                                                                It specifically says that it is UB to unlock from a different thread for normal cases

                                                                                                Thanks for correcting the link to the spec but that’s what I mean, though. It’s not mandated that the error be reported and just leaves it as UB with the burden on the caller to avoid rather than the implementation to catch and error out on in the case of a normal mutex, resulting in what is functionally a binary semaphore rather than what CS books would describe as a mutex, just like I could make my binary semaphore a normal mutex by saying it’s UB to call release() from a thread other than the one that most recently called wait(). But this is really just pedantry at that point and for the purposes of the article, I think calling a binary semaphore equivalent to a normal mutex is fine so long as no one stares too hard.

                                                                                                Missed wakes are the things that worry me […] Building the AutoResetEvent abstraction on top of a futex requires you to implement something like a counting semaphore so that you can ensure that two wakes alway trigger two waits to complete.

                                                                                                I agree that missed wakes are the hardest thing when implementing events or semaphores. In this case, all that is taken care of by the semaphore’s AutoResetEvent. The AutoResetEvent is intended to be simultaneously free of both spurious and missed wakes and has stress tests (including to try and ensure that two wakes always wake two separate threads) run under miri (though more can always be added!) to try and find any violations.

                                                                                                Your implementation pints this to a standard library, which punts it to the kernel.

                                                                                                My library actually uses as an underlying futex the rust port of the WTF Parking Lot from WebKit rather than delegating anything to the standard library. This uses a linked list of waiters (with std::cell::Cell of raw ptrs) in unsafe rust, and uses spin-waits where it makes sense (though I’m not sure if it’s optimally cache friendly or only part of the way there) and explicitly uses the keyed event/futex/address wait/whatever OS primitive when it needs to park/sleep a waiting thread rather than delegate anything to the rust standard library.

                                                                                          1. 15

                                                                                            I’m uneasy about this, given the well-known downsides to forking. SQLite is already pretty extensible through UDFs, virtual tables, virtual filesystems. It’s also not the most hacking-friendly codebase, being written in dense, somewhat archaic C.

                                                                                            1. 12

                                                                                              Forking is basically the core point of open source, people don’t do it enough.

                                                                                              1. 4

                                                                                                No, it just creates more confusion and headaches. Imagine if we had 15 different versions of nginx or mysql or whatever else runs this world of ours. Every time you fork the complexity to understand it all multiplies.

                                                                                                1. 5

                                                                                                  Why would you ever release software with a free license that allows people to make their own version if you don’t want them to make their own version? Just because there aren’t 15 websites for nginx or mysql or whatever doesn’t mean there aren’t tens of thousands functional forks out there in the wild in private use. Making a public fork is a practical act, the same way that making a private one is. If you really don’t want people to do that, don’t do open source.

                                                                                                  1. 5

                                                                                                    We do have those exactly now. They’re just rebranded as ‘AWS Aurora’, ‘CloudSQL’, ‘TimescaleDB’, and so on.

                                                                                                    1. 2

                                                                                                      Only because version control software is inadequate. Competitors should be able to agree on things to ease the complexity burden without needing to break things out into seperate libraries.

                                                                                                      1. 7

                                                                                                        I don’t understand. A fork is typically done over disagreement on the direction of the project. That is a social issue and no amount of tooling can fix that.

                                                                                                        See mysql vs. mariadb or OpenOffice.org vs LibreOffice or ownCloud vs. NextCloud etc. etc.

                                                                                                        1. 8

                                                                                                          A fork is typically done over disagreement on the direction of the project.

                                                                                                          This is a problem. People have got a bad taste for “forking” because most famous forks were like this. Yet it doesn’t have to be this way. Ubuntu, Mint, and all the other Debian forks are generally not hostile to Debian. Conversations has a large cloud of friendly forks that add their own spin or functionality. It doesn’t have to be about a fight.

                                                                                                          1. 4

                                                                                                            A disagreement doesn’t have to be complete. There will still be some things they agree on, otherwise it wouldn’t be a fork, it would be a new project from scratch.

                                                                                                            1. 1

                                                                                                              E.g. some projects mostly keep patchsets on top of upstream projects around. Is that the kind of thing you think better VCS could help with?

                                                                                                              1. 1

                                                                                                                I have really a lot to say on the topic, the entry point is datalisp.is.

                                                                                                                But the short answer to your question is “yes” - it could also help discover and stabilize common core functionalities (like these “$language; only the good parts” books or linters do to shrink the api of programming languages with footguns). Really it is about making a better web.

                                                                                                      2. 1

                                                                                                        I politely disagree. The core points of open source are freedom to redistribute and clear attribution/derivation.

                                                                                                        1. 3

                                                                                                          derivation

                                                                                                          Also known as “forking”…

                                                                                                          1. 3

                                                                                                            | Also known as “forking”…

                                                                                                            Or linking, or any number of things, one of which is forking. So yeah I don’t agree that forking is “the core point” of open source.

                                                                                                      3. 11

                                                                                                        What are the downsides? Splitting the contributor pool?

                                                                                                        If the main project doesn’t take contributions, then that’s not an issue … the contributor pool is already split

                                                                                                        Either way, I think some amount of forking is healthy, even if the original project is awesome. You can call it “parallel research”

                                                                                                        I don’t know about this case, but I think, in most cases, it’s not wasting anyone’s time

                                                                                                        Though one that confused me was the ffmpeg fork, I remember I had to choose between 2 things to install once, and I didn’t understand the difference

                                                                                                        1. 3

                                                                                                          No, the biggest downside is if this gets popular and splits the user base and then we have different feature sets available depending on which “SQLite” you are targeting and nothing is portable any more.

                                                                                                          1. 10

                                                                                                            That’s also true if they write their own thing that is not a fork. A fork of sqlite is not sqlite and should not be expected to be. It is its own thing

                                                                                                            1. 5

                                                                                                              Eh the whole point of forking is to add features! They list 3 at the end.

                                                                                                          2. 4

                                                                                                            I agree. I will probably never use this, however if they pilot replication hooks and drh picks up some version of the idea, I’ll be happy. Even though SQLite is written in an archaic, opinionated dialect of C, drh is not at all opposed to picking up modern ideas and paradigms.

                                                                                                            Native WASM UDFs are stupid though. Every SQLite library that I know of allows defining UDFs in the native language, since all you have to do is hand a C function pointer and user data pointer to the SQLite library. And anything that wraps SQLite obviously can already interface with C.

                                                                                                            In other words, with the current UDF interface, you can already trivially extend SQLite to run WASM UDFs with any runtime of your choice.

                                                                                                          1. 2

                                                                                                            When I ditched macOS many years ago, one of the things I missed most dearly was the ability to switch backwards and forwards between windows of the same app (cmd + tilde/backtick). I ended up writing a daemon that brings that functionality to Windows, you might also like it: https://neosmart.net/blog/2022/easy-window-switcher-1-3-0-released/

                                                                                                            1. 1

                                                                                                              That seems pretty useful! Here’s me trying to reproduce that in AHK:

                                                                                                              !`::
                                                                                                              WinGet, CurrExe, ProcessName, A
                                                                                                              WinActivateBottom % "ahk_exe " . CurrExe
                                                                                                              return
                                                                                                              

                                                                                                              Unfortunately it only goes one way, due to limitations on how WinActivate works. I might be able to get around that by doing something fancy with WinGet,, List but this seems alright for a quick spike.

                                                                                                              1. 2

                                                                                                                Nice and short! I’m assuming it works across processes? Does that exclude minimized windows from consideration (or can it be modified to do that)? I had to do that to mimic macOS behavior. Also, getting it to work with virtual desktops (not calling up windows of the same app from another VD) required an update when Windows 10 added that functionality out of the box.

                                                                                                            1. 9

                                                                                                              A couple of months ago there was a small discussion on r/rust about why the standard library doesn’t include a semaphore and I mentioned that it’s a deceivingly difficult synchronization primitive to safely model in a rusty way. I ended up nerd-sniping myself into trying anyway (docs.rs link), and decided to share a writeup with some of the issues I ran into trying to come up with a safe and no/low-cost rusty interface for a semaphore. It ended up being a great example of some of the things I love about the rust ecosystem (though it did also reveal some of the weaknesses of the rust ro^rw borrow semantics) in terms of the thought and care it takes to make an api that’s resistant to misuse but still (hopefully) ergonomic.

                                                                                                              1. 3

                                                                                                                I was thinking the other day — wouldn’t it be neat if databases did more WASM? Like, what if we could encode constraints with WASM? Then you could write constraint check functions in pure Rust, compile them to a WASM blob, and send a blob over?

                                                                                                                #[sqlite::constraint]
                                                                                                                fn check_is_ipv4(data: &[u8]) -> Result<(), ConstraintError> {
                                                                                                                    Ipv4Addr::try_from(data)?;
                                                                                                                    Ok(())
                                                                                                                }
                                                                                                                

                                                                                                                I would be incredibly excited if this was possible. And one of their goals is: We would like to allow for WASM-based user-defined functions. So definitely keep an eye on this one!

                                                                                                                1. 8

                                                                                                                  I don’t know if it’s part of the standard (I assume it is) but SQLite and Postgres support constraints with generic expressions, so presumably that could include calling a UDF.

                                                                                                                  I don’t know why you’d need WASM additionally?

                                                                                                                  1. 5

                                                                                                                    Wasm would allow you to write the UDF in a language of your choice. SQL is not a very good language for many users cases. And writing then in C had attendant security issues.

                                                                                                                    1. 9

                                                                                                                      I write my SQLite UDFs in Go. Any language that can call C can write a UDF for Postgres and SQLite AFAIK.

                                                                                                                      1. 4

                                                                                                                        Technically, correct, the best kind. But that is still a fairly high barrier that Wasm can remove for you. Many languages including Go do not make integrating with C very easy. Wasm had a much better and safer story here.

                                                                                                                        1. 8

                                                                                                                          Many languages including Go do not make integrating with C very easy.

                                                                                                                          I can’t tell if this pedantically correct or not. In truth, to someone that’s studied and compared FFI in most of the popular languages, I feel your comment should read

                                                                                                                          Unlike most languages, go makes it very difficult to integrate with C.

                                                                                                                          1. 3

                                                                                                                            I’ll certainly cop to soft pedalling that one. Go made it much harder than many other languages. But I think the point still stands. C FFI can be tricky to get in the best of languages. Wasm with clearly defined API boundaries significantly lowers the effort to integrate as well as increasing the security of that integration.

                                                                                                                  2. 3

                                                                                                                    One technical problem with this is that WASM runtimes are not generic / interchangeable, so you’d have to make some decisions, for example:

                                                                                                                    1. do you ship your own runtime/VM? (what if you want to run sqlite within wasm, would you have a nested VM?)
                                                                                                                    2. when you send that WASM blob (the module) to sqlite do you run it with some standardised API in the module imports? what can the blob access?
                                                                                                                    3. is this gaining you anything? Probably yes if you run untrusted code since it’s a sandbox, but not if you just want to ship your own stuff. You can already run UDFs by passing function pointers to the sqlite API so your own native code would win every time
                                                                                                                    4. Continuing from (2), there’s no standard way to specify bindings in web assembly. You’d have to support this by either rolling your own setup for your API or binding yourself to wasmtime’s wit-bindgen which is wasmtime specific and unstable as an interface
                                                                                                                    1. 2

                                                                                                                      Yeah, it seems like the easiest thing is to just keep the existing ABI/API, and just let embedding applications choose what runtime they want, using the same or a tweaked API as needed. I don’t see how specifying the wasm boundary in SQLite would be easier here?

                                                                                                                      1. 1

                                                                                                                        But you’re not sending the runtime, you’re sending the.. AST or something, which you can add versioning to? Is wasm API versioned?

                                                                                                                        1. 4

                                                                                                                          There’s no wasm api for interacting with runtimes! it’s implementation specific. That’s what I meant by “WASM runtimes are not generic / interchangeable”

                                                                                                                          1. 1

                                                                                                                            Yeah, i got it from the other comments about FFI

                                                                                                                      2. 2

                                                                                                                        Honestly, this makes more sense as a data type than as a constraint. It’s not great most databases make defining your own types/domains annoying or impossible.