Threads for honeyryderchuck

    1. 2

      I am looking forward to a practical example of this concept in use.

      I’m using LISTEN/NOTIFY in a couple of simple setups where I don’t even have a queue (but then risk losing messages/jobs, both if workers are down and if there are too many jobs, there aren’t in these specific simple setups) - it would be interesting to see a production setup, with some numbers on maximum sizes of queues, worker processes etc.

      I have seen situations where a job queue in PostgreSQL (oooold TheSchwartz-setup) started to get unwieldy when the number of jobs exceeded a couple of million jobs, where RabbitMQ has been starting to struggle only at 100x times that - I’m sure specific matters in both cases, though :-)

      1. 3

        The first time I heard about this approach was Sage Griffin mentioning they designed crates.io’s queuing system like this.

        1. 1

          Good tip! https://github.com/rust-lang/crates.io/blob/main/src/background_jobs.rs

          I see sage did the original implementation too.

      2. 3

        FWIW, I use it LISTEN/NOTIFY for writing updated static files to disk when the info in the database changes:

        https://sive.rs/shc

        (In this case, blog comments write HTML so when someone leaves a comment, the static HTML cache of those comments is written, and viewers just see the static cache.)

      3. 3

        I’m currently trialling good_job for $work as a replacement for sidekiq (it would let us drop redis, and one less dependency is always tempting, and avoids a race condition where jobs being started before the transaction that enqueues them commits).

        So far I haven’t run into any trouble, and my dive through the source code didn’t turn up anything terrifying, but I’d want to wait another month or so before firmly recommending it.

        1. 4

          Dont enqueue directly to redis in your transaction, see https://microservices.io/patterns/data/transactional-outbox.html

          It also works for enqueueing tasks.

          1. 2

            I mean, sure you could adjust your existing software to do this (senior dev will knock it out in a few weeks work, if you count monitoring / documenting what to do if it falls behind / setting up CI etc for the new process).

            Or, you could pick a toolset that eliminates the cost / ops work of an extra runtime dependency and lets you lean on the tools you’re already using.

        2. 2

          Read https://honeyryderchuck.gitlab.io/2023/04/29/introducing-tobox.html for more details.

          good_job does not use SKIP LOCKED, btw; it rather uses connection level advisory locks to achieve exclusive access, which among other drawbacks, limits some options, such as enabling transaction level connection poolers. That and being rails only make it a no go for me.

          1. 1

            Obviously if you’re not using rails then a rails-only tool is not going to help you.

            Well aware of the tradeoffs to be made between explicit locks and advisory; in particular, the tradeoffs good_job makes become very expensive if you have more than 100 or so worker processes.

      4. 1

        django-mailer is an outgoing mail queue that stores mail in the DB and uses SELECT FOR UPDATE as a locking mechanism to ensure one mail gets sent once. I’ve now also added a runmailer_pg command that uses NOTIFY/LISTEN to enable it to send mail on the queue immediately, which has fixed the major downside of using a queue for mail (i.e. it wasn’t sending mail immediately).

        Source code for reference: https://github.com/pinax/django-mailer/blob/master/src/mailer/postgres.py

        1. 1

          If you flush successful jobs and move failed jobs to a dead letter queue, it should be plenty performant to poll the queue table on 1s ticks, or even faster if necessary. I like NOTIFY/LISTEN for general-purpose systems with heterogeneous queues, but for something like django-mailer, it might make sense to keep using tried and true polling.

        2. 1

          This reminds me of an incident many years ago when MySociety were running the petitions website for 10 Downing Street, and they had a petition that went mega-viral, over a million signatures. They had to send out an official email response after the petition closed.

          Originally the petitions website kept its mail queue in a table in MySQL, but it was not able to deliver a million messages in a reasonable timescale. I think the main problem was lack of parallelism (though the table scanning might have been accidentally quadratic as well).

          One of my friends was working for MySociety at the time, and I was postmaster for Cambridge University and an Exim expert, so I helped out with a bit of advice. Which boiled down to dumping the messages into Exim (local, therefore fast) and let it do the mail delivery in parallel. Much more effective use of the network!

    2. 2

      Use gitlab CI

    3. 6

      Javascript avoids this by essentially making everything async and non-blocking. Python & Rust have to juggle async/non-async separately.

      So when you combine goroutines and fully integrated non-blocking I/O that’s when you get strong multicore performance and a platform that can “cheaply” handle a large number of network connections while still avoiding “callback hell” or the “function coloring” problem. It’s not everyone’s desired tradeoff. They’ve made C interop more expensive, and calls that can’t be made non-blocking have to be done in a threadpool (just like node.js and DNS resolution).

      I don’t think it’s fair to say Go “avoided” the function colouring problem. They just put different syntactic sugar on “making everything async and non-blocking”.

      Gor Nishanov’s Fibers under the magnifying glass is basically showing how Go’s decision to use stackful coroutines was tried in the 90s and abandoned by everyone else because they couldn’t figure out how to make it perform “better enough” to justify the negative effects on in-process FFI.

      Rust and Python made the trade-off they did because their raison d’etre is to interop well with C, either to enable partial rewrites of C codebases or to act as a glue language.

      Heck, fasterthanlime’s “Lies we tell ourselves to keep using Golang” has a whole section named Go is an island talking about how much Go’s developers have doubled down on “you should rewrite the universe in Go for easy cross-compilation and deployment” and “the only good boundary with Go is a network boundary”, yet Rewrite it in Rust is the one everyone memes because Rust is the language that excites those with little project management experience.

      1. 12

        My co-workers make faces of horror and disgust when I tell them I want to import a package that uses CGo to wrap a C library. It’s like I’ve confessed to heresy.

        Did you know someone went to the insane effort of writing a C-to-Go transpiler, so they could have a SQLite binding that doesn’t use CGo?

        1. 5

          Ouch. Given how much of SQLite’s value proposition is the literal avionics-grade testing they apply to the C version to verify how it interacts with all sorts of OS edge-cases and hardware faults, I wouldn’t want to rely too heavily on how differing semantics of the Go execution model might undermine that.

          1. 2

            Execution is faithfully preserved, it has to be if the transpiler will be coherent. The issue, if there is one, is performance.

            1. 5

              This is exactly backwards – anybody who has written a transpiler/compiler will tell you that :)

              It depends on the source language, target language, and the runtime interface

              If someone says, “I wrote a transpiler that’s correct, and therefore the translated sqlite library behaves exactly the same”, then I have a bridge to sell you

              The only way have confidence of correctness is to actually run the sqlite tests against the Go version (and maybe they did that)

              In some cases that may be difficult as the tests run in-process, not in a separate process

              1. 2

                […] to actually run the sqlite tests against the Go version (and maybe they did that)

                In some cases that may be difficult as the tests run in-process, not in a separate process

                Also, most of SQLite’s tests are proprietary, including the “avionics-grade” ones.

            2. 2

              Execution is faithfully preserved under normal circumstances. The test suite for the original C version of SQLite simulates all sorts of abnormal cases where the abstractions presented by the programming language and its runtime are leaky.

              (eg. The “Executive Summary” section of the How SQLite Is Tested page includes bullet points such as “Out-of-memory tests”, “I/O error tests”, and “Crash and power loss tests”… all of which could have semantics that vary subtly but significantly between the C and Go runtimes.)

              I’m reminded of how… Bryan Cantrill, I believe… talked about how achieving compatibility with the subtle nuances of the intersection of Linux’s… vfork and SIGCHLD semantics, if I remember correctly, for the purposes of being able to run Linux binaries was a terrible experience.

              1. 4

                You do remember correctly! Details are described in: lx vfork and signal handling still racey. While this was a bug in our emulation, it was surprising the degree that Go relied upon the murky semantics at the intersection of these two leaky abstractions…

                1. 2

                  Oh, hey! While you’re here, I really wanted to thank you for this:

                  And it’d be easy to be like “Well, that’s just B-Trees! …not AVL trees.” Yeah, but… the reason I could use a B-Tree and not an AVL tree, is because of that composability of Rust.

                  A B-Tree in C is gnarly. A B-Tree is just… I mean talk about intrusive. It’s VERY intrusive. It’s very kind of up in everything’s underwear. And someone, of course, on the Internet’s like “Well, you should go to use this B-Tree implementation.” You look at the B-Tree implementation and are like “This is rocky! And if there’s a memory corruption issue in here, I do not want to have to go find it.”

                  I would still use an AVL tree in C even though I know I’m giving up some small amount of performance, but in Rust, I get to use a B-Tree.

                  – Bryan Cantrill @ https://youtu.be/HgtRAbE1nBM?t=2450

                  It’s been so useful to have on hand to help support my arguments about the important difference between what performance a language is capable of and what a programmer is willing to do with it.

                  1. 2

                    Oh, that’s awesome to hear – and thank you for the kind words! If you haven’t already seen it, I expanded on the relative performance of C and Rust, which explains how I came to the conclusion that B-trees were providing much of the win of using Rust over C.

                    1. 1

                      Thanks. :)

                      I have an approach to blog-reading that’s very haphazard (stochastic, if you want to put it euphemistically), so I missed “The relative performance of C and Rust” despite having quoted other things like the “But the nothing that C provides reflects history more than minimalism; it is not an elegant nothing, but rather an ill-considered nothing” paragraph from “Rust after the honeymoon”.

                      Also, it looks like one of your WordPress plugins has gone awry, because every comment is topped with its own “Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta(‘email’) instead.” message.

        2. 2

          It’s similar to Java – JNI is frowned upon from what I remember

          And yeah it’s a big reason I’ve avoided Go – I’m supposed to throw out all my C / C++ knowledge and decades of libraries? The sqlite3 thing raised my eyebrows too

          It is a big shame, because I would like to use a higher level and faster language like Go or Java/Kotlin, but I stick with Python because there is always a “way out”

          Python has different problems, but I can always make it work somehow. With Go or Java it seems like I’ll find myself in a corner with a Catch 22 eventually


          I have a rant about the “monoglot anti-pattern” – language designers assuming that every system in the language will be written ONLY in that language. Everybody wants to “own the world”

          As a software system becomes more successful and gets to a certain size, the probability of that approaches zero rapidly

          I mean Go barely has GUI libraries from what I remember, so if you even add a GUI to your system, now you have this awkward boundary, and a code reuse problem

          1. 6

            It’s similar to Java – JNI is frowned upon from what I remember

            It made more sense for Java because pure Java code was binary portable to any platform with a JVM and was part of the same sandboxing as the rest of the Java code. In the case of Go, neither of those reasons hold.

          2. 4

            I switched from Python to Go precisely because Python rarely if ever offers a “way out”. People think “just rewrite the hot path in a fast language” will save their bacon, but they always forget that calling into C requires marshaling their data into a C-friendly structure which often outweighs any performance gains. People also fail to think about the complexity or adds to your build system or how you debug into the C code (it can be done, but it’s tedious).

            In every Python system I’ve worked in, we found ourselves boxed in because there are no good optimization options. Whereas with Go you can usually do a little refactoring of the naive implementation to consolidate allocations and you will often see something like a 10X improvement. Plus Go has shared memory parallelism, so you can go even further.

            Go has bindings to other GUI toolkits, but yes, it’s a cross-language boundary which is awkward in Go like it is in most languages. Even writing GTK in C or Qt in C++ are awkward because they make heavy use of conventions, metaprogramming, etc that are unusual in their host languages. GUI toolkits are just hard in any language, unfortunately.

            1. 1

              Yeah I don’t doubt that experience

              • fixing memory leaks in Python/C bindings isn’t fun
              • finding where to release the GIL in Python/C bindings isn’t fun
              • debugging SWIG-type crap in bindings isn’t fun

              I have done all these things, and I’ve also been surprised when I managed to rewrite the hot loop of a Python program in C++, and it reduced memory usage by 10x and increased speed by 10x. That’s the theory, but it doesn’t work in many types of programs, for the reasons you mention

              But I really meant it when I say “way out” … It’s more work in those situations, but you can always apply some more knowledge/effort to get it done

              I AM very tempted by Go – it does have many appealing qualities. But for the problems I’ve worked on and am working on, the “closed world” property makes me very hesitant

              Actually right now I have a choice between a (small) existing Go codebase, Python, and also looking at Elixir … genuinely not sure what direction I’ll go in

          3. -2

            I’m supposed to throw out all my C / C++ knowledge and decades of libraries?

            Yep, and replace them with Go libraries, which almost entirely are of pretty low quality. It’s common to have very clumsy, unsafe interfaces, incomprehensible bad error messages (usually just many levels deep of complex error strings glued together), no type safety, no escape hatches and so on.

            Whenever I touch Go, everything that would take 10 minutes in basically any other modern language is a 1 hour adventure and often ends in “and now you have to copy&paste half of this library into your project because they didn’t think of that. And it never gets better. The purest and stickiest tarpit our industry has at the moment, imo.

            1. 5

              This seems misinformed. Go doesn’t have Rust’s type safety, but it certainly has more type safety than C and certainly not “none”. I also haven’t perceived any lower quality in Go code bases than other languages (and a fair bit higher than C or C++), of course this is all subjective.

              Regarding error messages, Go’s approach isn’t my favorite, but the typical C approach is returning a single integer devoid of any context. It can map to a string like “file not found”, but you’re not going to get the salient context.

              Regarding “no escape hatches”, Go lets you escape the type system either by using (essentially) dynamic typing or unsafe. I’m not sure what kind of “escape hatches” you’re expecting (particularly those from C/C++).

              Whenever I touch Go, everything that would take 10 minutes in basically any other modern language is a 1 hour adventure

              This is the exact opposite of my experience. Go is my go-to language for “I need to get stuff done quickly in a way that others can understand”. It’s boring and it gets the job done. I use other languages for hobby stuff where I want to play with new ideas and I don’t necessarily care if it takes 10x longer to get the job done (I actually started using Go because I got tired of spending soooo much time scripting CMake or Gradle and I wanted something that just let me hit the ground running).

              Anyway, this doesn’t invalidate your experience; just adding my datapoint.

      2. 12

        Go absolutely avoided the function coloring problem. Go functions are all the same color, which is synchronous. This fact motivates a few best practices, summarized by the idiom of “leave concurrency to the caller”.

        Go treats FFI as a second-class citizen, for sure. But very few programs make use of FFI.

        1. 0

          *nod* And a big part of the reason Pascal lost out to C is that it took so long to step out of the “That functionality would impede its suitability to be a safe teaching language” sandbox.

          The function colouring problem exists in no small part because there was a time not too long ago when we hadn’t yet recognized how desirabie asynchronous programming would become as a feature. Go had the benefit of hindsight and chose to willingly walk toward being “Pascal, but for microservices instead of teaching”.

          (Thus, “the only good boundary with Go is a network boundary”.)

          …and yes, Pascal’s support for things like arrays where the length isn’t part of the type signature and strings that aren’t size-limited by a length field with a range below what available memory allows are a function colouring-esque change that needs to be plumbed throughout a codebase to change things if you chose incorrectly.

      3. 11

        I don’t think it’s fair to say Go “avoided” the function colouring problem. They just put different syntactic sugar on “making everything async and non-blocking”.

        Surely it is fair? The problem is that the developer needs to know about function colours. In go, they don’t?

        The fact that the go runtime may be able to reap the benefits of async I/O without bothering the developer is - imho - kind of the whole point of go.

        1. 1

          The reason I quoted what I did is because I think it’s unfair to say “Javascript avoids this by essentially making everything async and non-blocking.” and then not say the same thing about Go.

          Sure, the syntactic sugar is different, but Go and Node.js take the same approach.

          1. 8

            Node does not, right? You have to color async functions with the “async” prefix sigil, and sync functions can’t call async functions without await (which is a bit safer than what python does, or does not do in this case), but you do color them.

            1. 2

              You don’t need async to implement “everything is async and non-blocking”. That’s why so many Node.js APIs take callbacks. They were doing “everything is async and non-blocking” before JavaScript gained the async syntax construct.

              1. 6

                If you’re saying “straight line golang code” is merely “syntactic sugar” compared to “callback hell” then I think you’re fairly far off into “all languages are the same” territory.

                (If you’re not saying that, I’m misunderstanding you).

                1. 1

                  I’m saying that the execution models embodied in what the standard library APIs allow you to to do are the same, and that it weakens the argument to not acknowledge that just because one language has existed longer and had more time to accumulate multiple syntaxes that serve the same purpose as a result.

                  1. 6

                    Your initial point was that it wasn’t fair to say golang avoided the function-colouring problem.

                    I understand that problem to be “you have two different - mostly non-interoperable - kinds of function, and the developer has the burden of choosing which to use each time”.

                    Golang does not have this problem (due to fairly novel runtime support for calling blocking APIs and shuffling userspace threads across OS threads), so in my view it is pretty clear it did avoid this problem.

                    1. 0

                      And, when it was introduced, Node.js didn’t have that problem because the standard library threw in hard on callback-based async APIs.

                      To argue that Go doesn’t have that problem is like saying “Go doesn’t have the problem of a bolted-on generics system because Go doesn’t have generics”… oops. It’s a weak argument to say “This language is better because, thus far, upstream fiat has managed to keep it from growing bodged-on support for use-cases X and Y while the other language had many more years to get hammered into doing so.”

                      1. 5

                        The “function coloring problem” doesn’t refer to the different concurrency syntaxes in JS, it refers to one specific syntax (async/await) which has multiple colors within itself—sync functions and async functions.

                        The criticism isn’t “JS has too much stuff bolted on”, it’s “JS requires you to choose between a function coloring problem (async/await) or callback hell (while Go has neither”).

                        1. 2

                          And, in a decade, I wouldn’t be surprised if Go were being criticized for having a “requires you to choose between” relationship with some feature Rust’s in the middle of popularizing.

                          (eg. A garbage collector gets you memory-safety, and generics extends Go’s ability to give you type-safety, but I’ve already seen posts like Fixing the Next 10,000 Aliasing Bugs which ends with “Null checking was once unusual as well, but now, in this age of Kotlin and strictNullChecks TypeScript, nobody will take a language without null checking seriously. I think that someday, the same will be true of alias checking as well.”. If Go has to retrofit support for “alias-safety”, it’s going to make the language uglier.)

                          It’s a weak argument to compare two languages based on something one of them predates and had to add in a backwards-compatible way without acknowledging that fact.

                          For example, Rust vs. C++ arguments are stronger for readily acknowledging that C++ is hamstrung by the need to remain compatible with a long history of legacy code.

                          1. 1

                            And, in a decade, I wouldn’t be surprised if Go were being criticized for having a “requires you to choose between” relationship with some feature Rust’s in the middle of popularizing.

                            Maybe. 🤷‍♂️ Hopefully newer languages advance over older ones, right? That’s how progress works.

                            1. 1

                              *nod* …and that’s a good thing.

                              I’m just saying that the argument being made was weakened by not acknowledging the role that played in Go having a more pleasant, less function-coloured async story than JavaScript.

                      2. 1

                        Go’s generics certainly aren’t perfect, but I don’t think they can be accurately described as “bolted on”.

                        1. 3

                          There is a whole ecosystem full of APIs that were designed before Go had generics. Thus, I consider it bolted-on.

                          It’s like comparing Rust’s Option<T> to C++’s std::optional<T> based on their use in their respective ecosystems.

                          1. 2

                            I’m not sure that it’s reasonable to describe generics as “bolted on” simply because they came in a later version of the language.

                            1. 2

                              It depends on whether you’re looking at it from the point of view of the language’s semantics or how they fit into the ecosystem.

                              If you’re looking at the language in isolation, you’re correct… but if you’re looking at how they fit into the standard library APIs and the APIs provided by the ecosystem then, yes, they’re very much bolted on later and that’s an unavoidable side-effect of needing to remain backwards compatible.

                              Personally, as a programmer, what I care about is what I have to interact with in real-world scenarios, which means the latter case.

                              1. 1

                                if you’re looking at how they fit into the standard library APIs and the APIs provided by the ecosystem then, yes, they’re very much bolted on later and that’s an unavoidable side-effect of needing to remain backwards compatible.

                                I don’t really agree. Pre-generics “generic” code necessarily used runtime reflection to do its dark work, but the introduction of generics didn’t totally obviate that stuff, it just introduced an alternative way to express type-invariant code, which makes sense to use in a subset of use cases.

                                1. 1
                                  1. Swap out a few words and you’ve just described function colouring vs. callbacks as a way to handle async in JavaScript.

                                  2. Please understand that I’m one of those people who gravitates to Rust, not for its performance, but for the maintainability advantage of having such a strong type system combined with an ecosystem that embraces “fearless upgrades” more than Haskell’s ecosystem. Having stuff that can’t be shifted to being checked at compile-time, despite the language now supporting it, is a big downside.

              2. 2

                “You don’t need” is not the same as “you cannot”. Javascript supports async/await and has Dunston coloring, whether you use it or not.

                1. 2

                  And a big part of my point is that Node.js was designed before JavaScript had async/await, so saying Go is better without acknowledging that is like a fit 20-something mocking an old man for not having the same physical strength.

                  It hurts your argument to not acknowledge that one of the languages didn’t have that benefit of hindsight when being designed, and had an existing ecosystem to remain compatible with when extending its capabilities.

                  1. 3

                    I don’t get your argument. Other languages have big ecosystems to maintain too. Go is just historically conservative when it comes to expanding the language. And the choice of expanding the language and maintaining legacy is one the javascript designers were already familiar with (promises promises), and they still took it, so yeah, they had the benefit of hindsight.

                    1. 2

                      My point is that, of course Go doesn’t have as much baggage. It’s younger. JavaScript has baggage because it’s grown far beyond its original designed use case… so it hurts how people will perceive the argument if you don’t acknowledge that.

                      Compare, for example, how Rust readily acknowledges that one of the big reasons it’s more pleasant to work with than C++ in many situations is that it learned from C++’s decades of evolution without needing to maintain backwards compatibility.

                      1. 2

                        That’s not the argument for function coloring which is a relatively recent feature of JS. Which Go predates by several years, and could have adopted if worth it. Rust adopted it, and I don’t think it avoids function coloring either (although it’s not been promoted to a language feature AFAIK). It has nothing to do with JS perceived baggage, and I don’t know how being younger would have fixed it, but perhaps you can explain.

                        1. 2

                          What I’m saying is that:

                          1. JavaScript originated as a quick-and-dirty web content scripting language in 1995, so it’s not surprising that it began with only sync APIs.
                          2. That and the need for backwards compatibility meant that the only way it could reasonably add async was through function colouring.
                          3. Go originated in 2009, when the need for good async abstractions was much more visible than in 1995, and it didn’t have a sync ecosystem to remain compatible with, so it could be async-only.

                          That’s how being younger would have fixed it.

                          Not at least making a nod to this in a comparison is like implicitly faulting C for lacking Rust design decisions which must be present from the beginning and could never have been implemented on a PDP-11.

                          Rust adopted it, and I don’t think it avoids function coloring either

                          Because Rust’s primary goal is good “full-duplex integration” with C and suitability for replacing C and/or C++ in all scenarios, which means:

                          1. Sync-coloured APIs must be available.
                          2. Async cannot be baked into the language too deeply or it prevents things like the Linux kernel devs’ interest in using async/await on top of their own kernel-mode task APIs.
                          3. Rust’s view on explicitness means Zig’s approach is a non-starter.

                          …though the async WG has announced their intent to explore the idea of “keyword generics”, where you’d be able to declare a function as being generic over async-ness.

      4. 4

        Maybe it’s the stockholm syndrome, but I kind of like that stuff gets a native implementation in Go rather than pulling in a bunch of C stuff. FFI sounds great, but cross-language building, cross-compiling, debugging, etc are a pain (e.g., Python). I’m glad that Go makes these things a little costlier so something has to be “better enough” to justify these non-obvious costs. Moreover, even with Rust’s great FFI story, most things are still rewritten, so the practical effects are similar.

        Go might be a walled garden, but it’s a pretty nice one IMHO.

        1. 3

          The problem is that better is different. Forcing a reimplementation of everything to be in Go to coexist with the Go runtime means that things can be better than C-world by being different, but also things are different and different is bad in itself and often not actually better.

          Anyway, Go is a language based on trade offs and some people don’t like the trade offs it makes and some people don’t like the idea of having trade offs at all.

    4. 12

      Damn, this is the post I’ve always wanted to write myself but could never put into words.

      I use python at work. It’s not my favourite language, nor is it a great development experience. It’s decent. There’s also some asyncio based python, which accounts for the absolute worst experience I’ve ever had programming in my life: a second python ecosystem mostly incompatible with the other, nearly impossible production issues debugging, terrible unit testing APIs, the occasional “crap forgot to write async again and now this function does nothing”… all to support a product with barely 5 transactions per second.

      Explicit async/await based functional coloring is the worst API for managing concurrent I/O. It works for javascript because there’s nothing else, so zero chances of paradigm interference.

      Ruby got this right. Its fibers, along with the fiber scheduler interface shipped in v3, which auto switches on certain I/O or process managementAPIs, allow for an event loop implementation which supports the same synchronous API as standard vanilla ruby, which means that the very old ruby network libraries “just work”. While not perfect (there’s still no default scheduler implementation, only 3rd party libs), it will at least ensure that its vast ecosystem will not be left behind, and rubyists won’t need to learn a second dialect.

      1. 5

        I feel like explicit await points just doesn’t fit with Python’s goal of being executable pseudocode. It’s a mechanism that doesn’t fit the rest of the language.

      2. 3

        terrible unit testing APIs

        What’s terrible about:

        @pytest.mark.asyncio
        async def run_my_test():
          assert await my_func()
        

        ?

    5. 10

      We use RabbitMQ for similar things at my work but I have never run into this problem. Without knowing more of the technical details I couldn’t say for sure why that is, but my gut instinct is that it has to do with the queue topology. The way we use it, messages are always sending to topic exchanges and never directly to a queue, which always seemed to me sort of antithetical to how AMQP is designed to be used.

      Workers come in two flavors: either we need a persistent queue, in which case we establish a simple persistent queue, bind to that exchange, and then have workers reading from it. RabbitMQ automatically round-robins messages to workers. I have never really seen the bad behavior described in the article in this arrangement but it seems like, if the article’s analysis is correct, I should have. The other scenario we have is where the queue doesn’t need to be persistent, and then the worker creates an unnamed, transient, auto-delete queue and binds to the exchange with that.

      I do have one scenario in which we have data in Postgres and a RabbitMQ queue related to the same work. We did this because I thought it was unsafe to use Postgres as a queue with multiple workers, on account of MVCC and race conditions. So what I do there is something akin to, create a transient queue in RabbitMQ, load it up with data from the database, and then have workers reading off that queue; as they finish, they mark things done in Postgres as well. It would be interesting to explore a solution using row locking as described in the article.

      I really like using RabbitMQ, because it’s very responsive and the whole AMQP topology concept is very interesting to me and affords a lot of possibilities. I don’t see abandoning it, although it would be good to eliminate the double-entry system I have above if it isn’t really necessary. Overall, I see AMQP as enabling a kind of cross-language, event-driven system integration platform. If you just need a simple queue it is probably overkill.

      1. 10

        unsafe to use Postgres as a queue with multiple workers, on account of MVCC and race conditions.

        The rest of this post all makes sense to me, but postgresql has much, much better tools to deal with “ensure only one worker processes this, and only with the latest version of the data” than AMQP does.

          1. 6

            SELECT … FOR UPDATE SKIP LOCKED

            1. 1

              Right, and this also means that jobs are not available to be worked on until you commit the transaction inserting them - so workers cannot pick up a job before the associated records are present.

            2. 1

              I use the following code in a few different projects:

              DELETE FROM task
              WHERE task_id IN
              ( SELECT task_id
                FROM task
                ORDER BY RANDOM()
                FOR UPDATE
                SKIP LOCKED
                LIMIT 1
              )
              RETURNING task_id, task_type, params::JSONB AS params
              

              Note that RANDOM() is not ideal for larger tables – use TABLESAMPLE instead.

    6. 7

      sequel, the most complete, flexible and extensible database toolkit / ORM library you can find in any ecosystem. Period. In itself a bigger reason for one to learn ruby, than rails itself.

    7. 5

      I thought this was a good list, but it’s missing one really important aspect: write documentation.

      1. 3

        And, related, write tests. Tests in a library can help users because they can read them and see if they’re using the library in an expected way. A good testing framework also makes it easy for downstream users to submit tests that ensure that you don’t break something that they depend on.

      2. 2

        I think the author didn’t want to state the obvious about making the library. It’s all about the extra bits around the design. Which is great advice. Besides that sure, you’ll have to write the functionality, the tests, the docs.

    8. 3

      rbs supports type definitions like this one, I.e. a “string” is either a primitive string, or an object implementing #to_str. Wouldn’t such a thing solve your use-case?

    9. 7

      First, it’s crazy that Shopify even allowed a non-Ruby tool, given how much they invest in the Ruby ecosystem. I know a few people who have worked there, and while they do use some other programming languages, the vast majority is Ruby, and they are pretty forceful about using Ruby whenever possible.

      Second - why not just say “we switched to Node because we flipped a coin?” None of the reasons given in the article are compelling.

      Moreover, the Hydrogen team, which had built some tools on Node, started to consider building a new CLI instead of building Hydrogen workflows into the Shopify Ruby CLI so their users didn’t need a Ruby runtime in their system. Hydrogen developers expect the npm install command to resolve all the dependencies that they need to work on a project. With Ruby as a dependency, that mental model breaks and they can easily run into issues where the CLI refuses to run because of additional steps needed.

      I would have just given devs a script to setup their local env vs. rewrite the whole tool.

      1. 5

        So they don’t need a ruby runtime. But they need a node runtime. Lol.

        I would have understood if they had used the static binary excuse (and even then, mruby). They could have been honest and said “creator left the time, we don’t like ruby in this team, but we need a story to sell node to engineering managment”

        1. 2

          Sounds exactly right to me. It’s silly - they even state outright that

          Ruby is the language most developers are familiar with

          followed by

          Shopify wants internal teams to contribute new ideas into the CLI.

          and then the illogical conclusion

          We were left with either Ruby or Node.

          followed by some blah blah nonsensical bit about Node allowing multiple versions of the same module in one project (which is highly questionable IMO).

      2. 2

        yeah that is very crazy reasoning on shopify’s part

    10. 6

      When will rust community stop begging for adoption?

      1. 19

        The dynamics is exactly the opposite: a lot of people do web, people want to try Rust => people want to do web in Rust => are we web yet is created as a canned answer for a stream of “how do I do $web_thing in Rust?” question.

    11. 1

      That is, it could handle queued tasks about 5 times as quickly as an untuned Mastodon instance.

      Provided that you have 25 vCPUs, that is. It I’m sure scaling up the worker count is beneficial even beyond the CPU count, but i think the overhead is noticeable.

      1. 3

        Ruby has the GIL and can’t scale beyond a single core with a single process. Having multiple vCPUs won’t do anything except let you saturate each one with a Ruby process for 25x the memory usage of a different language.

        1. 1

          Sidekiq is mostly calling imagemagick or ffmpeg. The subprocess can run on a different cpu, and the GIL (but not the current thread) is released while waiting for the subprocess to finish running.

          Ruby also has Fiber, which lets you yield control during IO without using an extra thread.

          If your workload is “Download a file, pass it to a subprocess, upload the result elsewhere” then you might want 20x as many threads as you have CPUs.

          1. 2

            What you’re stating is only half-correct. While it’s true that the GIL may be released more often during I/O loads, you still have to prove that this is where your process spends most of the time. Increasing the number of sidekiq thread workers without tuning the VM parameters, such as malloc arena max, and potentially mem compaction, will only result in a lot of page faults, and your process spending more time garbage collecting than I should.

            While ruby has fibers, sidekiq makes no use of that. So this is a non-factor for your setup.

            Another thing: sidekiq is written in ruby. Not ruby on rails. But one downside of running rails in sidekiq (at least for older versions of rails?) is having the workers fetching a database connection early, before doing actual work, and holding on to it until the work is done. This is suboptimal, as you’ll have to always keep the max dB pool size the same as the thread count, in order to avoid timeout errors acquiring connections . This severely hampers horizontal scaling. If mastodon were built in a different ruby stack, using the sequel gem, you could have N max db pool size for N thread workers, where N > M , no problem. But mastodon isn’t gonna be rewritten anytime soon, so be aware of this known (anti) pattern.

            However, if you can work your way around that known limitation, it’s always preferable to run, as an example, 5 sidekiq processes of 5 workers each, than a single process with 25 workers, due to the limitations explained above.

            Another option is running jruby.

            1. 1

              While it’s true that the GIL may be released more often during I/O loads, you still have to prove that this is where your process spends most of the time.

              I’ve worked on 30+ rails codebases since 2007, including multiple sites in the alexa 10k list. I think I’ve seen one case, in all that time, where background workers were CPU-bound. You definitely want to check, but IMO it’s a reasonable starting assumption.

              one downside of running rails in sidekiq (at least for older versions of rails?) is having the workers fetching a database connection early

              Rails added lazy-loaded connection pooling in version 2.2 (2008). If your knowledge of performance tuning rails dates to before 2008, then perhaps include that caveat in your advice?

              It’s always preferable to run, as an example, 5 sidekiq processes of 5 workers each, than a single process with 25 workers, due to the limitations explained above.

              Sidekiq doesn’t offer preload/fork (or if it does, the docs are well hidden). Without forking, 5x5 has a higher memory footprint (for mid-sized rails apps that’s plausibly ~1gb of additional ram use).

              1. 1

                Lazy-loaded connection doesn’t stop the behaviour I described.

                As for sidekiq, it offers managed forks in the enterprise version.

                1. 1

                  Perhaps I’ve misunderstood. If you aren’t starting a transaction (which reserves a connection until you finish it), why would the worker hold a connection?

                  1. 1

                    You may acquire a connection only to perform a read query. In such cases, sequel acquires, selects, returns the results, and sends it back to the pool. Active record would acquire, select, and keep the connection, until the request would be over and the rack middleware would be reached. This was done so probably because the assumption is rhat you want to have multiple queries, so better keep the connection around. In practice, this means that the contention threshold is higher, so better recommend thread / db pool size parity.

                    I haven’t researched rails 6 or more recently, so I don’t know how much changed.

    12. 1

      I believe it’s not that simple. A lot of applications are deployed behind a reverse proxy. Among other features, they usually decode the payload before sending it upstream , because the app server would inflate it slower, or not even support the algo (brotli is not yet pervasive), and remove the entry from “accept-encoding”. Then there’s the vary header to add to this salad.

      So when the request arrives to the app server, and the conditional headers are sent, how do they know if the request refers to a gzip cache, or a “identity” cache?

      So the best course of action sometimes is to do nothing, until this problem becomes your biggest bottleneck.

    13. 2

      How is the world did ruby end up in a “dreaded” list?? I presume the author doesn’t like it.

      1. 6

        The 2022 Stack Overflow developer survey is out!

        And what’s fascinating to me is which popular programming languages are either loved or dreaded.

      2. 2

        Put it in perspective against the rest. Most dreaded langs are actually what runs the Internet, but see very little corporate-sponsored marketing, or usage in some popular niche fields such as ML and data science.

        1. 4

          Most dreaded langs are actually what runs the Internet,

          Not entirely fair: JavaScript was in the loved list.

          1. 3

            There’s a whole new tribe of programmers, coming out of bootcamps, who know of anything else. SO engagement is dominated by newcomers asking questions. I barely saw stars about cobol

            1. 1

              Adding “How many languages are you proficient in?” (or some better phrased version) might be a good way to get to the bottom of this. You could drill into questions like do newbies like their first language and what languages do experienced programmers enjoy.

              I do agree when I think about the people I know. As they get more experienced they start leaning away from JS towards TS.

          2. 2

            I wonder if survey responders lumped TS/JS together in their head? I don’t mind working TS these days, but I ran across an older JS codebase at work recently and it’s a nightmare.

        2. 2

          Most dreaded langs are actually what runs the Internet, but see very little corporate-sponsored marketing

          Good observation. All of them but one (Java) have not had mega-corp backing, while almost all of the loved languages have.

          1. 1

            And that megacorp (if you mean oracle) hurts more than helps, in a lot of ways.

            1. 3

              Java has has backing of many major corps, none of them oracle. Even Sun, the former owner, was small potatoes here. Every Enterprise product used Java for years and IBM and Google both put a lot of weight there for a long time

        3. 1

          Usually I expect “dreaded” to be based on issue with the language or ecosystem, such as PHP or Java – or fear, such as C or C++.

          “It’s kinda popular even though it has no marketing.” How did that even happen if something is dreaded?

          Anyway, I’m not saying it’s not true, just that it surprised me since I’ve not encountered the people with the dread in this case.

    14. 5

      Why does nobody complain about how OpenSSL doesn’t follow the UNIX philosophy of “Do one thing well”?

      1. 34

        Probably because there’s already so many other things to complain about with openssl that it doesn’t make the top 5 cut.

      2. 18

        Because the “Unix philosophy” is incredibly vague and ex-post-facto rationalization. That, and I suspect cryptography operations would be hard to do properly like that.

      3. 4

        Does UNIX follow the UNIX philosophy?

        I mean, ls has has 11 options and 4 of them deal with sorting. According to the UNIX philosophy sort should’ve been used for sorting. So “Do one thing well” doesn’t hold here. Likewise, other tenets are not followed too closely. For example, most of these sorting options were added later (“build afresh rather than complicate old programs” much?).

        The first UNIX, actually, didn’t have sort so it can be understood why an option might’ve been added (only t at the time) and why it might’ve stayed (backwards compatibility). Addition of sort kinda follows the UNIX philosophy but addition of more sorting options to ls after sort was added goes completely contrary to it.

        1. 3

          Theoretically, yes: it seems that Bell Labs’ UNIX followed the UNIX philosophy, but BSD broke it.

          Reference: http://harmful.cat-v.org/cat-v/

      4. 4

        Everyone’s still wondering if the right way to phrase it is that “it does too many things” or “it doesn’t do any of them well” ¯\_(ツ)_/¯

      5. 2

        Maybe because it’s not really a tool you’re expected to use beyond a crypto swiss army knife. I mean, it became a defacto certificate request generator, because people have it installed by default, but there are better tools for that. As a debug tool it is a “one thing well” tool. The one thing is “poke around encryption content / functions”.

        Otherwise, what would be the point of extracting things like ans1parse, pkey, or others if they would be backed by the same library anymore. Would it change anything if you called openssl-asn1parse as a separate tool instead of openssl asn1parse?

      6. 1

        For the same reason no one complains about curl either?

        1. 1

          related, here’s a wget gui that looks similarly complex https://www.jensroesner.com/wgetgui/#screen

    15. 6

      I like the idea of GitLab and even bought into it for my last company with an Ultimate subscription for the features they offered, but I probably should’ve gone with something I was familiar with (GitHub) because of stability issues of the platform itself. Everything felt a little half-baked as they tried to give you everything, just not well.

      1. 8

        Bringing it back to the topic, I’m pretty sure Gitlab had CI before Github had actions. Not that this means anything I guess, you’d use a third party service with Github.

        Competition is good but I think I prefer the CI format of Gitlab even though I’ve been a user on Github for longer. The smaller self-hosted git apps afaik are simple to run but don’t have the feature set. Gitlab is quite a thing to install and manage last I did, so not without trade-offs. I’m impressed with the Gitlab Team’s iteration speed. You can open an issue if there’s a problem, because it is open source. Github itself is not open source at all and this is usually a huge sticking point for any other tool thread.

        But whatever, not trying argue. We can’t measure or describe software really so it’s just chit-chat. 🌻

        1. 2

          I empathize with people wanting an open source tool, but GitLab is not approachable and it’s hard to engage them even when you’re a paying customer for features. There was also the stability problem they were having and they took everyone off of working on features and deliverables to improve platform stability. I don’t know if that ever had an effect, but they did it.

          Its CI does predate Actions, but before Actions, you’d just plug in Travis or Circle anyways (and from the article, I may choose to do so in the future for a non-GitLab, non-GitHub offering in the future).

      2. 3

        Half-baked? Can you even rerun a single job in github already? If anything is half baked, it’s actions.

          1. 1

            Wow, since yesterday. My point still stands then :)

      3. 2

        “Half baked” is a good way to put it. I felt that even with the little time I spent with it. I know GitLab has its fans, and I’m sure its great under the right circumstances, but if you want to host your code and run some Ci all in the same place, Github is just way ahead of GitLab.

        1. 5

          The thing that bugs me is that a lot of people think about moving off GitHub (which is great; they totally should; monopolies are bad, etc) but then they go look at GitLab and assume that because it’s the biggest “github alternative” that it’s also the best. Then they see this half-baked stuff and then go back to GitHub not realizing that they are missing out on much better alternatives like Gitea and Sourcehut.

          1. 4

            But gitea isn’t hosted. You’ll have to host it yourself. I think its great to have that option, but it’s very different from simply moving from github to gitlab.

            1. 2

              Sure; I guess in that context https://codeberg.org or one of the other hosted gitea sites would be a better comparison.

    16. 2

      So as usual, consider not writing Ruby, but if you do, use type signatures, and maybe 100% branch coverage too, as annoying as it seems.

      Could someone more into Ruby ecosystem explain why the OP could have written this paragraph about “not writing Ruby”? Is Ruby a dying language?

      1. 5

        strong opinion syndrome

      2. 4

        People thinking only strictly typed languages are worthy are prone to not recommend not strictly typed languages like Ruby. I like Ruby and I have made extensive refactors without Sorbet&co by having confidence in my tests. My colleague who swears by C# and Typescript will talk your ears off “proving” how such a thing is impossible.

        To each their own and let’s stop trying to find the one size fits all magical tool.

      3. 4

        OP was a big ruby community member with many contributions to his name, who fell out with ruby for, first, not solving the language and runtime issues he thought were the most important, then, for solving them in a way he likes. He does go now.

      4. 3

        If you want what this post wants (good types and safety rails) Ruby is a bad choice for those things. Sure there is sorbet and writing a billion tests, but even Crystal would be a better choice if those things are your priority.

        If you want is to play with a highly malleable environment and don’t want smalltalk then Ruby is where it’s at, though.

      5. 2

        I can’t speak about the OP, but Ruby definitely isn’t dying. It might not be growing, but it is keeping a good level and there are new devs coming to Ruby all the time.

      6. 1

        They are into Go, based on the other posts on the blog.

    17. 1

      I used to work for an ISP where even development was in machines you had to ssh into and had a proxy to the Internet, like what’s described in the article.

      I wondered what were the EXTRA_CERTS variables for. https is tunneled via CONNECT, does that mean that certificates to verify Internet Tls connections have to be provided and exposed by the sysadmin?

      The author doesn’t mention ruby, but there is a very handy function in the stdlib , URI.find_proxy , which abstracts the proxy env var discovery described in the article, that every other stdlib network library uses. Sadly, there are a bunch of non-stdlib libraries that ignore it, among them a few popular http libraries.

      1. 1

        I assume they had to add to the CA bundle for either or both of:

        • a bunch of services (e.g. caching pypi mirror) with certs signed by an internal CA because someone thought that was easier than using letsencrypt, or…
        • a proxy that MITMs all https connections (by generating its own certs on the fly from its own CA cert) and requires the client to trust the CA cert that belongs to the MTIMing proxy (¹). These are often used for “data lots prevention” proxies by big organisations. (I hate these things because they tend to be unreliable and buggy, and users report the DLP proxy’s bugs and downtime to the vendors of every SaaS product they use, rather than to the people who run the broken proxy.)

        (¹ pedantically, there might be an intermediate CA cert in between, but it doesn’t matter much.)

    18. 6

      “Pay them” is bullshit. Why? These OSS projects are public and free because the creator chose so. Individuals from companies decided to do use them because they chose so, out of its perceived usefulness to solve their problem. Their employees allowed and incentivised them. And when shit happened, the project maintainers were not obliged to spend “countless hours” to fix the problem for those companies which dont pay them , they willingly chose to. And if they don’t care, they can happily leave the project behind right now and let the rest of the enterprise world pick up the pieces of what they left behind. This is the contract.

      This thankless martyrdom speech misses the point. The whole java world uses log4j because it’s free. If it wasn’t, they’d probably reconsider using a logging library from which they need 10% of the features they could themselves implement in reasonable time and maintain in isolation. Because it’s free, they should have probably audited them. But the githubs and snyks and multiple package managers and CIs made it so trivial to outsource this concern, that we’re never prepared for the blast radius of popular packages chosen based on the number of github stars, and them we run to the public discussion forums that are github issues, where maintainers are publicly harassed about why jeff bezos is not going to get the same revenue dividend this month. How is money supposed to fix that?

      I personally do OSS because I enjoy it, i get a kick out of it, and i get to go deep on topics i can’t afford to waste time in at my day jobs. I try to balance the time i spend doing it, and when i am not feeling it, I stop for a bit. I am cordial when people report bugs,and dont give them timelines, its fixed when its fixed. And i dont have donations set up. I work when i want. I l quit when I want. If you dont like it,fork it, and respect the terms of the license I chose. And that’s that.

    19. 70

      Nobody knows how to correctly install and package Python apps.

      That’s a relief. I thought I was the only one.

      1. 8

        Maybe poetry and pyoxidize will have a baby and we’ll all be saved.

        One can hope. One can dream.

        1. 4

          After switching to poetry, I’ve never really had any issues.

          pip3 install --user poetry
          git clone...
          cd project
          poetry install
          poetry run python -m project
          

          You can pull the whole install sequence in a Docker container, push it in your CI/CD to ECR/Gitlab or whatever repo you use, and just include both the manual and the docker command in your readme. Everyone on your team can use it. If you find an issue, you can add that gotcha do the docs.

          Python is fine for system programming so long as you write some useful unittests and force pycodestyle. You loose the type-safety of Go and Rust, yes, but I’ve found they’re way faster to write. Of course if you need something that’s super high performance, Go or Rust should be what you look towards (or JVM–Kotlin/Java/Scala if you don’t care about startup time or memory footprints). And of course, it depends on what talent pools you can hire from. Use the right tool for the right job.

          1. 2

            I’ve switched to poetry over the last several months. It’s the sanest installing python dependencies has felt in quite a few years. So far I prefer to export it to requirements.txt for deployment. But it feels like about 95% of the right answer.

            It does seem that without some diligence, I could be signing up for some npm-style “let’s just lock in all of our vulnerabilities several versions ago” and that gives me a little bit of heartburn. From that vantage point, it would be better, IMO, to use distro packages that would at least organically get patched. I feel like the answer is to “just” write something to update my poetry packages the same way I have a process to keep my distro packages patched, but it’s a little rotten to have one more thing to do.

            Of course, “poetry and pyoxidize having a baby” would not save any of this. That form of packaging and static linking might even make it harder to audit for the failure mode I’m worrying about here.

        2. 1

          What are your thoughts on pipenv?

      2. 5

        I’d make an exception to this point: “…unless you’re already a Python shop.” I did this at $job and it’s going okay because it’s just in the monorepo where everyone has a Python toolchain set up. No installation required (thank god).

      3. 4

        I think the same goes for running Python web apps. I had a conversation with somebody here… and we both agreed it took us YEARS to really figure out how to run a Python web app. Compared to PHP where there is a good division of labor between hosting and app authoring.

        The first app I wrote was CGI in Python on shared hosting, and that actually worked. So that’s why I like Unix – because it’s simple and works. But it is limited because I wasn’t using any libraries, etc. And SSL at that time was a problem.

        Then I moved from shared hosting to a VPS. I think I started using mod_python, which is the equivalent of mod_php – a shared library within Apache.

        Then I used a CherryPy server and WSGI. (mod_python was before WSGI existed) I think it was behind Apache.

        Then I moved to gunicorn behind nginx, and I still use that now.

        But at the beginning of this year, I made another small Python web app with Flask. I managed to configure it on shared hosting with FastCGI, so Python is just like PHP now!!! (Although I wouldn’t do this for big apps, just personal apps).

        So I went full circle … while all the time I think PHP stayed roughly the same :) I just wanted to run a simple app and not mess with this stuff.

        There were a lot of genuine improvements, like gunicorn is better than CherryPy, nginx is easier to config than Apache, and FastCGI is better than CGI and mod_python … but it was a lot of catching up with PHP IMO. Also FastCGI is still barely supported.

        1. 2

          nginx, uWSGI, supervisord. Pretty simple to setup for Flask or Django. A good shared hosting provider for Python is OpalStack, made by the people who created Webfaction (which, unfortunately, got gobbled up by GoDaddy).

          I cover the deployment options and reasoning in my popular blog post, “Build a web app fast: Python, JavaScript & HTML resources”. Post was originally written in 2012 but updated over the years, including just this month. See especially the recommended stack section at the end, starting at “Conclusion: pick a stack”, if you want to ctrl+f for that section. You can also take a peek at how OpalStack describes their Python + uWSGI + nginx shared hosting setup here. See also my notes on the under the hood configuration for nginx, uWSGI, and supervisord in this presentation, covered in the 5-6 sections starting from this link.

          You’re right that there are a lot of options for running a Python web app. But nginx, uWSGI, supervisord is a solid option that is easy to configure, high performance, open source, UNIXy, and rock solid. For dependency management in Python 3.x you can stick with pip and venv, remotely configured on your server via SSH.

          My companies have been using this stack in production at the scale of hundreds of thousands of requests per second and billions of requests per month – spanning SaaS web apps and HTTP API services – for years now. It just works.

          1. 2

            I’m curious, now that systemd is available in almost all Linux distributions by default, why are you still using supervisord? To me it feels like it is redundant. I’m very interested.

            1. 1

              I think systemd can probably handle the supervisord use cases. The main benefit of supervisord is that it runs as whatever $USER you want without esoteric configuration, and it’s super clear it’s not for configuring system services (since that’s systemd’s job). So when you run supervisorctl and list on a given node, you know you are listing “my custom apps (like uwsgi or tornado services)”, not all the system-wide services as well as my custom app’s ones. Also this distinction used to matter more when systemd was less standard across distros.

              1. 1

                Understood! Thanks very much for taking the time to explain!

          2. 1

            Hm thanks for the OpalStack recommendation, I will look into it. I like shared hosting / managed hosting but the Python support tends to be low.

            I don’t doubt that combination is solid, but I think my point is more about having something in the core vs. outside.

            PHP always had hosting support in the core. And also database support. I recall a talk from PHP creator Ramsus saying how in the early days he spent a ton of time inside Apache, and committed to Apache. He also made some kind of data limiting support to SQL databases to make them stable. So he really did create “LAMP”, whereas Python had a much different history (which is obviously good and amazing in its own way, and why it’s my preferred language).

            Similar to package management being outside the core and evolving lots of 3rd party solutions, web hosting was always outside the core in Python. Experts knew how to do it, but the experience for hobbyists was rough. (Also I 100% agree about not developing on Windows. I was using Python on Windows to make web apps from ~2003-2010 and that was a mistake …)

            It obviously can be made to work, I mean YouTube was developed in Python in 2006, etc. I just wanted to run a Python web app without learning about mod_python and such :) Similarly I wish I didn’t know so much about PYTHONPATH!

            1. 1

              I agree with all that. This is actually part of the reason I started playing with and working on the piku open source project earlier this year. It gives Python web apps (and any other Python-like web app programming environments) a simple git-push-based deploy workflow that is as easy as PHP/Apache used to be, but also a bit fancier, too. Built atop ssh and a Linux node bootstrapped with nginx, uWSGI, anacrond, and acme.sh. See my documentation on this here:

              https://github.com/amontalenti/webappfast-piku#build-a-web-app-fast-with-piku

              1. 1

                Very cool, I hadn’t seen piku! I like that it’s even simpler than dokku. (I mentioned dokku on my blog as an example of something that started from a shell script!)

                I agree containers are too complex and slow. Though I think that’s not fundamental, and is mostly Docker … In the past few days, I’ve been experimenting with bubblewrap to run containers without Docker, and different tools for buliding containers without Docker. (podman is better, but it seems like it’s only starting to get packaged on Debian/Ubuntu, and I ran into packaging bugs.)

                I used containers many years ago pre-Docker, but avoided them since then. But now I’m seeing where things are at after the ecosystem has settled down a bit.

                I’m a little scared of new Python packaging tools. I’ve never used pyenv or pipx; I use virtualenv when I need it, but often I just manually control PYTHONPATH with shell scripts :-/ Although my main language is Python, I also want something polyglot, so I can reuse components in other languages.

                That said I think piku and Flask could be a very nice setup for many apps and I may give it a spin!

                1. 1

                  It’s still a very new and small project, but that’s part of what I love about it. This talk on YouTube gives a really nice overview from one of its committers.

          3. 1

            In addition to @jstoja’s question about systemd vs supervisord, I’d be very curious to hear what’s behind your preference for nginx and uWSGI as opposed to caddy and, say, gunicorn. I kind of want caddy to be the right answer because, IME, it makes certificates much harder to screw up than nginx does.

            Have you chosen nginx over caddy because of some gotcha I’m going to soon learn about very unhappily?

            1. 2

              Simple answer: age/stability. nginx and uWSGI have been running fine for a decade+ and keep getting incrementally better. We handle HTTPS with acme.sh or certbot, which integrate fine with nginx.

              1. 1

                That’s a super-good point. I’m going to need to finish the legwork to see whether I’m willing to bet on caddy/gunicorn being as reliable as nginx/uWSGI. I really love how terse the Caddy config is for the happy path. Here’s all it is for a service that manages its own certs using LetsEncrypt, serves up static files with compression, and reverse proxies two backend things. The “hard to get wrong” aspect of this is appealing. Unless, of course, that’s hiding something that’s going to wake me at 3AM :)

      4. 3

        Why is Python’s packaging story so much worse than Ruby’s? Is it just that dependencies aren’t specified declaratively in Python, but in code (i.e. setup.py), so you need to run code to determine them?

        1. 9

          I dunno; if it were me I’d treat Ruby exactly the same as Python. (Source: worked at Heroku for several years and having the heroku CLI written in Ruby was a big headache once the company expanded to hosting more than just Rails apps.)

          1. 3

            I agree. I give perl the same handling, too. While python might be able to claim a couple of hellish inovations in this area, it’s far from alone here. It might simply be more attractive to people looking to bang out a nice command line interface quickly.

        2. 6

          I think a lot of it is mutable global variables like PYTHONPATH which is sys.path. The OS, the package managers, and the package authors often fight over that, which leads to unexpected consequences.

          It’s basically a lack of coordination… it kinda has to be solved in the core, or everybody else is left patching up their local problems, without thinking about the big picture.

          Some other reasons off the top of my head:

          • Python’s import mechanism is very dynamic, and also inefficient. So the language design kind of works against the grain of easy distribution, although it’s a tradeoff.
          • There’s a tendency to pile more code and “solutions” on top rather than redoing things from first principles. That is understandable because Python has a lot of users. But there is definitely a big mess with distutils + setuptools + pip + virtualenv, plus a few other things.
          • Package managers are supposed to solve versioning issues, and then you have the tricky issue of the version of the package manager itself. So in some sense you have to get a few things right in the design from the beginning!
        3. 5

          Ruby’s packaging story is pretty bad, too.

            1. 4

              I don’t know, it’s been a long time since I’ve written any Ruby. All I know is that we’re migrating the Alloy website from Jekyll to Hugo because nobody could get Jekyll working locally, and a lot of those issues were dependency related.

        4. 4

          Gemfile and gemspec are both just ruby DSLs and can contain arbitrary code, so that’s not much different.

          One thing is that pypi routinely distributes binary blobs that can be built in arbitrarily complex ways called “wheels” whereas rubygems always builds from source.

          1. 5

            Not true. Ruby has always been able to package and distribute precompiled native extensions, it’s just that it wasn’t the norm in a lot of popular gems, including nokogiri. Which by the way, ships precompiled binaries now, taking couple of seconds where it used to take 15m, and now there’s an actual tool chain for targeting multi arch packaging, and the community is catching up.

            1. 2

              Hmm, that’s very unfortunate. I haven’t run into any problems with gems yet, but if this grows in popularity the situation could easily get as bad as pypi.

          2. 1

            Thanks for the explanation, so what is the fundamental unfixable issue behind Python’s packaging woes?

            1. 1

              I could be wrong but AFAICT it doesn’t seem to be the case that the Ruby crowd has solved deployment and packaging once and for all.

      5. 2

        I just run pkg install some-python-package-here using my OS’s package manager. ;-P

        It’s usually pretty straightforward to add Python projects to our ports/package repos.

        1. 3

          Speaking from experience, that works great up until it doesn’t. I have “fond” memories of an ex-coworker who developed purely on Mac (while the rest of the company at the time was a Linux shop), aggressively using docker and virtualenv to handle dependencies. It always worked great on his computer! Sigh. Lovely guy, but his code still wastes my time to this day.

          1. 1

            I guess I’m too spoiled by BSD where everything’s interconnected and unified. The ports tree (and the package repo that is built off of it) is a beauty to work with.

            1. 4

              I’m as happy to be smug as the next BSD user but it isn’t justified in this case. Installing Python packages works for Python programs installed from packages but:

              • They don’t work well in combination with things not in packages, so if you need to use pip to install some things you may end up with conflicts.
              • The versions in the package repo may or may not be the ones that the thing you want to install that isn’t in packages need, and may conflict with the ones it needs.
              • The Python thing may depend on one of the packages that depends on Linux-specific behaviour. The most common of these is that signals sent to the process are delivered to the first thread in the process.

              In my experience, there’s a good chance that a Python program will run on the computer of the author. There’s a moderately large chance that it will run on the same OS and version as the author. Beyond that, who knows.

            2. 3

              I mean, we used Ubuntu, which is pretty interconnected and unified. (At the time; they’re working on destroying that with snap.) It just often didn’t have quiiiiiite what we, or at least some of us, wanted and so people reached for pip.

              1. 1

                Yeah. With the ports tree and the base OS, we have full control over every single aspect of the system. With most Linux distros, you’re at the whim of the distro. With BSD, I have full reign. :-)

                1. 3

                  But it could still be the case that application X requires Python 3.1 when application Y requires Python 3.9, right? Or X requires version 1.3 of library Z which is not backwards compatible with Z 1.0, required by Y?

                  1. 3

                    The Debian/Ubuntu packaging system handles multiple versions without any hassle. That’s one thing I like about it.

                    1. 1

                      Does it? Would love to read more about this if you have any pointers!

                      1. 2

                        I guess the main usability thing to read about it the alternatives system.

                  2. 2

                    The ports tree handles multiple versions of Python fine. In fact, on my laptop, here’s the output of: pkg info | grep python:

                    py37-asn1crypto-1.4.0          ASN.1 library with a focus on performance and a pythonic API
                    py37-py-1.9.0                  Library with cross-python path, ini-parsing, io, code, log facilities
                    py37-python-docs-theme-2018.2  Sphinx theme for the CPython docs and related projects
                    py37-python-mimeparse-1.6.0    Basic functions for handling mime-types in Python
                    py37-requests-toolbelt-0.9.1   Utility belt for advanced users of python-requests
                    py38-dnspython-1.16.0          DNS toolkit for Python
                    python27-2.7.18_1              Interpreted object-oriented programming language
                    python35-3.5.10                Interpreted object-oriented programming language
                    python36-3.6.15_1              Interpreted object-oriented programming language
                    python37-3.7.12_1              Interpreted object-oriented programming language
                    python38-3.8.12_1              Interpreted object-oriented programming language
                    
      6. 1

        Fwiw, I’ve had good luck using Pyinstaller to create standalone binaries. Even been able to build them for Mac in Circleci.

      7. 1

        It can feel a bit like overkill at times, but I’ve had good luck with https://www.pantsbuild.org/ to manage python projects.

    20. 9

      Also from TFA:

      This attack vector isn’t unique to npm. Other package managers like pip and RubyGems allow for the same thing.

      1. 7

        Yup, Cargo as well. From rust-secure-code/cargo-sandbox#3:

        tl;dr: build-time attacks are stealthier than trojans in build targets, and permit lateral movement between projects when attacking a build system. The threat of a build-time trojan, versus a source code trojan, is an attack that does not leave behind forensic evidence and is therefore harder to investigate. Attacking a build system also potentially permits lateral movement between build targets.

        I don’t know what the state of the working group is, but there was definitely some interest in fixing this for Cargo.

      2. 2

        Worth noting that the Python world at least has been trying to move away from an install-time “build” step/executable package manifest. The first big chunk of that was the introduction, years ago, of the wheel (.whl) package format, which contains everything – including compiled extensions – already built and organized, so that the install step can just consist of unpacking it and putting files in the correct locations.

        There also has been support for many years for declarative package manifests; originally as setup.cfg, and now with the package-related APIs being standardized and genericized, coalescing around the pyproject.toml file.

        1. 1

          I don’t think wheels is a post install opt out, right? A python package can ship with precompiled extensions and invoke the post install. I don’t think wheels is therefore solving the underlying security issue here, rather avoiding forcing everyone to install gfortran and make/gcc + long compilation times.

          FWIW ruby gems integrates with diffend, a service which continuously inspects uploaded packages and does automated security auditing.

          1. 3

            I know of no hooks in the wheel format (spec is here) which would allow a wheel package to execute custom code on the destination machine, whether as pre-, during-, or post-install.

            Again, a wheel is literally a zip file containing everything pre-built; installing a wheel consists of unzipping it and moving the files to their destinations (which are statically determined). You can read the full process spec, but I do not see any mention of a post-install scripting hook in there, and strongly suspect you are either misremembering or misinformed.

            1. 1

              I’m not super familiar with the spec, but I thought it was possible to release a package with a wheel and a setup.py script, where the post install hook would be defined. I’ll read the links.

      3. 1

        Is maven similar? I’m pretty certain it’s doing the same, but I’m not sure.

        Although, most people use some private repository. But they still initialize it with some public code.