Threads for mperham

    1. 5

      Congrats on the release!

      It was an absolute delight to discover that modern CSS, with its support for variables, can be used without a build process.

      I went on my journey to that realization over the last 18 months. I haven’t phased out SCSS everywhere yet, but once native CSS nesting crosses 95% support, I’ll default to vanilla CSS, and be skeptical of deviating in any new projects.

      1. 1

        I was curious how this looked in terms of code… I’m a little surprised at how concise/spartan it is. There doesn’t seem to be a css reset even? Is that not needed anymore?

        Stylesheet:

        https://github.com/sidekiq/sidekiq/blob/main/web/assets/stylesheets/style.css

        Layout (html page template):

        https://github.com/sidekiq/sidekiq/blob/main/web/views/layout.erb

        A view:

        https://github.com/sidekiq/sidekiq/blob/main/web/views/scheduled.erb

        Looks quite reasonable - but also looks like they’re not really using their semantic variables that much (yet)? Like theme/custom color support?

        1.  

          There doesn’t seem to be a css reset even? Is that not needed anymore?

          Modern browsers differ much less in terms of default styles, so not really, unless you want absolutely pixel perfect equivalent design between browsers.

          1.  

            The main purpose was to DRY up the CSS. It makes theming possible but not really something that makes sense for Sidekiq and my users so I haven’t spent any time or thought on it.

            1.  

              That makes sense. I suppose I expected to find a complete bootstrap replacement - rather than “just the laces we thread”. Obviously the simpler solution is better when that’s all you need.

        2. 4

          As a person in tech, my only advice for future generations is avoid adding tech to your daily life. More than anything, Big Tech’s main purpose is to add rent seekers to your life.

          1. 3

            This always gets me - as a technologist, people are always so surprised to hear I use a dumb phone, hate AI and typically go for the low-tech option when there is one. Why is this still surprising though? I guess “normal” humans don’t spend a single second thinking about the negative sides of technology…

          2. 4

            Random sidenote: I wish there was standard shortcuts or aliases for frequently typed commands. It’s annoying to type systemctl daemon-reload after editing a unit, e.g. why not systemctl dr? Or debugging a failed unit, journalctl -xue myunit seems unnecessarily arcane, why not --debug or friendlier?

            1. 5

              I’m using these:

              alias sc="sudo LESSSECURE_ALLOW=lesskey SYSTEMD_LESS='$LESS' systemctl"
              alias jc="sudo LESSSECURE_ALLOW=lesskey SYSTEMD_LESS='$LESS' journalctl"
              

              this is shorter to type, completion still works and I get my less options

              1. 3

                Typing this for me looks like sy<tab><tab> d<tab> - doesn’t your shell have systemd completions ?

                1. 1

                  It does but what you describe doesn’t work for me.

                  $ systemctl d
                  daemon-reexec  daemon-reload  default        disable
                  
                  1. 2

                    what doesn’t work ? in any modern shell when you are here and type tab twice you will get to daemon-reload. ex: https://streamable.com/jdedh6

                    1. 1

                      your shell doesn’t show up a tab-movable highlight when such prompt appears? If so, try that out. It’s very nice feature.

                  2. 3

                    journalctl -u <service> --follow is equally annoying

                    1. 15

                      journalctl -fu

                      1. 3

                        My favorite command in all linux. Some daemon is not working. F U Mr. Daemon!

                        1. 2

                          so this does exist - I could swear I tried that before and it didn’t work

                          1. 19

                            I wasn’t sure whether to read it as short args or a message directed at journalctl.

                            1. 1

                              Thankfully it can be both! :)

                            2. 1

                              You gotta use -fu not -uf, nothing makes you madder then having to follow some service logs :rage:

                              1. 13

                                That’s standard getopt behaviour.

                                1. 2

                                  Well I guess fu rolls better of the tongue than uf. But I remember literally looking up if there isn’t anything like -f and having issues with that. Oh well.

                          2. 3

                            Would it be “too clever” for systemd to wait for unit files to change and reload the affected system automagically when it changed?

                            1. 13

                              I’m not sure it would be “clever”. At best it would make transactional changes (i.e. changes that span several files) hard, at worst impossible. It would also be a weird editing experience when just saving activates the changes.

                              1. 2

                                I wonder why changes should need to be transactional? In Kubernetes we edit resource specs—which are very similar to systemd units—individually. Eventually consistency obviates transactions. I think the same could have held for systemd, right?

                                1. 6

                                  I wonder why changes should need to be transactional

                                  Because the services sd manages are mote stateful. If sd restarted every service each moment their on-disk base unit file changes [1], desktop users, database admins, etc would have terrible experience.

                                  [1] say during a routine distro upgrade.

                            2. 3

                              Shorter commands would be easier to type accidentally. I approve of something as powerful as systemctl not being that way.

                              Does tab completion not work for you, though?

                            3. 4

                              The wording of the blog post confused me, because in my mind “FFI” (Foreign Function Interface) usually means “whatever the language provide to call C code”, so in particular the comparison between “the function written as a C extension” and “the function written using the FFI” is confusing as they sound like they are talking about the exact same thing.

                              The author is talking specifically about a Ruby library called FFI, where people write Ruby code that describe C functions and then the library is able to call them. I would guess that it interprets foreign calls, in the sense that the FFI library (maybe?) inspects the Ruby-side data describing their interface on each call, to run appropriate datatype-conversion logic before and after the call. I suppose that JIT-ing is meant to remove this interpretation overhead – which is probably costly for very fast functions, but not that noticeable for longer-running C functions.

                              Details about this would have helped me follow the blog post, and probably other people unfamiliar with the specific Ruby library called FFI.

                              1. 1

                                Replying to myself: I wonder why the author needs to generate assembly code for this. I would assume that it should be possible, on the first call to this function, to output C code (using the usual Ruby runtime libraries that people use for C extensions) to a file, call the C compiler to produce a dynamic library, dlopen the result, and then (on this first call and all future calls) just call the library code. This would probably get similar performance benefits and be much more portable.

                                1. 2

                                  I would guess because that would require shipping a C compiler? Ruby FFI/C extensions are compiled before runtime; the only thing you need to ship to prod is your code, Ruby, and the build artifacts.

                                  1. 1

                                    This is essentially how MJIT worked.

                                    https://www.ruby-lang.org/en/news/2018/12/06/ruby-2-6-0-rc1-released/ https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch#mjit-organization

                                    Ruby has since that evolved very fast on the JIT side, spawning YJIT, RJIT, now FJIT…

                                    I’m also not sure if the portability is needed here. Ruby is predominantly done on x86, at least at the scale where these optimisations matter.

                                    1. 3

                                      Apple Silicon exists and is quite popular for development

                                      1. 1

                                        You’re correct. I was referring to deployment systems (where the last bit of performance matters) and should have been clearer about that.

                                        1. 2

                                          Even in production, ARM64 is getting more common these days, because of AWS Graviton and al.

                                          But yes, x86_64 is still the overwhelming majority of production deployments.

                                          1. 1

                                            Yeah, that’s why I wrote “predominantly”. Also, for such a localised JIT, a second port to aarch64 is not that hard. You just won’t have an eBPF port falling out of your compiler (this is absurd for effect, I know this isn’t a reasonable thing).

                                      2. 2

                                        Note that the point here is not to JIT arbirary Ruby code, which is probably quite hard, but a “tiny JIT” to use compilation rather than interpretation for the FFI wrappers around external calls. (In fact it’s probably feasible, if a bit less convenient, to set things up to compile these wrappers ahead-of-time.)

                                  2. 9

                                    The paper:

                                    https://arxiv.org/abs/2501.02305

                                    This is the first PoC I could find, please post others if you find them.

                                    https://github.com/MWARDUNI/ElasticHashing

                                    1. 14

                                      I haven’t read the full paper so I can’t compare the PoC to it properly, but I doubt the PoC is correct. The storage partition sizes are never used! In the code this translates to the contents of self.arrays never being read; only its length matters.

                                      Also it has weird implementation details, the one that surprised me most is it uses a set to track what was inserted, and checks that on each insertion. This defeats the point of implementing a custom hash table!

                                      1. 8

                                        Maybe the implementation was written by an LLM?

                                        1. 1

                                          This defeats the point of implementing a custom hash table!

                                          Hilarious!

                                          1. 1

                                            Teacher: “implement a list” Me: “return list(x)”

                                        2. 2

                                          Wait but why would I want the laptop to randomly turn on while in sleep mode

                                          1. 3

                                            Nice to see that Spritely has been noticed by Mike Perham. Before Spritely I worked on Rails applications and of course Sidekiq was essential. :)

                                            1. 4

                                              ❤️ I hope something comes of it! I’m a fan of Mastodon and want to see more decentralized tech in general.

                                              1. 1

                                                We’re working hard on it! We’re also trying to find new sources of funding to keep the project going. If you have any connections to funders for decentralized tech that might be interested in Spritely we’d love to start a conversation.

                                              1. 7

                                                Those test improvements are really nice. I see myself using {t,b}.Context and b.Loop immediately.

                                                1. 3

                                                  Context’s are my favorite part of Go right now. Setting up a context with a timeout at top of my tests and errgroup waits at the end has been very useful to catch any accidental go routine leaks.

                                                2. 16

                                                  There’s nothing super groundbreaking [about rust modules], it just pretty much works.

                                                  I disagree :P Rust doesn’t have single global shared namespace, and it is a pretty big-brained idea! When a crate participates in compilation graph, the crate doesn’t have a name. The name is a property of dependency edge between two crates (so, the same thing could be known under different names to different users).

                                                  This in turn unlocks the killer feature of “you can link foo 1.2.3 and foo 2.0.0 together”, without which we simply wouldn’t have sprawling dependency trees.

                                                  1. 12

                                                    This in turn unlocks the killer feature of “you can link foo 1.2.3 and foo 2.0.0 together”, without which we simply wouldn’t have sprawling dependency trees.

                                                    Some may argue that that would be a good thing :P

                                                    1. 3

                                                      And Ruby is considering adding Namespaces which would allow Ruby apps to have giant dependency trees with multiple versions too, looking forward to that disasterfun!

                                                      1. 10

                                                        disasterfun should be a word to describe the general Ruby experience.

                                                  2. 5

                                                    One thing that I found interesting is that many changes in each Ruby release would be considered a big no for any other language (for example the “Keyword splatting nil” change) since the possibility of breaking existing code is huge, but Ruby community seems to just embrace those changes.

                                                    I always think about the transition between Python 2 and 3 that the major change was adopting UTF-8 and everyone lost their minds thanks to the breakage, but Ruby did a similar migration in the version 2.0 and I don’t remember anyone complaining.

                                                    I am not sure if this is just because the community is smaller, if the developers of Ruby are just better in deprecating features, or something else. But I still find interesting.

                                                    1. 7

                                                      Ruby’s version of the Python 2 to 3 experience (by my memory) came years earlier, going from 1.8 to 1.9. It certainly still wasn’t as big of an issue as Python’s long-lingering legacy version, but it was (again, my perception at the time) the Ruby version that had the most lag in adoption.

                                                      1. 6

                                                        Yes, and it was very well managed. For example, some changes were deliberately chosen in a way that you had to take care, but you could relatively easy write Ruby 1.8/1.9 code that worked on both systems.

                                                        The other part is that Ruby 1.8 got a final release that implemented as much as the stdlib of 1.9 as possible. Other breaking things, like the default file encoding and so on where gradually introduced. A new Ruby version is always some work, but not too terrible. It was always very user centric.

                                                        It was still a chore, but the MRI team was pretty active at making it less of a chore and getting important community members on board to spread knowledge and calm the waves.

                                                        Honestly, I think Ruby is not getting enough cred for its change management. I wish Python had learned from it, the mess of 2 vs 3 could have been averted.

                                                        1. 3

                                                          Yep, that’s my take too. IIRC 1.9 had a number of breaking API changes which were really low value. For instance, File.exists? -> File.exist?

                                                          1. 3

                                                            File.exists? started emitting deprecation warnings in Ruby 2.1 (2013) and was finally removed in Ruby 3.2 (2022)

                                                        2. 6

                                                          I feel like Python was pretty deeply ingrained in a bunch of operating systems and scripts that was excruciating to update.

                                                          Ruby is mostly run as web apps

                                                          1. 4

                                                            Interesting POV. As a long-time Rubyist, I’ve often felt that Ruby-core was too concerned with backwards compatibility. For instance, I would have preferred a more aggressive attempt to minimize the C extension API in order to make more performance improvements via JIT. I’m happy to see them move down the path of frozen strings by default.

                                                            1. 3

                                                              One thing that I found interesting is that many changes in each Ruby release would be considered a big no for any other language (for example the “Keyword splatting nil” change) since the possibility of breaking existing code is huge, but Ruby community seems to just embrace those changes.

                                                              Like others already said, the Ruby core team stance is almost exactly the opposite: it is extremely concerned with backward compatibility and not breaking existing code (to the extent that during discussion of many changes, some of the core team members run grep through the codebase of all existing gems to confirm or refute an assumption of the required change scale).

                                                              As an example, the string literal freezing was discussed for many years, attempted before Ruby 3.0, was considered too big a change (despite the major version change); only pragma for opt-in was introduced, and now the deprecation is introduced in the assumption that the existence of pragma prepared most of the codebases for the future changes. This assumption was recently challenged, though, and the discussion is still ongoing.

                                                              Keyword splatting nil change might break only the code that relies on the impossibility of the nil splatting, which is quite a stretch (and the one that is considered acceptable in order to make any progress).

                                                              1. 1

                                                                Keyword splatting nil change might break only the code that relies on the impossibility of the nil splatting, which is quite a stretch (and the one that is considered acceptable in order to make any progress).

                                                                This seems like really easy code to write and accidentally rely on.

                                                                def does_stuff(argument):
                                                                    output = do_it(argument)
                                                                    run_output(output)  # now `output` might be `{}`
                                                                    rescue StandardError => e
                                                                        handle(e)
                                                                end
                                                                
                                                                def do_it(arg)
                                                                    splats(arg)
                                                                end
                                                                

                                                                If nil was expected but was just rolled up into the general error handling, this code feels very easy to write.

                                                                1. 1

                                                                  Well… it is relatively easy to write, yes, but in practice, this exact approach (blanket error catching as a normal flow instead of checking the argument) is relatively rare—and would rather be a part of an “unhappy” path, i.e., “something is broken here anyway” :)

                                                                  But I see the point from which this change might be considered too brazen. It had never come out during the discussion of the feature. (And it was done in the most localized way: instead of defining nil.to_hash—which might’ve been behaving unexpectedly in some other contexts—it is just a support for **nil on its own.)

                                                                  1. 1

                                                                    is relatively rare

                                                                    I have to doubt that. It’s extremely common in Python, for example, to catch ‘Exception’ and I know myself when writing Ruby I’ve caught StandardError.

                                                                    I don’t have strong opinions.

                                                                    1. 1

                                                                      I don’t mean catching StandardError is rare, I mean the whole combination of circumstances that will lead to “nil was frequently splatted there and caught by rescue, and now it is not raising, and the resulting code is not producing an exception that would be caught by rescue anyway, but is broken in a different way”.

                                                                      But we’ll see.

                                                                2. 1

                                                                  Like others already said, the Ruby core team stance is almost exactly the opposite: it is extremely concerned with backward compatibility and not breaking existing code (to the extent that during discussion of many changes, some of the core team members run grep through the codebase of all existing gems to confirm or refute an assumption of the required change scale).

                                                                  But this doesn’t really matter, because there are always huge proprietary codebases that are affected for every change and you can’t run grep on them for obvious purposes. And those are the people that generally complain the most about those breaking changes.

                                                                  1. 2

                                                                    Well, it matters in a way that the set of code from all existing gems covers a high doze of possible approaches and views on how Ruby code might be written. Though, of course, it doesn’t exclude some “fringe” approaches that never see the light outside the corporation dangeons.

                                                                    So, well… From inside the community, the core team’s stance feels like pretty cautious/conservative, but I believe it might not seem so comparing to other communities.

                                                                    1. 1

                                                                      It doesn’t seem anything special really. Of course Python 2 to 3 was a much bigger change (since they decided “oh, we are going to do breaking changes anyway, let’s fix all those small things that were bothering us for a while”), but at the tail end of the migration most of the hold ups were random scripts written by a Ph. D. trying to run some experiments. I think if anything, it does seem to me that big corporations were one of the biggest pushers for Python 3 once it became clear that Python 2 was going to go EOL.

                                                                3. 2

                                                                  I’d say that the keyword splatting nil change is probably not as breaking as the frozen string literal or even the it change (though I do not know the implementation details of the latter, so it might not be as breaking as I think). And for frozen string literals, they’ve been trying to make it happen for years now. It was scheduled to be the default in 3 and was put off for 4 whole years because they didn’t want to break existing code.

                                                                  1. 2

                                                                    Over the years I feel like Ruby shops have been dedicated to keeping the code tidy and up-to-date. Every Ruby shop I’ve been at has had linting fail the build. Rubocop (probably the main linter now) is often coming out with rule adjustments, and often they have an autocorrect as well making it very easy to update the code. These days I just write the code and rubocop formats and maybe adjusts a few lines, I don’t mind.

                                                                    1. 2

                                                                      I always think about the transition between Python 2 and 3 that the major change was adopting UTF-8 and everyone lost their minds thanks to the breakage, but Ruby did a similar migration in the version 2.0 and I don’t remember anyone complaining.

                                                                      From what I remember, UTF-8 itself wasn’t the problem— most code was essentially compatible with it. The problem was that in Python 2 you marked unicode literals with u"a u prefix", and Python 3 made that a syntax error. This meant a lot of safe Python 2 code had to be made unsafe in Python 2 in order to run in Python 3. Python 3.3 added unicode literals just to make migrations possible.

                                                                      On top of that, Python 3 had a lot of other breaking changes, like making print() a function and changing the type signatures of many of the list functions.

                                                                      1. 4

                                                                        As someone who was maintaining a python package and had to make it compatible with 2 and 3, it was a nightmare. For instance the try/except syntax changed.

                                                                        Python 2

                                                                        try:
                                                                          something
                                                                        except ErrorClass, error:
                                                                          pass
                                                                        

                                                                        Python 3

                                                                        try:
                                                                          something
                                                                        except ErrorClass as error:
                                                                          pass
                                                                        

                                                                        Basically the same thing but both are syntax error in the other version, that was a nightmare to handle. You can argue the version 3 is more consistent with other construct but it’s hard to believe it would have been particularly hard to support both syntax for a while to ease the transition.

                                                                        Ruby change way more things, but try its best to support old and new code for a while to allow a smooth transition. It’s still work to keep up, but it’s smoothed out over time making it acceptable to most users.

                                                                        1. 2

                                                                          It’s been a while, and I was just starting out with Python at the time, so take this with a grain of salt, but I think the problem was deeper than that. Python 2’s unicode handling worked differently to Python 3, so even when Python 3 added unicode literals, that didn’t solve the problem because the two string types would still behave differently enough that you’d run into compatibility issues. Certainly I remember reading lots of advice to just ignore the unicode literal prefix because it made things harder than before.

                                                                          Googling a bit, I think this was because of encoding issues — in Python 2, you could just wrap things in unicode() and the right thing would probably happen, but in Python 3 you had to be more explicit about the encoding when using files and things. But it’s thankfully been a while since I needed to worry about any of this!

                                                                          1. 1

                                                                            My recollection at Dropbox was that UTF8 was the problem and the solution was basically to use mypy everywhere so that the code could differentiate between utf8 vs nonutf8 strings.

                                                                            1. 1

                                                                              In my experience the core issue was unicode strings and removing implicit encoding / decoding was, as well as updating a bunch of APIs to try and clean things up (not always successfully). This was full of runtime edge cases as it’s essentially all dynamic behaviour.

                                                                              Properly doing external IO was on some concern but IME pretty minor.

                                                                            2. 1

                                                                              On top of that, Python 3 had a lot of other breaking changes, like making print() a function and changing the type signatures of many of the list functions.

                                                                              This is why I said the “major” change was UTF-8. I remember lots of changes were trivial (like making print a function, you could run 2to3 and it would mostly fix it except for a few corner cases).

                                                                              1. 2

                                                                                To me, the big problem wasn’t so much to convert code from 2 to 3, but to make code run of both. So many of the “trivial” syntax changes were actually very challenging to make work on both versions with the same codebase.

                                                                                1. 1

                                                                                  It was a challenge early on, after ~3.3 it was mostly a question of having a few compatibility shims (some very cursed, e.g. if you used exec) and a bunch of lints to prevent incompatible constructs.

                                                                                  The string model change and APIs moving around both physically and semantically were the big ticket which kept lingering, and 2to3 (and later modernize/ futurize) did basically nothing to help there.

                                                                            3. 2

                                                                              It wasn’t an easy transition. As others said, you’re referring to the 1.8-1.9 migration. It was a hard migration. It took around 6-7 years. An entirely.new VM was developed. It took several releases until.there was a safe 1.9 to migrate to, which was 1.9.3 . Before that, there were memory leaks, random segfaults, and one learned to avoid APIs which caused them. Because of this, a big chunk of the community didn’t even try 1.9 for years. It was going so poorly that github maintained a fork called “ruby enterprise edition”, 1.8 with a few GC enhancements.

                                                                              In the end , the migration was successful. That’s because, once it stabilised, 1.9 was significantly faster than 1.8 , which offset the incompatibilities. That’s why python migration failed for so long: all work and no carrot. For years, python 3 was same order of performance or worse than python 2. That only changed around 3.5 or 3.6 .

                                                                              Fwiw the ruby core team learned to never do that again, and ruby upgrades since 1.9 are fairly uneventful.

                                                                              1. 2

                                                                                Minor correction: Ruby Enterprise Edition was maintained by Phusion (who did Passenger), not GitHub.

                                                                              2. 1

                                                                                Ruby 2 was a serious pain for many large projects. Mainly with extensions behaving slightly differently with the encoding. I remember being stuck on custom builds of 1.9 for ages at work.

                                                                              3. 4

                                                                                I think Integer#success is a typo (I don’t see ri docs for it). But good post, I enjoyed it!

                                                                                1. 4

                                                                                  It should be Integer#succ.

                                                                                  > 1.succ
                                                                                  => 2
                                                                                  
                                                                                  1. 2

                                                                                    Whoops! Thanks for the typo find! Fixed.

                                                                                  2. 6

                                                                                    So merry ChristMatz 2022, we got 3.2 where YJIT is stable but off by default. 2023 it 3.3, YJIT’s on by default in a paused state so you can enable it in code, rather than when compiling and passing a flag to executables. No relevant change this year.

                                                                                    Anyone got a sense of when it’ll be on by default? Or remove the flag altogether? Why might someone avoid YJIT, and why can’t we give em an LTS version?

                                                                                    I dunno, seeing all that feature flagging in the low level implementation puts a squirm in my spine. I don’t think it’s like wrong, I just wouldn’t want to be maintaining it. OTOH, that’s all focused and prolly low churn stuff, so I suppose it ain’t hurting no one.

                                                                                    1. 8

                                                                                      I believe another reason it is still off by default is more people-related: ruby-core is mostly Japanese and C-based. YJIT is maintained by Shopify employees and Rust-based.

                                                                                      Ruby has many of the same problems as Linux – some groups want to move to Rust for safety and speed but the existing maintainers often don’t know Rust at all.

                                                                                      1. 2

                                                                                        Ruby has improved quite a lot. I love Ruby and used it extensively in the late 2000s. MRI was a disaster. Slow and had memory leak issues in long-running processes. Right now, it’s much faster and more stable. Night and day difference. Leaving Ruby aside, it never ceases to amaze me how good and performant the JVM is. The results in this benchmark are quite stunning.

                                                                                      2. 7

                                                                                        yea, it’s a great question, and something I thought of mentioning in the article (then forgot).

                                                                                        I think the primary reason it’s still off by default is because of memory overhead. any JIT will usually add a not insignificant amount of memory usage on top of the runtime.

                                                                                        That said, YJIT is really configurable about the amount of memory it will consume. By default I think it’s 48 megs per process max? I know Github uses it but tunes it down a bit to more like 16megs. So possibly in the future it will be on, but by default set to a lower max.

                                                                                        Would be curious to hear from the YJIT team on their thoughts on that!

                                                                                        1. 5

                                                                                          48MiB is how much executable memory it will use, but it also has to keep track of some metadata which usaly account for 2 or 3x the executable memory. So you’ll easily end up with a 150-200MB overhead with the default settings.

                                                                                          3.4 will have a much more ergonomic memory setting: https://github.com/ruby/ruby/pull/11810

                                                                                          And yes you are right, the main blocker for enabling it by default really is extra memory usage, even though the YJIT team never formally proposed it, but from informal discussion with other Ruby committers, it’s clear that it would be the blocker.

                                                                                          1. 1

                                                                                            Ah right, thanks for the clarification byroot. Not the first time I’ve thought through that incorrectly - glad to have a clearer setting. Thanks!

                                                                                        2. 3

                                                                                          I think the main reason to not flip it on by default in 3.3+ was that it could break container / memory-constrained environments if folks upgrade blindly and don’t monitor and increase mem limits appropriately It also can increase startup time for interactive use cases, such as CLIs and such.

                                                                                          I dunno if that was really the right call, but it seems that more conservative approach still holds: I haven’t heard any change in the default YJIT settings for 3.3+.

                                                                                        3. 10

                                                                                          I flagged this as spam, it is just a press release with no technical details or even any substantive news aside from “we’re releasing a new thing”

                                                                                          1. 4

                                                                                            no technical details

                                                                                            It is a meta article that links to other articles with much more “in the weeds” details such as https://blog.heroku.com/planting-new-platform-roots-cloud-native-fir. Which is penned by former Bundler (Ruby package manager) maintainer and co-founder of the CNCF org Cloud Native Buildpacks, Terence Lee.

                                                                                            Where I’ve linked to that people complained it was too technical.

                                                                                            I posted this also because the top article of Lobsters right now is titled “The Next Platform” about whether or not Kubernetes is actually a good base to build on (https://www.macchaffee.com/blog/2024/the-next-platform/). And, we are launching an our “next generation” platform and we chose Kubernetes. So I thought the word choice and timing would make for a good discussion.

                                                                                            If you want some harder technical content than that. Here’s a tutorial for using a cloud native buildpack locally https://github.com/heroku/buildpacks/blob/main/docs/ruby/README.md. This is a core building block of that effort that works anywhere that supports OCI images.

                                                                                            1. 4

                                                                                              Justify it however you want, it is just a press release…

                                                                                              I posted this also because the top article of Lobsters right now is titled “The Next Platform” about whether or not Kubernetes is actually a good base to build

                                                                                              [snip]

                                                                                              So I thought the word choice and timing would make for a good discussion.

                                                                                              So instead of participating in the discussion around that actual article you dropped a comment telling people about this one? Sorry this just further cements it as spam.

                                                                                              1. 12

                                                                                                Sheesh, man, chill out.

                                                                                                The community has been really negative on Heroku (rightfully so) because of their lack of investment in their platform for years. Now they decide to share details of how they want to modernize the platform. I, for one, am interested in learning more.

                                                                                              2. 2

                                                                                                I’m not sure who or where someone complained about an article being too technical, but for me technical is exactly the point of Lobsters. Please share the technical posts, and if someone complains then that seems like their problem. I do agree that this specific article is more of a press release than a useful article in itself.

                                                                                              3. 1

                                                                                                At the same time, they announced .NET support, in another blog post, that an app developer on HN felt had too many details, I pay so I don’t need to follow or care.

                                                                                                Dunno if I should submit that as a story.

                                                                                              4. 3

                                                                                                For an article that references XKCD, let me provide another: https://xkcd.com/927/

                                                                                                Gotta be honest, I don’t see the point:

                                                                                                This is intended to be comfortable to follow strictly, so more likely to be reliable in practice:

                                                                                                I find it less comfortable. Current SemVer is strictly objective - if breaking, then major. Else: if affects externally-visible behaviour, minor; else patch. BreakVer requires me to make a subjective judgement about what constitutes a major breakage - which requires knowing about all my customers - which is, practically, impossible. Plus, depending on my confidence/arrogance/move-fast-and-break-things-iness, I’ll make classifications that do not match up with customers’ expectations.

                                                                                                There’s only 2 types of version bumps: Those that are definitely safe (non breaking) / Those that require checking the CHANGELOG

                                                                                                That is currently true. You’ve just shifted the boundary of specificity - instead of subdividing non-breaking changes, you’re sub-dividing breaking changes. In practice, either:

                                                                                                • folks will be appropriately cautious in checking every major or minor change for impact - in which case, they have no benefit from this method
                                                                                                • they will be lazy and not check minor changes - in which case some breakages will still slip by, as they do currently, but the software publisher will not even be in-the-wrong and thus, hopefully, correct their behaviour. A situation in which every actor can make rational justified choices and still end up with a bad outcome is not one we should actively try to design.

                                                                                                People strongly resist bumping major version numbers for every breaking change.

                                                                                                …do they? Why? Genuine question.

                                                                                                1. 10

                                                                                                  My descriptivist take is that SemVer isn’t what the spec says, but what the tooling does with it, and how people are affected by that.

                                                                                                  If you bump the first number, people won’t update quickly, and this may be a good thing or a bad thing, depending on what your goal is.

                                                                                                  If you bump the other numbers you may or may not upset a lot of people and get complaints.

                                                                                                  So you weigh the risk of getting an avalanche of complaints vs how inconvenient it would be for you if users didn’t update your software quickly, and maybe also complained it doesn’t update automatically often enough.

                                                                                                  1. 7

                                                                                                    SemVer is objective.

                                                                                                    Is it?

                                                                                                    As the article states, “There’s an old joke that every change is breaking for someone.” I think that the joke is true to a large extent.

                                                                                                    1. 4

                                                                                                      Yes, it is. It doesn’t matter whether the change is breaking “for someone” - it matters that the change is breaking to a public contract. If you take a dependency on behaviour that isn’t an explicit part of the an established contract, you have no right to expect warning when it changes.

                                                                                                    2. 4

                                                                                                      …do they? Why? Genuine question.

                                                                                                      I find it jarring when a piece of software goes from version 16.213.26 to 17.0.0 just because the developers removed spacebar heating.

                                                                                                      1. 2

                                                                                                        Current SemVer is strictly objective - if breaking, then major.

                                                                                                        “breaking” is not objective. Remember Hyrum’s law. Someone is going to need you to re-enable spacebar heating if you pretend it is.

                                                                                                        1. 6

                                                                                                          Breaking in semvar is objective. It’s defined by the spec as a change in the behavior of your public interface. If you haven’t documented your public interface, you aren’t following semvar.

                                                                                                          1. 2

                                                                                                            While this may be strictly true, it also implies that almost no one is actually following semver, which doesn’t seem like a very productive framing to me.

                                                                                                            1. 5

                                                                                                              Huh? When the semvar spec says “public API”, I imagine some docs you can read that list all of the functions in a package along with their type signatures and a description of their behavior. Most of the packages you use have this, no?

                                                                                                              That’s the package’s public interface. If the package changes one of those type signatures, that’s a breaking change. If it introduces a new function, that’s not breaking. If it makes a change that violates the documentation for one of its functions, that’s a breaking change. If it makes a change to the behavior of a function that’s consistent with that function’s docs… well either that’s not a breaking change, or as is common the function was inadequately documented.

                                                                                                              This all seems fairly unambiguous to me, excepting changes to the behavior of poorly documented functions. Am I missing something?

                                                                                                              1. 7

                                                                                                                The example I’ve gone round and round a bunch of times with people on is: Go 1.13 introduced the ability to use underscores as grouping separators in integer literals, like 1_000_000 instead of 1000000.

                                                                                                                This also changed the behavior of Go’s integer-parsing functions. For example, strconv.ParseInt() suddenly started accepting and parsing inputs with underscore characters rather than returning an error. And the Go team seem to have been aware that there were people whose code was broken by this change, which would be a problem for Go’s claim that there will never be breaking changes ever, for any reason.

                                                                                                                Generally people have argued with me that although ParseInt() was a public function, it was somehow underspecified or ambiguously specified prior to Go 1.13 and therefore it was acceptable to clarify its behavior in Go 1.13 by suddenly changing the inputs it accepted. But this just points to the real purpose of SemVer: it’s about protecting the developer of the code from the user of the code, by giving the developer endless subjective loopholes and ways to say “sure, that change broke your code, but it’s still not technically a breaking change”. For example, any function which does not specify up-front the entire set of potential inputs it will accept and the results it will return for them is subject to the ParseInt() style of “oops, we underspecified it” loophole.

                                                                                                                1. 4

                                                                                                                  Ah, I get it. There are three things the Go docs for ParseInt() could say:

                                                                                                                  • They could positively state what sorts of integers ParseInt() accepts, so that a reader could confirm it would not parse underscores. E.g. giving a regex of accepted inputs.
                                                                                                                  • They could carve out negative space to allow a variety of behaviors. E.g. saying “parses integers in the same way that the Go language does, which is subject to change”.
                                                                                                                  • They could be silent in the matter.

                                                                                                                  Reading the actual docs, I’d frankly put them in the first case: they state what the accepted integer syntax is, and give examples, and all of this makes it rather clear that underscores aren’t part of the integer syntax, any more than “thirty seven” would be.

                                                                                                                  But even if the docs weren’t clear, you don’t get to say “LOL no change is breaking change because I forgot to write docs”. That just means you’ve entered a gray area, and you should be cautious about what counts as a breaking change and your users should be cautious about not relying on too much. It should be a “meet in the middle” sort of a thing, not a “how C++ compiler writers interpret undefined behavior” sort of a thing.

                                                                                                                  tldr; I’m sorry that people are treating “incompletely documented” the same as “explicitly unspecified”, those are very different things.

                                                                                                                  1. 1

                                                                                                                    Isn’t accepting “1_000” in Go source also a breaking change by the same reasoning as it would be for ParseInt? Input that used to result in an error no longer does.

                                                                                                                    1. 8

                                                                                                                      Maybe in some technical sense, but people rely on both positive&negative behavior of ParseInt() (e.g. rejecting invalid user input), but generally only rely on positive Golang behavior. If “this program started to compile when it used to be an error” was considered a breaking change, every change in language behavior would be breaking.

                                                                                                                  2. 3

                                                                                                                    Why do you keep calling it “semvar”? It’s “semver”, semantic versioning.

                                                                                                                    1. 3

                                                                                                                      Just a typo. Past the edit window so I can’t fix it now…

                                                                                                                    2. 3

                                                                                                                      What I meant was that in the real world, it’s very common for an API to be underdocumented, with the result that it’s not well-defined whether a given change breaks the API or not. Like, you can look at certain changes and say “this really seems like it breaks the API,” but the API was defined vaguely enough that it’s impossible to make any judgment like that.

                                                                                                                      You say “…excepting changes to the behavior of poorly documented functions,” but I think that’s actually a very large category, in practice :-) Obviously there are some libraries and frameworks that take defining their APIs very seriously, but I would guess that the set of libraries that use SemVer is an order of magnitude larger than the set of ones that are strict about it in this way.

                                                                                                                      1. 2

                                                                                                                        Yeah, that all makes sense. I’d argue that if it’s hard to make that judgement call, the library should be conservative and bump the major version number.

                                                                                                                        Is there a versioning system you think does better in the presence of a poorly defined API?

                                                                                                                        1. 4

                                                                                                                          I don’t know of one, and I suspect that carefully defining the API is a prerequisite for any versioning system to be able to give you the kind of guarantees we want.

                                                                                                                2. 5

                                                                                                                  Hyrum’s Law describes a common pathological condition of dependency relationships between software modules, it doesn’t define a de facto baseline/standard guarantee or expectation of compatibility.

                                                                                                                  1. 3

                                                                                                                    That person is welcome to be upset and to ask for consideration, but they are by no means entitled to it. SemVer is about breaking the explicit, published contract of software, not about breaking any hypothetical consumer. If you take a dependency on undefined behaviour, you have no rights to complain when it changes, nor any justification to expect warning when it does.

                                                                                                                3. 53

                                                                                                                  I’m concerned that Bluesky has taken money from VCs, including Blockchain Capital. The site is still in the honeymoon phase, but they will have to pay 10x of that money back.

                                                                                                                  This is Twitter all over again, including risk of a hostile takeover. I don’t think they’re stupid enough to just let the allegedly-decentralized protocol to take away their control when billions are at stake. They will keep users captive if they have to.

                                                                                                                  1. 8

                                                                                                                    Hypothetically, if BlueSky turned evil, they could:

                                                                                                                    1. ban outside PDSes to be able to censor more content
                                                                                                                    2. block outside AppViews from reading their official PDS

                                                                                                                    This would give them more or less total control. Other people could start new BlueSky clones, but they wouldn’t have the same network.

                                                                                                                    Is this a real risk? I’m not sure. I do know it’s better than Twitter or Threads which are already monolithic. Mastodon is great but I haven’t been able to get many non-nerds to switch over.

                                                                                                                    1. 4

                                                                                                                      Hypothetically, the admins of a handful of the biggest Mastodon instances, or even just the biggest one, could turn evil and defederate from huge swathes of the network, fork and build in features that don’t allow third-party clients to connect, require login with a local account to view, etc. etc.

                                                                                                                      Other people could start clones, of course, but they wouldn’t have the same network.

                                                                                                                      (also, amusingly, the atproto PDS+DID concept actually enables a form of account portability far above and beyond what Mastodon/ActivityPub allow, but nobody ever seems to want to talk about that…)

                                                                                                                      1. 2

                                                                                                                        The two situations are not comparable. If mastodon.social disappeared or defederated the rest of the Mastodon (and AP) ecosystem would continue functioning just fine. The userbase is reasonably well distributed. For example in my personal feed only about 15% of the toots are from mastodon.social and in the 250 most recent toots I see 85 different instances.

                                                                                                                        This is not at all the case for Bluesky today. If bsky.network went away the rest of the network (if you could call it that at that point) would be completely dead in the water.

                                                                                                                        1. 1

                                                                                                                          While I generally agree with your point (my timelines on both accounts probably look similar) just by posting here we’ve probably disqualified ourselves from the mainstream ;) I agree with the post you replied to in a way that joe random (not a software developer) who came from twitter will probably on one of the big instances.

                                                                                                                          1. 1

                                                                                                                            For what its worth I did the sampling on the account where I follow my non-tech interests. A lot of people ended up on smaller instances dedicated to a topic or geographical area.

                                                                                                                    2. 7

                                                                                                                      While it’s sometimes possible to get code at scale without paying – via open source – it’s never possible to get servers and bandwidth at scale without someone dumping in a lot of money. Which means there is a threshold past which anything that connects more than a certain number of people must receive serious cash to remain in operation. Wikipedia tries to do it on the donation model, Mastodon is making a go at that as well, but it’s unclear if there are enough people willing to kick in enough money to multiple different things to keep them all running. I suspect Mastodon (the biggest and most “central” instances in the network) will not be able to maintain their present scale through, say, an economic downturn in the US.

                                                                                                                      So there is no such thing as a network which truly connects all the people you’d want to see connected and which does not have to somehow figure out how to get the money to keep the lights on. Bluesky seems to be proactively thinking about how they can make money and deal with the problem, which to me is better than the “pray for donations” approach of the Fediverse.

                                                                                                                      1. 9

                                                                                                                        Your point is valid, though a notable difference with the fediverse is the barrier to entry is quite low - server load starts from zero and scales up more or less proportionally to following/follower activity, such that smallish but fully-functional instances can be funded out of the hobby money of the middle+ classes of the world. If they’re not sysadmins they can give that money to masto.host or another vendor and the outcome is the same. This sort of decentralisation carries its own risks (see earlier discussion about dealing with servers dying spontaneously) but as a broader ecosystem it’s also profoundly resilient.

                                                                                                                        1. 3

                                                                                                                          a notable difference with the fediverse is the barrier to entry is quite low

                                                                                                                          The problem with this approach is the knowledge and effort and time investment required to maintain one’s own Mastodon instance, or an instance for one’s personal social circle. The average person simply is never going to self-host a personal social media node, and even highly technical and highly motivated people often talk about regretting running their own personal single-account Mastodon instances.

                                                                                                                          1. 3

                                                                                                                            I think Mastodon needs a better server implementation, one that is very low-maintenance and cheap to run. The official server has many moving parts, and the protocol de-facto needs an image cache that can get expensive to host. This is solvable.

                                                                                                                            1. 1

                                                                                                                              Right! I’ve been eyeing off GoToSocial but haven’t had a chance to play with it yet. They’re thinking seriously about how to do DB imports from Mastodon, which will be really cool if they can pull it off: https://github.com/superseriousbusiness/gotosocial/issues/128

                                                                                                                        2. 5

                                                                                                                          Worst case one moves off again. That’s a problem for a future date.

                                                                                                                          1. 1

                                                                                                                            That’s true, but I’ve been hooked on Twitter quite heavily (I’ve been an early adopter), and invested in having a presence there. The Truth Social switcheroo has been painful for me, so now I’d rather have a smaller network than risk falling into the same trap again.

                                                                                                                          2. 3

                                                                                                                            Relevant blog post from Bluesky. I’d like to think VCs investing into a PBC with an open source product would treat this differently than Twitter, but only time will tell.

                                                                                                                            1. 39

                                                                                                                              OpenAI was a “non profit” until it wasn’t.

                                                                                                                              1. 3

                                                                                                                                OpenAI never open sourced their code, so Bluesky is a little bit different. It sill has risks but the level of risk is quite a bit lower than OpenAI was.

                                                                                                                                1. 4

                                                                                                                                  OpenAI open sourced a lot and of course made their research public before GPT3 (whose architecture didn’t change much[1]). I understand the comparison, but notably OpenAI planned to do this pseudo-non-profit crap from the start. Bluesky in comparison seems to be “more open”. If Bluesky turned evil, then the protocols and software will exist beyond their official servers, which cannot be said for ChatGPT.

                                                                                                                                  [1]: not that we actually know that for a fact since their reports are getting ever more secretive. I forget exactly how open that GPT3 paper was, but regardless the industry already understood how to build LLMs at that point.

                                                                                                                          3. 4

                                                                                                                            Would it be reasonable for GitHub to make this default behavior?

                                                                                                                            1. 1

                                                                                                                              No, but it should be optional. Gitlab supports this: https://docs.gitlab.com/ee/ci/pipelines/settings.html#auto-cancel-redundant-pipelines.

                                                                                                                              There are some cases where I do want different commits on the same branch/pull request to run all actions. For example, if I am updating workflow dependencies for a workflow that normally runs on merges to main, I will temporarily make them run on my branch to test/validate. I make two commits—one with the updates, and another that runs the workflow on my branch. Once I have validate the change, I remove the last commit.

                                                                                                                              1. 1

                                                                                                                                It is optional, using exactly the process described in the linked article!

                                                                                                                            2. 5

                                                                                                                              I love this opinionated POSIX standard. Instead of baking in a bunch of hacks to support filenames with newlines, they just said “don’t do that, not supported going forward”. That’s a change I can get behind.

                                                                                                                              1. 3

                                                                                                                                what will happen to existing files tho? i have some, currenty all programs support those without any hacks, except one (ls). will i be unable to even rename or delete them if i mount my storage on a new posix 2024 system?

                                                                                                                                1. 6

                                                                                                                                  The article says:

                                                                                                                                  the following utilities are now either encouraged to error out if they are to create a filename that contains a newline, and/or encouraged to error out if they are about to print a pathname that contains a newline in a context where newlines may be used as a separator

                                                                                                                                  1. 3

                                                                                                                                    thank you, i need to read more care fully (either i missed “create” or assumed create a filename means create a parsed path object from a string)

                                                                                                                                    1. 1

                                                                                                                                      I’m curious how this plays with locales. The most common places I’ve seen this issue are where the file is created with something like a Big5 locale and then viewed in a C locale.

                                                                                                                                      1. 1

                                                                                                                                        Trail byte on Big5 is 0x40 or higher, so not line feed.

                                                                                                                                        It would really make POSIX ready for 2024 to drop non-UTF-8 locales, though. (Which probably won’t happen as long as someone finds it important to be able to claim a Big5 AIX system POSIX-compliant.)

                                                                                                                                2. 3

                                                                                                                                  I see the issue as one of compatibility:

                                                                                                                                  • You can’t build safely on top of a foundation of quicksand.
                                                                                                                                  • Improvements which break compatibility fork the community and add work to every user downstream.
                                                                                                                                  • Retaining compatibility limits the improvements which can be made or greatly increases the cost of making that improvement.
                                                                                                                                  • Boring tech always values compatibility over improvement.
                                                                                                                                  • I build my systems on top of boring tech because I want to minimize the effort needed to maintain them.
                                                                                                                                  1. 20

                                                                                                                                    At some point I need to write an actual article about this, but I’ve recently been thinking that “blocking” APIs in general are a design mistake: everything is fundamentally concurrent, but OSes use blocking APIs to hide this concurrency, which tends to inevitably break, and mixes concurrency with parallelism. Even something as simple as “spawn a process, collecting both its stdout and stderr” requires proper concurrency. See how Rust’s Command::output can’t be expressed using std-exposed APIs.

                                                                                                                                    Which is the smaller problem! The bigger problem is that we don’t actually know how to write concurrent programs. This is an unsolved area in language design. And the systems reason for why the problem is unsolved is, thanks to the existence of blocking APIs, languages are allowed to shirk here, moving what is fundamentally a language-design issue to the OS-design.

                                                                                                                                    IO-uring (and previously, hilariously, JavaScript) models the problem of concurrency more directly, without introducing a midlayer. Hopefully it will create enough of evolutionary pressure for language design to get the problem solved finally.

                                                                                                                                    1. 5

                                                                                                                                      There was an interesting period about 25 years ago, when the web was growing fast and really stressing the capabilities of the hardware and operating systems of the time. This is when the c10k problem was identified. The performance pressure led to a bunch of interesting operating systems research.

                                                                                                                                      I’m pretty sure I read a paper around that time which described a kernel/userland API with a design roughly along the lines of io_uring — though its point of comparison was hardware interfaces with command queues and DMA buffers. I had a very vague memory of Vivek Pai’s IO-Lite but that has a different shape so there must be another paper I’m failing to remember.

                                                                                                                                      I wondered if a fully asynchronous kernel API could solve the M:N threading problems that were happening around the same time. But that whole area of research dried up, as the problems were dealt with by sendfile(), 1:1 threading, and computers getting faster. Boring!

                                                                                                                                      1. 1

                                                                                                                                        the c10k problem was identified.

                                                                                                                                        I do not actually see a clear specification of what the problem is in that article, though.

                                                                                                                                        1. 6

                                                                                                                                          It was really hard to handle 10,000 concurrent connections from one server in 1999. The challenge was to make it easier, to make the kind of performance engineering that went into cdrom.com available to a lot more systems.

                                                                                                                                          1. 1

                                                                                                                                            IIRC, the fundamental challenge is handling 10K long-running connections that are not very active. If each of those connections is actually producing significant load on the hardware, either CPU load or I/O load, then the hardware can’t handle 10K connections anyway, because we have neither 10K cores nor 10K independent I/O channels.

                                                                                                                                          1. 2

                                                                                                                                            I found a copy of the SEDA SOSP paper which is a bit shorter than Welsh’s PhD thesis :-) It seems to be about getting more concurrency out of Java, not about changing the userland/kernel API.

                                                                                                                                        2. 4

                                                                                                                                          It’s a compelling argument, but I think in some ways async APIs end up constraining the design space of the framework/OS/whatever offering it. For example, QNX’s main message passing primitives are blocking so that, on the fast path, context is switched to the callee without a pass through the scheduler.

                                                                                                                                          More broadly, this paper argues that it’s not possible to make a fast and DoS-resistant async message passing system: https://www.researchgate.net/publication/4015956_Vulnerabilities_in_synchronous_IPC_designs

                                                                                                                                          1. 4

                                                                                                                                            You absolutely hit the nail on the head. Given how absolutely fundamental asynchronous computation is to how computers work, it’s surprisingly underexplored.

                                                                                                                                            I think this is partly due to a decline of groundbreaking OS research, and partly because there are barely - if any - modern languages purpose built for doing deep, systems research. The problems become apparent almost instantly. At the latest when you’re a few hundred lines into your first kernel, you’ll be presented with DMA interfaces for all kinds of peripherals. What now?

                                                                                                                                            This is not the kind of problem that most language designers are facing. Most PLs are made to solve application-level problems, instead of having a principled, hardware-first approach to language design. I think Rust with embassy-rs has been solving some of these challenges in a pretty nice way, but it shouldn’t end there.

                                                                                                                                            1. 6

                                                                                                                                              I agree on the lack of OS research. Mothy gave a keynote at SOSP a few years ago pointing out that twenty years ago there were multiple new kernel papers each year at OSDI / SOSP whereas now we average less than one. I’m slightly biased, because the last paper I submitted with a new OS design was rejected on the ground that we didn’t do a performance comparison to an OS that required ten times as much hardware and that we were possibly vulnerable to an attack that, by construction, cannot happen on our system. Ho hum.

                                                                                                                                              I disagree on language research though. There’s a lot of interesting work on asynchrony. Most PL research takes at least ten years to make it into mainstream programming languages. Verona’s behaviour-oriented concurrency model is designed to work directly with things like DMA and we’ve ported it to C++, Rust, and Python. It works best with a memory management model that puts object graphs in regions and allows them to be atomically transferred, which depends on a type system that can do viewpoint adaptation. We’re not the only people working on things like this and I expect mainstream languages in the 2030s to have a lot of these concepts.

                                                                                                                                              1. 4

                                                                                                                                                Timothy Roscoe, “it’s time for operating systems to rediscover hardware”

                                                                                                                                                Which kicks off with the observation that only 6% of the papers at the USENIX OSDI conference were about operating system design and implementation.

                                                                                                                                                1. 1

                                                                                                                                                  Mothy gave a keynote at SOSP a few years ago pointing out that twenty years ago there were multiple new kernel papers each year at OSDI / SOSP whereas now we average less than one.

                                                                                                                                                  That’s brutal. Too brutal in my opinion to only come from spurious rejections. From the outside, I would suspect the main drive is that nobody makes kernels any more: apart from specific niches like real time or formal verification, the world has basically been taken over by NT, Linux, and XNU. And a major reason behind such concentration is that every kernel needs to have dedicated drivers for the insane diversity of hardware we use — except NT, hardware vendors being business savvy enough to write the driver for us. (See The 30 Million Lines Problem.)

                                                                                                                                                  Those three kernels are big. Making any addition or change to them is likely a significant endeavour. We likely need the ability to quickly write practical kernels from scratch if research is to take off again. But no one can possibly write a practical kernel if they have to, let’s be kind, port all the driver code from Linux. So the first step has for hardware vendors to design interfaces for humans, and then give us the manual. But given how suicidal this would be business wise (who would ever buy hardware that doesn’t come with a Windows driver out of the box?), we probably need regulators to step in: say the US, or the entirety of the EU, ban the sale of any hardware for which the same company distributes software — or at the very least, proprietary software. With such a setup, I would expect hardware vendors to quickly converge on relatively uniform hardware interfaces that aren’t stupidly complex.

                                                                                                                                                  But then there’s the other side of the fence: user space. Since the ossification of OSes we came do depend on a huge stack of software that would take significant effort to port anywhere not POSIX-ish. And that’s for the stuff we have the source code of…

                                                                                                                                                  1. 7

                                                                                                                                                    I don’t really think that’s true. For a research project, you need a few drivers and most people just pick up the NetBSD ones (the RUMP kernel work makes this trivial). There’s a big unexplored space for both big and small operating systems. With a very small team, we were able to write an RTOS that has a solid FreeRTOS compat layer and can run most existing embedded software (source compatible) and we were designing and building the core that ran it as well. No one has really built an OS for cloud computing, but it could remove a lot of stuff that desktop operating systems need (no IPC or storage layers, since those will both sit on top of a network).

                                                                                                                                                    1. 2

                                                                                                                                                      No one has really built an OS for cloud computing

                                                                                                                                                      fwiw I think existing unikernels fit that bill (e.g. unikraft, nanos). They mostly target virtualized hardware like KVM, and seem pretty focussed on the cloud use-case (fast boot, small images, and efficiency). Being in KVM-land certainly removes a lot of implementation complexity, since bringup and platform init is mostly done.

                                                                                                                                                      1. 4

                                                                                                                                                        I disagree. I think unikernels are applications for cloud computing. They’re hampered by the fact that the OS that they run on (Xen, Hyper-V, Linux) is not designed for the cloud. They have network devices, for example, but a good cloud OS would make it easy to take advantage of hardware TLS offload and terminate connections directly in a unikernel.

                                                                                                                                                        Unikernels have had almost no commercial adoption precisely because there isn’t a good OS to run them on. The benefits of unikernels are incompatible with the billing models of cloud systems. A unikernel invocation may consume a few MiBs of RAM and a few CPU seconds (or hundreds of CPU milliseconds). The operating systems that run them are not set up to handle this at all. They are intended to account for CPU usage in minutes and RAM in GiBs.

                                                                                                                                                        If you’re running a unikernel in KVM, there’s little benefit in it over just running a Linux process. You then get to share a TCP/IP stack and so on with other processes and amortise more costs.

                                                                                                                                                        1. 3

                                                                                                                                                          Ah then I misunderstood your original comment. So it sounds like you’re unhappy about the currently dominant hypervisor abstraction, that I would agree with. The interfaces for efficient resource sharing are pretty borked.

                                                                                                                                                          A unikernel invocation may consume a few MiBs of RAM and a few CPU seconds

                                                                                                                                                          Are you talking about serverless here? Unikernels can be long-running services, at least that’s how I’ve run them in the past.

                                                                                                                                                          1. 3

                                                                                                                                                            For me, the big benefit of unikernels is their ability to scale down. If you’re handling sustained throughputs of thousands of network messages per second or more, there are some benefits, but if you’re implementing services that have long idle periods of no activity and bursts of very high usage, unikernels’ abilities to either exit completely and then restart in one network round trip time or scale back to almost no resource usage is a huge win. Cloud billing systems are not set up for that at all.

                                                                                                                                                            I don’t really like the term serverless, because you don’t get rid of servers. The way most FaaS services are implemented today comes with ludicrous amounts of overhead. Each service starts in a lightweight Linux VM, that runs an OCI container, that runs a language runtime and some support code, which then runs the customer code. We estimated an efficient system with good hardware-software co-design to do build an attestable base with TLS flow isolation to individual services would allow at least an order of magnitude denser hosting and provide better security properties.

                                                                                                                                                  2. 1

                                                                                                                                                    Thanks for the link. In your view, would the BoC as a concurrency paradigm allow for a competitive implementation for e.g. performance-critical embedded systems? Are there places where you’d say “this place needs an escape hatch to close the last few percent perf gap” or so?

                                                                                                                                                    I’m genuinely interested, because I was looking for things beside stackless async/await that would work well for highly constrained environments.

                                                                                                                                                    1. 1

                                                                                                                                                      Possibly. BoC is really equivalent to stackless coroutines, but with a coordination layer on top and an ownership model that makes it safe. Each when captures the things it need to complete and runs without the parent’s stack.

                                                                                                                                                      The problem for BoC on resource-constrained systems is that any behaviour can spawn arbitrary numbers of other behaviours, which can exhaust memory, so you’d need to be a bit careful. It’s probably no different to avoiding deep recursion though.

                                                                                                                                                      You definitely wouldn’t want to use the Verona runtime for small embedded systems (it’s nice in a kernel though. I ported it to run in the FreeBSD kernel a few years ago) but I have been pondering what an embedded BoC runtime would look like.

                                                                                                                                                  3. 2

                                                                                                                                                    I think this is partly due to a decline of groundbreaking OS research

                                                                                                                                                    I mean, utah2k was written nearly a quarter-century ago. (There is a certain irony of someone from Bell Labs writing this, considering the output of Bell Labs doing a lot to slow systems research…)

                                                                                                                                                  4. 2

                                                                                                                                                    In the 1970s, this was addressed. Quite a few popular OSs back then treated all base I/O calls as async. Blocking was a unix thing. Out of necessity, the Berkeley unix apis on unix tried to walk that back. (They also hacked a few other unix design choices)

                                                                                                                                                    Programmers on those older OSs successfully built multiuser database and timesharing systems.

                                                                                                                                                    For simple programs, many liked the simplicity of the unix style,

                                                                                                                                                    1. 2

                                                                                                                                                      IMNSHO, the underlying problem is that we decided a long time ago that the procedure call (subroutine call, function, method, etc.) is our fundamental model of abstraction. Not just for interacting with the operating system, but pretty much for everything. (“Decided” might be overstating it…it just sort of happened for various reasons, many good at the time).

                                                                                                                                                      I talked about this, somewhat haphazardly here: Can Programmers Escape the Gentle Tyranny of call/return?].

                                                                                                                                                      You can also find some of that in Mary Shaw’s Myths and mythconceptions: what does it mean to be a programming language, anyhow?, and of course many, many others have made observations about the limitations of procedure calls and proposed alternatives as the basic building blocks.

                                                                                                                                                      1. 1

                                                                                                                                                        Myths and mythconceptions: what does it mean to be a programming language, anyhow?

                                                                                                                                                        Lobsters thread: https://lobste.rs/s/vypmkr/myths_mythconceptions_what_does_it_mean

                                                                                                                                                        Can Programmers Escape the Gentle Tyranny of call/return?

                                                                                                                                                        I don’t think this has been submitted here, but a piece commenting on it was submitted: https://lobste.rs/s/alzaim/thoughts_on_gentle_tyranny_call_return.

                                                                                                                                                      2. 1

                                                                                                                                                        I’m not 100% convinced that concurrent-only is the way to go (and I say this as a JS dev who also uses Javascript as a good example of what concurrent-only could look like). But I agree that most languages need to choose one or the other. You either go all in on concurrency (like in Javascript, where synchronous APIs are rare and usually only exist a special-case exception alongside an equivalent async API), or you accept that concurrency is always going to be an opt-in feature with less support that will require its own way of doing things.

                                                                                                                                                        I’ve seen this a lot in Python, and I think we’re seeing it now again in Rust, where async support gets added, and creates a split ecosystem. You can’t mix and match, so you need to decide: am I using the sync APIs or the async ones? I wonder if there are other concurrency ideas that could mitigate this somewhat, but I think it’s just a fundamental divide. Reading files, interacting with sockets, executing processes etc are fundamental OS-level tasks, and deciding how you’re going to do those cuts to the core of a language’s standard library. In Rust, I know that some of the big proponents of the Async Way are aware of this problem, but I think their goal is to push async as the predominant way, and I’m not sure Rust is set up for that yet.

                                                                                                                                                        (As an aside: I know C# also was synchronous and then added the async keyword - are there similar issues there? Is there a divide between the sync and async worlds?)

                                                                                                                                                        1. 3

                                                                                                                                                          The thing is, you can’t choose only one! Here’s a mundane, boring example:

                                                                                                                                                          You spawn a process. You want to collect its stdout and stderr. You can’t do just

                                                                                                                                                          let stdout_bytes = child.stdout.read_to_end();
                                                                                                                                                          let stderr_bytes = child.stderr.read_to_end();
                                                                                                                                                          

                                                                                                                                                          because this might deadlock. The child process might interleave printing to stderr&stdout, so you have to interleave reading them. What you want to do here is to issue both read syscalls concurrently and wait for one of them to finish. So you end up writing platform specific code like this:

                                                                                                                                                          https://github.com/rust-lang/cargo/blob/master/crates/cargo-util/src/read2.rs

                                                                                                                                                          Another case you always hit when programming mundane things is that something does a read call, and you really want to “cancel” that read (even if it currently blocks). This again necessitates pretty horrific work-arounds with installing a process-global signal handler: https://github.com/rust-lang/jobserver-rs/commit/4c8a5985e32fa193350f988f6a09804e565f0448

                                                                                                                                                          In the opposite direction, DNS (and, until io_uring, even file IO) — there’s just no non-blocking APIs for these!

                                                                                                                                                          So, I don’t think it’s up to the languages to fix this, really. It’s the OS which first needs to provide reasonable API. Only then can language designers of production languages figure out how to express the concurrency in the source code (to echo David’s comment, I suppose we might have figured how to deal with concurrency in academia, but this certainly haven’t quite percolated to day-to-day languages. Go & Rust seems like they are improvements, but also both feel quite far from optimal).

                                                                                                                                                          1. 3

                                                                                                                                                            There are plenty of nonblocking DNS APIs. The gap is in the higher-level name service APIs that bundle up lots of non-DNS name databases, such as the hosts file, mDNS, NetBIOS, NIS, etc.

                                                                                                                                                            1. 1

                                                                                                                                                              You can’t perfectly choose one, but I think you can abstract surprisingly well by choosing one route or the other. As you point out yourself, we’ve done pretty well pretending that things like file IO (and even memory access) are synchronous, idealised things. Yes, there are weird edge cases where the abstraction breaks, but most of the time it holds surprisingly well.

                                                                                                                                                              Going the fully async route brings its own problems, but it’s still in many ways just a different abstraction. Many async libraries handling file IO just run synchronous APIs on a different thread, because it’s simple, it works, and we still get the concurrency we were expecting.

                                                                                                                                                              So I think a lot of this is about the abstractions we chose to use on top of these sorts of APIs, and I think language designers have a lot of leeway there. Yes, the OS APIs will affect those abstractions - what works well, what works efficiently, what leaks constantly into the abstraction layer etc - but there is no shortage of interesting approaches to concurrency on top of largely synchronous APIs.

                                                                                                                                                              1. 1

                                                                                                                                                                Many async libraries handling file IO just run synchronous APIs on a different thread, because it’s simple, it works, and we still get the concurrency we were expecting.

                                                                                                                                                                Until recently, Linux didn’t have usable async file IO, so you had to do that.