Threads for schneems

    1. 2

      Great job!

      I hope this isn’t too off topic but it’s on queues for rails.

      Did anyone switch from Sidekiw to SolidQueue and can share the experience?

      I am curious because while sidekiq is nice I do think that with the limited scope that such systems have having to buy the for-pay features is something that might not be feasible for small or non profit projects when they simply are relying heavily on queues because of the application.

      1. 3

        In general running your queue on the same database as your app is gonna be a bad time, performance wise. Now, solid queue can go on a different database on the same machine (SQLite) but usually the database is the first thing that needs to be scaled and competing with it for resources is also not a great idea. If you run it, I would run it on a different machine and different than your main database. At that point you have to pay for another box/service for your queue datastore anyway.

        My preference has been to stick with Sidekiq.

        1.  

          In general running your queue on the same database as your app is gonna be a bad time, performance wise.

          Could you go into that a more?

          Sounds a bit like something breaking down quickly. We are talking about relatively simple tables, with relatively simple queries. Stuff where I’d expect that more complex applications would potentially dozens of requests (or alternatively somewhat complex joins and conditions, queries), so it feels like a query that just locks the next thing shouldn’t really carry weight overall.

          usually the database is the first thing that needs to be scaled

          That also really depends on the application. Both on doing silly things in application code and doing silly things regarding database query. Doing a DB server with let’s say 1 TB NVMe and 128 GiB and an identical secondary for around ~150 USD can get you really far, even if you do a bit more complex stuff (eg. on-demand GIS with measuring, manipulating and relating polygons) even if you don’t invest a lot of effort into making everything super efficient. And having a couple of hundred thousand entries for your queue doesn’t sound like it should be that big of a deal. And if it is then just use a separate DB for your queue?

          So in other words: Yes, you are technically competing for resources, but we are talking about a pretty easy task for any DB?

          1.  

            I work for Heroku where I get performance tickets and I’ve seen hundreds, if not thousands of Rails apps in a performance context. I also am on the Nate Berkopec performance slack. And database load is pretty much the primary bottleneck for most rails apps.

            It’s not about “complex” or “easy” or not, it’s about load and locking. Congestion.

            And if it is then just use a separate DB for your queue?

            I’m not sure why there’s a question mark. That was a recommendation in my comment. The next logical step though was…if you’re going to use a different data store anyway, why not just use redis/keyval which has persistent queue data structures.

            One weird thing though: it’s surprisingly hard to migrate off of a queue backend (or can be). It’s easier than migrating from MySQL to PostgreSQL, but perhaps harder than people realize.

            We use delayedjob for a legacy but important app, and without realizing it we’re relying on database transaction semantics for some critical behavior. The effort and cost to identify and replicate all that behavior that would allow us to move to a different queue backend isn’t justifiable, so we are kinda just…stuck with it. At least for now, on that one app.

            Even migrating from a queue in the same DB to a different DB could yield slightly different behavior. Not saying it can’t be done or that it will be prohibitively difficult or expensive, but rather: make a plan for how you will possibly scale your database and queue systems in the future now.

            For me: I choose Sidekiq and keyval/redis. In addition to what I’ve already mentioned, I know I can get support if required and I know it won’t be community abandonware like resque or webpacker. I also have a personal relationship with Mike and hang with him at confs and he’s generally accessible online and active in the communities. Mentioning both for disclosure and to contrast with Dave, who isn’t accessible unless you’re in an inner circle.

            1.  

              I work for Heroku where I get performance tickets and I’ve seen hundreds, if not thousands of Rails apps in a performance context. I also am on the Nate Berkopec performance slack. And database load is pretty much the primary bottleneck for most rails apps.

              Are you able to provide more insight there?

              In my experience say you take a DB setup like to above for ~150 USD (or 76 without a replica - in other words we will only use the replica for fail over, no reads from it) and with lets say 10k active users a day, let’s say that gives you an average of 1k DB queries/second on average with let’s say 10k DB queries/second in more peak time. In my experience for relatively basic CRUD apps with little optimization and somewhat idiomatic Rails code throwing out JSON for web and mobile apps to ingest will be barely noticeable in terms of DB load. Even if you don’t optimize it. Even if you turn logging up for every statement for monitoring. Even if you use something like ZFS so you can make point in time snapshots. Yet your Rails application will start to struggle in peak times, eating up CPU.

              But I have to say while I work on a Rails app right now that is way bigger than the above I am certainly not an expert when it comes to Rails or even Ruby.

              1.  

                Here’s an article that shows load average on basically the smallest crud app you can think of, and it’s also open source. In this case the article walks through finding and eliminating a bug/problem, but it shows the correlation between usage and load to help build up intuition.

                https://schneems.com/2017/07/18/how-i-reduced-my-db-server-load-by-80/

                1.  

                  I don’t really understand the relevance. As mentioned the database is mostly bored even during peak times. That’s the whole point I’m making.

                  1.  

                    As mentioned the database is mostly bored even during peak times

                    As mentioned, the database is the performance bottleneck of most Rails apps experiencing performance problems that I’ve seen 🤷

                2.  

                  That’s 6 queries per minute per user on average, which seems like A LOT. How many queries per page load is that?

                  1.  

                    Let’s see

                    • You do one query for authentication.
                    • You get something related to the resource, to authenticate the resource (if you use any kind of ACL mechanism like cancancan), might be two queries if it’s for example that being based on relationships to other users
                    • You get the thing that the endpoint is about
                    • It might be something more involved where you end up doing additional queries

                    That’s already at least 3-4 queries. And that’s the most basic one request can be if you separate authentication and authorization into their own things.

                    Now think about the fact that oftentimes a page load does more than one ajax request. So you have a multitude of them. Eg. you might have whatever the page is about, then in addition checking something like notification, some inbox. You might have a general endpoint for the current user profile, settings, etc. You might in some situations do non user triggered updates, for example to mark stuff as read or similar. In addition if it’s something social you might have something to give suggestions. etc. So the first page load have multiple of these 3-4 queries. Sometimes you also want to do some more generic calls. For example a geoip call that might still be authenticated or simply load some news that might still be authenticated or use it based on the user, so again you have to do the authentication part. In addition sometimes stuff ends up in the queue, to come back to topic. So that ID there has to be queried again. Sometimes you interact with the outside world, so you query, the communicate, then query again. Sometimes stuff is callbacks. Sometimes requests fail and are retried (again, mobile app, so network quality varies, it’s an outdoorsy app). Then you have some caching queries and since it’s Rails there is the classic touch: true association.

                    If you then go further like having activities, you might do stuff where user uploads something, then uploads pictures individually so that it’s not one big request, but if you are a mobile app multiple so it’s easier in terms of retrying when only one changes (think wifi-cell switch).

                    Now if your site is more complex you multiply those 3 with even more. Very different of course if you just use Rails directly for rendering. But if you have endpoints for a lot of features that stuff of course adds up, when they are queried independently, so the client can compose it.

                    So overall you end up with a bigger amount of initial queries (because many AJAX calls) on first page load and then reduce it. Same for when the app is opened. Couple of initial queries, that often mostly do the authentication part and then their main thing.

                    Just checked. The initial page load for a logged in user is 12 AJAX requests. All with at least Authentication -> ACL -> main request(s). Could be optimized for sure.

                    But then again, as mentioned the 75 USD/month primary DB is pretty bored, as it should it. So the need to improve there hasn’t yet arisen. The additional Rails instances are also mostly therefor failover. Compared to so many other things the infrastructure cost is pretty much irrelevant. Pretty much every few minutes of unnecessary meetings covers the DB costs for months.

                    All of that is without any requests not initiated by users directly. Think of accepting callbacks (payment, etc.), reporting, monitoring, altering, etc. related queries regarding the current state of the DB. Or status endpoints that trigger requests that are pulled in regular intervals 24/7. Not all of these go through Rails of course. A lot of them go either directly to the DB or through something else. And it feels like the queue would be nothing more than one of these small side things.

                    I hope that clarifies things a bit. :)

              2.  

                Yeah it shouldn’t be a problem, but the database needs to be operated properly like having the right indexes and using SKIP LOCKED for queue ops.

                https://web.archive.org/web/20160416164956/https://blog.2ndquadrant.com/what-is-select-skip-locked-for-in-postgresql-9-5/

                (Sadly EnterpriseDB have fucked the 2nd Quadrant blog.)

                1.  

                  Yes, but that’s Sidekiq’s or SolidQueue’s job.

                  Ages ago built my own queues, both in Redis and PG.

          2. 0

            Planned and enforced obsolescence via certificates.

            This is the future the “HTTPS everywhere” crowd wants ;)

            It will be interesting to see if Google fixes this. On the one hand, brand value. On the other, it’s a chance to force purchase of new hardware!

            1. 47

              This is the future the “HTTPS everywhere” crowd wants ;)

              Not me. I want HTTPS Everywhere and I also don’t want this.

              1. 6

                What’s your marketing budget? If you aren’t aligned with the marketing budget havers on this, how do you expect them to treat you when your goals diverge?

                See also, fast expiring certificates making democratized CT logs infeasible, DNS over HTTPS consolidating formerly distributed systems on cloudflare. It’s not possible to set up a webpage in 2025 without interacting with a company that has enough money and accountability to untrustworthy governments to be a CA, and that sucks.

                HTTPS is cool and all, but I wish there was a usable answer that wasn’t “just centralize the authority.”

                1. 3

                  Sigh. Lobsters won’t let me post. I must be getting rate limited? It seems a bit ridiculous, I’ve made one post in like… hours. And it just shows me “null” when I post. I need to bug report or something, this is quite a pain and this is going to need to be my last response as dealing with this bug is too frustrating.

                  See also, fast expiring certificates making democratized CT logs infeasible, DNS over HTTPS consolidating formerly distributed systems on cloudflare.

                  Can you tell me more about these? I think “infeasible” is not accurate but maybe I’m wrong. I don’t see how DoH consolidates anything as anyone can set up a DoH server.

                  t’s not possible to set up a webpage in 2025 without interacting with a company that has enough money and accountability to untrustworthy governments to be a CA, and that sucks.

                  You can definitely set up a webpage in 2025 pretty with HTTPS, especially as you can just issue your own CA certs, which your users are welcome to trust. But if your concern is that a government can exert authority within its jurisdiction I have no idea how you think HTTP is helping you with that or how HTTPS is enabling that specifically. These don’t feel like HTTPS issues, they feel like regulatory issues.

                  HTTPS is cool and all, but I wish there was a usable answer that wasn’t “just centralize the authority.”

                  There are numerous, globally distributed CAs, and you can set one up at any time.

                  1. 2

                    Lobsters has been having some issues, I had the same trouble yesterday too.

                    The CT log thing is something i read on here iirc, basically that CT logs are already pretty enormous and difficult to maintain, if there are 5x as many cert transactions cause they expire in 1/5 the time the only people who will be able to keep them are people with big budgets

                    I suppose i could set up a DoH server, but the common wisdom is to use somebody else’s, usually cloudflare’s, the fact that something is technically possible doesnt matter in a world where nobody does it.

                    especially as you can just issue your own CA certs

                    Are you joking? “please install my CA cert to browse my webpage” may technically count as setting up a web page but the barrier to entry is so high I might as well not. Can iphones even do that?

                    There are numerous, globally distributed CAs, and you can set one up at any time.

                    That’s a lot more centralized than “I can do it without involving a third party at all.”

                    I dunno, maybe I’m just romanticizing the past but I miss being able to publish stuff on the internet without a Big Company helping me.

                    1.  

                      The CT log thing is something i read on here iirc, basically that CT logs are already pretty enormous and difficult to maintain, if there are 5x as many cert transactions cause they expire in 1/5 the time the only people who will be able to keep them are people with big budgets

                      Strange but I will have to learn more.

                      I suppose i could set up a DoH server, but the common wisdom is to use somebody else’s, usually cloudflare’s

                      Sure, because that’s by far the easiest option and most people don’t really care about centralizing on Cloudflare, but nothing is stopping people from using another DoH.

                      Are you joking? “please install my CA cert to browse my webpage” may technically count as setting up a web page but the barrier to entry is so high I might as well not. Can iphones even do that?

                      iPhones being able to do that isn’t really relevant to HTTPS. If you want to say that users should be admins of their own devices, that’s cool too.

                      As for joking, no I am not. You can create a CA, anyone can. You don’t get to decide who trusts your CA, that would require work. Some companies do that work. Most individuals aren’t interested. That’s why CAs are companies. If you’re saying you want a CA without involving any company, including non-profits that run CAs, then there is in fact an “open” solution - host your own. No one can stop you.

                      You can run your own internet if you want to. HTTPS is only going to come up when you take on the responsibility of publishing content to the internet that everyone else has to use. No one can stop you from running your own internet.

                      That’s a lot more centralized than “I can do it without involving a third party at all.”

                      As opposed to running an HTTP server without a third party at all? I guess technically you could go set up a server at your nearest Starbucks but I think “at all” is a bit hard to come by and always has been. Like I said, if you want to set up a server on your own local network no one is ever going to be able to stop you.

                      I dunno, maybe I’m just romanticizing the past but I miss being able to publish stuff on the internet without a Big Company helping me.

                      What did that look like?

                2. 1

                  I want the benefits of HTTPS without the drawbacks. I also want the benefits of DNS without the drawbacks.

                  On the one hand, I am completely sincere about this. On the other, I feel kind of foolish for wanting things without wanting their consequences.

                  1. 1

                    Which drawbacks? I ask not because I believe there are none, but I’m curious which concern you the most. I’m sympathetic to wanting things and not wanting their consequences haha that’s the tricky thing with life.

                    1. 4

                      HTTPS: I want the authentication properties of HTTPS without being beholden to a semi-centralized and not necessarily trustworthy CA system. All proposed alternatives are, as far as I know, bad.

                      DNS: I want the convenience of globally unique host names without it depending on a centralized registry. All proposed alternatives are, as far as I know, bad.

                3. 42

                  These kind of accusations are posts that make me want to spend less on lobsters. Who knows if it’s planned or accidental obsolescence? Many devices and services outlive their teams by much longer than anticipated. Everyone working in software for a long while has experienced situations like those. I also find the accusation that HTTPS is leading to broken devices rather wild…

                  I want to offer a different view: How cool is it that the devices was fixable despite Google’s failure to extend/exchange their certificate. Go, tell your folks that the Chromecast is fixable and help them :)

                  1. 14

                    For me, it’s takes like yours that irritate me. Companies that are some of the largest on the planet don’t need people like you to defend them, to make excuses for them, to try to squelch the frustration directed towards them because they’re either evil or incompetent.

                    By the way, there is no third option - either they’re evil and intended to force obsolescence upon these devices, or they’re incompetent and didn’t know this was going to happen because of this incompetence.

                    The world where we’re thinking it’s cool that these devices are fixable tidily neglects the fact that 99% of the people out there will have zero clue how to fix them. That it’s fixable means practically nothing.

                    1. 10

                      For me, it’s takes like yours that irritate me. Companies that are some of the largest on the planet don’t need people like you to defend them, to make excuses for them, to try to squelch the frustration directed towards them because they’re either evil or incompetent.

                      Who cares? No one is defending Google. People are defending deploying HTTPS as a strategy to improve security. Who cares if it’s Google or anyone else? The person you’re responding to never defends Google, none of this has to do with Google.

                      By the way, there is no third option - either they’re evil and intended to force obsolescence upon these devices, or they’re incompetent and didn’t know this was going to happen because of this incompetence.

                      Who cares? Also, there is a very obvious 3rd option - that competent people can make a mistake.

                      Nothing you’ve said is relevant at all to the assertion that, quoting here:

                      This is the future the “HTTPS everywhere” crowd wants ;)

                      1. 3

                        Even though you’re quoting me, you must be mistaken - this post is about Google, and my response was about someone who is defending Google’s actions (“Who knows if it’s planned or accidental obsolescence?”).

                        I haven’t a clue how you can think that a whole post about Google breaking Google devices isn’t about Google…

                        To the last point, “https everywhere” means things like this can keep being used as an excuse to make fully functional products in to ewaste over and over, and we’re left wondering if the companies responsible are evil or dumb (or both). People pretending to not get the connection aren’t really making a good case for Google not being shit, or for how the “https everywhere” comment is somehow a tangent.

                        1. 1

                          Nope, not mistaken. I think my points all stand as-is.

                    2. 2

                      Take what you want from my employment by said company, but I would guess absolutely no-one in private and security has any wish/intention/pressure to not renew a certificate.

                      I have no insider knowledge about what has happened (nor could I share it if I did! But I really don’t). But I do know that the privacy and security people take their jobs extremely seriously.

                      1. 7

                        Google has form in these matters, and the Chromecast as a brand even has an entry here:

                        https://killedbygoogle.com/

                        But in the future I’ll be more polite in criticizing one of the world’s biggest companies so that this place is more welcoming to you.

                        1. 17

                          This isn’t about who you criticize, I would say the same if you picked the smallest company on earth. This is about the obvious negativity.

                          This is because the article isn’t “Chromecast isn’t working and the devices all need to go to the trash”. Someone actually found out why and people replied with instructions how to fix these devices, which is rather brilliant. And all of that despite google’s announcements that it would discontinue it..

                          1. 14

                            This is the future the “HTTPS everywhere” crowd wants ;)

                            I’m not exactly sure what you meant by that, and even the winky face doesn’t elide your intent and meaning much. I don’t think privacy and security advocates want this at all. I want usable and accessible privacy and security and investment in long term maintenance and usability of products. If that’s what you meant, it reads as a literal attack rather than sarcasm. Poe’s law and all.

                            1. 8

                              Not all privacy and security advocates wanted ‘HTTPS everywhere’. Not all of the ‘HTTPS everywhere’ crowd wanted centralized control of privacy and encryption solutions. But the privacy and security discussion has been captured by corporate interests to an astonishing degree. And I think @gerikson is right to point that out.

                              1. 4

                                Do you seriously think that a future law in the US forcing Let’s Encrypt (or any other CA) to revoke the certificates of any site the government finds objectionable is outside the realms of possibility?

                                HTTPS everywhere is handing a de facto publishing license to every site that can be revoked at will by those that control the levers of power.

                                I admit this is orthogonal to the issue at hand. It’s just an example I came up with when brewing some tea in the dinette.

                                1. 19

                                  In an https-less world the same people in power can just force ISPs to serve different content for a given domain, or force DNS providers to switch the NS to whatever they want, etc. Or worse, they can maliciously modify the content you want served, subtly.

                                  Only being able to revoke a cert is an improvement.

                                  Am I missing something?

                                  1. 3

                                    Holding the threat of cutting off 99% of internet traffic over the head of media companies is a great way to enforce self-censorship. And the best part is that the victim does all the work themselves!

                                    The original sin of HTTPS was wedding it to a centralized CA structure. But then, the drafters of the Weimar constitution also believed everything would turn out fine.

                                    1. 8

                                      They’ve just explained to you that HTTPS changes nothing about what the government can do to enact censorship. Hostile governments can turn your internet off without any need for HTTPS. In fact, HTTPS directly attempts to mitigate what the government can do with things like CT logs, etc, and we have seen this work. And in the singular instance where HTTPS provides an attack (revoke cert) you can just trust the cert anyways.

                                      edit: Lobsters is basically completely broken for me (anyone else just getting ‘null’ when posting?) so here is my response to the reply to this post. I’m unable to reply otherwise and I’m getting no errors to indicate why. Anyway…

                                      Yeah, “trust the cert anyway” is going to be the fig leaf used to convince a compliant SCOTUS that revoking a certification is not a blatant violation of the 1st amendment. But at least the daily mandatory webcast from Dear Leader will be guaranteed not to be tampered with during transport!

                                      This is getting ridiculous, frankly.

                                      You’ve conveniently ignored everything I’ve said and focused instead of how a ridiculous attack scenario that has an obvious mitigation has 4 words that somehow you’re relating to SCOTUS and 1st amendment rights? Just glossing over that this attack makes almost no sense whatsoever, glossing over that the far easier attacks apply to HTTP at least as well (or often better) as HTTPS, glossing over the fact that even more attacks are viable against HTTP that aren’t viable against HTTPS, glossing over that we’ve seen CT logs actually demonstrate value against government attackers, etc etc etc. But uh, yeah, SCOTUS.

                                      SCOTUS is going to somehow detect that I trusted a certificate? And… this is somehow worse under HTTPS? They can detect my device accepting a certificate but they can’t detect me accessing content over HTTP? Because somehow the government can’t attack HTTP but can attack HTTPS? This just does not make any sense and you’ve done nothing to justify your points. Users have been more than charitable in explaining this to you, even granting that an attack exists on HTTPS but helpfully explaining to you why it makes no sense.

                                      1. 3

                                        Going along with your broken threading

                                        My scenario was hypothetical.

                                        In the near future, on the other side of an American Gleichschaltung, a law is passed requiring CAs to revoke specific certificates when ordered.

                                        If the TLS cert for CNN.com is revoked, users will reach a scary warning page telling the user the site cannot be trusted. Depending on the status of “HTTPS Everywhere”, it might not be able to proceed past this page. But crucially, CNN.com remains up, it might be accessible via HTTP (depending on HSTS settings) and the government has done nothing to impede the publication.

                                        But the end effect is that CNN.com is unreadable for the vast number of visitors. This will make the choice of CNN to tone down criticism of the government very easy to make.

                                        The goal of a modern authoritarian regime is not to obsessively police speech to enforce a single worldview. It’s to make it uneconomical or inconvenient to publish content that will lead to opposition to the regime. Media will parrot government talking points or peddle harmless entertainment. There will be an opposition and it will be “protected” by free speech laws, but in practice accessing its speech online will be hard to impossible for the vast majority of people.

                                        1. 4

                                          But crucially, CNN.com remains up, it might be accessible via HTTP

                                          I feel like your entire argument hinges on this and it just isn’t true.

                                          1. 3

                                            If the USA apparatus decides to censor CNN, revoking TLS cert wouldn’t be the way. It’ll be secret court orders (not unlike recent one British government has sent to Apple), and, should they not comply, apprehension of key staff.

                                            And, even if such cert revocation happened, CNN would be able to get new one within seconds by contacting any other ACME CA, there are even some operating in EEA.

                                            I think your whole argument is misguided, and not aimed at understanding failures of Google, but at lashing at only tangentially related problem space.

                                            And my comment is not defence of Google or Cloudflare, I consider both to be malicious for plethora of reasons.

                                            1. 1

                                              You’re still thinking like the USSR or China or any totalitarian government. The point isn’t to enforce a particular view. The point is to prevent CNN or any other media organization from publishing anything other than pablum, by threatening their ad revenue stream. They will cover government talking points, entertainment, even happily fake news. Like in Russia, “nothing is true and everything is possible”.

                                              And, even if such cert revocation happened, CNN would be able to get new one within seconds by contacting any other ACME CA, there are even some operating in EEA.

                                              Nothing is preventing the US from only allowing certs from US based issuers. Effectively, if you’re using a mainstream browser, the hypothetical law I have sketched out will also affect root CAs.[1]

                                              I think your whole argument is misguided, and not aimed at understanding failures of Google, but at lashing at only tangentially related problem space.

                                              I proposed a semi-plausible failure mode of the current CA-based certification system and suddenly I’ve gotten more flags than ever before. I find it really interesting.


                                              [1] note that each and every one of these attempts to block access will have quite easy and trivial workarounds. That’s fine, because as stated above, having 100% control of some sort of “truth” is not the point. If nerds and really motivated people can get around a block by installing their own root store or similar, it will just keep them happy to have “cheated the system”. The point is having an atomized audience, incapable of organizing a resistance.

                                              1. 4

                                                I proposed a semi-plausible failure mode of the current CA-based certification system and suddenly I’ve gotten more flags than ever before. I find it really interesting.

                                                The flags are me and they’re because your posts have been overwhelmingly low quality, consisting of cherry picking, trolling, rhetoric, and failing to engage with anyone’s points. You also never proposed any such attack, other users did you the favor of explaining what attack exists.

                                                The closest thing you’ve come to defining an attack (before others stepped in to hand you one) is this:

                                                Holding the threat of cutting off 99% of internet traffic over the head of media companies

                                                It’s not that interesting why you’re getting flagged. IMO flags should be required to have a reason + should be open, but that’s just me, and that’s why I virtually always add a comment when I flag a post.

                                                This is one of the only posts where you’ve almost come close to saying what you think the actual problem is, which if I very charitably interpret and steel-man on your behalf I can take as essentially “The US will exert power over CAs in order to make it hard for news sites to publish content”. This utterly fails, to be clear (as so many people have pointed out that there are far more attacks on HTTP that would work just as well or infinitely better, and as I have pointed out that we have seen HTTPS explicitly add this threat model and try to address it WITH SUCCESS using CT Logs), but at least with enough effort I can extract a coherent point.

                                                1. 4

                                                  I have around 30 flags right now in these threads (plus some from people who took time off their busy schedule to trawl through older comments for semi-plausible ones to flag). You’re not the only one I have pissed off.[1]

                                                  (I actually appreciate you replying to my comments but to be honest I find your replies quite rambling and incoherent. I guess I can take some blame for not fully cosplaying as a Project 2025 lawyer, instead relying on vibes.)

                                                  It’s fine, though. I’ve grown disillusioned by the EFF style of encryption boosting[2]. I expect them to fold like a cheap suit if and when the gloves come off.


                                                  [1] but I’m still net positive on scores, so there are people on the other side too.

                                                  [2] they’ve been hyperfocussed on the threat of government threats to free speech, while giving corporations a free pass. They never really considered corporations taking over the government.

                                                  1. 3

                                                    Hm, I see. No, I certainly have not flagged all of your posts or anything, just 2 or 3 that I felt were egregious. I think lobsters should genuinely ban more people for flag abuse, tbh, but such is the way.

                                                    It’s interesting that my posts come off as rambly. I suppose I just dislike tree-style conversations and lobsters bugs have made following up extremely annoying as my posts just disappear and show as “null”.

                                                    1. 1

                                                      I’ve been getting the “null” response too. There’s nothing in the bug tracker right now, and I don’t have IRC access. Hopefully it will be looked at soon.

                                                      As to the flags, people might legitimately feel I’m getting too political.

                                                    2. 1

                                                      I can take some blame for not fully cosplaying as a Project 2025 lawyer, instead relying on vibes.

                                                      Genuine question, is this aimed at me?

                                                      1. 1

                                                        Nope. Unless you are a lawyer for Project 2025.

                                            2. 2

                                              Yeah, “trust the cert anyway” is going to be the fig leaf used to convince a compliant SCOTUS that revoking a certification is not a blatant violation of the 1st amendment. But at least the daily mandatory webcast from Dear Leader will be guaranteed not to be tampered with during transport!

                                              1. 4

                                                Wouldn’t you agree that certificate transparency does a better job detecting this kind of thing than surreptitiously redirecting DNS would?

                                                1. 2

                                                  The point of this hypothetical scenario would be that the threat of certificate revocation would be out in the open, to enforce self-censorship to avoid losing traffic/audience. See my comment here:

                                                  https://lobste.rs/s/mxy0si/chromecast_2_s_device_authentication#c_lyenlf

                                    2. 11

                                      But in the future I’ll be more polite in criticizing one of the world’s biggest companies so that this place is more welcoming to you.

                                      Flagged as trolling. I’m also extremely critical of Google’s killing of various services.

                                      1. 3

                                        I’m not sure any of those are good examples of planned obsolescence. As far as I can tell, they’re all services that didn’t perform very well that Google didn’t want to support, tools that got subsumed into other tools, or ongoing projects that were halted.

                                        I think it’s reasonable to still wish that some of those things were still going, or that they’d been open-sourced in some way so that people could keep them going by themselves, or even that Google themselves had managed them better. But planned obsolescence is quite specifically the idea that you should create things with a limited lifespan so that you can make money by selling their replacements. As far as I can tell, that doesn’t apply to any of those examples.

                                        1. 0

                                          Trust Google to not even manage to do planned obsolescence right either…

                                    3. 13

                                      This is the future the “HTTPS everywhere” crowd wants ;)

                                      Please refrain from smirky, inflammatory comments.

                                      1. 7

                                        I get that it’s a tongue in cheek comment, but this is what falls out of “we want our non-https authentication certificates to chain through public roots”.

                                        There is no reason for device authentication to be tied to PKI - it is inherently a private (as in “only relevant to the vendor” , not secret) authentication mechanism so should not be trying to chain through PKI, or PKI-like, roots.

                                        1. 9

                                          Hyperbole much? Sometimes an expired certificate is just an expired certificate

                                          1. 10

                                            Why is this a hyperbole? It is clear that even an enterprise the size of Google, famous for it’s leetcode-topping talent is unable to manage certificates at scale. This makes it a pretty good point against uncritical deployment of cryptographic solutions.

                                            1. 10

                                              Microsoft let microsoft.com lapse that one time. Should we give up on DNS?

                                              1. 6

                                                When Microsoft did that I wasn’t standing embarrassed in front of my family failing to cast cartoons on the TV. So it was their problem, not my problem.

                                                (It is still bricked today btw)

                                              2. 6

                                                No one has ever argued for “uncritical deployment” of any solution, let alone cryptographic ones.

                                                1. 2

                                                  Maybe I’m reading too much into “HTTPS everywhere” then.

                                                  1. 3

                                                    Maybe. I think there are two ways to interpret it - “HTTPS Everywhere” means “literally every place” or it means “everywhere that makes sense, which is the vast majority of places”. But, to me, neither of these implies “you should deploy in a way that isn’t considered and that will completely destroy a product in the future”, it just means that you should very likely be aiming for a reliable, well supported deployment of HTTPS.

                                                2. 2

                                                  I was replying more to the “planned and enforced obsolescence” conspiracy theorizing.

                                                  It is true that managing certificates at scale is something not a lot of large organizations seem to be able to pull off, and that’s a legitimate discussion to have… but I didn’t detect any good faith arguments here, just ranting

                                            2. 2

                                              I’m curious if anyone has any team wide practices for knowledge sharing in this area.

                                              I ask because “better” spends on values. For example, I’m writing a rust tutorial and I COULD introduce more abstraction to make invalid state impossible at more layers, however that code would add indirection, and it’s unclear if it’s meaningful or not. I chose to make the example simpler, faster to write instead of choosing to harden it against accidental misuse. Neither is better, they prioritize different values. Sometimes things are messy because the author was learning and knows better now, sometimes they are messy because

                                              In my ideal world developers would share strong “do this” and “not that” opinions, the always and nevers. And even some “this is okay” cases of “we can’t make this better, even though it seems suboptimal it’s fine”. Ideally listing out their values for why they feel some code is better than others so if we turn these feelings into a linter or shared style guide then we preserve enough info revisit old decisions/opinions. But I’ve not seen that done in a way that’s not just one dev ad-hoc writing down their preferences. I genuinely want an inclusive team-wide framework for continuous improvement and learning. Does that make sense? Any suggestions?

                                              The author makes the assumption that everyone is in alignment on what dirty tasks and things everyone wishes someone else would clean up, but in my experience alignment is truly the hardest part. The best devs I’ve worked with would happily nerdsnipe themselves all day every day over refactorings and tinkering. But it makes them terribly slow sometimes and the process take forever. I feel that if we spent time learning how to improve continuously (yes the wording strikes me as a bit on the nose) that they would feel empowered to ship suboptimal things and take more risks, knowing that there was space for tidying later (I like that more than the loaded “clean” word). I know how to do that as an individual, but don’t know how to drive that as a process as a team. Suggestions?

                                              1. 1

                                                “Cleanliness” is not formalizable. To your point, people disagree on what is or is not “clean.”

                                                The only antidote I’ve found for this is team-building / team “connecting.” What I mean by this is looking for the moments where you bring up a topic with a teammate, and something just clicks between you. This happened to me the other day with a teammate on the subject of Go channels for example.

                                                This kind of stinks, because this is a pretty fuzzy suggestion. But, I think there’s extreme value in nurturing these connections between everyone on the team. And there are things you can do to be more intentional about making them happen.

                                                A group code review is another good example. Where you review code amongst multiple people. This gives a place for the team to analyze and comment on real code, and share that process together to build connections.

                                              2. 2

                                                I adopted hedy for an elementary school curriculum for one year, several years ago. I prefer Python syntax over JavaScript (code.org) and I like the ambition and concepts behind hedy. When I tried (several years ago) the implementation was lacking. We hit edge cases fairly often in a relatively small class size (negative). The implementers were gracious and welcoming to feedback (positive). But it was more a research project than a robust production teaching tool. I don’t know how it’s changed since.

                                                IIRC at the time the backend was implemented in typescript. A more robust backend in something like Rust would probably have helped with edge cases.

                                                At the time every program submitted was recorded, so we had to warn students to not put in any PII like their name or address into a program. Which was not great.

                                                I would like to see more experimentation with how to slowly frog boil syntax knowledge. I would also like to see code.org expand their curriculum beyond block and javascript based coding to other languages. It’s really an amazing thing they’ve built.

                                                1. 3

                                                  I would like to see more experimentation with how to slowly frog boil syntax knowledge.

                                                  The decades-long research program that created the HtDP curriculum may be of interest. There’s a related teaching language and community, Pyret, that looks more like Python but shares many concepts with the Racket-based HtDP languages.

                                                  1. 1

                                                    Thanks for the consideration. I clicked through. I think your expectations are off by an order of magnitude or two. When I start teaching kids they struggle with “what does the shift key do” and later “why do I need to put quote marks around both sides of my string” (not to mention “what is a string”).

                                                    Honestly, watching young 3rd grade minds smashed to bits by the minor amount of indirection provided by variables in a block based language is deeply humbling when I reflect on some of the complexity and abstraction I’m able to reason about relatively intuitively at this point.

                                                    Say you need to compute the sin of some angle

                                                    My students have never even heard of sin much less wanting to be able to compute something with it.

                                                    Hedy worked wonderfully, in gradually introducing syntax, but it missed (quality) gamification and polish (in the form of unreliable implementation). The thing I most want to preserve is joy and the ability to create. Blocks give that to kids. Text syntax is a huge leap already.

                                                    The move has been to use straight python rather than a dialect. An open question of mine is whether or not such frog-boil syntax rules helped in the long term or if throwing kids into the deep end was less confusing I.e. no starting with hate words and then gradually introducing quoting. The hardest thing with this age group is to keep them slightly challenged so they are learning but not so much that they are stuck. Joy and creation.

                                                    1. 1

                                                      HtDP is a college curriculum! I think it’s reasonable for something like an AP high school course, but I wouldn’t try to teach third graders with it. Quite honestly, I wouldn’t try to teach kids “textual programming” until they’re already comfortable with a keyboard and with grammar and punctuation in their native language, as well as arithmetic. Seems like a recipe for frustration. What’s the rush?

                                                      I completely agree about joy and creation, though. I have a ten-year-old who’s taught himself quite a lot of programming basics or prerequisites just by creating custom items and command blocks in Minecraft. Sometimes he asks me for help, but mostly he’s learning by absorbing his environment, just like we all do.

                                                      1. 1

                                                        AP high school course, but I wouldn’t try to teach third graders with it.

                                                        Why did you recommend it to the comment from an elementary school teacher?

                                                        Seems like a recipe for frustration. What’s the rush?

                                                        3rd is too young, but 5th is not. We want to teach them that there’s a bigger world out there, beyond blocks, before they get locked into a single paradigm of coding. Our curriculum also involves teaching typing.

                                                        1. 1

                                                          I didn’t think of your comment as coming from an elementary school teacher. I was thinking about pedagogical language design, and pointing to the prior art that I’m aware of. If you’re not building a language, just trying to use something that already exists, and specifically for elementary school, then HtDP is probably not that helpful, and I’m sorry about that!

                                                          1. 1

                                                            Thanks for the apology. And genuinely appreciate the link, i just couldn’t connect the dots, which you just did.

                                                          2. 1

                                                            Let me try again… here’s an few-years-old lobsters story linking to a blog review of a much older book about how children relate to programming that I’ve personally found very useful in thinking about conceptual scaffolding: https://lobste.rs/s/r9thsc/mindstorms

                                                            For what it’s worth, if you’re using Python for teaching, you might check out the turtle graphics package in the standard library. “Batteries included!”

                                                        2. 1

                                                          Isn’t third grade a bit too young? I’d say picking up some programming is OK for 16-year olds, as they get younger than that they wouldn’t really pick up anything very useful even as a foundation for the future.

                                                          1. 4

                                                            Isn’t third grade a bit too young? I

                                                            I don’t think so. I have experimentally taught some Scratch to a bunch of second-graders during my brief stint as a school informatics teacher, and they were pretty responsive. (I quit the job for unrelated reason the next year.)

                                                            Some decade later, my own daughters have Scratch in their school curriculum, and my youngest one (will be 10 this year) additionally visits children’s programming courses by her own will, and as far as I can see, the children are extremely interested.

                                                            The goal, as far as I understand it, is not to prepare for a career in software development, but to introduce “constructing algorithms” as a tool of thought, as well as demystify computing a bit; like, “see, you can make things do this and that by your own, and it is all just ifs and cycles and responding to input/events,”

                                                            1. 1

                                                              Isn’t third grade a bit too young?

                                                              Nope. They learn iteration (loops), variables, logic, and plenty more.

                                                      2. 8

                                                        Hi all! I finally decided to write the monad tutorial of my dreams, using Rust and property-based testing as the vehicle to understand them.

                                                        A lot of monad explanations focus on either the formalism, or an IMHO excessive level of casualness. While those have their place, I think they tend to distract from what I think is an important and profound design pattern. My explanation has:

                                                        • no analogies to food items or anything else
                                                        • no Haskell
                                                        • no category theory

                                                        Instead, this post talks about:

                                                        • real-world problems
                                                        • production-grade Rust
                                                        • with performance numbers, including a very pretty log-log cdf!

                                                        There’s also a small Rust program to accompany the post.

                                                        1. 6

                                                          Hi, it’s commendable that you set forth to bridge communities and languages with this post, but among other TCS/algebraic concepts monads tend to get variously conflated with their implementation in this or that concrete language. However you make some pretty strong claims in the post, and some downright inaccurate ones.

                                                          The thing we care about is composition of computations that might have effects. As such bind/flatMap are generalizations of the pure function composition “dot” operator.

                                                          This is the essence of monadic composition: powerful, unconstrained, and fundamentally unpredictable.

                                                          Not really. Lists and Optional computations can be composed with monadic bind, but would you say composition of programs that return these values is unpredictable?

                                                          Some of this misunderstanding is a historical accident; Haskell lets you talk about effect types that do not involve I/O, so it proved to be a good test bench for this concept.

                                                          monadic composition is Turing-complete

                                                          This is a property of the language, not of the composition operator.

                                                          1. 7

                                                            Thank you for the feedback! For context, I studied some category theory in graduate school and I can also talk about natural transformations and endofunctors, but I made a deliberate decision to avoid all of that. There are a large number of developers who would benefit from recognition of monadic patterns in their daily professional lives, but have been traumatized by overly technical monad tutorials – that’s the audience for my post. If you’re not in that audience, there are plenty of other monad tutorials for you :)

                                                            Even in the appendix, I talk about the monad laws in terms of their application to strategies with the concrete Just and prop_flat_map. That’s a deliberate pedagogical decision too. I enjoy working in abstract domains, I just think it really helps to be motivated by practical concerns before going deeper into them.

                                                            but among other TCS/algebraic concepts monads tend to get variously conflated with their implementation in this or that concrete language.

                                                            This is absolutely true. In my post I was trying to focus on why monadic bind is of deep interest and not just a Haskell/FP curiosity — why working programmers should at least notice it whenever it pops up, no matter what environment or language they are in. (Speaking personally, in one of my first projects at Oxide, very far away from grad school, I had the opportunity to add monadic bind to a system, but deliberately chose not to do so based on this understanding.)

                                                            I think of it as similar to group theory, where you can either introduce groups formally as a set with an attached operation that has certain properties, or motivate them through an “implementation” in terms of symmetries (maybe even through the specific example of solving the quintic). I personally prefer the latter because it gives me really good reasons for why I should care about them, and also helps explain things like why group theory is central to physics. In my experience teaching people, this preference is shared by most of them. There’s always time later to understand the most general concept.

                                                            Not really. Lists and Optional computations can be composed with monadic bind, but would you say composition of programs that return these values is unpredictable?

                                                            Absolutely, relative to fmap in each of the monads. I mention an example at the end with lists, where with bind you don’t know the size of the resulting list upfront – for Rust this has performance implications while collecting into a vector, since you can’t allocate the full capacity upfront (though that’s a tiny difference compared to the exponential behavior of test case shrinking). Even with optionals, fmap means a Some always stays a Some, while bind can turn a Some into a None. Relatively speaking that’s quite unpredictable.

                                                            The extent to which monadic bind’s unpredictability matters is specific to each monad (it matters less for simpler monads like optionals, and more for strategies, futures and build systems.) But in all cases it is unpredictable relative to functor (or applicative functor) composition within that monad.

                                                            This is a property of the language, not of the composition operator.

                                                            This is true. What I was trying to say here is that in a typical Turing-complete environment, the Turing completeness means that introspecting the lambda is impossible in general. (And this is true in non-Turing-complete environments too – for example, in primitive recursive/bounded loop environments it’s still quite unpredictable.) I’ll try and reword this tomorrow.

                                                            1. 5

                                                              I appreciate the thoughtful approach. I learn much better starting with the specific and moving to the general than starting with the abstract.

                                                              I wrapped up Georgia tech OMSCS and one of my favorite classes was Knowledge Based AI which focused on how humans learn things. (And the AI is classical and not LLMs). One core takeaway for me was the general/specific learning dichotomy.

                                                              A really interesting learning approach was called “version spaces” which acted like bidirectional search by building general and specific info at the same time. Basically, people need both models to fully absorb a concept, but how individuals learn best varies.

                                                              All that to say: thanks again, I think it takes a lot of work and effort to make something approachable and I appreciate your post.

                                                              1. 4

                                                                You may enjoy this post: https://terrytao.wordpress.com/career-advice/theres-more-to-mathematics-than-rigour-and-proofs/

                                                                This three-staged learning process is very common, in my experience, and i feel like general/specific is similar to intuitive/rigorous.

                                                              2. 2

                                                                There are a large number of developers who would benefit from recognition of monadic patterns in their daily professional lives, but have been traumatized by overly technical monad tutorials

                                                                When intersected with “Rust developer”, do you think that’s still a large group…? If someone finds monad tutorials to be “overly technical” then they’re never going to make it past JavaScript, much less discover systems programming languages like Rust.

                                                                I’m one of the fools who once upon a time tried to use Haskell for systems programming, and of all of the Rust/C/C++ developers I’ve met, their primary difficulty with monads is that all the documentation was non-technical (i.e. prose) and did the normal academic math thing of using five different words to identify the same concept in different contexts.

                                                                This article is a new way of writing a confusing explanation of monads, which is that it starts off by diving deep into an obscure testing strategy that’s pretty much only used by people really into functional programming, and then slowly works its way back into the shallow waters of monads, then dives right into a discussion on how function composition is Turing-complete[0] and how exemplar reduction in property-based testing can have unpredictable runtime performance. You’ve got, like, three different articles there!

                                                                If you want someone who can’t spell “GHC” to understand monads, there’s only three types you need:

                                                                • Lists (Rust’s Vec, C++ std::vector) with flat_map
                                                                • Maybe (Option, std::optional or local equivalent) with and_then
                                                                • Either (Result, std::expected or local equivalent) with and_then

                                                                Don’t go off into the weeds about property testing, only teach one alien concept at a time. Those three types have super-simple implementations of (>>=), they’re universally familiar to everyone who’s not been frozen in Arctic sea ice since 1995, and it’s easy to go from ? to do-notation if you want to demystify the smug “monads are programmable semicolons” in-joke while you’re at it.

                                                                Then, once your reader is comfortable with the concept of nested inline functions as a control flow primitive, you can link them to your follow-up article about the performance implications of monadic combinators or whatever.

                                                                [0] The article acts as if this is a surprising and meaningful revelation, so I might be misunderstanding what’s actually being discussed, but when you say “monadic composition is Turing-complete” you mean something like “bind :: (a -> T b) -> T a -> T b is Turing-complete”, yes? I feel like among people who know what “Turing-complete” means, the knowledge of a Turing machine’s equivalence to function composition is well-known.

                                                                1. 7

                                                                  When intersected with “Rust developer”, do you think that’s still a large group…? If someone finds monad tutorials to be “overly technical” then they’re never going to make it past JavaScript, much less discover systems programming languages like Rust.

                                                                  Yes, it’s a large group. The number of Rust people that come from Haskell or other FP languages is tiny.

                                                                  This article is a new way of writing a confusing explanation of monads, which is that it starts off by diving deep into an obscure testing strategy that’s pretty much only used by people really into functional programming

                                                                  Property-based testing is quite widely used by Rust projects! It’s not the most common way to test software, but it’s far from obscure. It’s also really effective at finding bugs in systems code. I haven’t done any professional work in FP languages, but I’ve used PBT extensively.

                                                                  I even picked a very systems-y example, writing a production-grade sort function that’s resilient to comparator misbehavior. This is the the kind of thing Rust developers enjoy.

                                                                  and then slowly works its way back into the shallow waters of monads, then dives right into a discussion on how function composition is Turing-complete[0] and how exemplar reduction in property-based testing can have unpredictable runtime performance. You’ve got, like, three different articles there!

                                                                  Yes, deliberately so. This kind of progressive complexity enhancement, with a tutorial for A also being a tutorial for B in disguise, is the style I teach in. It’s one I’ve practiced and refined over a decade. It doesn’t work for everyone (what does?) but it reaches a lot of people who bounce off other explanations.

                                                                  If you want someone who can’t spell “GHC” to understand monads, there’s only three types you need

                                                                  With respect, I quite emphatically disagree. I personally believe this approach is a colossal mistake, and I’m far from the only one to believe this (I’ve had a number of offline conversations about this in the past, and after I published this post a professor reached out to me privately about this as well.) I deliberately picked PBT as one of the simplest examples of monadic bind being weird and fascinating.

                                                                  1. 4

                                                                    With respect, I quite emphatically disagree. I personally believe this approach is a colossal mistake, and I’m far from the only one to believe this (I’ve had a number of offline conversations about this in the past, and after I published this post a professor reached out to me privately about this as well.) I deliberately picked PBT as one of the simplest examples of monadic bind being weird and fascinating.

                                                                    The problem with only using List, Option and (perhaps to a lesser extent) Either, as examples is that they’re “containers” (List is commonly understood; Option can be understood as “a list with length < 2”; Either can be understood as Option whose “empty” case provides a “reason” (e.g. an error message)). Containers come with all sorts of intuitions that make interfaces like Monad less appealing. For example, what’s the point of y = x.flat_map(f) compared to ordinary, everyday, first-order code like for (element : x) { y += f(element); }?[0]

                                                                    List (and Option) are definitely good anchors for our understanding, e.g. as sanity checks when reading a generic type signature or a commutative diagram; but we also need examples that aren’t “containers”, to show why these interfaces aren’t just some weird alternative to “normal” looping. A set of examples which aren’t “containers”, but may still be familiar, are things which “generate” values; e.g. parsers, random generators, IO, etc.[1]. Those are situations where our intuition probably isn’t “just loop”[2], so other interfaces can get a chance[3]. The fact that Monad & friends apply to both “containers” and “generators” is then a nice justification for their existence. Once we’re happy that this is a reasonable interface, we can go further into the weeds by deriving some examples purely from the interface, to get a better feel for what it does/doesn’t say, in general (e.g. a “container” that’s always empty, i.e. a parameterised unit type; or a Delay type, which captures general recursion; etc.).

                                                                    [0] Indeed, Scala’s for is syntactic sugar for monadic operations, similar to Haskell’s do. Although that does require extra concepts like yield and <-, which don’t appear in e.g. a Java for-loop; and may require understanding of monad-like-things (I can’t say for sure, since I first tried Scala after after many years of Haskell programming!).

                                                                    [1] Parsers and random generators are actually very closely related, e.g. we can think of a random generator as a parser that’s been given a random (but valid) input. Parser combinators are the most obvious way to understand what it means for parsers to be monads: they used to be quite academic, but I think are now common enough to be a motivating example for “why care”, even for those who tend to use other approaches. Random generators like those in QuickCheck (i.e. built using composition) seem much less common than e.g. only generating random ints, and operating on those directly; which makes them a less motivating example up-front. However, passing around PRNGs may be a good motivation for monadic state, and combining this with monadic parsing to get QuickCheck-style generators might be a nice climax :)

                                                                    [2] We can do equivalent things with loops if we’re comfortable with yield; but (a) that’s a smaller subset of those that are comfortable with for, and (b) that rabbit hole leads more towards delimited continuations, algebraic effects, etc. which are alternatives to monads that are just as fascinating to think about :)

                                                                    [3] I think the intuition for “generators” motivates the idea of composition. For “containers”, composition is merely an optimisation: we can just do multiple passes instead. Whereas for “generators”, it feels like we “don’t have the values yet”, so we need some way to plug them together before they’re “run”. IO is not a good motivator here, since imperative languages automatically compose statements for us. It seems better to come back to it after grokking parsers, random generators, etc.; even then it might be better to first describe an abstraction like a “task queue”, and only later introduce IO as a sort of “task queue for the whole language”.

                                                                    1. 4

                                                                      Thanks for the thoughtful reply.

                                                                      List (and Option) are definitely good anchors for our understanding, e.g. as sanity checks when reading a generic type signature or a commutative diagram; but we also need examples that aren’t “containers”, to show why these interfaces aren’t just some weird alternative to “normal” looping. A set of examples which aren’t “containers”, but may still be familiar, are things which “generate” values; e.g. parsers, random generators, IO, etc.

                                                                      I really love this way of looking at it, as well as you pointing out later that IO is not a good monad to introduce to people either because it is implicit in imperative code.

                                                                      For me, there were two things that made monads really click:

                                                                      • Build Systems a la Carte, which draws a distinction between monadic and applicative build systems – this one made me first realize that monads are a general system property much more than a Haskell curiosity
                                                                      • The PBT example in the post, where generation isn’t hugely affected by monadic composition, but shrinking is
                                                                  2. 3

                                                                    If you want someone who can’t spell “GHC” to understand monads, there’s only three types you need:

                                                                    • Lists
                                                                    • Maybe
                                                                    • Either

                                                                    I dunno if it’s because I learned monads when they had only recently been discovered as the solution to purely functional IO, but I expect that many programmers who have vaguely heard about monads know that they are how you do IO in Haskell. So I think a practical monad tutorial should try to bridge the gap between monads as pure algebra (lists, maybe, either) and how they are useful for impure programming.

                                                                    (I have noticed in several cases that many programmers find it hard to get to grips with very abstract ideas, if the ideas don’t come with concrete practical examples of how the abstraction is used in practice and what benefits the abstraction provides. This is especially true the less complicated the abstraction is, which is why monads are so troublesome.)

                                                                    The problem with list/either/maybe is that they are all parametric types, so all the monad is able to do is faff around with the layout of the values. It’s hard for a tutorial to illustrate what benefit you get from an explicit monad as opposed to less polymorphic list/either/maybe combinators.

                                                                    So I think a monad tutorial should show an example of something more imperative such as the state monad. That allows you to show monads in use with functions that do practical things with the container type, and how the monad sequences those functions. (Perhaps also emphasise Either as the exception monad.) It’s then only a small step to monadic IO.

                                                                    1. 3

                                                                      ISTM that now that many languages have features like promises, there’s more relevant common knowledge among imperative programmers than there used to be. This might be an easier on-ramp than what the initial discoverers had to struggle through. A Promise<Promise<a>> can be flattened into a Promise<a>, and you can write a version of bind. Thinking of bind as “map + join” also helps avoid the “but I don’t have an a so how can I run my a -> m b function?” that I struggled with when understanding monads as they applied to things other than concrete data structures.

                                                                    2. 2

                                                                      Dealing with your footnote, even as someone fairly familiar with function composition, I wouldn’t immediately notice that “bind :: (a -> T b) -> T a -> T b” qualifies as function composition, but “fmap :: (a -> b) -> T a -> T b” is not. Sure, if I as down and wrote it out, it would become clear quickly, but leaving this as an exercise for the reader is just poor pedagogy.

                                                                      1. 1

                                                                        Would it be clearer if you considered composeM :: (a -> T b) -> (b -> T c) -> (a -> T c)? Because you can write it in terms of bind and vice-versa, provided you also have pure. (Final parens are redundant but added for clarity.)

                                                                    3. 1

                                                                      Relatively speaking that’s quite unpredictable.

                                                                      Yep, thank you for clarifying. But I still think that not preserving the number of elements of a container is not the same thing as being unpredictable. For example, there are things for which you can define monadic bind (e.g. functions, https://hackage.haskell.org/package/base-4.12.0.0/docs/src/GHC.Base.html#line-828 ), for which binding means piling applications on top of each other.

                                                                  3. 2

                                                                    Do you think it’s perverse that when I first read a rust tutorial I was perplexed about not putting semicolons at the end of a block until I decided that semicolons are just monadic bind (I don’t think I got around to writing any rust)

                                                                    1. 3

                                                                      It is true that semicolons are a monadic bind, but I also think that lens confuses more than it enlightens :)

                                                                      1. 3

                                                                        It’s the sunglasses from They Live but they show you the monads underlying all computation

                                                                  4. 28

                                                                    I’m mixed on interviews like fizzbuzz. I don’t perform well on them despite going on two decades of experience, being a ruby core contributor, and open source maintainer with billions of downloads: it’s just not an interview environment I thrive in. To the point that avoiding having to do a technical interview is a non-trivial reason I’m still at my current job.

                                                                    I’ve also worked with developers that seemed to be unable to write code and wished we had done some more low pass coding filtering. So u understand where interviewers are coming from.

                                                                    The last intern I hired worked out incredibly well and I got to pick the technical interview tactic. I chose “show me some code you’ve written (that isn’t proprietary and you have permission to share/show) and walk me through some part of it” I also gave them the option to choose a whiteboard style question to pair on if they either didn’t have something they were able or willing to show. It worked very well. The candidate we choose showed me a PR of theirs and was able to explain what it did, why it was done that way and answer some questions about the general codebase. The code showed they could generate code, the interview part showed their communication skills. You could also get a hint of their values: were they a fast prototyper with single letter variable names or did they write tests and refactor? Did they stop when it worked or did they go back and clean up the implementation? No wrong answers, every org needs a mix of people with skills and abilities.

                                                                    Ive also sat through amazing interviews from people I wish we could have hired, but we chose another candidate due to reasons outside of that candidates control (such as another candidate having a unique skill or experience fit). A lot of not getting a job has more to do with the company’s needs and candidate pool than with individual performance (assuming it’s above some baseline). Anywhoo, this is an interesting post and has some neat solutions. I hope to never need them IRL.

                                                                    1. 10

                                                                      imho one of the best interview tactics regarding skills was the one i had at my current jobs intervew. it was an in-person interview, and we didnt had computers at hand, except for the hr person writing notes.

                                                                      to test my coding skills, they just handed me two sheets of paper with code printed (a c++ header and implementation) and they asked me to do a code review.

                                                                      it was super pressure free as you can apply knowledge, but dont have to come up with solutions under pressure

                                                                      from an interviewer standpoint it’s also valuable as you test how the applicant communicates.

                                                                      1. 5

                                                                        do a code review

                                                                        Which, frankly, is much harder than writing code.

                                                                        1. 5

                                                                          Yeah, I’ve had this before and I’ve found it very difficult if you’re just given a piece of code. I mean, I can look for things that look surprising or incorrect, but if I can’t see the broader context for the change - the ticket and discussion that triggered it, the rest of the changes being made, the developer(s) involved, the code that was there beforehand, the libraries and internal APIs that are being used and available, etc etc - then it’s very difficult to know what to focus on.

                                                                          I used to think this was a good way of doing interviews, but since experiencing it as an interviewee, I’ve gone off it a lot, to the point that it’s a bit of an amber flag at this point.

                                                                          1. 2

                                                                            It depends on the context. Here, it was an isolated piece of code, fully visible.

                                                                            It wasn’t the worst code either, but it had flaws an experienced programmer could quickly recognize.

                                                                            But yes, in general i agree that reviewing code is harder, but also less stressful under pressure

                                                                      2. 7

                                                                        This is a very interesting article. I was originally taken aback by the initial “Not IO bound” comment, but pointing out that our current understanding of IO is actually conflated with the OS scheduling of threads was very on point. I hadn’t considered that before. I think my reaction still stands though, but in a pedantic way. Looking at:

                                                                        YJIT only speeds up Ruby by 2 or 3x

                                                                        and

                                                                        Like Discourse seeing a 15.8-19.6% speedup with JIT 3.2, Lobsters seeing a 26% speedup, Basecamp and Hey seeing a 26% speedup or Shopify’s Storefront Renderer app seeing a 17% speedup.

                                                                        I still feel that if a component sees a 2-3x perf increase and that translates to (1.15x-1.27x) improvement that it’s a significant component (and well worth optimizing), but it isn’t the dominant/limiting factor.

                                                                        Towards the end of the article Jean gets into some specific numbers regarding “truly IO bound” being 95% and “kinda” being 50%. I asked him on Mastodon about them. https://ruby.social/@byroot/113877928374636091. I guess in my head “more than 50%” would be what I would classify as “IO bound.” Though I’ve never put a number to it before.

                                                                        Someone tagged an old thread of mine in a private slack recently where I linked to this resource https://www.youtube.com/watch?app=desktop&v=r-TLSBdHe1A. With this comment

                                                                        Samuel shared this in Ruby core chat, and (spoiler) that’s actually one trick for debugging performance. They want to answer the question “Is this code worth optimizing” i.e. “if we made this code 2x faster…would anyone care.” Because if you make something 100x faster that only accounts for 1% of your total time, people aren’t really going to notice.

                                                                        So they can’t arbitrarily make code faster, but they CAN make code arbitrarily slower. So the program simulates a speedup of one code section by making all the other code slower to report if it’s worth optimizing it or not. An all around interesting talk. It’s very aprochable as well

                                                                        It would be interesting to have some kind of an IO backend where you could simulate a slowdown. I.e. perform the query on the database and time it, then sleep for some multiplier of that time before returning. It would (in theory) let you put a number to how much your app is affected by (database) IO. If you set a 2x multiplier and you see requests take 2x as long…then you’re approaching 100% IO bound.

                                                                        The linked GVL timing gem is new and interesting. Overall, thanks for writing all this down. Great stuff.

                                                                        1. 4

                                                                          I guess in my head “more than 50%” would be what I would classify as “IO bound.”

                                                                          Perhaps I should have written about that in my conclusion, but ultimately IO-bound isn’t perfect to describe all I want to say.

                                                                          In a way it’s a good term, because the implication of an IO-bound app, is that the only way to improve its performance is to somehow parallelize the IOs it does, or to speedup the underlying system it does IOs with.

                                                                          With that strict definition, I think YJIT proved that it isn’t the case, given it proved it was able to substantially speedup applications.

                                                                          A more relaxed way I tend to use the IO-bound definition, in the context of Ruby applications, is whether or not you can substantially increase your application throughput without degrading its latency by using a concurrent server (typically Puma, but also Falcon, etc).

                                                                          That’s where the 50% mark is important. 50% IO implies 50% CPU, and one Ruby process can only accommodate “100%” CPU usage. And given threads won’t perfectly line up when they need the CPU, you need substantially more than 50% IO if you wish to process concurrent requests in threads without impacting latency because of GVL contention.

                                                                          So beyond trying say whether apps are IO-bound or not, I mostly want to explain under which conditions it makes sense to use threads or fibers, and how many.

                                                                          1. 3

                                                                            50% IO implies 50% CPU, and one Ruby process can only accommodate “100%” CPU usage. And given threads won’t perfectly line up when they need the CPU, you need substantially more than 50% IO if you wish to process concurrent requests in threads without impacting latency because of GVL contention.

                                                                            Are you comparing multiple single threaded processes to a single multithreaded process here? Otherwise, I don’t understand your logic.

                                                                            If a request takes 500msec of cpu time and 500msec of “genuine” io time, then 1 request per second is 100% utilization for a single threaded server, and queue lengths will grow arbitrarily. With two threads, the CPU is only at 50% utilization, and queue lengths should stay low. You’re correct that there will be some loss due to requests overlapping, and competing for CPU time, but it’ll be dominated by the much lower overall utilization.

                                                                            In the above paragraph, genuine means “actually waiting on the network”, to exclude time spent on CPU handling the networking stack/deserializing data.

                                                                            P.S. Not expressing an opinion on “IO bound” it’s not one of my favorite terms.

                                                                            1. 2

                                                                              Are you comparing multiple single threaded processes to a single multithreaded process here?

                                                                              Yes. Usually when you go with single threaded servers like Unicorn (or even just Puma configured with a single thread per process), you still account for some IO wait time by spawning a bit more processes than you have CPU core. Often it’s 1.3 or 1.5 times as much.

                                                                              1. 2

                                                                                I don’t think there’s anything special about 50% CPU. The more CPU time the worse, but I don’t think anything changes significantly at that point, I think it’s going to be relatively linear relationship between 40% and 60%.

                                                                                You’re going to experience some slowdown (relative to multiple single-threaded processes), as long as the arrival rate is high enough that there are ever overlapping requests to the multithreaded process. Even if CPU is only 1/4 of the time, your expectation is a 25% slowdown for two requests that arrive simultaneously.

                                                                                I think, but am not sure, that the expected slowdown is “percent cpu time * expected queue length (for the cpu)”. If queue length is zero, then no slowdown.

                                                                          2. 1

                                                                            You can achieve some of that with a combination of cgroups limiting the database performance and https://github.com/Shopify/toxiproxy for extra latency/slowdown on TCP connections.

                                                                          3. 2

                                                                            You might be interested in https://github.com/zombocom/rundoc. It acts as a test for your tutorial and guarantees the output in your doc is the same the user will see. For an example of a tutorial built with it https://github.com/heroku/buildpacks/blob/main/docs/ruby/README.md and https://devcenter.heroku.com/articles/getting-started-with-rails8.

                                                                            1. 1

                                                                              Thanks, I’ll check it out!

                                                                              I remember Bob Nystrom describing the custom build system he made for ensuring all the code in Crafting Interpreters worked, and it sounded like a great idea but a pain to roll your own.

                                                                              1. 2

                                                                                LMK if you have questions. Source code is visible for both of those on github. Also for Heroku/Buildpacks we have the docs built in GHA which sends a PR to the repo every week https://github.com/heroku/buildpacks/blob/main/.github/workflows/rundoc-pr.yml which I think is useful/neat.

                                                                              1. 31

                                                                                Strong disagree by someone who lived in a world before lockfiles.

                                                                                Are there any implementations that would meet the author’s suggested criteria? The author doesn’t list any.

                                                                                1. 6

                                                                                  The (implicit?) proposal is not a regression to the world before lockfiles (which I also remember with zero fondness), but a world after lockfiles. I really empathise with this part:

                                                                                  Lock files are a crutch for non-deterministic processes; they make downstream steps reproducible, at the cost of an extra dependency (the contents of the lock file). At best they are unverifiable, unreproducible build artifacts with no corresponding source; at worst they are plausibly-deniable attack vectors. In this sense, they embody all the same anti-patterns, foot-guns and time-bombs as other toxic practices like e.g. Docker images.

                                                                                  Just yesterday I was trying to manoeuvre a combination of a Gemfile.lock and gemset.nix (via bundix) for my partner’s Ruby project into alignment. I already had the exact same setup done on a project of mine — there’s an annoying bug wherein bundix doesn’t understand Bundler platforms very well, and on my project I had something going well which forced Ruby/non-native gems — but I simply couldn’t recall or find my way to the exact sequence of commands necessary to produce the correct lockfile. In the end I had to just hack up her Gemfile.lock with a text editor, but her resolved versions differed from mine and so there was even more grief.

                                                                                  And Gemfile.lock is one of the easier kinds of lockfile to deal with as a human!

                                                                                  1. 2

                                                                                    Working with Ruby related software on Nix is a pain, but a counter example could be the Rust-Nix ecosystem mix; here you simply specify your Cargo.lock file and that is sufficient for nix to build the package deterministically. The cargo2nix tool works fairly reliable. Both will build projects just fine.

                                                                                    The author of the post is proposing to do away with lock files by proposing storing a content hash of dependencies… which is just a lock file with a new name. They only difference to what f.e. Rust currently does it that they want Cargo.lock with Minimal Version Selection from Go.

                                                                                    1. 1

                                                                                      The author of the post is proposing to do away with lock files by proposing storing a content hash of dependencies… which is just a lock file with a new name.

                                                                                      Lock files are unnecessary, since all that information is already stored. For example, Cargo doesn’t need a lock file since all that information is already available in the crates.io index; we just need to say what revision we’re using (current HEAD is https://github.com/rust-lang/crates.io-index/commit/10e6ccc602255b181d5187b9926dc0abeaebfd9e )

                                                                                      They only difference to what f.e. Rust currently does it that they want Cargo.lock with Minimal Version Selection from Go.

                                                                                      Nope, no need for Cargo.lock, and version selection is mostly orthogonal (so long as it gives the same output for the same input). I’ve never used Go, so can’t really comment on how well its version selection works.

                                                                                      1. 6

                                                                                        I think your article would have been easier to understand if you had been more explicit about how the package name, version -> hash lookup occurs and how it is made deterministic and reproducible. This one example gets your point across far more effectively.

                                                                                        But it still isn’t clear to me why you think it’s better to lock the package index rather than the individual dependencies. What if I want to get the dependencies from somewhere other than a centralized repository?

                                                                                        1. 3

                                                                                          What if I want to get the dependencies from somewhere other than a centralized repository?

                                                                                          I’m not tied to any particular implementation; indeed, I’m not even advocating the use of centralised repositories, that’s just the default behaviour of many existing package managers.

                                                                                          My guiding principle is that it’s better for repos to contain “source code”, which is roughly “instructions written by a person”. Lock files are not source code, since they’re auto-generated, not the sort of thing a person would write, and (crucially) were not chosen or decided-on by a human. Their contents were calculated automatically, so why keep them in git? The only reason to do so is that the calculation was not reproducible; if we fix that, then we don’t need them.

                                                                                          On the other hand, something like , or <hackage.haskell.org at 2024-01-01T00:00:00Z>, or <revision ABC of git://foo> are the sorts of thing a programmer might decide to write down.

                                                                                          Hence, the answer to your question would be: if you choose to use some specific thing, then write that down that choice, in a way that’s precise-enough to be reproduced elsewhere. That would probably look like a lock file, but the crucial difference is that it’s “source code”, in the sense that you wrote it to capture a specific decision you made.

                                                                                        2. 3

                                                                                          Storing the revision of the cargo registry is just a lock file, though it would solve the version selection issue at the cost of being unable to deterministically upgrade a dependency version without possibly dragging along other versions.

                                                                                          1. 1

                                                                                            being unable to deterministically upgrade a dependency version without possibly dragging along other versions.

                                                                                            There’s no reason a package manager has to stick with one registry, or one version of one registry. It’s also useful to allow manually-specified versions/hashes (which look like a lock file, but are hand-written instructions; not an auto-generated cache of everything). Overrides, jail-breaking, etc. can be useful too. There’s a large design space for that; but none of that needs lock files.

                                                                                            1. 3

                                                                                              Having to maintain that, especially if I use tools to automatically create PRs for security updates, starts sounding like the file will end up being automatically generated anyway because few people will bother to go into crates.io and copy-paste the latest revision from there.

                                                                                              Cargo would support the entire workflow if you really wanted using a local repository mirror. It’d just be really painful for near 0 gain over having an autogenerated lock files.

                                                                                              The points mentioned are great and would be achievable with lock files, I see no reason to abandon them.

                                                                                          2. 2

                                                                                            For example, Cargo doesn’t need a lock file since all that information is already available in the crates.io index; we just need to say what revision we’re using (current HEAD is https://github.com/rust-lang/crates.io-index/commit/10e6ccc602255b181d5187b9926dc0abeaebfd9e )

                                                                                            Not quite, no. cargo update --precise exists.

                                                                                            1. 1

                                                                                              I’m not super familiar with Cargo, but from reading the manual it looks like --precise is just a way to explicitly specify a subset of choices? Writing down those choices somewhere in our git repo makes perfect sense (either by adding cargo update --precise to our build scripts, or doing something fancier).

                                                                                              Running that command manually then adding its result to git doesn’t make sense to me; the same way that it doesn’t make sense to put .o files in git, when we could instead write down the GCC command that creates them.

                                                                                              1. 2

                                                                                                I think it makes sense to me ¯\(ツ)

                                                                                      2. 5

                                                                                        Golang uses minimum version resolution, which I have mixed feelings about in general but does meet the criteria as far as I know.

                                                                                        But simply checking in the lock file also adds determinism.

                                                                                        1. 14

                                                                                          I think even with minimal version selection it isn’t resilient to an upstream silently replacing the code in a version in place (without a new version number). Go has its .sum file with hashes for (I assume) this reason.

                                                                                          1. 1

                                                                                            Right, so you still need that. But I think that is different in kind since it is purely derived, unlike Cargo.lock which is both derived and contains information not found anywhere else.

                                                                                        2. 2

                                                                                          Are there any implementations that would meet the author’s suggested criteria? The author doesn’t list any.

                                                                                          Maven is very close to this: in the absence of version ranges it will do the right thing and be 100% reproducible without any need for lock files. Version ranges fuck this up but luckily they are very rare in the Maven world. Leiningen uses Maven dependency resolution but it watches for ranges and defaults to aborting when they are detected.

                                                                                          Strong disagree by someone who lived in a world before lockfiles

                                                                                          You’re talking about pre-bundler Ruby, right? The problem with pre-bundler Ruby was not that it didn’t have lockfiles, the problem was that basically every gem in existence used version ranges, which are inherently nondeterministic. In that situation you can’t solve the root problem (bad dependency specifications in a lange library ecosystem) so the best you can do is work around it by layering lockfiles on top of it.

                                                                                          But using lockfiles in a brand new ecosystem is an unforced error. At that stage you can simply omit support for version ranges and get the determinism you want the right way.

                                                                                          1. 3

                                                                                            Maven is very close to this: in the absence of version ranges it will do the right thing and be 100% reproducible without any need for lock files.

                                                                                            I consider version numbers to be documentation. They shouldn’t be relied upon for anything security-critical, like which files to fetch and run off the Internet. Besides which, I like semver and find dependency solvers very useful.

                                                                                            Maven itself doesn’t actually have much security: everything is addressed using a name and version, but those are both arbitrary strings that have no relation to the package contents. Maven repos keep content hashes alongside artifacts (e.g. .jar and .pom files), so anyone compromising the artifacts can also compromise the hashes.

                                                                                            The good thing about Maven is that it’s easy enough to run in an offline sandbox, using a local directory for package metadata. That, in turn, makes it easily reproducible. Unfortunately, I’ve not found an authoratitive, reproducible source of Maven metadata (akin to all-cabal-hashes, or crates.io-index). Hence my Maven uses tend to generate something like a lockfile (but cached in /nix/store, not version-controlled, and with only its fixed-output hash kept in git) which relies on the repository-provided hashes via trust-on-first-use. Still, that setup was able to spot some repository corruption e.g. https://discuss.gradle.org/t/plugins-gradle-org-serving-incorrect-pom/44161

                                                                                            1. 2

                                                                                              Yes, I should have said “in the absence of version ranges and given trustworthy repositories” because it doesn’t offer much protection against a compromised repository. Definitely the Correct Solution is a content-addressed system, but Maven comes the closest of the “first generation” package systems to doing the right thing.

                                                                                            2. 1

                                                                                              But using lockfiles in a brand new ecosystem is an unforced error. At that stage you can simply omit support for version ranges and get the determinism you want the right way.

                                                                                              Golang, I believe, does this by having you specify a version. When you pull your dependency list has the version, and then you have a block file that has the checksums of every single version you’ve downloaded.

                                                                                              So I’m curious what you would improve about the Golang situation to not have the two files that essentially are lock files.

                                                                                              1. 2

                                                                                                I’ve never used golang so I could be wrong here, but my understanding is that the checksums it stores do not affect the dependency resolution algorithm; they are used after dependency resolution has occurred as a way to detect mismatches that could be caused by attackers or mistakes.

                                                                                                Superficially they look similar to lockfiles, but if your main dependency declaration always resolves by giving you exactly what you asked, then they don’t actually work the same way at all.

                                                                                                I don’t believe there’s any downside to having a separate safety pass layered on top of a dependency resolution algorithm that is already built on a deterministic foundation.

                                                                                            3. 2

                                                                                              Are there any implementations that would meet the author’s suggested criteria? The author doesn’t list any.

                                                                                              I tend to use Nix for everything, and this post was originally written as part of http://www.chriswarbo.net/projects/nixos/nix_dependencies.html but I made it separate since it’s largely orthogonal. That link gives some concrete advice for those writing package managers, gives a worked example of getting Haskell’s Cabal package manager to resolve dependencies from a local copy of Hackage metadata inside a Nix derivation, and the problems I encountered along the way (e.g. the use of bespoke binary formats, the semantics of local repos not offering all the same features of remote repos, etc.).

                                                                                              1. 1

                                                                                                Haskell Stack will suggest extra-deps with commitment hashes. You can leave out the hash and you might get broken when a revision is published on Hackage. So it’s not perfect. But it seems like it should be realistic to verify whether a stack.yaml is sufficiently pinned down.

                                                                                                You could also consider Nix package expressions a conforming implementation, since I believe you always attach a hash commitment everytime you put an URL in a Nix expression.

                                                                                                1. 1

                                                                                                  Nixpkgs’s Haskell infrastructure does a decent job, e.g. it pins a default Hackage state for lookups, and that state can be overridden if needed.

                                                                                                  One downside with the current way Cabal and Hackage work (I’m not overly familiar with Stack) is that it’s hard to make them resolve using a local folder of Hackage state, e.g. see www.chriswarbo.net/projects/nixos/nix_dependencies.html#solving-dependencies-in-nix-haskell-example

                                                                                              2. 5

                                                                                                I feel this is the right advice for the wrong audience (but is effective clickbait so the right people will see it).

                                                                                                Developers need to break out of the passive mindset and contribute high quality issues, reproductions, and fixes. Full stop.

                                                                                                As a maintainer, I don’t think the problem is us. I’ve tried the above suggestion (for the purposes of encouraging contribution) and it’s really hard. Building things in public is hard mode. Building a community of people building things in public is extra hard mode. Not to say we (the maintainers) have no responsibility here, but I feel contributors need the most support and by supporting them we (the community) support maintainers. That’s the thesis of my open source work with CodeTriage and my HTOS book.

                                                                                                1. 1

                                                                                                  only reason we needed the mutex is because there’s another task that reads the field).

                                                                                                  Is there a reason you’re using a mutex instead of RWLock (or similar)?

                                                                                                  1. 3

                                                                                                    Why use a rwlock when you could use a mutex? The only time you should use an rwlock is when you have many more readers than writers, and you can’t use an RCU for some reason (e.g. the data can’t ever be stale).

                                                                                                    1. 1

                                                                                                      The only time you should use an rwlock is when you have many more readers than writers, and you can’t use an RCU for some reason

                                                                                                      The documentation says nothing like this. I too thought RwLock was appropriate if not all code accessing the lock would need to write through it. Maybe I would know better if I had formal education in multi-threading?

                                                                                                      1. 3

                                                                                                        Using an RwLock for one reader and one writer is equivalent to a mutex. The RwLock either allows a number of readers or at most one writer at any point in time. Hence, with just one reader and one writer, it’s either one or the other and the entire thing behaves like a mutex.

                                                                                                        1. 1

                                                                                                          Oh, well, I was thinking “many more” sounded greater than 1.

                                                                                                          1. 1

                                                                                                            There is a performance and complexity penalty to using an rwlock vs a mutex. So if you do not actually have many more readers than writers, you are paying the price every time you take the lock. If you don’t care about performance, just use a mutex since it is simpler. And if you really do have many more readers than writers, you should consider using something like RCU (e.g. crossbeam), which is wait-free for readers. There are scenarios where you can’t use RCU, such as when all readers must see an update as soon as the writer finishes. But these are rare since the scheduler can introduce something similar just by scheduling readers before/after the writer.

                                                                                                  2. 36

                                                                                                    I ran into numerical identifiers getting silently corrupted during roundtripping, and had to encode them as strings. It’s the worst of everything. Restrictive and under-specified at the same time.

                                                                                                    1. 12

                                                                                                      In the Cloud Native Buildpack ecosystem there’s a bug where you store data as TOML but due to an implementation detail it’s persisted via json and then re-serialized into TOML. The round trip turns an int (which toml has) into a float so if you’re using a strong typed language you end up with values that aren’t even the same type as what you put in.

                                                                                                      So we end up having to do the same. Either make it a float up front or a string. The rest of the ecosystem is pretty great, but that’s a subtle but meaningful gotcha.

                                                                                                      1. 7

                                                                                                        We had this same issue when we first added a REST API to the prior-to-that embedded-Java-only Neo4j..

                                                                                                        What got really messy later in was realizing - some implementations of JSON do support real ints. Like, the .NET one if I recall correctly will happily round trip i64s without precision loss.. but then that just makes it even worse, because now some devs think that’s safe and fine, because it usually is in their ecosystem

                                                                                                      2. 7

                                                                                                        Given you can’t have a JSON object with non-string keys, you end up dealing with this mismatch a lot.

                                                                                                        1. 5

                                                                                                          This is bad but a while back I discovered that somehow GraphQL finds a way to make it even worse; the only integer type is defined as 32-bits.

                                                                                                          1. 3

                                                                                                            Heh! Guilty as charged. I had a bug like that in my JSON parser, it corrupted very large integers - such as those used by tweet ids back in the day: https://github.com/SBJson/SBJson/pull/171#issuecomment-19842731

                                                                                                            The bug wasn’t even in the parsing, exactly, but in my assumption that converting NSDecimalNumber to long long integer would retain as much precision as possible. This turned out to be wrong.

                                                                                                            1. 2

                                                                                                              Likewise, we used it for a stock exchange data APIs, started with numbers for price data, and then switching to strings instead for the very reasons explained in the article. It made me sad!

                                                                                                            2. 3

                                                                                                              Very interesting, however, I don’t understand how engaging multiple cores uses less power. That seems counterintuitive. I would guess it was the same or slightly more power.

                                                                                                              The linked paper says:

                                                                                                              Number of Active Cores. Figure 11a shows the relationship between the number of active cores and power draw. Using multiple cores, energy usage grows up to around 2× the base power draw compared to using a single core. We fit a log curve to the data �𝑅2 = 0.70, as the relationship does not appear linear, but results may vary across platforms. Denoting Power (𝑥) as the average power draw using 𝑥 cores,

                                                                                                              But I’m not sure why that’s the case. Is that due to chip design or is it coincidental with data locality and this specific benchmark? It might be covered elsewhere in the paper that I’ve missed. I’m curious to hear more if someone has info.

                                                                                                              1. 5

                                                                                                                Multiple cores use more energy per second, but makes the program go quicker, which means fewer seconds are necessary. If, say, by using two cores, we make our program execute in half the time, but the energy usage per second is only 1.5× the energy usage with one core, then overall we’ve made our program use less power.

                                                                                                                The “2×” comment in the paragraph you quoted is a bit confusing, but I think it’s saying that, when all 128 cores on their test machine(s?) were being used, then it used twice as much energy as just one core. But that means that if you could fully utilise all cores in your software, the energy usage per second would be at most twice as high, but you could potentially be able to run your programs 128 times as quickly (under very ideal conditions, of course!)

                                                                                                                1. 2

                                                                                                                  Multiple cores use more energy per second, but makes the program go quicker, which means fewer seconds are necessary.

                                                                                                                  This can be true but it isn’t necessarily true. Consider an operator that races two algorithms toward the same result and only keeps the one that finishes first.

                                                                                                                  1. 1

                                                                                                                    This is true, my assumption is that you are fully utilising all the cores to produce useful work. That said, the paper points out that a parallelisation overhead of up to 87% will still (on their system at least) use less energy overall. So if the result of racing the two algorithms is that you can still produce, on average, faster results than if you were to just pick one or the other algorithm, then you might still end up saving energy.

                                                                                                                  2. 1

                                                                                                                    I took an HPC course and understand things like diminishing computational returns and Amdahl’s law and embarrassingly parallel problems (speaking to the linear speed up comment). I think what I most don’t understand is this:

                                                                                                                    the energy usage per second is only 1.5× the energy usage with one core

                                                                                                                    Something like that must be true for the article to be true. But why? What about the CPU design/architecture makes 2x cores not use 2x power?

                                                                                                                    1. 3

                                                                                                                      The paper discusses this a bit in section 3.2 where they talk about how they made the measurements. From what I can tell, each core, even if it’s not running, contributes some power draw. I assume there are other components on the system that also draw constant power regardless of the number of CPUs running, but I think the tool they’re using to measure power usage tries to limit itself to only reading energy usage from the CPUs and memory.

                                                                                                                      1. 1

                                                                                                                        Given the scale of difference in wall-clock time (scalar backend: 4.38s 1c vs 0.33s nc (13.3x); SIMD backend: 0.95s 1c vs 0.10s nc (9.5x)), it doesn’t seem extremely surprising to me; there are resources being used that belong to the system as a whole which are being used for ~10x as long. That the net energy used is about 2x in each 1c case compared to the respective nc case suggests tying up those resources for a longer period of time uses more electricity. The example in the article generates a 100 megapixel @32bpp image; perhaps tying up 400MB of RAM or using the system bus for 10x longer explains the difference?

                                                                                                                        It’s also entirely possible that the measurement method is just reporting the general state of the computer being on for that much longer. The SIMD case runs for an additional 0.85s and consumes an additional 9 J, or 10.6 J/s. The scalar case runs for an additional 4.05s and consumes an additional 34 J, or 8.4 J/s. Feels equivocal without a lot more or higher quality data (e.g. run on bare metal).

                                                                                                                    2. 2

                                                                                                                      The amount of available wall-clock time is fixed. If the program is useful, I will run it more often. If it runs faster, I can run it more often. So 2x parallelism will let me run the program twice in the same amount of time, doubling power consumption.

                                                                                                                      1. 2

                                                                                                                        If the program is useful, I will run it more often.

                                                                                                                        This is clearly not generally true. I often run programs to achieve particular goals, not “to run 20 seconds worth of program”.

                                                                                                                        Using multiple cores, energy usage grows up to around 2× But I’m not sure why that’s the case. So 2x parallelism will let me run the program twice in the same amount of time, doubling power consumption.

                                                                                                                        The article being quoted by GP (end of p12–p13) is stating that energy usage grows up to around 2x as the number of active cores increases from 1 (baseline) to past 120 in a logarithmic curve, and we’re collectively wondering why it’s not much closer to linear:

                                                                                                                        Denoting Power (𝑥) as the average power draw using 𝑥 cores, we find that, on our system, doubling the number of cores used increases average power draw by only roughly 30W

                                                                                                                    3. 4

                                                                                                                      Phil worked for Heroku back in the day. A neat trick that we do: YAML is a superset of JSON and it supports comments. So we parse as YAML then as JSON (IIRC, I don’t maintain a tool that directly interacts with customer JSON).

                                                                                                                      The meta is interesting: lots of people commenting for or against it. This feels like prescriptive versus descriptive linguistics. The standard says “Ain’t ain’t a word” but the people do. Hopefully standards converge to match consensus just as Webster now has an entry for “Ain’t” but taking an absolutist position on either side is likely not the answer.

                                                                                                                      I feel like OP has a pragmatic view of “here’s what I want to see more of, let’s talk about it and promote it, even if it’s not technically Alex compliant.” Everyone arguing with the title cannot see the Forrest for the trees. He’s not literally arguing against DC, he’s advocating for something he wants to see in the world. It’s a technical “yes, and” and I like it.

                                                                                                                      1. 11

                                                                                                                        I love Firefox and open source. But the little chips in the armor and gravity of the chrome ecosystem for my day job was too much.

                                                                                                                        I dropped Firefox for a weird bug in Google hangouts/meet/video where my audio when using an analog to USB converter would make me sound like a robot. No clue if it’s their fault or Google’s or the makers of the Motu 2 I’m using. I tried making some videos to report it be it was really hard as it needed a 2-3 devices and a video session and the devices couldn’t be in the same room since I needed to demo audio to reproduce the problem.

                                                                                                                        Prior to that I hit an annoying but non-blocking bug that I reported about 6 years ago. Others have commented they have the same issue. No progress https://bugzilla.mozilla.org/show_bug.cgi?id=1528442

                                                                                                                        I wish I could “trade” some open source time where I’m equipped to take on Ruby or Rust contributions, for a foreign ecosystem where someone else is more equipped. I’ve wondered if I’m alone in that want or if there could be some sort of bartering marketplace for work hours on open projects rather than for money.

                                                                                                                        1. 9

                                                                                                                          I ended up just using the Google Meet PWA with Chrome (in its own segregated window, as if it were a regular application), and Firefox for everything else

                                                                                                                          1. 7

                                                                                                                            I dropped Firefox for a weird bug in Google hangouts/meet/video where my audio when using an analog to USB converter would make me sound like a robot.

                                                                                                                            This very much sounds like a sample rate mismatch between your Audio Interface and OS, e.g. the MOTU M2 runs at 48kHz, while the OS expects a sample rate of 44.1kHz for the device. It’s a bit odd that this is limited to Firefox.

                                                                                                                            1. 2

                                                                                                                              I have the same situation, but my solution is to use Chrome for Google Workspace pages exclusively, and Firefox (esp. w/ Containers) for everything else, on both work and personal devices. Aside from some YouTube hiccups that crop up about once a year, Firefox works just as well for everything that isn’t Google Workspaces.

                                                                                                                              For me, Firefox Containers feel like the “right” way to do what Chrome instead (I assume?) expects you to do via multiple signed in Accounts in the browser itself.

                                                                                                                            2. 2

                                                                                                                              It’s interesting. I would think that you’d want everything written in Ruby (which would allow for JIT optimizations), and then hand pick things to optimize in C-Ruby (for interpretation), but they’re doing the exact opposite! :D

                                                                                                                              1. 6

                                                                                                                                If they were starting from scratch, but with YJIT, I’m sure they would. They’re undoing earlier manual optimizations in favor of the new thing that’ll do it automagically.

                                                                                                                                1. 2

                                                                                                                                  In the past (10+ years ago) the primary path to improve ruby MRI perf was to optimize all the hot code paths via C. Or you could go alternative ruby runtimes with rubinius or jruby or one of the many others, but mri has always been the primary runtime. This pre-dated any sort of production-ready JIT development for MRI.

                                                                                                                                  So now I think there a lot of perf sensitive code paths where you have to carefully unwind C (or essentially feature flag, as the post shows) in ruby-core to let YJIT do its magic.

                                                                                                                                  1. 1

                                                                                                                                    IIRC Ruby 3.3 does not have YJIT on by default but 3.4 will. With that change they can modify the codebase to favor refactors that help JIT versus today they need to balance both on and off execution mode performance.

                                                                                                                                    1. 2

                                                                                                                                      No, YJIT still isn’t on by default in 3.4.

                                                                                                                                  2. 9

                                                                                                                                    I love these posts man. Very deep dives into lots of interesting topics. Still a consistent story line that’s easy to follow.

                                                                                                                                    1. 3

                                                                                                                                      Thanks schneems! I had alot of fun writing this one. Something I haven’t really dug into before.

                                                                                                                                      1. 3

                                                                                                                                        They’re even fascinating for ppl like me who don’t know much about Ruby at all.

                                                                                                                                      2. 3

                                                                                                                                        How do Cloud Native Buildpacks differ from traditional buildpacks? We’ve had at least one request to convert the jemalloc buildpack but I’ve yet to understand what work is involved or what the benefits are.

                                                                                                                                        1. 3

                                                                                                                                          It’s a different spec here’s a comparison doc https://devcenter.heroku.com/articles/classic-vs-cloud-native-buildpacks

                                                                                                                                          And here’s a doc on converting an existing buildpack into a CNB https://devcenter.heroku.com/articles/creating-cloud-native-buildpacks-from-classic-buildpacks

                                                                                                                                          From a maintainer standpoint a lot of APIs are more purposeful and less hacky. The biggest end user benefit is the ability to generate OCI images (docker containers) locally.

                                                                                                                                          These docs just went live today so let me know if you hit problems or still have un-answered questions.

                                                                                                                                          1. 1

                                                                                                                                            Thank you!

                                                                                                                                            I will take a look when I have a chance. Can you comment on if Fir will continue to support classic buildpacks?

                                                                                                                                            1. 1

                                                                                                                                              Skimmed the doc on updating and it mentions the Fir will only support Cloud Native Buildpacks.

                                                                                                                                              1. 1

                                                                                                                                                It will not support classic buildpacks.

                                                                                                                                                We’ve done some experimentation with wrappers and shims etc. but it’s easier to go from a more explicit standard to a less explicit standard. For example, the .NET classic buildpack wraps and calls the .NET Cloud Native Buildpack. That’s an option to not have to maintain two.

                                                                                                                                                Also not sure if you saw or not, but we have a suite of rust tools for cloud native buildpacks https://github.com/buildpacks/libcnb

                                                                                                                                                1. 1

                                                                                                                                                  I haven’t seen that yet but I’ll add it to my notes for when I get the chance to work on it.

                                                                                                                                                  Is there any way to support both in a single repository? There is a lot of documentation out there for folks to set up the buildpack that would suddenly become obsolete if we need to change to a new URL for the CNB version.

                                                                                                                                                  1. 1

                                                                                                                                                    Yes you can support both in one repo. That’s the approach taken here https://devcenter.heroku.com/articles/creating-cloud-native-buildpacks-from-classic-buildpacks#copy-bin-compile-to-bin-build.

                                                                                                                                                    The CNB entry point is bin/build rather than bin/compile. So you can have both or call one from the other.