Threads for twotwotwo

  1.  

    In the world at large, there are much more important things to spend a billion dollars on than software, like, I dunno, malaria prevention or vaccine research. In software, I would try to make an open, complete, easy-to-get-started-with way to build everyday Web apps. The motivation:

    • $product_from_work is built/operated/supported a mid-sized team (10 folks) and we’re well-served by Django, an RDBMS, and some AWS services. It still bugs me how much of what we built is still “factorable” stuff we and a zillion other teams each have to do themselves.
    • Integrated “code it and go” services often lock you into a vendor and have a huge markup on the underlying hardware for steady-state uses (Lambda, Vercel–I know, the markup is how they get the billion dollars), and/or you hit a cliff where you can’t do what you need in their environment. You need it to be easy to start with, but flexible when you outgrow it.
    • I’ve got a soft spot for how people who aren’t expert programmers could strap together something useful for their work in FileMaker or such, or folks familiar with a framework can churn out a CRUD site quick. People tend to pooh-pooh it as not-really-programming but if you’re trying to make the potential of computers available to as many folks as possible, degree of difficulty is a bad thing!

    The project is to assemble and package up the stuff that’s missing around CRUD frameworks:

    • Front-end batteries: there’s a range of widgets and form idioms that are almost universal out on the Web but still tend to take a lot of reimplementation in every app, from “decent multiselects” to complex interactions between fields (field Y is required only if box X is checed). You need to make decent forms easy w/out creating a cliff where people need to leave your framework in order to do anything custom.
    • Back-end batteries: the development and operations envrionment, from things like repo, deployments, a Web cluster, monitoring and alerting, your DB setup (incl., say, backups and scchema migrations), backend and cron jobs, miscellaneous services like logging, caches. People are often either stringing this together or getting a managed environment at the cost of actual dollars and/or vendor lock-in.
    • Administration and multitenancy: Many apps have at least a few types of user: customer administrators, customer end-users, and developers/operators of the application, and many can operate different instances with some degree of separation. CRUD frameworks’ authentication also needs some awareness of modern world (2FA, outside identity providers).
    • Openness: The whole thing should be open source and able to run on bare metal. Not that you couldn’t use a service for blob storage, CDN, or a managed DB, but even if you don’t, you get a complete stack. As a corollary, if some bundled/“blessed” component doesn’t work quite right for you, you can integrate a different one. (This also complicates the “back-end batteries” part; you need a fallback to AWS’s or GCP’s approaches to things.)
    • Using existing stuff: Despite the billion dollars, no one is writing a new database or wholesale reinventing the CRUD framework. That loses the benefits of the existing ecosystem and gives you the chance to remake old mistakes or make new ones.

    I specifically think it’s not right to tackle instant/infinite/transparent scalability (vertical RDBMS and frontend/worker cluster, or one per tenant, scale reasonably large!) or to replace the programming models people are actually using. Those eat your project and are not really where the barriers to entry are.

    Now if you manage to get all that done, I have a few silly specific ideas where some elbow grease could probably make things better:

    • There are particular parts of the stack where I think the right way to do things may involve something not super common in CRUD frameworks now, like templating, frontend performance, and what to do about reporting when you get medium-sized.
    • Once you reduce the overhead to coding, you can start to think about a super-easy entry point–“low-code” as they say these days–that ramps up to really writing code. I know various projects try to do this on top of existing frameworks.
    • If you’re actually operating a large instance of this you have the motivation to work on scalability/efficiency. That isn’t necessarily Big Ideas, ‘just’ the (interesting!) loop of profiling or organically noticing pain points and figuring out how you get past ’em.

    But, again, I think a satisfactory factoring of the things every mid-sized app already needs to do is an enormous already!

    1. 3

      I know there’s no trivial fix here, but, related to “Why Share What You Don’t Want to Share?”, it’s notable how unclear the expectations are around the majority of public code out there. Your language’s standard library and lots of big or long-standing projects have an org behind them, some momentum, and a policy, but most projects depend on some code that has none of those.

      Sometimes this plays out as people who just released some code being approached on a public issue tracker as if they were paid maintainers. For me, sometimes publishing code was held up by it not feeling like a proper product, and I’ve heard the same for others. Now and then we end up with dependencies actually proving untrustworthy, and picking packages can involve lore or frustration (“use Y instead of X; X looks great but you’ll find nasty bugs two days into using it”).

      Platforms like GitHub, pkg.go.dev, or crates.io do some things for you–you get some indicators of popularity, recency and level of activity, folks can use badges to suggest e.g. they at least have some kind of tests, and once a dependency passes the smell test you can dive in for more thorough checks. But it all feels kind of unnecessarily hazy, both for those trying to make educated guesses about what to depend on and those putting code out there.

      If I had a magic wand to wave, maybe there would be more explicit and systematic (but not binding) statements of project intent and status out there, e.g. ‘this is just for fun’ or ‘it’s a personal project, but I intend to keep it stable’ or ‘we at jfkldajskl corp. depend on this in production and maintain it’. Trusted folks doing curation (less “awesome X”, more orgs you recognize who use packages in anger blessing them), maybe linked to some resources for projects/maintainers, could help too.

      The point of this isn’t to rail about the economics, but it also isn’t lost on me this is the second-order effect of how a lot of open source turns people’s hobby work into value for well-funded companies. Maybe part of a durable solution would be for those that want to rely on a project kick in enough to help make it reliable–improved channels/systems for that to happen and the companies involved deciding to use those channels.

      1. 3

        Two common classes of database query you might do in Python are “pull one row into a tuple” and “pull a column of IDs or such as a list”.

        Our DB utility library handled these two situations with 1) a function that would accumulate all the values from all the rows in a query into one big tuple (so that one function could handle a single-row or single-column query), and 2) a wrapper to call the tuple-returning function and convert its result to a list.

        In retrospect it’d’ve made more sense for those two use cases to be handled with totally independent functions, and the row one enforces that the query returns exactly one row and the column one enforces that it returns one column. But ten years ago I was–uh, we were capricious and foolish.

        Unfortunately, adding lots of values to a tuple one-by-one is O(n^2). Retrieving lots of values through this code still completed quickly enough that it took surprisingly long to notice the problem–it might add a few seconds pulling a million IDs, and often those retrievals were the type of situation where a legitimate few-second runtime was plausible.

        When we did fix it, it was a very small diff.

        1. 2

          “Strictly speaking, a CPU is an interpreter” is a part that would have blown my mind growing up with in-order CPUs. It appears to be a fundamental thing, not just a historical quirk: processors rely on a JIT-like process of observing actual execution, rather than an up-front analysis of the code itself.

          I’m almost sure sure if you’d asked to predict what a highly parallel processor would look like before I knew anything about OoO, I’d’ve imagined something closer to Itanium: a compiler analyzes code a ton to find what instructions can run together. Maybe I’d’ve speculated about some other approach (SIMD?), but definitely not today’s implied, dynamically-figured-out parallelism within one stream of instructions.

          And then it turns out there tons of performance challenges that are hard to solve by up-front analysis but feasible to tackle by watching execution (including values and path taken through the code), and often making a guess and being able to recover when it turns out you were wrong:

          • To figure out if two instructions with memory references depend on each other, you might need to know if two pointer values turned out to be the same; that might be very hard to figure out up front, but you can observe whether the pointers are the same if you’re deciding parallelism at runtime.
          • Bounds checking involves a ton of branches that are almost never (or always) taken, but completely explode your security model if you don’t have them; branch prediction, even versions of it much simpler than you’d find in a fast chip today, makes that relatively cheap by just watching if the branch is taken a lot.
          • Indirect branches, needed for any sort of dynamic language where you can call obj.foo() on any kind of obj, would normally have to stall the processor ’til it can be completely sure where that pointer will lead and what instructions it will run; indirect branch prediction can often help a lot.

          It’s not OoO-specific, but even cache hierarchies have a little bit of this dynamic flavor: deal with the slowness of main memory, without complicating the code with explicit data movement between types of RAM, by observing patterns code typically follows.

          1. 2

            Go’s rule against circular dependencies at the package level makes splitting by dependency particularly important.

            If one utility depends on any package in your app and another independently useful utility doesn’t, it’s possible someone will want the no-deps utility but be unable to use it because there’d be a circular dep with the other utility’s dependency. Besides the hard rule, avoiding circularity just tends to make things easier to think about.

            I also sort of agree with another comment that I’d locate e.g. email utilities ‘near’ any other email code that lives in your world if possible, e.g. email/emailutil or email itself instead of util/emailutil. If someone writing future code knows they’re trying to do something with email they know the ‘email’ part for sure, and only might know that the particular thing they want is a ‘util’. However, if your utils are related to code outside your app (stdlib or otherwise third-party), “put the utils with the other code” may just not apply and the stdx sort of approach seems appealing.

            1. 3

              A non-cloud deployment seems a lot more reasonable than you’d think given how little you hear the option talked about. (As context, the app I work on started on-prem and moved to AWS years back, and other parts of the larger company we’re part of still have space at a datacenter.)

              You need to overprovision for flexibility/reliability. You need some staff time for the actual maintenance of the servers; it may help to keep your physical deployment simple, a different thing from its size. You also need to adjust to the absence of some nice convenient cloud tools. You can still come out fine even given all those things.

              Echoing another comment, you can get some pretty incredible hardware outside the cloud–Amazon’s high-I/O local-SSD hardware lagged what you could get from a Supermicro reseller for as long as I was tracking it.

              We’re now pretty committed to AWS. We integrate S3, SQS, ALBs, Athena, etc. and some features/processes count on the easy replaceability of instances. Flexibility is also useful in less tangible ways. I’d also note this blog post shares the common weakness of talking up expected upsides of a change before dealing with the downsides.

              Still, I don’t at all think the non-cloud approach is unreasonable. In a way, it’d be neat to hear about some successes outside the major clouds, and I wouldn’t mind more people taking another look at other hosting options, both because it could make some cool things possible and it could nudge the cloud providers to be more competitive.

              1. 1

                Scattered thoughts:

                • I really like that this starts with Cost Management. It’s kind of like profiling a program–when you look the easy room for improvement doesn’t turn out to be where you’d guess. You can also export cost data to query in Athena.
                • I’m a big fan of the flexible Savings Plans. Knowing only that you’ll need a certain level of base load for 1 or 3 years, you can save a lot, and you keep the flexibility to change type, family (including new families that aren’t yet an option when you sign up for the savings plan), or region (for lower cost, DR, data-location requirements, etc.). Note RDS, etc. don’t fall under these plans and have their own, less flexible, reserved instances.
                • Some places you can reduce costs outside of compute:
                  • Useless storage: over enough it’s easy to accumulate junk in S3 or EBS snapshots you don’t need. An occasional manual cleaning can be worth your time (S3 Storage Lens is nice), and lifecycle rules to delete stuff like logs, Athena results, or temporary files can help. (Also, gzip and zstd work wonders on things like logs or data dumps and also make them faster to read!) Archive-tier EBS snaps are kind of interesting if you need to keep them 90 days, though we haven’t found a use case.
                  • S3 storage classes: for anything like backups, Intelligent Tiering looks awesome. Glacier Instant lets you store backups you’re keeping for 90 days for cheap, without the slow Glacier retrieval process if you need them. (And Infrequent Access’ minimum is just 30d.) None of these have super punitive costs when you do use your backups.
                  • Bandwidth out: AWS’s bandwidth out is relatively expensive. If you serve enough TB/mo of static or cacheable content to care, multiple CDNs and Cloudflare R2 can do it for you cheaper. (If you’re relatively small but your bandwidth costs for static content are noticeable, CloudFront’s free 1TB/mo may be nice.)
                  • Inter-AZ bandwidth: I’m not sure how common this is, but we run cross-AZ, and trying to keep heavy data flows within one AZ can reduce costs, but it’s tricky and app-dependent.
                • Picking the right instance family (e.g. c6 vs r6 vs m6) can help.
                  • A couple times we’ve considered increasing size and been able to move “laterally” instead (e.g. ‘m’ family to ‘r’ when RAM was the limiter), or to a newer variation that had what we wanted at the time (i3 to i3en for space).
                  • Data/benchmarks may not line up with intuition: our Web tier was more RAM-bound than you’d expect a Web tier to be, and an OLAP database ended up more CPU-sensitive than expected.
                  • With flexible RIs/savings plans, it’s worth checking now and if moving to the latest variation of your instance types is an efficiency gain.
                  • The m6a/r6a/c6a instances can be cheaper and faster than Intel options in the same gen and are worth checking out.
                  • The ‘t’ instances are awesome for small utility servers.
                  • Gravitron’s sneaky advantage is that each “vCPU” is a physical core whereas on Intel/AMD a vCPU a is hardware thread that may a core w/another thread. It won’t win a single-threaded benchmark against AMD/Intel, but because you have twice the physical cores at a similar price, many-thread throughput can be better. Certainly interesting for greenfield or easy-to-port stuff.
                • It’s worth comparing alternative/related services for a use case, and looking for people’s opinions of them. Like, Athena does wonders for us, and I hear iffy things about DynamoDB and Managed NAT Gateway. (A data lake queryable by Athena taking advantage of compressed formats and partitioning can be pretty great.)
                • Considering the app and deployment together can help. If you had to grow an instance because of OOMs around spikes of activity, maybe you can smooth the spikes or reduce the RAM usage or both. I’ve gotten useful ideas from trying to follow costs back to the code that incurred them.
                1. 6

                  One of its mistakes is:

                  Question 1, point 1: fails to call getPoints or goalReached on a level object (it tries to access a levels array which doesn’t exist)

                  An interesting thing about this problem space is you could cheaply try compiling the output and feeding the error back in an automated way. Ditto for linters or other non-AI tools. You could also ask the machine to write a test (or the human writes one) and report back if and how it fails. Someone’s experiments with Advent of Code show AI models can work with feedback like that.

                  Those wouldn’t pass the spec for the task here (which is to pass a test that humans pass closed-book), but it does suggest there are strategies outside of the models themselves to make models better at real-world tasks.

                  Also, that’s using tools built made to help humans. In current deployments, AI shows some non-human-like failure modes like producing runs of highly repetitive code or blending two potential interpretations of the input text (notably, unlike a human, AI can’t or at least doesn’t usually seek clarification about intended meaning when it’s not sure). That suggests there might be other AI-specific improvements possible, again outside of making the bare model better at getting the right result the first time.

                  1. 12

                    An interesting thing about this problem space is you could cheaply try compiling the output and feeding the error back in an automated way.

                    I spent much of today “pair programming” with ChatGPT. Just feeding the error back will sometimes help. But not at all consistently. ChatGPT struggles with conditional branching, with non-determinism, and a number of other things. But I was eventually able to talk it through things: http://www.randomhacks.net/2019/03/09/pair-programming-with-chatgpt/

                    The kind of things that only rarely work:

                    • Feeding ChatGPT error messages.
                    • Explaining that it has a bug in a particular expression.

                    Things that work surprisingly well (paraphrased):

                    • Restarting the session when it starts to get confused.
                    • “Let’s try writing that string parsing in Python.” (Not C. sscanf is no way to live.)
                    • “Here are some example inputs and outputs.”
                    • “Please rewrite the parsing function using regular expressions.”
                    • “Show me an efficient formula to calculate X. Now translate that formula to a Python function with the signature Y.”
                    • “OK, now that the Python version is working, translate it to idiomatic Rust.”
                    • “Please write a bunch of unit tests for function X.”

                    It’s really more like mentoring a very diligent junior programmer than anything else. You have to be specific, you have to give examples of what you want, and you have to realize when it’s getting in over its head and propose a better approach. But it’s willing to do quite a lot of the scut work, and it appears to have memorized StackOverflow.

                    I suspect that there is some CoPilot-adjacent tool that would be one hell of a pair programmer, given a human partner that knew how to guide it. Or at least, such a thing will exist within 5 years.

                    But the more I played with ChatGPT, the more I felt like there was a yet-to-be-closed “strange loop” in its cognition. Like, I’m pretty sure that if you asked ChatGPT if there were better ways parse strings than lots of split and if, it could probably suggest trying regular expressions. I’d give you even odds it can explain the Chomsky hierarchy. But when it gets stuck trying to fix a underpowered parsing function, it can’t recursively query itself and ask about better parsing techniques, and then recursively feed itself instructions to use that prompt. At least not consistently. I need to close the “strange loop” for it.

                    When it’s clever, it’s terribly clever. When it fails, it sometimes has most of the knowledge needed to do better.

                    I figure that we’re still several breakthroughs away from general-purpose AI (happily). But I think it’s also a mistake to focus too much on the ways ChatGPT breaks. The ways in which it succeeds, sometimes implausibly well, are also really interesting.

                    1. 1

                      Yes! I rarely see somebody notice this.

                      It feels like the GPT family is a (super?)humanlike implementation of about half of a human cognition.

                    2. 2

                      An interesting thing about this problem space is you could cheaply try compiling the output and feeding the error back in an automated way.

                      It exists! There’s a neat little open-source CLI program called UPG that does exactly this (using Codex IIRC). I started a new thread for it over here: https://lobste.rs/s/0gi7bi/upg_create_edit_programs_with_natural

                    1. 4

                      Scattered thoughts:

                      • Sending a post to a few thousand federated servers feels like a lot, but is going to be dwarfed by sending posts from servers to end users. As long as there are far fewer nodes than users, iffy scaling in that network is just not as big a problem.
                      • The “wasted” traffic is sending content server-to-server that no user views–user1@A is subscribed to user2@B’s posts, but they don’t happen to log in at a time they’d ever see this particular post.
                      • Even that traffic isn’t that much, given the small posts, without media. Federating media serving is possible in theory: you embed a link to your buddy’s server or a mutually trusted CDN like you’d embed a YouTube video. If no operator pays another for the cost, somebody is subsidizing somebody else, but that doesn’t make it impossible.
                        • Nobody is going to do a Twitter clone’s media serving as a charity but if some CDN wanted to get some attention now, they could do worse than a first-taste-free kind of deal on CDN’ing for nodes in this alternative social network that’s growing like wild.
                      • It seems worth thinking about what the next couple steps at the data layer are when a node gets too big to be backed by One Big RDBMS (“next couple steps” very different from “rebuilding Twitter”). Caching everywhere? Partitioning old data off? Super-optimized timeline updates when you follow thousands of users of whom only 10 posted since your last check?
                        • Big Nodes™️ presumably get some economies of scale the small ones don’t, for all the reasons being discussed.
                        • I don’t think having a lot of users on Big Nodes breaks the federation idea. Even a dozen big nodes is different from one Facebook or Twitter. And it’s plausible lots of people pick the easy big-node route and a few pick the option that gives them the most control, like in lots of other areas.
                      • It kind of sounds as if optimizing the server code itself would really be useful right now? Would be fun to work on GoToSocial at a time that more-efficient frontends could really help some folks.

                      When looking for Twitter alternatives started to get real, I got an account on Cohost. It’s fun and quirky and I like the Tumblr-ish design, community, and the HTML and CSS crimes, and I figured when I signed up that one team running one instance of the app might have less trouble scaling. All the nice design is still nice, but Mastodon, partly by virtue of being around a while, is just running better for me and has great tools for, say, finding people, which makes it hard for me to count it out despite arguments about what theoretically should or shouldn’t work.

                      1. 8

                        Some of the criticisms are weird because something like Twitter or Facebook is also a distributed system. The difference is that it’s one where the topology is (mostly) under the control of a single organisation. It’s not like there’s one big computer somewhere that runs Twitter, there are a bunch of data centers that all run instances of parts of the logical ‘Twitter server’.

                        Use cases like Twitter are perfect for distributed systems for two key reasons:

                        • Very weal consistency is fine, no one really cares if two people send two messages to other people and they arrive in the wrong order.
                        • Eventual consistency is also fine - no one cares if two people look at the same feed and one sees an older snapshot.
                        • Latency that users care about is measured in minutes (or, at most, tens of seconds), not milliseconds.

                        All of this is very different from the ‘economies of scale’ argument. You can benefit a lot from economies of scale from other people by buying cloud services rather than running your own. Mastodon does this, optionally using cloud storage for large data. I’m not sure if it proxies these back, but (at least with Azure storage, not sure about other services, but I presume they’re similar) you can trivially mint short-lived tokens that allow a user to read some resource, so you can punt serving things like images and videos to a cloud service and just have to give them updated URLs periodically. You could implement most of the control messaging with Azure Functions or AWS Lambda and probably end up with something where anyone can deploy an instance but the cost is much lower than needing to host a full VM.

                        And it does this over HTTP, hence TCP/IP. All of which is chatty and relatively inefficient.

                        This is not necessarily true with HTTP/3. Then each s2s connection can be a single long-lived QUIC connection and each request can be a separate stream. The overhead there is very low.

                        My biggest issue with Mastodon is scalability in the opposite direction. If I want to run a single-user instance, or one for my family, then the cost is very high per user. I’d love to see a ‘cloud native’ (I hate that phrase) version that I could deploy for $1/user/year on any major cloud provider.

                        1. 1

                          Scaling down is a way harder problem than scaling up. For one, you basically can’t get a VPS for less than ~5$/month - and that number has been pretty much constant for as long as I can remember. So that would immediately put the minimum number of users for the limit you give at 60. You’d need to be able to get instances for at least 10 times cheaper than are currently available if you want to do so.

                          If you want to go the serverless way, I don’t think you’d get much savings there either. My single-user instance handles about 31k of incoming requests per day, which would sum to close to a million requests per month. But this does not include any outgoing requests, which for me, have been averaging out to roughly 500 outgoing requests per day, ignoring retries. So I’d say, that’s at least 1 million function invocations per month, which is exactly the amount of invocations that AWS offers for Lambda for free. But then you also need to add object storage for media (grows depending on usage, I’d guess ~0.1$/month after a while), queuing for outgoing and some incoming requests (would probably fit into SQS’s 1M message limit), and importantly, the main database, which for maximum cheapness, would probably be DynamoDB, and I think you might be able to fit into free tier limits, but I’m not that sure because some operations commonly done by fediverse servers aren’t that efficient on key-value databases.

                          So, you could probably fit it into the free limits of various cloud providers. But if you’d take the free limits away, I’m fairly certain that you’d soon see the costs grow way higher than 5$/month of a VPS.

                          1. 1

                            Scaling down is a way harder problem than scaling up. For one, you basically can’t get a VPS for less than ~5$/month - and that number has been pretty much constant for as long as I can remember

                            This is why you don’t implement it as something that needs a VM.

                            So I’d say, that’s at least 1 million function invocations per month

                            Azure Functions cost $0.000016/GB-s of memory and $0.20 per million executions, so I’d expect your uses to come in below $1/month.

                            But then you also need to add object storage for media (grows depending on usage, I’d guess ~0.1$/month after a while)

                            and importantly, the main database, which for maximum cheapness, would probably be DynamoDB, and I think you might be able to fit into free tier limits, but I’m not that sure because some operations commonly done by fediverse servers aren’t that efficient on key-value databases.

                            Azure Data Lake has some rich querying interfaces and charges per-GB. I presume AWS has something similar.

                            1. 1

                              I don’t think that Azure Data Lake is fit for the purpose. I see it mentioned more as a big data storage, meant for OLAP workloads rather than for OLTP workloads. I think CosmosDB would be the better example on Azure.

                        2. 1

                          Good assumptions but need data. :-P

                        1. 7

                          Did a pretty good job of making me care about JPEG XL even though I have little practical use for it. The backwards compat/re-encoding bit is pretty baller.

                          1. 4

                            FWIW, if others like the back-compat part, a Chrome bug for supporting the re-encoding as a gzip-like transparent filter has not been closed. That may effectively be a clerical error, but I also think it’s a legitimately different tradeoff: doesn’t require the same worldwide migration of formats, just CDNs transcoding for faster transmission, and it’s a fraction of the surface area of the full JXL standard.

                            (It should also be possible to polyfill with ServiceWorkers + the existing Brunsli WASM module, but I don’t have a HOWTO or anything.)

                            They don’t like “me too” comments, but stars on the issue, comments from largish potential users, relevant technical insight (like if someone gets a polyfill working), or other new information could help.

                          1. 3

                            I’m sorry, lossless jpeg recompression Is not a feature that is a selling point: it doesn’t impact new images, and people aren’t going to go out and recompress their image library. I really don’t understand why people think this is such an important/useful feature.

                            1. 15

                              Realistically I think the JPEG recompression is not something you sell to end users, it’s something that transparently benefits them under the covers.

                              The best comparison is probably Brotli, the modernized DEFLATE alternative most browsers support. Chrome has a (not yet closed) bug to support the JXL recompression as a Content-Encoding analogous to Brotli, where right-clicking and saving would still get you a .jpg.

                              Most users didn’t replace gzip with brotli locally. It’s not worth it for many even though it’s theoretically drop-in improved tech. Same things are true of JPEG recompression. But large sites use Brotli to serve up your HTML/JS/CSS, CDNs handle it; Cloudflare does, Fastly’s experimenting, if you check the Content-Encoding of the JS bundle on various big websites, it’s Brotli. Same could be true of JPEG recompression.

                              I don’t think you stop thinking about handling existing JPEGs better because you have a new compressor; existing content doesn’t go away, and production doesn’t instantly switch over to a new standard. I think that’s how you get to having this in the JXL standard alongside the new compression.

                              Separately, if JXL as a whole isn’t adopted by Chrome and AVIF is the next open format, there’s a specific way JPEG recompression could help: AVIF encoding takes way more CPU effort than JPEG (seconds to minutes, depending on effort). GPU assists are coming, e.g. the new gen of discrete GPUs has AV1 video hardware. But there’s a gap where you can’t or don’t want to deal with that. JPEG+recompression would be a more-efficient way to fill that gap.

                              1. 2

                                AVIF encoding takes way more CPU effort than JPEG (seconds to minutes, depending on effort).

                                Happily most modern cameras (i.e. smartphones) have dedicated hardware encoders built in.

                                1. 4

                                  None I know of has a hardware AV1 encoder, though. Some SoCs have AV1 decoders. Some have encoders that can do AVIF’s older cousin HEIF but that’s not the Web’s future format because of all the patents.

                                  I’d quite like good AV1 encoders to get widespread, and things should improve with future HW gens, but today’s situation, and everywhere you’d like to make image files without a GPU, is what I’m thinking of.

                                  Companies already do stuff like this internally, and there’s a path here that doesn’t require end-users know the other wire format even exists. It seems like a good thing when we’re never really getting rid of .jpg!

                                  1. 1

                                    Ah, sorry my bad I was thinking of the HEVC encoders, durrrrrr

                              2. 9

                                It’s just good engineering. We’ve all seen automated compression systems completely destroy reuploaded images over the years. It’s not something that users should care about.

                                1. 1

                                  Indeed, but my original (now unfixable) comment meant for end users. The original context of the current jpegxl stuff is the removal of jpegxl support from chrome, which meant I approached this article from the context of end users rather than giant server farms.

                                  1. 3

                                    I don’t quite understand how you get here. Browsers are used for getting images from servers to regular users’ faces. If they support JXL, servers with lots of JPEG images for showing to regular users can get them to the regular users’ faces faster, via the browser, by taking advantage of the re-encoding. Isn’t that an advantage for regular users?

                                2. 6

                                  The conversion can be automatically applied by a service like Cloudinary. Such services currently offer automatic conversion of JPEG to WebP, but that always loses quality.

                                  1. 6

                                    people aren’t going to go out and recompress their image library

                                    I’m not sure why you’d assume that. For many services that stores lots of images it is an attractive option, especially given that it isn’t just identical image, but can enable recreating the original file.

                                    E.g. Dropbox has in the past come up with their own JPEG recompression algorithm, even though that always required recreating the source file for display.

                                    1. 2

                                      You’re right - I didn’t make this clear.

                                      Regular users are not the ones who care, but they’re the people for whom it would need to be useful in order to justify reencoding as a feature worthy of the attack surface adding JPEGXL on the web. The fact that it kept being pulled up to the top of these lists just doesn’t make sense in the context of browser support.

                                      A much more compelling case can be made for progressive display - not super relevant to many home connections now, but if you’re on a low performance mobile network in remote and inaccessible locations or similar (say AT&T or TMobile in most US cities :D ) that can still matter.

                                      That said I think google is right to remove it from Chrome if they weren’t themselves going to be supporting it, and it doesn’t seem that many/any phones are encoding to it (presumably because it’s new, but also modern phones have h/w encoders that might help with heir, etc?)

                                      1. 3

                                        Most regular users don’t host their own photos on the web, they use some service that they upload images to. If that service can recompress the images and save 20% of both their storage and bandwidth costs, that’s a massive cost and energy saving. I wouldn’t be surprised, given that the transform appears to be reversible. If they’re already doing the transcoding on the server side and just keeping small caches of JPEG images for frequently downloaded ones and transcoding everything else on the fly. If browsers support JEPG XL the their CPU and bandwidth costs go down.

                                        The surprising thing here to me is that the Google Photos team isn’t screaming at the Chrome team.

                                        1. 2

                                          the jpeg-xl<->jpeg transcoding is simply an improvement of the entropy coder, but more importantly as long as there is no change in the image data the “cloud storage” provider is more than welcome to transcode however they feel - I would not be surprised if the big storage services aren’t already doing any as good or better than jpeg-xl.

                                          The reason it can do lossless transcoding is that it is able to essentially a jpeg using a different extension to indicate a different entropy coder. There is nothing at all stopping cloud storage providers doing this long before jpegxl existed or was a standard, and they don’t have the compatibility requirements a standards body is worried about, so I would not be surprised if providers were already transcoding, nor would I be surprised if they were transcoding using their own system and not telling anyone for “competitive advantage”.

                                          The surprising thing here to me is that the Google Photos team isn’t screaming at the Chrome team.

                                          Why? They’re already free to transcode on the server end, which I assume they do anyway, and I would assume they do a better job of than jpeg-xl. For actual users, Chrome already supports a variety of other formats superior to jpeg (not xl), and seemingly on par with jpeg-xl (+/- tradeoffs). In my experience online views (vs. “download image…” button) use resized images that are smaller than the corresponding JS most such sites use (and reducing hypothetical full resolution image isn’t relevant, because they essentially treat online as being “preview by default” and also why provide a 3x4k version of a file to be displayed in a non-full screen window on a display with a smaller resolution than the image?).

                                          The downside for Chrome of having jpeg-xl is that it’s yet another image format, a field renowned for its secure and robust parsers. I recall Safari having multiple vulnerabilities over the years due to exposing parsers for all the image formats imaginable, so this isn’t an imaginary worry.

                                          Obviously in a year or so, if phones have started using jpeg-xl the calculus changes, it also gives someone time to either implement their own decoder in a secure language, or the chrome security folk have time to spend a lot of effort breaking the existing decoder library, and getting it fixed.

                                          But for now jpeg-xl support in chrome (or any browser) is a pile of new file parsing code, in a field with a sub optimal track record, for a format that doesn’t have any producers.

                                          To me the most repeated feature that is used to justify jpeg-xl is the lossless transcoding, but there’s nothing stopping the cloud providers transcoding anyway, and moreover those providers aren’t constrained to requirements specified by a standard.

                                    2. 4

                                      I would actually go and re-encode my share of images, which for some reason* exist as JPEG, if I knew that this won’t give me even more quality loss.

                                      * Archival systems are a typical reason to have high amounts of jpeg stuff laying around.

                                      1. 4

                                        This repacking trick has been around for a while, e.g. there’s Dropbox Lepton: https://github.com/dropbox/lepton

                                        1. 1

                                          But the vast majority of users aren’t going to be doing that.

                                        2. 2

                                          You realized your first order mistake. The second order mistake is what really should be corrected.

                                          1. tone, “I’m sorry”, this is passive aggressive and not productive.

                                          2. Not realizing why this would be an advantage for a new format. This is your Chesterton’s Fence moment.

                                        1. 1

                                          There are some good potentially portable ideas embodied in the reference JXL encoder, like the option to target a perceptual distance from the original rather than a bitrate.

                                          There are also deep differences (spatial prediction!) but I bet the folks that worked on JXL could help advance AVIF encoding in terms of CPU efficiency (which AVIF badly needs) and reliable quality even with the set of encoding tools already fixed.

                                          1. 22

                                            The Web platform hasn’t figured out how to remove anything yet. It’s a Katamari Damacy of code. Every new feature will have to be maintained forever. Every new feature will be bloating every web view, and you’ll be redownloading them endlessly with every update of every Electron app you have.

                                            On a technical level, vendors are reluctant to add more code. It’s a maintenance cost. It’s a security risk. It may become a compat headache if they ever need to change it. What vendors actually add is somewhat complex and you could say “political” (sometimes their own interest for their main business, sometimes pressure from users and web standards, sometimes fire and motion play to make competitors look bad).

                                            Adoption of the AV1 codec was a pressing need, so it easily overcame the reluctance. Video bandwidth is way larger, with major business involved, and the open web had no answer to the commercial H.265 taking over. Without AV1 we’d have another decade of paying MPEG-LA and true free software being screwed (free-as-in-you’re-the-product browsers can purchase the license, but free-as-in-freedom browsers can’t).

                                            And with AV1 in, AVIF was relatively cheap and easy to add. AVIF images are pretty much just 1-frame videos.

                                            Non-Chrome(ium) browser vendors were very reluctant to adopt WebP before (for good reasons IMHO — WebP settled on the VP8 codec before it was fully baked in VP9). To avoid delayed adoption again, especially from Apple’s side, AVIF has been designed for Apple. The spec is basically “take the HEIF you already have, and put AV1 in it instead”. And it worked — we have AVIF in macOS, iOS, and Safari in a relatively short time.

                                            JPEG XL does compress a bit better and much faster, but that just wasn’t enough to convince all browser vendors to take on yet another C++ library they will never be able to remove, given that AVIF is already here. We’ve had JPEG 2000 and JPEG XR and a few other formats that were a bit better than the format before it, and they didn’t go anywhere on the Web. The Web is lagging behind on image compression. We still have 1989 GIF and Chrome refusing to support MP4 in all the places where GIF works! With AVIF in, satisfying the HDR and wide gamut use-cases that didn’t work well before, and miraculously being interoperable in all major browsers, there’s just little appetite to start over again for an incremental gain.

                                            1. 3

                                              One idea the “we can’t add things because we can’t remove them” aspect raises for me: the JPEG1 recompressor (formerly Brunsli) could work as a Content-Encoding. You can drop Content-Encodings (sdch!), and it’s smaller than JXL as a whole. There are enough JPEGs out there, and will be for a while, that saving a fifth or so of the bandwidth spent on them is a real win, and recompression is so much cheaper than software AVIF encoding.

                                              [After writing this, I said something similar in a comment on the Content-Encoding bug linked above; why not.]

                                              I suspect, organizationally/“politically,” the decision not to adopt JXL means they’re going to focus on AVIF and other ideas are mostly dead. It’s still kind of a nice thought.

                                              1. 4

                                                sdch got dropped easily only because it never got widely adopted. Once anything is supported by majority of browsers for a while, hyrum’s law will find a way to make it unremovable. For example, you can’t remove support for gzip content encoding, because there certainly are some servers that send it unconditionally (violating HTTP protocol), just because that happens to work.

                                                Lossless JPEG recompression is a great feature.

                                                It may be possible to find a way to adopt Web features temporarily. Maybe browsers should add new image formats, but refuse to decode them on Wednesdays, so that all sites have to maintain a working fallback, so that it never becomes unremovable.

                                                1. 2

                                                  Eventually that becomes true, though arguably if the jxl encoding gets widespread enough to have the gzip problem, your experiment has shown that the new codec is worthwhile. It’s mostly if it’s unadopted like sdch that you want to drop it like sdch. Doing the experiment in the open, versus behind a flag, helps with the chicken-and-egg problem of browsers and authors each not wanting to get ahead of the other.

                                                  On the “doesn’t work on Wednesdays” concept:

                                                  I could imagine browsers refusing to use a new format and issuing a diagnostic if the image doesn’t appear to have a fallback in a common format. Of course, that’s a hack: the fallback may not really work, or a <picture> might have an <img> fallback whose format can’t be guessed from the URL. But it might help keep reliance on new formats from being the path of least resistance.

                                                  Separately I wonder if there is some sort of natural condition where a new browser might make a reasonable choice to ask for the plain JPEG and so make it less attractive to be that noncompliant server that assumes it can always serve JXL. Like, maybe Brunsli’s not worth much on small-dimension images, or with slow enough CPUs or fast enough connections.

                                                  1. 2

                                                    I mean more than just experimenting to see if it’s going to get used enough to be unremovable. I mean being able to remove it once it is widely used, because JXL2/AVIF2/WebP2 will come and one-up it, and we’ll be back with “why do browsers support only this older codec when the new one is better?”.

                                                    You have to grease the fallbacks with hard breakage. On the web scale, approximately nobody pays attention to warnings in the console, and perhaps not even always to pages being visibly broken (there’s a bunch of sites developed once and left to rot). The fallback problem gets brought up when people ask “why don’t browsers bundle jQuery, even from a well-known URL with subresource integrity” and the answer is that if the URL isn’t actually used, it will go down or out of sync, and devs won’t notice.

                                              2. 1

                                                (Also, strong agree that a VP9-based WebP released around when VP9 came out would’ve been a much stronger contender.)

                                              1. 11

                                                Huge bummer. JXL has things that are qualitatively different from AVIF: progressive decoding for better perceived load speed, .jpg recompression to save bandwidth on a ton of existing content out there while allowing bit-for-bit recovery of the original, a really good lossless mode, and a faster software encoder.

                                                I think there’s a good case for having both around: JXL has everything above, but AVIF handles extreme photo compression really well and seems natural as (say) an Android phone camera’s next-gen codec, especially when phones have AV1 hardware for video already.

                                                I wonder if, and on what timeframe, AVIF gets pickup outside of the highly-optimized pipelines that serve WebP and multiple resolutions now. Apple just added AVIF decoding to their OSes (making it work in Safari) and next-gen dGPUs are getting hardware AV1 encoders. Faster software encoding would help.

                                                1. 2

                                                  One implementation of this for Rust is in the im crate.

                                                  An idea I’d be interested to see done, but can’t imagine having the tuits for, is something kind of like it that’s mutable and has medium-sized gap buffers as the leaves.

                                                  The gap buffers can handle a lot of patterns efficiently (like indexing/swapping for a sort, inserting/deleting a chunk of items anywhere, filtering, merging two lists), and because of that, you can probably make the leaves a little larger while still having decent practical performance in many cases. Larger leaves can also reduce the memory overhead, and the cost of common things like iteration. If you’re doing a bunch of totally random inserts and deletes you copy a lot of items around, but it’s “just” a large constant-factor slowdown (proportional to the leaf page size, not the total list size).

                                                  For the specific case of strings, someone did something spiritually similar with jumprope (also in Rust as jumprope-rs), with a skip list instead of a tree. For what it was designed for (applying edits quickly) it does perform quite well.

                                                  1. 1

                                                    But this is a persistent data structure, so the nodes are immutable, i.e. copy-on-write. The benefit of a gap buffer is fewer bytes to move when you update it in place, but nothing’s ever updated in place here.

                                                    1. 1

                                                      But this is a persistent data structure, so the nodes are immutable, i.e. copy-on-write. The benefit of a gap buffer is fewer bytes to move when you update it in place, but nothing’s ever updated in place here.

                                                      Yes, I didn’t mean that as an idea for Clojure’s vectors and should have spelled that out. Different use case that just comes to mind because you might use a loosely related data structure.

                                                      I’m saying, if you want a mutable vector but could use cheap inserts/deletes in the middle, a mutable RRB tree with weird leaves might be interesting. It’d take an implementation to know if there is any value there, including whether those gap buffer leaves are worth having in practice.

                                                  1. 3

                                                    FYI, this isn’t by the Robinhood trading platform; the cache policy itself is named RobinHood because it takes cache resources from ‘cache-rich’ applications and gives them to ‘cache-poor’ ones. (The title and researcher affiliations are in the paper and all of the affiliations are with academic institutions, not industry.) I was also confused for a bit, so you’re not alone. [ed: I submitted a title-change suggestion to reflect this.]

                                                    It’s an interesting idea. In our system at work, we still don’t have any notion of “miss cost” or “priority” to decide what to keep in cache, and we get by despite that because we just don’t need too much RAM in the big scheme of things. So I guess I’m interested in practical/widely available tools for (metaphorically) walking in addition to ideas for how to run faster.

                                                    1. 1

                                                      thanks! I definitely overlooked it.

                                                    1. 8

                                                      taking containers and upgrading them to full-fledged virtual machines

                                                      I’m not at all an expert on containers but wasn’t one of the first reasons to move to containers that they were lightweight? I’ve seen this other places too, I’m wondering why there’s been the shift to VMs now? Or what’s the benefit of VMs that plain containers don’t have?

                                                      1. 21

                                                        The main downside of containers is that they don’t really have any security guarantees. The Linux kernel interface is a big attack surface for apps to escape from

                                                        I believe Amazon started using firecracker VMs a few years ago, which are lighter VMs but providing more security than the Linux kernel does for containers

                                                        Google has stuff like this as well, I think gvisor

                                                        Either way, if you can take OCI images and run them in a sandbox with stronger guarantees without making startup time too slow, or making it hard to mount directories or file systems, that’s a win

                                                        1. 19

                                                          Firecracker is exactly the technology that fly.io uses.

                                                          1. 3

                                                            Not meant as a correction as much as a fun rabbit hole: gVisor is not always a VM-based hypervisor like Firecracker but this oddball sui generis thing: in one of its modes, a Go process intercepts the sandboxed process’s system calls with ptrace and makes other system calls to the Linux kernel as needed. The calls it makes to the “real” Linux kernel are a subset of all the possible ones, which was enough to mitigate some kernel bugs.

                                                            I think that link is saying that even in KVM mode, it’s still running its own pseudo-kernel vs. doing more traditional virtualization, but using KVM to provide some extra isolation and speed some things up.

                                                            (gVisor is also a place where race conditions are unusually relevant, because it’s meant to be a security boundary and you can poke at it using a multithreaded local process.)

                                                            1. 1

                                                              And when we get security guarantees for containers then containers will be the new VMs for your containers for VMs.

                                                              1. 3

                                                                And when we ditch Linux for something with security guarantees for native processes then native processes will be the new VMs for containers for VMs for native processes.

                                                            2. 10

                                                              To save you some external reading if you just want a quick answer: VMs are fast now. Because of things like mature CPU extensions for things like nested page tables, paravirtual driver improvements, and technologies like Firecracker that boot VMs extremely quickly on minimal virtual hardware profiles.

                                                              From a security perspective it’s easier to rely on hardware extensions that isolate page tables and IO devices than rely on multitudes of permissions checks added in to every corner of the kernel to try and support containers. It reduces the security problem to a few key areas.

                                                              From a performance perspective it’s faster to rely on hardware extensions that keep guest pages separate from host pages than rely on multitudes of nested or partitioned data structures all throughout the kernel to try and support containers. Theoretically there should be no difference, but the kernel has already gone through decades of optimization for the base case, whereas these containerized / namespaced data structures are relatively new. And certain security features can never be implemented with equal performance. For example syscall filtering just isn’t necessary for a VM, since the VM doesn’t interact with the host on that level. And these container security features don’t have hardware acceleration like modern VMs do.

                                                              Ironically all this hardware virtualization acceleration was done for the cloud, so you could argue that VMs are more literally “cloud native” than containers, as VMs are natively supported by all modern server hardware. 😉

                                                              1. 3

                                                                Operating Proxmox (KVM,QEMU,ceph “Distribution”) I can say that it’s a breeze. Snapshots, Hardwareconfig, HA etc is really easy. So yeah, I’m a fan of VMs and if your cpu isn’t from 20 years ago, you’ll have no performance issues operating them.

                                                                1. 2

                                                                  I’m glad VMs are seeing a resurgence in popularity now, but one thing I am happy that came out of the Docker/containers world is consistent tooling, management, configuration and deployment. I don’t have to mount in a .conf file somewhere, a .yaml file somewhere else, ssh in and manually edit some file. For the most part, my deployments all look the same: a bunch of key-value environment variables, loaded the same way, written the same way and easily understood compared to many different configuration formats with their own syntax and rules.

                                                                  So the main reason I love Fly.io is I can keep all of that consistency that Docker gave me while benefitting from Firecracker turning those container images into nice fast VM images. It’s also portable so I can test my containers locally with a simple docker command!

                                                                  1. 1

                                                                    VMs are fast now.

                                                                    Assuming you mean “fast to start up” — can you quantify “fast”?

                                                                    1. 3

                                                                      ~125ms.

                                                                      But I also mean fast as in low execution overhead. Any performance difference you see between a VM on a cloud provider and a bare metal core on the same hardware has a lot more to do with noisy neighbors or CPU frequency scaling limits than actual virtualization overhead.

                                                                      1. 1

                                                                        ~125ms.

                                                                        Gotcha. This is fine for a ton of use cases. I usually work in environments where SLOs are defined in terms of roundtrip or time-to-first-byte latency, and the p99 targets are typically on the order of single-digit milliseconds. Alas.

                                                                  2. 6

                                                                    They’ve written a bit about their Docker-container-to-VM architecture:

                                                                    1. 2

                                                                      VMs got a bad rap when they were just traditional servers in a box.

                                                                      It was easiest to minimize state in applications by changing technologies to containerization.

                                                                      VMs have not yet hit their limits. Just don’t do “traditional server in a box”.

                                                                      1. 1

                                                                        While the lightweightness was the original point, people have adopted “lightweight VMs” (like firecracker, as other commenters have pointed out).

                                                                        It’s still the packaging / distribution method du jour, though, so it makes sense to keep them around.

                                                                      1. 1

                                                                        A fun thing about this is that while most of us write for CPUs with a multiply instruction, this is one hop away from ways you implement fancier operations used in cryptography like elliptic curve multiplication for ECC or modular exponentiation for RSA or DH. For modular exponentiation, you can square-and-multiply, that is repeatedly square a number instead of doubling and then multiply some of the results together instead of adding them. For elliptic curves, you can repeatedly double and add, but “addition” is defined differently. (Wikipedia mentions faster ways for both but those basic methods get the right result.) Unlike with plain multiplication/division, those operations don’t have a known fast, general way to compute the inverse, which is why they’re interesting for cryptography.

                                                                        That does not mean go implement your own crypto, but it is cool that at least that one piece of what crypto libraries are doing is relatively accessible using our intuitions about regular old arithmetic. And if someone more informed about this wants to fill in details go for it–just trying to give a general impression of the similarities.

                                                                        1. 15

                                                                          This is an incredible writeup for really showing the dead ends (the motivation, the effort, and how it turned out), the practicalities (time lost due to random bugs or random things tanking performance), and noting what they don’t know. It’s common not to do any of that, and besides those details being interesting in themselves, the overall picture is encouraging to all us actual humans who hit dead ends, have bugs in our code, and don’t know everything. I love it.

                                                                          (As a random specific thing, it was kind of neat they found that removing .clone()s in their Rust code had less effect than expected because the compiler was already optimizing copies away for them. In particular, might make me feel a little freer to try cloning my way around a borrow-checker roadblock if I play with Rust and hit one.)

                                                                          1. 2

                                                                            I really enjoyed it for similar reasons. I’m not especially familiar with ML techniques, but at every step in this blog post I was able to go “huh, yeah that sounds like an interesting avenue, I wonder how it will turn out”.

                                                                            It’s not easy to write like that, so huge kudos to the authors.

                                                                          1. 3

                                                                            At work, we have a kind of soft deletion we call “hide”/“hidden” in the UI. It’s used on tables with under a million rows so we haven’t had to think much about subtle effects on scaling. We subclassed Django’s “managers” so that searching for lists of items, by default, excludes hidden objects, with an escape hatch for when we want the hidden stuff. (Around the same place, we also add in some timestamping, etc.) Users can navigate to a list of the hidden objects of a given type and unhide them–it’s much more Move to Trash than rm.

                                                                            Hiding is non-cascading, and by default hidden objects still show up when searched for by ID or accessed via a relation, which has usually worked out to be a feature. For example, you can hide outdated webpage templates so they’re no longer offered when creating new pages, but not affect the pages already published. And hiding a page with a form doesn’t undo/lose submissions that came in through it (not something everyone would want, but we do).

                                                                            It helps to have had this from early on, and that some UI bits are factored so some of the trickiness of e.g. hidden-aware item lists only needs to be happen once. I recall we may’ve had to hack in some stuff to show the right UI when unhidden objects reference hidden ones.

                                                                            It’s just one strategy and we use others depending on context. For some other tables, certain values of status columns can have soft-delete-like effects. We track history for some tables, but would like it for more. We also have hard deletes in various places: for some non-hideable things of course, but also via the API for most (all?) hideable stuff, and in the GUI where not having the data really is the goal (e.g. “delete my data” requests from users).

                                                                            It has created some work: the combo of hiding and unique indexes can confuse users (“it says I can’t reuse this unique name, but I don’t see anything using it!”). But overall this particular feature has been pretty low-regret after a number of years with it. I do want to repeat the caveat that we don’t use it everywhere and don’t think it would make sense to.