Threads for hyperpape

    1. 2

      Strange that the article makes no mention of the well-known metaphor of technical debt. (Comparing metaphors is useful!)

      Aside: It looks like the author has been blogging since 2008.

      1. 2

        I found it refreshing. Debt is a poor metaphor and we really need to learn how to talk about these things another way.

        1. 2

          What examples (or “conceptual limitations” if you prefer) illustrate when the technical debt metaphor fails? How does your preferred metaphor do better? (Just hoping for a comparative discussion.)

          For me, some limitation of the tech debt metaphors are:

          1. the implication that tech debt is commensurate / fungible. (You mentioned this in another comment). To be frank, though, I struggle to find a clear example showing how this conceptual limitation tanks the metaphor. Seems to me people simply assign a higher price tag to fixing harder (or more embedded) forms of tech debt, which is good enough for most purposes.

          2. the implication that one can pay down the debt in the order of your choosing … but with software (as in physical construction) there are strong dependencies between components. One might have to spend a lot of effort to fix a broken foundation which only unlocks the potential to solve the more painful symptoms.

          For a metaphor to be really broken, it has to actively lead people astray I think. Where does the tech debt metaphor do that? I think we agree that having it as a metaphor is better than nothing

          So what are some better metaphors? what alternative metaphors address these concerns while also being clear and easy to apply? (how complicated is it? Is it worth the conceptual cost?)

          1. 2

            If you have a reason that the tech debt metaphor is bad, you should actually say it. If you did that, your comment might be useful. Right now, it isn’t.

            Doing my part: the metaphor of tech debt suggests that sometimes, you can make a choice that will have a cost, but only later (aka, a debt).

            If that choice has an immediate benefit, then you have a delicate choice how to handle it. Depending on the size of the immediate benefit, and the nature of the long-term cost, it may be worth it. Whether it is worth it may depend on whether you actually pay down the cost, or make the situation worse by piling on more debt.

            Is that a good metaphor? I think it has one thing going for it: technical choices often do work that way. A choice is convenient today, but has a cost to you later on. And then you have to decide: is it worth it to speed things up today? Can we fix the problems later? Will we? Or is taking on that debt a mistake?

            1. 2

              Is that a good metaphor? I think it has one thing going for it: technical choices often do work that way. A choice is convenient today, but has a cost to you later on. And then you have to decide: is it worth it to speed things up today? Can we fix the problems later? Will we? Or is taking on that debt a mistake?

              It’s a bad metaphor because debt is abstract and fungible. The reasons why a technical decision is wrong or inapposite are always concrete, and non-fungible. “Technical debt” is caused by code that reflects concrete circumstances that are no longer true. Thinking about that in financial terms does more harm than good, because all of the ways you would attack that problem in the realm of finance don’t apply.

              1. 2

                I’ll jump in. financial debt is mostly one dimensional. “Tech debt” is not. Tech debt has many different and hard to understand implications. Calling it like that gives non technical people a wrong idea about it.

                For example, tech debt can turn the code base into a mine field. It means that new features are not only slower or more costly, but they can also easily break things, the timeline becomes unpredictable and/or the demotivates the developers involved due to the potential non-sensical work they have to do. I’ve seen people quit because of that.

                So, when you say “tech debt” then many people will never get the association of “potentially increased risk of attrition” but they should.

                Hence I agree, it’s not a good term.

                1. 3

                  I like the term “debt” here because it indicates that not only does the debt have to be paid back, but that there are “interest payments”. Those interests are exactly the hours spent on working in this mine field you mention!

                  And just like with real debt these interest payments can make the decision spicy: if I add a new feature quickly (and take up tech debt), we can show it at the customer demo scheduled next week, and hopefully get a new customer and get more revenue. And in the weeks after we can clean up the quick-and-dirty feature implementation, so that we don’t have to pay the interest (unreadable code base, more interdependencies, more complexity) for a long time. That might be a good investment.
                  But if I just take up debt out of convenience, without getting any additional revenue from the investment, that sounds like a bad deal.

                  Also, similar to financial debt a company can take up so much technical debt that they spend all their time paying off the interest (BTDT). That’s a quite illuminating similarity I think.


                  But I agree that the “technical debt” metaphor has some drawbacks:

                  • in contrast to financial debt, sometimes you can erase technical debt without repercussions (e.g. if the affected module is later deleted anyway, I never have to pay back that debt). Not sure if that ever happens in finance.
                  • managers who have a background in economy might have different associations with “debt”, e.g. “do I have to provide a collateral for technical debt?!”. So my intended meaning of metaphor might be lost on them.
                  • worst of all, I find it really difficult to measure and quantify technical debt. In contrast to financial debt, I cannot tell easily my manager how much interest we are currently paying – at best I can say “this feature took longer to implement due to tech debt”, but a) only in hindsight and b) not very accurately.

                  So I think the term is far from perfect. But I can’t think of a better one either.

                  1. 1

                    Yeah, so the drawbacks you mention (especially the first one) have such a big impact that I think calling it “tech debt” is a bad idea. It’s not sufficient that there’s “some” kind of overlap. IMHO

          2. 1

            There is a ton of material here, and while I’m not finished, it seems very detailed and well thought out. It cites Joe Duffy’s post from a decade or so ago–if you’re a fan of that post, you’ll probably enjoy this one.

            To highlight one point: this article very clearly lays out the case that error handling is contextual, and whether a particular error is recoverable or not is almost never based on the error itself, but the full context the error happens in. For instance, it gives examples here where (at least in principle) you reading past the bounds of an array to be a recoverable error.

            I don’t think the contextual nature of error handling is new to this piece (I’ve been thinking about it for years, and I suspect I didn’t come up with it on my own), but it’s very well stated.

            1. 7

              I’m a layperson on this topic, but here’s my best understanding of what’s going on.

              Take the function unique :: List[Int] -> Bool. There’s an obvious O(n) time algorithm: iterate through the list, throw everything into a hash table, return false if a number appears twice. This also requires (auxiliary) O(n) space, with the worst case being “the only duplicate is in the last element”.

              There’s another algorithm you could do: sort the list and then compare adjacent elements. This requires O(nlog(n)) time but only O(1) auxiliary space (for the sort), meaning the “unique problem” has a more spatially efficient solution than it has a temporally efficient solution.

              (And here’s where I think I’m already wrong: the two solutions have different auxiliary space usage, but they both have the same overall space complexity O(n), as you need already need that much space to store the whole input! I just don’t know any good examples of total space being more efficient, having thought about this for all of ten minutes).

              It’s been known for a long time that this is true in the general case: any problem’s certain time complexity is “larger” than its space complexity. The best bound since 1977 is “not that much more efficiently”: O(f(n)) time can be solved in O(f(n)/log(f(n))) space. So, like O(n²) can be solved in O(n²/log(n)) space, which grows almost as fast as n² anyway.

              This paper claims that O(f(n)) time can be solved in O(sqrt(f(n)log(f(n))) space, so O(n²) time can be solved in O(n*sqrt(log(n))) space, which is an enormous improvement. The blog says that if paper passes peer review, it’ll be a major step towards resolving P ≠ PSPACE!

              Anyone who’s an actual complexity theorist should please weigh in, I have no idea how much of my understanding is garbage

              1. 3

                A lazy space-efficient solution for unique is just O(n^2) compare each element to every other element.

                You might be able to get away with the sort if you have some sub-linear auxiliary structure (permutation index?) that lets you un-sort the list when done (if I understand catalytic computing right, which I likely don’t). Edit: never mind, log(n!) to even count the permutations is going to have n outside the log anyway.

                1. 3

                  Iirc from the lectures I attended a couple months ago, when it comes to log-space or at least less than linear, we treat the input tape as read only, the output tape as write (and maybe also append) only. The Auxiliary space is all you measure, but that readonly-ness is what breaks your sorting solution, because sorting requires mutation or copying. (I’m sure there are other ways of formalising it)

                  The simple log-space solution would be the brute force compare that @hcs mentioned, I think you would use one pointer/index to each of the two elements you are currently comparing which would take log(n) each.

                  1. 1

                    Ah, that makes so much now. Thank you!

                    Edit: when you say “write only”, does that mean you can’t write some intermediate result to the aux space and then read it back later?

                    1. 1

                      I mentioned an output tape as the third tape because I was remembering log-space reductions (transforming the input from one problem to another using only log auxiliary space).

                      The auxiliary space is read+write. So for sublinear space machines you have read only input, sublinear auxiliary space, and the machine either accepts, rejects, or diverges.

                      In the log space reduction/transducer case I was remembering, you also then output a transformed result into an write only output tape, the key being you can’t store data there (and get around the aux space restriction), only append more output.

                      Another interesting thing (also pointed out on the wiki linked below) is that you can’t do O(n + log n) read-write input tape, because linear working space can be recovered using the linear speedup theorem (I always think of it as SIMD for Turing machines. You can transform four input binary bits to one hex symbol for example, and make you input 4x smaller at the cost of a bigger state machine).


                      Some other bits:

                      Formally, the Turing machine has two tapes, one of which encodes the input and can only be read, whereas the other tape has logarithmic size but can be written as well as read. Logarithmic space is sufficient to hold a constant number of pointers into the input and a logarithmic number of Boolean flags, and many basic logspace algorithms use the memory in this way.

                      https://en.wikipedia.org/wiki/L_(complexity)

                      And the text book referenced by wikipedia and my lecturer: Sipser 1997 https://fuuu.be/polytech/INFOF408/Introduction-To-The-Theory-Of-Computation-Michael-Sipser.pdf The definition of log space machine is on page 321 (342 of the pdf).

                  2. 1

                    I think they have the same time complexity as well, since integer sort is O(n) (with any lexicographic sort like radix sort).

                    1. 3

                      Radix sort is O(n) if the elements of the array have a limited size. More precisely it’s O(nk) where k is the element size, which for integer elements is k = log m where m is the integer value of the largest element.

                      Edit: and the question of the existence of duplicate elements is only interesting if m > n (otherwise it’s trivially true) so in this context radix sort can’t do better than O(n log n).

                  3. 7

                    I’m surprised it works on Windows, because the kernel docs suggest it shouldn’t. DRM’d media is sent to the GPU as an encrypted stream, with the key securely exchanged between the GPU and the server. It’s decrypted as a special kind of texture and you can’t (at least in theory) copy that back, you can just composite it into frames that are sent to the display (the connection between the display and GPU is also end-to-end encrypted, though I believe this is completely broken).

                    My understanding of Widevine was that it required this trusted path to play HD content and would downgrade to SD if it didn’t exist.

                    No one is going to create bootleg copies of DRM-protected video one screenshotted still frame at a time — and even if they tried, they’d be capturing only the images, not the sound

                    If you have a path that goes from GPU texture back to the CPU, then you can feed this straight back into something that recompresses the video and save it. And I don’t know why you’d think this wouldn’t give you sound: secure path for the sound usually goes the same way, but most things also support sound via other paths because headphones typically don’t support the secure path. It’s trivial to write an Audio Unit for macOS that presents as an output device and writes audio to a file (several exist, I think there’s even an Apple-provided sample that does). That just leaves you having to synchronise the audio and video streams.

                    1. 13

                      I’m pretty sure that what Gruber is describing is basically just “hardware acceleration is not being enabled on many Windows systems”, but because he has his own little narrative in his head he goes on about how somehow the Windows graphics stack must be less integrated. Windows is the primary platform for so much of this stuff!

                      I would discount this entire article’s technical contents and instead find some other source for finding out why this is the case.

                      1.  

                        Well it depends on the type of acceleration we’re speaking of. But I’ve tried forcing hardware acceleration on video decode and honestly you’d be surprised how much it failed and I did this on rather new hardware. It was actually shockingly unreliable. I’m fairly certain it’s significantly worse if you extend your view to older hardware and other vendors.

                        I’m also fairly sure, judging by people’s complaints, that throwing variable refresh rate, higher bit depths and hardware-accelerated scheduling in the mix has not resulted in neither flagship reliability or performance.

                        It can be the primary platform but this doesn’t mean it’s good or always does what it should or promises it’ll do.

                        1. -1

                          I’m pretty sure that what Gruber

                          Wait wait wait is this,,, checks URL, oh, lmao. Yeah Gruber is useless there’s literally no point in ever reading a single word he says.

                        2. 4

                          I think it means: enabling the feature to screenshot DRM protected media would not by itself enable piracy, since people would not use screenshots to pirate media frame at a time.

                          What you are saying reads like “one technical implementation of allowing screenshots would enable piracy.” I trust that you’re probably right, but that doesn’t contradict the point that people would not use that UI affordance itself for piracy.

                          1. 17

                            No one would use screenshots for piracy because all the DRM is already cracked. Every 4k Netflix, Disney, etc, show is already on piracy websites, and they’re not even re-encoded from the video output or anything, it’s straight up the original h264 or h265 video stream. Same with BluRays.

                            1. 3

                              Yup, if you go through GitHub there are several reverse-engineered implementations of widevine, which just allow you to decrypt the video stream itself with no need to reencode. That then moves the hard part to getting the key - fairly easy to get the lower security ones since you can just root an Android device (and possibly even get it from Google’s official emulator? At least it supports playing widevine video!), the higher security ones are hardcoded into secure enclaves on the GPU/CPU/Video decoder though, but clearly people have found ways to extract them - those no-name TV streaming boxes don’t exactly have a good track record of security, so if I were to guess that’s where they’re getting the keys.

                              Still, no point blocking screenshots - pirates are already able to decrypt the video file itself which is way better than reencoding.

                              1. 6

                                Those no-name TV streaming boxes usually use the vendor’s recommended way to do it, which is mostly secure, but it’s not super-unusual for provisioning data to be never deleted off the filesystem, even on big brand devices.

                                The bigger issue with the DRM ecosystem is that all it takes is for one secure enclave implementation to be cracked, and they have a near infinite series of keys to use. Do it on a popular device, and Google can’t revoke the entire series either.

                                Personally, I’m willing to bet the currently used L1 keys have come off Tegra based devices, since they have a compromised boot chain through the RCM exploit, as made famous by the Nintendo Switch.

                            2. 6

                              Pre-DoD Speed Ripper early DVD rips jacked into a less-than-protected PowerDVD player and did just screenshot every frame.

                          2. 4

                            I get the feeling that Uncle Bob developed TDD in the environment he found himself in most—with a legacy system that needs to be modified and thus, the smallest amount of code is preferred over large amounts of code. For a green-field program, I think TDD would be a miserable failure, much like Ron Jeffries’ sudoku solver.

                            1. 9

                              Just fyi, Bob did not invent TDD. Your thought still stands as a possibility, but wanted to point that out to readers.

                              1. 3

                                TDD is test driven design. If you already have large amount of badly written code, you can’t really design it. Not saying that testing is bad or no design happens at all, just that it is more of a refactoring tool in this context, rather than design.

                                1. 6

                                  As far as I know TDD means Test Driven Development, or at least it’s how I’ve always seen it. Was UB refering to Design for TDD?

                                  1. 2

                                    Sorry, I stay corrected. Somehow I believed all my life that jast D stands for design.

                                    1. 4

                                      There are certainly some hardcore TDD fans who insist that TDD is a design methodology! But that’s not what it was initially coined as.

                                      1. 2

                                        Yes there is a school of thought that does preach the D as “design”. The idea is, you never add new code unless you’re looking at a red test, and thereby you guarantee both that your tests really do fail when they should, and that your code truly is unit testable.

                                        I’m not really an advocate except in maybe rare circumstances, but that’s the idea.

                                        So the original D meant “Development”, and then another camp took it further and advocated for it to mean “Design”.

                                2. 3

                                  I’m curious what you mean? My experience is that TDD is much better in a greenfield project, because you’re not beholden to existing interfaces. So you can use it to experiment with what feels right a lot easier.

                                  1. 2

                                    In my understanding of TDD, one writes a test first, then only writes enough code to satisfy that one test. Then you write the next test, enough code to pass that test, and repeat. That is my understanding of it.

                                    Now, with that out of the way, a recent project I wrote was a 6809 assembler. If you were to approach the same problem, what would be your very first test? How would you approach such a project? I know me, and having to write tests before writing code would drive me insane, especially in a green field project.

                                    1. 7

                                      I wrote a Uxn assembler recently and while I don’t practice TDD at all in my day-to-day, in this case it felt natural to get a sample program and a sample assembled output, add it to a test and build enough until that test passed, then I added a second more complex example and did the same, and so on and so on. I ended up with 5 tests covering quite a bit of the implementation. At the start I just had the bare minimum in terms of operations and data flow. By the fifth test I had an implementation that so far has done well (although it’s not perfect, but that was a explicit tradeoff, once I find limitations I’ll fix them)

                                          #[test]
                                          fn test_basic_assemble() {
                                              let src = "|100 #01 #02 ADD BRK".to_string();
                                              let program = assemble(src).unwrap();
                                      
                                              assert_eq!(program.symbol_table, BTreeMap::new());
                                              assert_eq!(
                                                  program.rom,
                                                  vec![0x80, 0x01, 0x80, 0x02, 0x18, 0x00],
                                              )
                                          }
                                      
                                      1. 3

                                        I’ve not done this with an assembler, but I’ve tried to do this with projects with a similar level of complexity, including a Java library for generating code at runtime. This is probably a skill issue, but I always end up with a lot of little commits, then I end up with some big design issue I didn’t anticipate and there’s a “rewrite everything” commit that ends up as a 1000 line diff.

                                        I still aim to do TDD where I can, but it’s like the old 2004 saying about CSS: “spend 50 minutes, then give up and use tables.”

                                      2. 4

                                        First, you are totally correct that “true” TDD proponents say that you have to drive every single change with a failing test. Let me say that I don’t subscribe to that, so that might end the discussion right there.

                                        But, I still believe in the value of using tests to drive development. For example, in your assembler, the first test I would write is an end to end test, probably focusing on a single instruction.

                                        To get that to pass, you’ll solve so many problems that you might have spent a bunch of time going back and forth on. But writing the test gets you to make a choice. It drives the development.

                                        From there, more specific components arise, and you can test those independently as you see fit. But, an assembler is an interesting example to pick, because it’s so incredibly easy to test.

                                        1. 1

                                          First, you are totally correct that “true” TDD proponents say that you have to drive every single change with a failing test. Let me say that I don’t subscribe to that, so that might end the discussion right there.

                                          Then I can counter with, “then you haven’t exactly done TDD with a greenfield project, have you?”

                                          When I wrote my assembler, yes, I would write small files and work my way up, but in no way did I ever do test, fail, code, test, ad nauseam. I was hoping to get an answer from someone who does, I guess for a lack of a better term, “pure TDD” even for greenfield development, because I just don’t see it working for that. My own strawman approach to this would be:

                                          #include <stdio.h>
                                          #include <string.h>
                                          
                                          int main(void)
                                          {
                                            char buffer[BUFSIZ];
                                            scanf("%s\n",buffer);
                                            if (strcmp(buffer,"RTS") == 0)
                                              putchar('\x39');
                                            else
                                              fprintf(stderr,"bad opcode\n");
                                            return 0;
                                          }
                                          

                                          That technically parses an opcode, generates output and passes the test “does it assemble an instruction?”

                                          1. 2

                                            What is the problem with the code you posted?

                                            I have done “true” TDD on a greenfield project, and it was fine. It’s just an unnecessary thing to adhere to blindly. From the test you have here, you would add more cases for more opcodes, and add their functionality in turn.

                                            Alternatively, you could write a test of a whole program involving many opcodes if you want to implement a bunch at once, or test something more substantial.

                                            1. 1

                                              It’s just that I would never start with that code. And if the “test” consists of an entire assembly program, can that really be TDD with a test for every change? What is a change? Or are TDD proponents more pragmatic about it than they let on? “TDD for thee, but I know best for me” type of argument.

                                              1. 2

                                                Yes, you could make new tests that consist of new programs which expose some gap in functionality, and then implement that functionality.

                                                A change can be whatever you want it to be, but presumably you’re changing the code for some reason. That reason can be encoded in a test. If no new behavior is being introduced, then you don’t need to add a test, because that’s a refactor. And that’s what tests are for: to allow you to change the internal design and know if you broke existing behavior or not.

                                                1. 2

                                                  I guess I was beaten over the head by my manager at my previous job. He demanded not only TDD, but the smallest possible test, and the smallest code change to get that test to pass. And if I could fine an existing test to cover the new feature, the better [1].

                                                  [1] No way I was going to do that, what with 17,000+ tests.

                                                  1. 2

                                                    Yea that’s a whole different story. We were talking about what’s possible for a bit, but you’re asking should you do this.

                                                    Dogmatic TDD is not necessary, and doesn’t even meet the desired goal of ensuring quality by checking all cases. There are better tests for getting higher coverage, for example property based tests.

                                                    For me, the sweet spot is simply writing tests first when I’m struggling to make progress on something. The test gives me a concrete goal to work towards. I don’t even care about committing it afterwards.

                                  2. 10

                                    I mean…it’s about time he put on his BDFL hat on this issue, which has been going on for way too long. If I was the Rust binding maintainer, though, I’d be worried about this:

                                    So when you change the C interfaces, the Rust people will have to deal with the fallout, and will have to fix the Rust bindings. That’s kind of the promise here: there’s that “wall of protection” around C developers that don’t want to deal with Rust issues in the promise that they don’t have to deal with Rust.

                                    If the DMA maintainer wants to continue his petty anti-Rust crusade, all he has to do is start subtly breaking the bindings in ways that primarily affect the downstream Rust code. Not that he’ll necessarily do that, but…he has both license and plausible deniability to do so.

                                    Everyone in this situation needs to be put in a room, told to play nice, and hash out their disagreements like adults.

                                    1. 19

                                      This also doesn’t address the Asahi problem, where the subsystem doesn’t want to accept improvements written in C to enable new use cases that happen to be implemented in Rust.

                                      1. 12

                                        However elsewhere in the thread Ted Tso provides a pretty good response for that, given his original behaviour (which led to the departure of Wedson) it’s a pretty heartening if bittersweet change of heart: https://lore.kernel.org/rust-for-linux/20250219170623.GB1789203@mit.edu/

                                        I do understand (now) what Wedson was trying to do, was to show off how expressive and powerful Rust can be, even in the face of a fairly complex interface. It turns out there were some good reasons for why the VFS handles inode creation, but in general, I’d encourage us to consider whether there are ways to change the abstractions on the C side so that:

                                        (a) it makes it easier to maintain the Rust bindings, perhaps even using automatically generation tools,
                                        (b) it allows Rust newbies having at least some hope of updating the manually maintained bindings,
                                        (c) without causing too much performance regressions, especially on hot paths, and
                                        (d) hopefully making things easier for new C programmers from understanding the interface in question.

                                        1. 4

                                          That is good. We’ll see if that attitude propagates. By “the Asahi problem” I meant that in graphics driver land, tussles over a device abstraction and scheduler lifetimes resulted in the R4L folks abandoning the effort to improve(*) the C side of things (and in some cases abandoning the R4L side too).

                                          (*) Of course, “improvement” is in the eye of the beholder.

                                          1. 2

                                            That’s what Ted’s comment is about as far as I can tell, it’s about being willing to clarify the C side, and to modify it in order to improve the interface / abstraction. Especially the (B) and (D) bits which are about making the interface easier for callers to grasp.

                                      2. 11

                                        I don’t think that’s the case. I believe this is just restating the existing contract that since Rust in the kernel is still experimental, breaking Rust code does not break the entire build.

                                        1. 6

                                          I’d be pretty surprised if they could even do that to the rust bindings without introducing all sorts of subtle bugs in downstream C code. The only real difference between rust and downstream C here is that rust writes down what behaviour it expects, not that the downstream C code doesn’t have expectations. Fixing those C bugs would be a lot harder than updating the rust… because rust wrote down what it expects.

                                          Ultimately I’m not impressed that we aren’t seeing some COC style enforcement here, but other than that I don’t think there’s anything for the leadership to do but wait and react to whatever happens next.

                                          1. 1

                                            True, but the kernel has always explicitly reserved the right to change kernel only APIs. Such changes are accompanied by tree-wide patches of all the users, which is why out of tree driver maintenance is so difficult.

                                          2. 4

                                            I think the spirit of

                                            But then you take that stance to mean that the Rust code cannot even use or interface to code you maintain.

                                            So let me be very clear: if you as a maintainer feel that you control who or what can use your code, YOU ARE WRONG.

                                            implies that Linus would take a very dim view of intentionally/frivolously breaking consumers, and I imagine he’d be loud about it.

                                            1. 2

                                              Everyone in this situation needs to be put in a room, told to play nice, and hash out their disagreements like adults.

                                              He encourages exactly that in the last two paragraphs.

                                            2. 6

                                              The interesting thing to me is that his alternate makes more “assumptions” than what he’s comparing to. He can “assume” a fixed goal and optimize the code for that goal. The “big data systems” are trying to be general purpose compute platforms and have to pick a model that supports the widest range of possible problems.

                                              At the end of Mike Acton’s now infamous “Data-Oriented Design and C++” talk at CppCon 2014, the second question he is asked (around the 1:11:20 mark on YouTube) is: “All of these examples that you provided about optimizing for the cache line are very extraordinary, it’s very very interesting. Totally agree with that. I find it extremely hard to deal with when we’re dealing with different platforms that we have no idea where the program is going to run, how big the cache line is going to be, or anything like that”

                                              Acton interrupts the question asker “Sorry, I’m going to interrupt you before you get lost… ‘I have so many platforms that I don’t know the characteristics of those platforms’ that’s not true, it may be true that you don’t know, but it isn’t true that there isn’t a finite set of characteristics, finite set of range. It should be within some range, you should know the min, the max, the average of the things you’re likely to be dealing with. You should know the common case. You’re not going to be putting the same solution on a Z80 [8-bit processor launched in 1976] and a Google server farm. It’s unlikely that this gonna solve that range of problem space. So what is the finite space you’re working with? What is the finite set of chip designs you’re going to be working with? What are [their] requirements?… There is a finite set of requirements that you have to have, you need to articulate them, you need to understand what that range is. This idea of general portability is a fool’s errand, honestly.”

                                              1. 5

                                                Acton, Muratori and Blow live in their own little world (which is fine) and are extremely arrogant about it and don’t acknowledge the existence of people who don’t work in their domain (that’s bad, if anyone is wondering).

                                                SQLite does not get to make a lot of assumptions about what it runs on. It certainly can try to optimize for the typical case (not sure what that is, maybe an Android mobile phone?), but even that probably isn’t the majority of its deployments.

                                                Or another example: Hotspot only recently deprecated the 32bit windows port: https://openjdk.org/jeps/449.

                                                Fwiw, In my dayjob, I get to make a lot of assumptions: I’m writing code for a specific JVM version. It will only run on cloud servers or developer laptops.[0] The RAM will probably be within a 2-4x range of a median value, etc. So, in fact, I’m probably closer to Acton’s world in that regard than I am SQLite. Of course, I’m also much more at the whims of actual users–I’m not writing a game where there are plausible constraints on how many of each object will be in memory. On any given instance, I may have 100 users doing X, and 0 doing Y, or vice versa.

                                                [0] Well, excepting the one project that will only run in CI or on end-user laptops on another specific version, but same principle.

                                              2. 10

                                                It does read like the combination of the microbenchmarks and the gc tuning doesn’t let the generational hypothesis apply, which I think is where the author ends up.

                                                Fwiw, 16MB in a 300MB heap does seem like a small nursery to me. Iirc, the hotspot JVM allocates 25% of the total heap to the nursery if you don’t otherwise configure it.

                                                1. 4

                                                  Perhaps they didn’t read the Blackburn, Cheng, and McKinley paper they referenced in the 2nd paragraph?

                                                  • the best GC is never GCing, just let the OS clean up when you quit

                                                  • if you can’t afford the address space to never GC (aka your program is too long-running) then the next best thing it to make the bump-ptr nursery as large as you can afford i.e. half the free memory space, so you’re guaranteed to be able to copy everything to the mature generation if it turns out to be all live.

                                                  1. 9

                                                    That may work for servers, or devices running a single process. It gets more complicated when you’re one of many processes in memory. What does “half the free memory space” even mean? What happens when five or six processes all try to allocate that much and fill it with objects? How much does GC slow down when a bunch of your nursery has been paged out by the time you collect it?

                                                    1. 9

                                                      Deciding the initial size of the heap is one of those engineering questions that has no perfect answer. Make it too large and indeed you cause conflict with other users / tasks, make it too small and you can waste a lot of time on GCs while you slowly grow your heap.

                                                      On a machine with 16 GB or 128 GB or whatever I’d suggest that starting with, say, a 64 KB heap would be unnecessarily restrictive. 8 GB would be too much. Pick the answer in between that you are comfortable with. Perhaps do some research and find out what heap size most programs settle down to and allocate that, or something close to it (a half or a quarter, say) right from the start.

                                                      I’ve been contributing code to and tuning real-world GCs for about 25 years on a wide variety of devices, from embedded systems with a few KB of RAM, to mid 2000’s “feature phones” with 400 KB to 2 MB RAM (at Innaworks), to modern Android and iOS phones and PCs and servers. I’ve worked for example at Samsung R&D tuning and improving GC algorithms for Android Dalvik/ART and the Tizen DotNet JIT/runtime.

                                                      1. 5

                                                        Paging anonymous memory to disk and having in process scanning garbage collection are indeed basically oil and water.

                                                        1. 1

                                                          So I haven’t built a system of this sort, and I’m sure there are better choices than what I can come up with in the comments here, but start by allocating a block of memory.

                                                          When that block approaches being half full, do a GC, copying to the free space.

                                                          If that didn’t free enough memory, ask the OS for more—not a small amount more, but a lot more (maybe double what you had before). If you’re using a “large enough” proprortion of physical memory, don’t ask for more. Continue processing but you’ll probably enter a GC death spiral where you’re repeatedly calling GC without doing much work. When that happens, die.

                                                          If you did start swapping…curse overcommit? (You can actually track the amount of time spent in GC and die in this scenario).

                                                          1. 2

                                                            I think systems that are willing to overcommit anonymous memory are probably less likely to come configured with disk swap, because it’s less necessary for correct operation in the average case (i.e., when there’s not a run on the bank for pages). Instead you tend to have the OOM killer, a dubious foreclosure mechanism.

                                                            In a system without overcommit, an allocation request requires a strict reservation of either physical memory, or somewhere else to be able to promise you’ll be able to store the data so that you’re never caught in a situation that the OOM killer would putatively solve. Classical UNIX things like fork() don’t work very well if you don’t have a bunch of swap to paper over the short term duplication of reservation. Or, you don’t get to make good use of the physical memory you actually have – some of it has to remain fallow for things to work.

                                                            1. 2

                                                              MacOS has both overcommit and disk swap. There’s no OOM killer (unlike iOS, which does not swap.) if paging gets excessive or disk space runs too low, a supervisor process starts suspending processes and pops up an alert asking you to select apps to be killed. It’s happened to me a few times and it works pretty well at letting you fix things without having to reboot.

                                                              1. 1

                                                                It also has the “compressed memory” business, which I find somewhat fascinating. It’s novel, as far as swap targets go, except that it feels like it would be impossible to assess the capacity of it like you can with, say, a disk partition. I guess the fact that all Mac OS X systems are interactive allows you to do things like the pause and prompt you’re describing.

                                                                1. 2

                                                                  It isn’t novel, Windows and Linux both have memory compression paging support. In Linux their is both zram and zswap (I’ve only ever used zswap). I even remember seeing on Phoronix that Linux recently got support for using the hardware accelerated compression engine on Intel’s server chips for zswap. Windows has memory compression, the easiest way to tell that it does is it shows how much memory has been saved by compression in task manager.

                                                                  EDIT: Unless you meant asking the user what apps to kill part, I agree that is novel.

                                                                  1. 1

                                                                    The concept goes back at least as far as Connectix’s RAMDoubler utility from the mid-90s.

                                                        2. 1

                                                          The article also makes the point that nursery size is important. Can a generational collector self-tune in this regard?

                                                          e.g. increase/decrease proportional nursery size after every nursery GC depending on proportion of nursery items freed. (High %age => bigger nursery may be useful and vice versa). Of course, this is swapping one heuristic for another, but this one might help tune the system to different kinds of workload?

                                                          1. 4

                                                            Some collectors can self-tune, yes. The G1GC for Hotspot has a large number of independent regions, and after each young-gen collection, it can adjust the number of them to try and hit pause and throughput goals (which are the standard way of tuning that collector).

                                                        3. 9

                                                          I have a pair of old domains where renewal automation is broken, and every 3 months, I manually renew and say “I really gotta automate this.”

                                                          …I guess we’ll find out if ending email notifications is the stick that gets me to do it.

                                                          1. 3

                                                            Every few days it seems like I see a new post which basically goes: ‘I did this thing with an LLM, I used to think LLMs were bad now I think they are good but not so good that I will lose my job’. I don’t disagree with the sentiment but it has been said already imo.

                                                            1. 3

                                                              I agree with the sentiment and the quantum of posts on this.

                                                              It is my experience too (I’ve been using cursor / windsurf) to develop apps.

                                                              I still welcome such posts and read because it is “comforting” to know, people have approached these LLMs from different angles and all reach the same conclusion.

                                                              It also gives me a buffer to take a step back, stop being filled with FOMO, read other’s experiments and decide to get in only if I see 10x results different from ine.

                                                              1. 3

                                                                I remain skeptical of the “no threat to my job” point, despite hoping for it to be true. I think too many of the people who say this sort of thing are in a position where it would be Very Bad for their job to become obsolete. Which means that they evaluate these tools looking for a reason why it cannot replace them.

                                                                1. 6

                                                                  I’m in a position of hiring software developers on a constrained budget, so it would be Very Good for my job if I could hire fewer people to achieve the same things (or, ideally, more).

                                                                  Everything I’ve seen indicates that, except in situations where the developer is coming to a completely new environment (e.g. a systems programmer writing some in-browser JavaScript for the first time or doing some numerical analysis in Python), they are a net productivity drain. They let experienced developers accomplish less in the same time because they’re spending more time fixing bugs that they would never have introduced. The code that these things generate is the worst kind of buggy code: code that looks correct (complete with misleading comments, in some cases).

                                                                  1. 1

                                                                    I would love to hear your thoughts on this blog post in that case, since its author espouses Claude’s productivity benefits to them.

                                                                    1. 5

                                                                      Three things I’d note from the post:

                                                                      • It’s a translation problem. LLMs are fairly good at translation for natural languages but they tend to fall down on nuance and homonyms where the context matters. Programming languages, by design, typically don’t have that property. Translating between languages is something I’d expect them to be moderately good at, with the caveat that the program as written must be representable in both languages. It looks as if the input was using flat or tree-structured data. Trying it with something that included cyclic data structures (which can’t be expressed in Rust without some explicit cycle breaking) would be interesting. I very rarely encounter translation problems in software development though, so this is an outlier.
                                                                      • The post explicitly says that the author ‘ didn’t bother verifying how well Claude’s Rust code matched the original elisp, because I don’t actually care until and unless it has bugs I notice’. That kind of YOLO development is totally fine for a personal project that no one else uses. Not something you should rely on for anything you might want a customer to use. My experience with LLM-generated code is that the bugs are much harder to find than in human-written code because, by design, they produce code that looks correct.
                                                                      • There is no mention at all of how maintainable the code is (except near the end ‘ You can now generate thousands of lines of code at a price of mere cents; but no human will understand them’). If LLMs could completely replace programmers, that wouldn’t matter: if requirements change, just tell them to generate new code. But with LLMs, that will often introduce changes that subtly break existing functionality.

                                                                      I agree with the premise of the article: for low stakes development (no one cares if it’s wrong, nearly right code is better than no code, which covers a lot of places where currently there is no code being written), LLMs are probably a win. I’d still be concerned about the studies that link LLM use to a reduction in critical thinking ability and to reduced domain-specific learning there, because I suspect they will widen, rather than narrow, the gap between people who can and can’t program.

                                                                      1. 1

                                                                        Thanks for sharing, I enjoyed reading what you had to say.

                                                              2. 2

                                                                To go meta: the prevailing sentiment on lobsters is so negative on LLMs that I think we need more people with credibility (like Nelhage or Simon Willison) to post their experiences.

                                                                Everyone should make up their own mind how useful LLMs are, but we need to break the meme that the only people interested in them are wild-eyed futurists or management types scheming to deskill programmers (that argument is a sort of reverse argument from authority).

                                                                1. 10

                                                                  I’d like to object to your characterization of the Lobsters “prevailing sentiment” opposition to genAI for programming or in general. Off the top of my head, here are a few reasons for opposition that you didn’t mention:

                                                                  • Awareness of hype cycles in general, and experience thereof. (Remember the Metaverse? Blockchain everything?)
                                                                  • Ethical and legal issues around the provenance of training data and process, almost entirely proprietary and secret even for current open-weight models
                                                                  • Dependency on cloud services in general (there’s a long tradition of that here)
                                                                  • Dependency on heavy GPU computing even when local, with associated energy costs
                                                                  • Introduction of black boxes into large projects, where nobody ever understood how it works
                                                                  • Hallucinations and all their ramifications
                                                                  • Amplification of incumbent technologies which are highly represented in training data
                                                                  • Damage to long-established learning and career growth pathways for junior engineers

                                                                  I welcome the debate, and I do think that there are useful perspectives to be heard from the pro-genAI camp. But dismissive strawmanning basically never furthers that end. Lobsters is a rare oasis that maintains a culture of encouraging quality discourse. More pro-genAI experience reports from more credible sources will continue to experience pushback here, for the reasons above and probably some others I missed. Join the debate, by all means! But don’t just try to drown out arguments against your favored position. We can go anywhere else on the Internet for that kind of shouting-past style.

                                                                  1. 3

                                                                    Simon Willison is a member and his posts have been submitted multiple times: https://lobste.rs/domains/simonwillison.net

                                                                    My take from reading them is that LLMs can be a good rubber duck, but that in that case they are the world’s most expensive rubber duck.

                                                                    To go meta on your meta: why do we need “fair and balanced” views on LLMs here on lobste.rs? Those members of the community who find them useful and productive can just… use them, and refrain from posting if they’re getting flamed for doing so. I’m sure there are plenty of members of this community who do good productive work in “unpopular” programming languages who don’t feel the need to broadcast that.

                                                                    What makes LLM use special?

                                                                    1. 4

                                                                      To go meta on your meta: why do we need “fair and balanced” views on LLMs here on lobste.rs? Those members of the community who find them useful and productive can just… use them, and refrain from posting if they’re getting flamed for doing so. I’m sure there are plenty of members of this community who do good productive work in “unpopular” programming languages who don’t feel the need to broadcast that.

                                                                      Fundamentally, we need accuracy. And the memes that I’m describing are inaccurate.

                                                                      I’m struggling to even understand your perspective. You seem to be saying we should just live with flaming users of unpopular languages. I’d normally consider that a reductio, and I would’ve considered saying “the current LLM reaction is as if anytime someone posted an article about a C/C++ tool, most comments were to say it’s dumb because no one should write in C.”

                                                                      While I’m happy to say that certain languages are badly designed, and you have to live with the occasional “it would be better to rewrite it in Rust” or “we should avoid just rewriting software in Rust” comment, I do not think we should be flaming users of unpopular languages, or users of LLMs, or people who don’t use LLMs. A certain degree of criticism is fine. A knee-jerk echo chamber is bad.

                                                                      Note that despite Simon being a very productive member of the community, his last post got flagged as spam for no good reason, and many of his other LLM posts have several spam votes. https://lobste.rs/s/oclya6/building_python_tools_with_one_shot.

                                                                      1. 2

                                                                        My point is that a productive user of PHP, say, might find the Lobsters isn’t the best venue to discuss PHP, because there will probably be a vocal minority of hecklers dumping on their language choice. But that’s ok, because there are other venues which are more welcoming.

                                                                        It’s the same with GenAI. There’s a section of the userbase that doesn’t like the technology, and who are prepared to let others know they don’t like it. Either tune them out, or discuss GenAI somewhere else, or just use it in your daily life and be happy and productive.

                                                                2. 11

                                                                  Publishing would be slower—in some cases, much slower

                                                                  Is this really that much of a downside? I can’t imagine publishing to be something that is done often enough to warrant concern about this.

                                                                  1. 4

                                                                    It’s probably an issue for companies that publish private packages many times a day from CI.

                                                                    1. 2

                                                                      Is a marginal slowdown really that important in CI, as opposed to something being run on a laptop where the developer is interactively waiting for npm publish to complete?

                                                                      (To be fair, I’m kind of just stirring the pot - an obvious retort to this question might be, actually, private packages tend to be huge and this would balloon the time on the order of tens of minutes. I don’t know whether that’s true or not.)

                                                                      1. 3

                                                                        You could also make it opt-out. Default to slow, strong compression, but if it causes a major performance regression on your deployment, toggle a flag and you’re back on the old behaviour.

                                                                  2. 17

                                                                    At a technical level, while I understand the appeal of sticking to DEFLATE compression, the more appealing long term approach is probably to switch to zstd–it offers much better compression without slowdowns. It’s a bigger shift, but it’s a much clearer win if you can make it happen.

                                                                    I admit to being a bit disappointed by the “no one will notice” line of thinking. It’s probably true for the vast majority of users, but this would rule out a lot of useful performance improvements. The overall bandwidth used by CI servers and package managers is really tremendous.

                                                                    1. 14

                                                                      Node already ships Brotli, and Brotli works quite well on JS, it basically has been designed for it.

                                                                      1. 1

                                                                        Took me a minute to realize that by “on JS” you meant, on the contents of .js/.mjs files. At first I thought you meant, to be implemented in JS. Very confusing :D

                                                                      2. 5

                                                                        the more appealing long term approach is probably to switch to zstd–it offers much better compression without slowdowns.

                                                                        Yes, especially since the change can’t recompress older versions anyway because of the checksum issue. Having a modern compression algorithm could result in smaller packages AND faster/equivalent performance (compression/decompression).

                                                                        1. 3

                                                                          I agree. Gzip is just about as old as it gets. Surely npm can push for progress (I’m a gzip hater, I guess). That said,

                                                                          Dictionaries can have a large impact on the compression ratio of small files, so Zstandard can use a user-provided compression dictionary.

                                                                          I do wonder if npm could/would come up with a custom dictionary that would be optimized for, well, anything at all, be it the long tail of small packages or a few really big cornerstones.

                                                                          [1] https://en.wikipedia.org/wiki/Zstd

                                                                          1. 3

                                                                            HTTP is adding shared dictionary support, including a Brotli archive that is like zip + Brotli + custom dictionaries:

                                                                            https://datatracker.ietf.org/doc/draft-vandevenne-shared-brotli-format/13/

                                                                          2. 2

                                                                            I agree a better compression algorithm is always nice, but here back-compat is really important given there’s lots of tools and users.
                                                                            It’s a whole other level of pain to add support a format existing tools won’t support, it’s not even sure the NPM protocol was built with that in mind. And a non back-compat compression might even make things worse in the grand scheme of things: you need 2 versions of the packages, so more storage space, and if you can’t add metadata to list available formats you get clients trying more than one URL, increasing server load.

                                                                          3. 2

                                                                            Wayland bad. X11 good. /s

                                                                            Imagine if all that Wayland effort was put into fixing the supposed issues with X11…

                                                                            1. 20

                                                                              They can’t be fixed, these are design issues. And this is said by the very developers and maintainers of X, who moved on to develop Wayland.

                                                                              1. 22

                                                                                This is false, completely false, but repeated over and over and over and over and over and over and over again, then upvoted over and over and over again.

                                                                                It can be fixed, and the Wayland developers said it can be fixed, they just didn’t want to.

                                                                                See the official wayland faq: https://wayland.freedesktop.org/faq.html#heading_toc_j_5

                                                                                “ It’s entirely possible to incorporate the buffer exchange and update models that Wayland is built on into X.”

                                                                                From their own website, in their own words.

                                                                                1. 29

                                                                                  they just didn’t want to.

                                                                                  And that’s a good enough reason. As far as I know, none of us are paying them for their work. Since we benefit from what they (or their employers) are freely giving, we have no right to complain.

                                                                                  1. 6

                                                                                    I’m not an X11 partisan, it seems plausible enough that Wayland is a good idea, but you’re just begging the question that we’re benefiting from what they’re freely giving.

                                                                                    We’re using what they’re freely given, but it’s not guaranteed that we’re benefiting.

                                                                                  2. 16

                                                                                    With the risk of engaging with someone who’s words read as an emotional defence…

                                                                                    What do you suppose the benefits of sticking with X.ORG were?

                                                                                    I know of no person who could reasonably debug it- even as a user, and the code itself had become incredibly complex over time, accumulating a vast array of workarounds.

                                                                                    The design (I’m told) was gloriously inefficient but hacked over time to be performant - often violating the principles in which it existed in the first place.

                                                                                    It also forced single-threaded compositing due to the client-server architecture…

                                                                                    Fixing these issues in X11 would have been incredibly disruptive, and they chose to do the python3 thing of a clean break, which must have been refreshing.

                                                                                    The “Hate” seems to be categorisable in a few ways:

                                                                                    • “My global hotkeys don’t work” - which, is something your window manager should have had a say in, not any random program.
                                                                                    • “Screensharing is broken” - which, hasn’t been an issue in 7 years, with the xdg-portal system.
                                                                                    • “My program does not support wayland”; which is odd, because every UI toolkit supports it, exception: Java’s old SWIG, but even that supports it for 3 years now, waiting for projects to get on board.
                                                                                    • “It has bugs”; like all software, it matures over time. It has made tremendous strides in recent years and is very mature today.
                                                                                    • “X11 wasn’t broke”; but it was broke, the authors said as such based on what I said above.
                                                                                    • “Wayland is a black box”; this, is probably the only truly fair criticism of all the online vitriol that I see. It works more often out of the box without any tweaking or hacks by distro makers, but, when it doesn’t work, it’s completely opaque. I levy this same criticism against systemd.

                                                                                    Do you have something else to add to this list, or do you think I’ve mischaracterised anything?

                                                                                    1. 5

                                                                                      What do you suppose the benefits of sticking with X.ORG were?

                                                                                      It just works for tons of people. X runs the software I want it to run, and it does it today. Wayland does not.

                                                                                      I think I could make it work with probably hundreds of hours of effort but…. why? I have better things to do than run on the code treadmill.

                                                                                      I know of no person who could reasonably debug it

                                                                                      I’ve encountered one bug in X over the last decade. I make a repro script, git cloned the x server, built it, ran my reproduction in gdb, wrote a fix, and submitted it upstream in about 30 minutes of work. (Then subsystem maintainer Peter Hutterer rewrote it since my fix didn’t get to the root cause of the issue, my fix was to fail the operation when the pointer was null, his fix ensured the pointer never was null in the first place. So the total time was more than the 30 mins I spent, but even if my original patch was merged unmodified, it would have solved the issue from my perspective.)

                                                                                      After hearing all the alleged horror stories, I thought it’d be harder than it was! But it wasn’t that bad at all. Maybe other parts of the code are worse, I don’t know. But I also haven’t had a need to know, since it works for me (and also for a lot of other people).

                                                                                      Also important to realize that huge parts of the X ecosystem are outside the X server itself. The X server is, by design, hands off of a lot decisions, allowing independent innovation. You can have a major, meaningful contribution to the core user experience without ever touching the core server code, by working on inter-client protocols, window managers, input methods, toolkits, compositors, desktop panels, etc., etc., etc.

                                                                                      clean break, which must have been refreshing.

                                                                                      Every programmer likes the idea of a clean break. Experienced programmers usually know that the grass might be greener now if you rip out Chesterson’s fence, but it won’t stay that way for long.

                                                                                      1. 3

                                                                                        X runs the software I want it to run, and it does it today.

                                                                                        Keep running it then, nobodies stopping you if it works, it just won’t be updated.

                                                                                        With all the posturing you’d think someone would have stepped up to maintain it; based on this comment you have the capability.

                                                                                        1. 6

                                                                                          With all the posturing you’d think someone would have stepped up to maintain it;

                                                                                          Like I said before, I had one issue with it recently, last year, and the bug report got a quick reply and it was fixed.

                                                                                          I don’t know what you expect from an open source maintainer, but replying promptly to issues, reviewing and merging merge requests, and fixing bugs is what I expect from them. And that’s what I got out of the upstream X team, so I can’t complain. Maybe I’m the lucky one, idk, but worst case, if it does ever actually get abandoned, yeah, compiling it was way easier than I thought it would be, so I guess I could probably do it myself. But as of this writing, there’s no need to.

                                                                                      2. 4

                                                                                        I know of no person who could reasonably debug it- even as a user, and the code itself had become incredibly complex over time, accumulating a vast array of workarounds.

                                                                                        Yes, I remember looking at the code of xinit (that is supposed to be a small shell script just to start the X server) and it was… bad to say the least. I also had some experience implementing a protocol extension (Xrandr) in python-xlib and while I was surprised it worked first try with me just following the X11 documentation at the time, it was really convoluted to implement even including the fact that the code base of python-xlib already abstracted a lot for me.

                                                                                        I don’t know if I designed a replacement of X.org it would like Wayland, I think the least X.org needed was a full rewrite with lots of testing, but it wouldn’t happen considering even the most knowledge people in its codebase don’t touch parts of it because of fear. If anything, Wayland is good because there are people that have entusiasm hacking its code base and constantly improving it, something that couldn’t be said about X.org (that even before Wayland existed, had a pretty slow development).

                                                                                        1. 3

                                                                                          I think a lot of the hate happens because “it has bugs” magnifies every other complaint.

                                                                                          The xdg-portal system certainly addresses things like screensharing on a protocol level. But pretty much every feature that depends on it – screen sharing, recording etc. – is various kinds of broken at an application level.

                                                                                          E.g. OBS, which I have to use like twice a year at most, currently plays a very annoying trick on me where it will record something in a Wayland session, but exactly once. In slightly different ways – under labwc the screen video input only shows up the first time, then it’s gone; on KDE it will produce a completely blank video.

                                                                                          I’m sure there’s something wrong with my system but a) I’ve done my five years of Gentoo back in 2005, I am not spending any time hunting bugs and misconfiguration until 2035 at the very least and b) that just works fine on X11.

                                                                                          If one’s goal is to hate Wayland, b) is juuuust the kind of excuse one needs.

                                                                                          1. 4

                                                                                            I understand the frustration, but I was raised on X11 and these bugs sound extremely milquetoast compared to the graphical artifacting, non-starting and being impossible to debug, mesa incompatibility, proprietary extensions not driving the screen and the fucking “nvidia-xconfig” program that worked 20% of the time.

                                                                                            X11 is not flawless either, we got better at distro maintainers handling various hacks to make it function well enough- especially out of the box, but it remains one of the more brittle components of desktop linux by a pretty wide margin.

                                                                                            1. 4

                                                                                              Oh, no, I’m not disagreeing, that was meant as a sort of “I think that’s why Wayland gets even more hate than its design warrants”, not as a “this is why I hate Wayland”. I mean, I use labwc, that occasional OBS recording is pretty much the only reason why I keep an X11 WM installed.

                                                                                              I’m old enough for my baseline in terms of quirks to be XFree86, nvidia-xconfig felt like graduating from ed to Notepad to me at some point :-D.

                                                                                              1. 1

                                                                                                mesa incompat wil bite people in WL too.

                                                                                          2. 7

                                                                                            But then they’d be maintaining this new model and the rest of X.Org, which the maintainers could not do. It might not matter that it’s technically possible if it’s not feasible.

                                                                                            Wayland is the new model plus xwayland so that the useful parts of X11 keep working without them having to maintain all the old stuff.

                                                                                            1. 8

                                                                                              But then they’d be maintaining this new model and the rest of X.Org, which the maintainers could not do.

                                                                                              Again, provably false - they are maintaining the rest of X.Org, first like you said, xwayland is a thing… that’s all the parts they said were obsolete and wanted to eliminate! But also from the same FAQ link:

                                                                                              Why duplicate all this work?

                                                                                              Wayland is not really duplicating much work. Where possible, Wayland reuses existing drivers and infrastructure. One of the reasons this project is feasible at all is that Wayland reuses the DRI drivers, the kernel side GEM scheduler and kernel mode setting. Wayland doesn’t have to compete with other projects for drivers and driver developers, it lives within the X.org, mesa and drm community and benefits from all the hardware enablement and driver development happening there.

                                                                                              A lot of the code is actually shared, and much of the rest needs very little work. So compared to what is actually happening today, not developing Wayland would have been less maintenance work, not more. All that stuff in the kernel, the DRI stuff, even the X server via xwayland, is maintained either way. Only difference is now they’ve spent 15 years (poorly) reinventing every wheel to barely regain usability parity because it turns out most the stuff they asserted was useless actually was useful and in demand.

                                                                                              I like to say Graphics are, at best, 1/3 of GUI. OK, let’s assume you achieved “every frame is perfect”. You still have a long way to go before you have a usable UI.

                                                                                              1. 14

                                                                                                That’s like saying that we are maintaining both English and German language. Who are “they”? And X is very much in life support mode only - take a look at the commit logs.

                                                                                                XWayland is much smaller than the whole of X, it’s just the API surface necessary to keep existing apps working. That’s like saying that a proxy is a web browser..

                                                                                                Code sharing:

                                                                                                The DRM subsystem and GPU code is “shared”, as in it is properly modularized in Linux and both make use of it. If anything, Wayland compositors make much better use of the actual Linux kernel APIs, and are not huge monoliths with optional proprietary binary blobs. It’s pretty trivial to write a wayland compositor from scratch with no external libraries at all - where is the shared X code?

                                                                                                1. 8

                                                                                                  Where is the shared X code?

                                                                                                  All right here, notice how xwayland is just one of the many “hw” backends of the same xserver core:

                                                                                                  https://gitlab.freedesktop.org/xorg/xserver

                                                                                                  There’s also backends for other X servers (xnest), Macs (xquartz), Windows (xwin), and, of course, the big one for physical hardware (xfree86, which is much smaller than it used to be - over 100,000 lines of code were deleted from there around 2008 - git checkout a commit from 2007 and find … 380k lines according to a crude find | wc for .c files, git checkout master and find 130k lines by the same measure in that hw/xfree86 folder. Yeah, that’s still a lot of code, but much less than it was, because yes, some if it was simply deleted, but also a lot of it was moved - so nowadays both the X server and Wayland compositors can use it).

                                                                                                  But outside of the hw folder, notice how much of the X server core code is shared among all these implementations, including pretty much every user-facing api and associated implementation bookkeeping in the X server.

                                                                                                  XWayland not only is a whole X server, it is a build of the same X.org code.

                                                                                                  1. 10

                                                                                                    “Maintaining” can mean different things. The X server is “maintained” in the sense that it’s kept working for the benefit of XWayland, but practically nobody is adding new features any more. Not having to maintain the “bare metal” backend also removes a lot of the workload in practice.

                                                                                                    This is a much simpler and less work than continuing to try to add features to keep the X protocol up to date with the expectations of a modern desktop.

                                                                                                    In other words, yes, the X server is maintained, in the sense that it’s in maintenance mode, but there’s very little active development outside of XWayland-specific stuff.

                                                                                                    If you look at the Git tags on that repo, you’ll see that XWayland is also on a completely different release cadence now, with a different version number and more frequent releases. So even though it’s the same repo, it’s a different branch.

                                                                                                    1. 5

                                                                                                      XWayland is not a mandatory part of the Wayland protocol, though. Of course they chose the easiest/most compatible way to implement the functionality, which will be to build on the real thing, but it’s a bit dishonest on your part to say that an optional part, meant to provide backwards compatibility, could be considered “shared code”.

                                                                                                  2. 7

                                                                                                    I have to admit that I don’t know much about X.Org’s internals, but I think a lot of that stuff extracted to libraries and into the kernel is in addition to, not replacing, the old stuff.

                                                                                                    For example X.Org still has to implement its own modesetting in addition to KMS. It has to support two font systems in addition to whatever the clients use. It has to support the old keyboard configuration system and libxkbcommon. It has to support evdev and libinput. It has to implement a whole graphics API in addition to kernel DRM.

                                                                                                    Wayland can drop all this old stuff and just use the new. Xwayland isn’t X.Org, I don’t think it has to implement any of this. It’s “just” a translation layer for the protocol.

                                                                                                    not developing Wayland would have been less maintenance work, not more.

                                                                                                    Please be careful about assuming what somebody else will find easy / less work. If the maintainers said they can’t support it anymore, I’m inclined to believe them. Sometimes cutting your losses is the easier option.

                                                                                                    Only difference is now they’ve spent 15 years (poorly) reinventing every wheel to barely regain usability parity because it turns out most the stuff they asserted was useless actually was useful and in demand.

                                                                                                    I think this is unfair to people working on Wayland. I know you don’t like it, but It’s an impressive project that works really well for many people.

                                                                                                    I don’t think anybody was “asserting [features] are useless”, they just needed the right person to get involved. I’m assuming you mean things like screen sharing, remote forwarding, and colour management. People do this work voluntarily, it might not happen immediately if at all, and that’s fine.

                                                                                                    1. 9

                                                                                                      XWayland does have to implement all the font/drawing/XRender/etc. stuff, since X11 clients need that to work. That’s part of its job as a “translation” layer.

                                                                                                      (I should know, Xorg/XWayland does some unholy things with OpenGL in its GLAMOR graphics backend that no other app does, and it has tripped up our Apple Mesa driver multiple times!)

                                                                                                      But the most important part is that XWayland doesn’t have to deal with the hardware side (including modesetting, but also multi screen, input device management, and more), which is where a lot of the complexity and maintenance workload of Xorg lies.

                                                                                                2. 7

                                                                                                  The core Wayland protocol is indeed just buffer exchange and update based on a core Linux subsystem. It’s so lean that it is used in car entertainment systems.

                                                                                                  And HTTP 1.0 is another protocol, that can also be added to other programs. They are pretty trivial, obviously they can be incorporated into X. I can also attach an electric plug to my water plumping, but it wouldn’t make much sense either. Having two ways to do largely overlapping stuff, that will just overstep each other’s boundaries would be bad design.

                                                                                                  Wayland has a ready-made answer to “every frame is perfect” – adding that to X would just make some parts of a frame perfect. Wayland also supports multiple displays that have different DPIs, which is simply not possible at all in X.


                                                                                                  GPUs weren’t even a thing when X was designed - it had a good run, but let it rest now. I really don’t support reinventing the wheel unnecessarily, that seems all too common in IT, but there are legitimate points where we have to ask if we really are heading in the direction we want to go. It is the correct decision in case of X vs Wayland, as seen by the display protocols of literally every other OS, whose internals are pretty similar to it.

                                                                                                  1. 10

                                                                                                    Wayland also supports multiple displays that have different DPIs, which is simply not possible at all in X.

                                                                                                    Then how do you explain it being implemented and functional and working in X (with Qt or KDE applications) since at least 2016? Its configuration is a bit janky https://github.com/qt/qtbase/blob/1da7558bfd7626bcc40a214a90ae5027f32f6c7f/src/gui/kernel/qhighdpiscaling.cpp#L488 but it does work.

                                                                                                    1. 13

                                                                                                      EDIT: I posted a video comparison in this comment because I’m tired of arguing in text about something that is obvious when you see it in real life. X11 and Wayland are not the same, and only Wayland can seamlessly handle mixed DPI.

                                                                                                      We already went over this in another thread here in the past. X11 does not implement different DPIs for different monitors today, and it doesn’t work out of the box (where you say “janky”, what you really mean is “needs manual hacks and configuration and cannot handle hotplug properly”).

                                                                                                      Even if you did add the metadata to the protocol (which is possible, but hasn’t been done), it’s only capable of sudden DPI switching when you move windows from one screen to another.

                                                                                                      X11 cannot do seamless DPI transitions across monitors, or drawing a window on two monitors at once with the correct DPI on both, the way macOS and KDE Wayland can, because its multi-monitor model, which is based on a layout using physical device pixels, is incompatible with that. There’s no way to retroactively fix that without breaking core assumptions of the X11 protocol, which would break backwards compatibility. At that point you might as well use Wayland.

                                                                                                      1. 5

                                                                                                        We already went over this in another thread

                                                                                                        Indeed and I am not sure why you keep repeating the same misconceptions despite being told multiple times that all these things do work in X11.

                                                                                                        The links posted by @adam_d_ruppe are good, but I think the oldest post talking about it is oblomov’s blog post. He even had patches that implement mixed DPI for GTK on X11, but obviously the GTK folks would never merge something that would improve their X11 backend (mind you, they are still stuck on xlib).

                                                                                                        It gets even funnier, because proper fractional mixed DPI scaling was possible in X11 long before Wayland got the fractional scaling protocol, so there was ironically a brief period not long ago where scaling was working better in XWayland than in native Wayland (that particular MR was a prime example of Wayland bikeshedding practice, being blocked for an obnoxious amount of time mostly because GTK didn’t even support fractional scaling itself, even quite some time after the protocol was added).

                                                                                                        X11 cannot do seamless DPI transitions across monitors

                                                                                                        Yes it can and it works ootb with any proper toolkit (read: not GTK) and renders at the native DPI of each monitor switching seemlessly inbetween. We don’t even have to lock ourselfes to Qt, even ancient toolkits like wxWidgets support it.

                                                                                                        drawing a window on two monitors at once with the correct DPI on both

                                                                                                        This doesn’t work, neither on Wayland nor X11 and not a single major toolkit supports this usecase and for good reason, the complexity needed would be insane and you can forget basically about any optimizations across the whole rendering stack.

                                                                                                        Also the whole usecase is a literal edge case in its own rights, I don’t think many people are masochist enough to keep a window permanently on the edge of different-DPI monitors for a long time.

                                                                                                        the way macOS and KDE Wayland can

                                                                                                        It doesn’t work on KDE Wayland the way you think it does, you get the same sudden switch as soon as you move more than half over to the other monitor. If you don’t believe me, try setting one monitor to a scale factor that corresponds to a different physical size. Obviously you will not notice any sudden DPI changes if the scale factors end up as the same physical size, but that works in X11 too.

                                                                                                        At this point the only difference is that X11 does not have native per-window atoms for DPI, so every toolkit adds their own variables, but that hardly makes any difference in practice. And to get a bit diabolical here, since Wayland is so keen on shoehorning every usecase they missed (i.e. everything going beyond a basic Kiosk usecase) through a sideband dbus API, wp-fractional-scale-v1 might as well have become a portal API that would have worked the same way on X11.

                                                                                                        After a decade of “Wayland is the future” it is quite telling that all the Wayland arguments are still mostly based on misconceptions like this (or the equally common “the Wayland devs are the Xorg devs” - not realizing that all of the original Wayland devs have long jumped the burning ship), while basic features such as relative-window positioning or god forbid I mention network transparency are completely missing.

                                                                                                        1. 16

                                                                                                          Sigh.

                                                                                                          I’m tired of pointless arguing, so here’s a video. This is what happens on X11, with everything perfectly manually configured through environment variables (which took some experimentation because the behavior wrt the global scale is a mess and unintuitive):

                                                                                                          https://photos.app.goo.gl/H9TRvexd2SQWxLg28

                                                                                                          This is what happens on Wayland out of the box, just setting your favorite scale factors in System Settings, with no environment variable tweaks:

                                                                                                          https://photos.app.goo.gl/XjS36F2MbHye1F276

                                                                                                          In both tests the screen configuration, resolution, relative layout (roughly*), and scale factors are the same (1.5 left, 2.0 right).

                                                                                                          These two videos are not the same.

                                                                                                          I guess I should also point out the tearing on the left display and general jank in the X11 video, which are other classic symptoms of how badly X11 works for some systems. This is with the default modesetting driver, which is/was supposed to be the future of X11 backends and is the only one that is driver-independent and relies on the Linux KMS backend exclusively, but alas… it still doesn’t work well. Wayland doesn’t need hardware-specific compositor backends to work well.

                                                                                                          Also, the reason why I used a KDialog window is that it only works as expected with fixed size windows. With resizeable windows, when you jump from low scale to high scale, the window expands to fit the (now larger) content, but when you jump back, it keeps the same size in pixels (too large relative to the content now), which is even more broken. That’s something that would need even more window manager integration to make work as intended on X11. This is all a consequence of the X11 design that uses physical pixels for all window management. Wayland has no issue since it deals in logical/scale-independent units only for this, which is why everything is seamless.

                                                                                                          Also, note how the window decorations are stuck at 2.0 scale in X11, even more jank.

                                                                                                          * X11 doesn’t support the concept of DPI-independent layout of displays with different DPI, so it’s impossible to literally achieve the same layout anyway. It just works completely differently.

                                                                                                          1. 0

                                                                                                            Funnily enough on my system the KDE Wayland session behaves exactly like what you consider so broken in the X11 session.

                                                                                                            There is a lot to unpack here, but let’s repeat the obvious since you completely ignored my comment and replied with basically a video version of “I had a hard time figuring out the variables, let’s stick with the most broken one and just dump it as a mistake of X11”: What’s happening in the Wayland session is most certainly not what you think it is, Qt (or any toolkit for that matter) is not capable of rendering at multiple DPIs at the same time in the same window. As I stated before, something like that would require ridiculous complexity in the entire render path. Imagine drawing an arc somewhere in a low-level library and you suddenly have to change all your math, because you cross a monitor boundary. Propagating that information alone is a huge effort and let’s not even start with more advanced details, like rotated windows.

                                                                                                            The reason why you see no jump in the Wayland session is because you chose the scaling factor quite conveniently so that it is identical in physical size on both monitors (and that would work on X11 too). Instead of doubling down, it would have taken you 5 seconds to try out what I suggested, i.e. set one of the scale factors to one with a different physical size (maybe 1.0 left, 2.0 right) and you will observe that indeed also Wayland cannot magically render at different DPIs at the same time and yes you will observe those jumps.

                                                                                                            Now obviously your scaling factors of 1.5 vs 2.0 should produce the same result on X11. I don’t know your exact configuration, so I can only reach to my magic crystal ball, but since you already said you had a hard time figuring out the variables, a configuration error is not far fetched: Maybe you are applying scaling somewhere twice, e.g. from leftover hacks with xrandr scaling or hardcoding font DPI, or kwin is interfering in a weird way (set PLASMA_USE_QT_SCALING to prevent that). But honestly, given that you start your reply with “Sigh” and proceed to ignore my entire comment, I don’t think you are really interested in finding out what was wrong to begin with. If you are though, feel free to elaborate.

                                                                                                            In my case I have the reverse setup of yours, i.e. my larger monitor is 4k and my laptop is 1080p, so I apply the larger scale factor to my larger screen (I don’t even know my scaling factors, I got lucky in the EDID lotto and just use QT_USE_PHYSICAL_DPI). So yes this means also on Wayland I get the “jump” once a window is halfway over to the next monitor. The size explosions are not permanent as you say though, neither on Wayland nor on X11.

                                                                                                            1. 9

                                                                                                              What’s happening in the Wayland session is most certainly not what you think it is, Qt (or any toolkit for that matter) is not capable of rendering at multiple DPIs at the same time in the same window.

                                                                                                              She obviously doesn’t think that that is what’s happening. In the comment you yourself linked, she explains how it works:

                                                                                                              KWin Wayland is different. It arranges monitors in a logical coordinate space that is DPI-independent. That means that when you move a window from a 200 scale monitor to a 150 scale monitor, it’s already being scaled down to 150 scale the instant it crosses into the second monitor, only on that monitor. This doesn’t require applications to cooperate in any way, and it even works for X11 applications with XWayland, and has worked that way for over two years. Windows never “jump” size or appear at the wrong DPI, partially or fully, on any monitor. It’s completely seamless, like macOS.

                                                                                                              What you need application cooperation for is to then adjust the window buffer scale and re-render the UI to optimize for the monitor the window is (mostly) on. That’s more recent functionality, and only works for Wayland apps that implement the fractional scaling protocol, not X11 apps. For X11 apps on XWayland, KWin chooses the highest screen scale, and scales down on other screens. The only visual difference is that, for Wayland apps with support, the rendering remains fully 1:1 pixel perfect and sharp once a window is moved to a new screen. The sizing behavior doesn’t change.

                                                                                                              With this correct understanding, we can see that the rest of your comment is incorrect:

                                                                                                              The reason why you see no jump in the Wayland session is because you chose the scaling factor quite conveniently so that it is identical in physical size on both monitors (and that would work on X11 too). Instead of doubling down, it would have taken you 5 seconds to try out what I suggested, i.e. set one of the scale factors to one with a different physical size (maybe 1.0 left, 2.0 right) and you will observe that indeed also Wayland cannot magically render at different DPIs at the same time and yes you will observe those jumps.

                                                                                                              I just tried using the wrong DPI and there was no jump (I’m on Sway). On on screen, the window I moved was much bigger, and on the other, it was much smaller. But it never changed in size. The only thing that changed was the DPI it was rendering it, while seamlessly occupying the exact same space on each monitor as it did so. This works because Wayland uses logical coordinates instead of physical pixels to indicate where windows are located or how big they are. So when a window is told to render at a different scale, it remains in the same logical position, at the same logical size.

                                                                                                              There is a noticeable change, but it’s just the rendering scale adjustment kicking in causing the text on the monitor the window is being moved into to become pixel sharp, and the text in the old monitor getting a bit of a blur.

                                                                                                        2. 4

                                                                                                          drawing a window on two monitors at once with the correct DPI on both, the way macOS and KDE Wayland can

                                                                                                          My Mac refuses to show a window spanning two monitors.

                                                                                                          1. 9

                                                                                                            This changed a few years ago to allow per-display spaces (what Linux calls “workspaces”) - presumably they decided they didn’t want to deal with edge cases where a window is on an active space on one monitor and an inactive space on another. (Or what happens if you move a space containing half a window across monitors?)

                                                                                                            You can get the old behavior back by turning off Settings > Desktop & Dock > Mission Control > Displays have separate spaces.

                                                                                                          2. -1

                                                                                                            I’m tired of arguing in text about something that is completely obvious when you see it in real life.

                                                                                                            Indeed - the fact that it does actually work, albeit with caveats, proves that it is, not, in fact, “simply not possible at all”.

                                                                                                            We can discuss the pros and cons of various implementations for various use cases, there’s legitimate shortcomings in Qt and KWin, some of which are easy to fix* (e.g. the configuration ui, the hotplugging different configurations), some are not (the window shape straddling monitor boundaries), there’s some advantages so it too (possibly better performance and visual fidelity), but a prerequisite to a productive technical discussion is to do away with the blatant falsehoods that universally start these threads.

                                                                                                            “You can do it, but….” is something reasonable people can discuss.

                                                                                                            “It is simply not possible at all” is provably flat-out false.

                                                                                                            • I know these are easy to fix because they do work on my computer with my toolkit. But I prefer to work in verifiable primary sources that interested people can try for themselves - notice how most my comments here have supporting links - and Qt/KDE is mainstream enough that you can try it yourself, likely using programs you already have installed, so you don’t have to take my word for it.

                                                                                                            I appreciate that you’ve now tried it yourself. I hope you’ll never again repeat the false information that it is impossible.

                                                                                                            1. 8

                                                                                                              “It is simply not possible at all”

                                                                                                              The use of quote marks here implies that the commenter you’re replying to used this exact term in their comment, but the only hit for searching the string is your comment.

                                                                                                              I’m flagging this comment as unkind, because my reading of this and other comments by you in this thread is that you are arguing in bad faith.

                                                                                                              1. 3

                                                                                                                The use of quote marks here implies that the commenter you’re replying to used this exact term in their comment, but the only hit for searching the string is your comment.

                                                                                                                https://lobste.rs/s/oxtwre/hard_numbers_wayland_vs_x11_input_latency#c_argozj

                                                                                                                Try not to accuse people of personal attacks - which is itself a personal attack, you’re calling me an unkind liar - without being damn sure you have your facts right.

                                                                                                                1. 10

                                                                                                                  That’s a different person. The person you replied to did not say that.

                                                                                                                  This is what the person you replied to actually said:

                                                                                                                  X11 cannot do seamless DPI transitions across monitors, or drawing a window on two monitors at once with the correct DPI on both, the way macOS and KDE Wayland can, because its multi-monitor model, which is based on a layout using physical device pixels, is incompatible with that.

                                                                                                              2. 6

                                                                                                                Looking at the two videos it’s pretty obvious that they are not doing the same thing at all. That dialog window is not being drawn with the correct DPI on each monitor, it’s either one or the other. “Mixed” is sufficiently elastic a word that I’m sure some semantic tolerance helps but I’m not exactly inclined to call that behaviour “mixed”, just like I also can’t point at the bottle of Coke in my fridge, the stack of limes in my kitchen and the bottle of rum on my shelf and claim that what I actually have is a really large Cuba Libre. (Edit:) I.e. because they’re not mixed, they’re obviously exclusive.

                                                                                                                I don’t know if that’s all that X11 can do, or if it’s literally impossible to achieve what @lina is showing in the second video – at the risk of being an embarrassment to nerddom everywhere I’ve stoped caring a few years back and I’m just happy if the pixels are pixeling. But from what I see in the video, that’s not false information at all.

                                                                                                                1. 4

                                                                                                                  100% the same thing as Wayland is impossible in X. It can’t handle arranging mixed DPI monitors in a DPI-independent coordinate space, such that rendering is still pixel perfect on every monitor (for windows that don’t straddle monitors). X11 has no concept of window buffer scale that is independent of window dimensions.

                                                                                                                  The closest you can get is defining the entire desktop as the largest DPI and all monitors in that unit, then having the image scaled down for all the other monitors. This means you’re rendering more pixels though, so it’s less efficient and makes everything slightly blurry on the lower DPI monitors. It’s impossible to have pixel perfect output of any window on those monitors in this setup, and in practice, depending on your hardware, it might perform very poorly. It’s basically a hacky workaround.

                                                                                                                  This is actually what XWayland fakes when you use KDE. If you have mixed DPI monitors, it sets the X11 DPI to the largest value. Then, in the monitor configuration presented via fake XRandR to X11 clients, all monitors with a lower DPI have their pixel dimensions scaled up to what they would be at the max DPI. So X11 sees monitors with fake, larger resolutions, and that allows the relative layout to be correct and the positioning to work well. If I had launched KDialog under KDE Wayland with the backend forced to X11, it would have looked the same as Wayland in the video in terms of window behavior. It also wouldn’t have any tearing or glitches, since the Wayland compositor behind the scenes is doing atomic page flips for presentation properly, unlike Xorg. The only noticeable difference would have been that it’s slightly less sharp on the left monitor, since the window would be getting downscaled there.

                                                                                                                  That all works better than trying to do it in a native X11 session, because XWayland is just passing the window buffers to Wayland so only the X11 windows get potentially scaled down during compositing, not the entire screen.

                                                                                                                  Where it falls apart is hotplug and reconfiguration. There’s no way to seamlessly transition the X11 world to a higher DPI, since you have to reset all window positions, dimensions, monitor dimensions and layout, and client DPI, to new numbers. X11 can’t do that without glitches. In fact, in general, changing DPI under X11 requires restarting apps for most toolkits. So that’s where the hacky abstraction breaks, and where the proper Wayland design is required. X11 also doesn’t have any way for clients to signal DPI awareness and can’t handle mixed DPI clients either, so any apps that aren’t DPI aware end up tiny (in fact, at less than 1.0 scale on monitors without the highest DPI). This affects XWayland too and there’s no real way around it.

                                                                                                                  At best, in XWayland, you could identify which clients aren’t DPI aware somehow (like manual user config) and give them a different view of the X11 world with 1.0 monitor scales. That would mostly work as long as X11 windows from both “worlds” don’t try to cooperate/interact in some way. KDE today just gives you two global options, either what I described or just always using 1.0 scale for X11 (which makes everything very blurry on HiDPI monitors, but all apps properly scaled).

                                                                                                                  1. 2

                                                                                                                    The closest you can get is defining the entire desktop as the largest DPI and all monitors in that unit, then having the image scaled down for all the other monitors [without hotplug or per-client DPI awareness].

                                                                                                                    That’s what I thought was happening, too, but I wasn’t really sure if my knowledge was up-to-date here. Like I said, I’m just happy if my pixels are pixeling, and I didn’t want to go down a debate where I’d have to read my way through source code. This end of the computing stack just isn’t fun for me.

                                                                                                                  2. 2

                                                                                                                    I’m not exactly inclined to call that behaviour “mixed”,

                                                                                                                    This isn’t a term we invented in this thread, it is very common, just search the web for “mixed dpi” and you’ll find it, or click the links elsewhere in this thread and see how it is used.

                                                                                                                    A blog posted in a cousin comment sums it up pretty well: “A mixed-DPI configuration is a setup where the same display server controls multiple monitors, each with a different DPI.”

                                                                                                                    (DPI btw formally stands for “dots per inch”, but in practice, it refers to a software scaling factor rather than the physical size because physical size doesn’t take into account the distance the user’s eyes are from the display. Why call it DPI then? Historical legacy!)

                                                                                                                    Or, if that’s too far, go back to the grandfather post that spawned this very thread:

                                                                                                                    Wayland also supports multiple displays that have different DPIs, which is simply not possible at all in X.

                                                                                                                    “displays that have different DPIs”, again, the common definition spelled out.

                                                                                                                    What, exactly, happens when a window straddles two different monitors is implementation-dependent. On Microsoft Windows and most X systems, the window adopts the scaling factor for the monitor under its center point, and uses that across the whole window. If the monitors are right next to each other, this may cause the window to appear non-rectangular and larger on one monitor than the other. This is satisfactory for millions of people. (I’d be surprised if many people actually commonly straddle windows between monitors at all, since you still have the screen bezel at least right down the middle of it… I’d find that annoying. It is common for window managers to try to snap to monitor boundaries to avoid this, and some versions of Apple MacOS (including the Monterey 12.7.6 I have on my test computer) will not even allow you to place a window between monitors! It makes you choose one or the other.)

                                                                                                                    edit: just was reminded of this comment: https://lobste.rs/s/oxtwre/hard_numbers_wayland_vs_x11_input_latency#c_1f0zhn and yes that setting is available on my mac version, but it requires a log out and back in. woof. not worth it for a demo here, but interesting that Apple apparently also saw fit to change their default behavior to prohibit straddling windows between monitors! They apparently also didn’t see much value in this rare use case. /edit

                                                                                                                    On Apple operating systems and most (perhaps all?) Wayland implementations… and some X installs, using certain xrandr settings (such as described here https://blog.summercat.com/configuring-mixed-dpi-monitors-with-xrandr.html), they do it differently: the window adopts the highest scaling factor the window appears on (or is present in the configuration? tbh im not exactly sure), using a virtual coordinate space, then the system downscales that to the target area on screen. This preserves its rectangular appearance - assuming the monitors are physically arranged next to each other and the software config mirrors that physical arrangement… and the OS lets you place it there permanently (but you can still see it while dragging at least) - but has its own trade offs; it has a performance cost and can lose visual fidelity (e.g. blurriness), especially if the scale factors are not integer multiples of each other, but sometimes even if they are because the application is drawing to a virtual screen which is scaled by a generic algorithm with limited knowledge about each other.

                                                                                                                    In all these cases, there is just one scale factor per window. Doing it below that level is possible, but so horribly messy to implement, massive complexity for near zero benefit (again, how often do people actually straddle windows between monitors?), so nobody does it irl. The difference is the Mac/Wayland approach makes it easier to pretend this works… but it is still pretending. The illusion can be pretty convincing a lot of the time though, like I said in that whole other lobsters link with lina before, I can understand why people like this experience, even if it doesn’t matter to me.

                                                                                                                    The question isn’t if the abstraction leaks. It is when and where it leaks.

                                                                                                                    1. 4

                                                                                                                      This isn’t a term we invented in this thread, it is very common, just search the web for “mixed dpi” and you’ll find it, or click the links elsewhere in this thread and see how it is used.

                                                                                                                      I tried to before posting that since it’s one of those things that I see people talking past each other about everywhere, and virtually all the results I get are… both implementation-specific and kind of useless, because the functional boundary is clearly traced somewhere and different communities seem to disagree on where.

                                                                                                                      A single display server controlling multiple monitors, each with the same DPI is something that X11 has basically always supported, I’m not sure how that’s controversial. Even before Xinerama (or if your graphics card didn’t work with Xinerama, *sigh*) you could always just set up two X screens, one for each monitor. Same display server, two monitors, different DPIs – glorious, I was doing mixed DPI before everyone was debating it, and all thanks to shitty S3 drivers and not having money to buy proper monitors.

                                                                                                                      But whenever this is discussed somewhere, it seems that there’s a whole series of implicit “but also” attached to it, having to do with fractional scaling, automatic configuration, what counts as being DPI-aware and whatnot.

                                                                                                                      So it’s not just something we invented in this thread, it’s something everyone invents in their own thread. In Windows land, for example, where things like getting WM_DPICHANGED when the window moves between monitors are a thing, you can query DPI per window, and set DPI awareness mode per thread, I’m pretty sure you’ll find developers who will argue that the xrandr-based global DPI + scaling system we’ve all come to know and love isn’t mixed-DPI, either.

                                                                                                                      (Edit:) To be clear – I haven’t used that system in a while, but as I recall, the way it worked was it set a global DPI, and you relied on the display server for scaling to match the viewports’ sizes. There was no way for an application to “know” what DPI/scaling factor combination they were working with on each monitor so they could adjust their rendering for whatever monitor they were on (for their implementation-defined definition of on, “on”, sure – midpoint, immediate transition, complete transition, whatever). Toolkits tried to shoehorn that in, but that, too, was weird in all sorts of ways and assumed a particular setup, at least back in 2016-ish or however long ago it was.

                                                                                                                      1. 2

                                                                                                                        I’m pretty sure you’ll find developers who will argue that the xrandr-based global DPI + scaling system we’ve all come to know and love isn’t mixed-DPI, either.

                                                                                                                        Well, I wouldn’t call that not mixed dpi, but I would call it suboptimal. So it seems you’re familiar with the way it works on Windows: move between monitors or change the settings in display properties, and the system broadcasts the WM_DPICHANGED message to top level windows that opted into the new protocol. Other windows are bitmap scaled to the new factor, as needed.

                                                                                                                        Applications use the current DPI for their monitor in their drawing commands - some of this is done automatically by the system APIs, others you multiply out yourself. You need to use some care not to double multiply - do it yourself, then the system api does it again - so it is important to apply it at the right places.

                                                                                                                        Your window is also automatically resized, as needed, as it crosses scaling boundaries, by the system.

                                                                                                                        Qt/KDE tries to apply similar rules… but they half-assed it. Instead of sending a broadcast message (a PropertyChange notification would be about the same in the X world), they settled for an environment variable. (The reason I know where that is in the source is that I couldn’t believe that’s the best they did…. for debugging, sure, but shipping that to production? Had to verify but yes, that’s what they shipped :( the XWAYLAND extension has proposed a property - see here https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1197 - but they couldn’t agree on the details and dropped it, alas) There’s also no standard protocol for opting out of auto scaling, though I think XWAYLAND proposed one too, I can’t find that link in my browser history so I might be remembering wrong.

                                                                                                                        The KDE window manager, KWin, tries to apply scale as it crosses monitor boundaries right now, just like Windows does, but it seems to only ever scale up, not back down. I don’t know why it does this, could be a simple bug. Note that this is KWin’s doing, not Qt’s, since the same application in a different window manager does not attempt to resize the window at all, it just resizes the contents of the window.

                                                                                                                        But, even in the half-assed impl, it works; the UI content is automatically resized for each monitor’s individual scale factor. User informs of each monitor’s scale factor, either by position or by port name (again, half-assed, it should have used some other identifier which would work better with hotplugging, but does still work). If a monitor configuration changes, xrandr sends out a notification. The application queries the layout to determine which scaling factor applies to which bounding box in the coordinate space, then listens to ConfigureNotify messages from the window manager to inform them of where they are. Quick check of rectangle.contains(window_coordinate) tells it what scale it has, then this fires off the internal dpi changed event, if necessary. At this point, the codepaths between X and Windows merge as the toolkit applies the new factor. At this point, the actual scaling is done client side, and the compositor should not double scale it… but whether this works or not is hit and miss, since there’s no standardization! (The one nice thing about xwayland is they’re finally dismissing the utter nonsense that X cannot do this and dealing with reality - if the standard comes from wayland, i don’t really care, i just want something defined!)

                                                                                                                        A better way would be if the window manager sent the scale factor as a ClientMessage (similar to other EMWH messages) as it crosses the boundary, so the application need not look it up itself, which would also empower the user (through the window manager) to change the scale factor of individual windows on-demand - a kind of generic zoom functionality - and to opt some individual windows out of automatic bitmap scaling, even if the application itself isn’t written to support it. I haven’t actually implemented this in my window manager or toolkit; the thought actually just came to mind a few weeks ago in the other thread with lina, but I’d like to, I think it would be useful and a nice little innovation.

                                                                                                                        As a practical matter, even if the window manager protocol is better, applications would probably still want to fallback to doing it themselves if there is no window manager support; probably query _NET_SUPPORTED, and if absent, keep the DIY impl.

                                                                                                                        None of this is at all extraordinary. Once I implement mine, I might throw it across the freedesktop mailing list, maybe even the xwayland people, to try to get some more buy-in. Working for me is great - and I’ll take it alone - but would be even nicer if it worked for everybody.

                                                                                                              3. 7

                                                                                                                If a framework has to go way out of its way to implement some hack to make it work despite X’s shortcomings, but all the other frameworks don’t support it at all, then X simply doesn’t support this feature.

                                                                                                                Also, take a look at @lina ’s videos.

                                                                                                              4. 2

                                                                                                                I want to preface this by saying: 1) I run neither X nor Wayland 99% of the time; the kernel’s framebuffer is usually enough for my personal needs; 2) it’s been months since I tried Wayland on any hardware.

                                                                                                                That said, the one thing I seemed to notice in my “toe dip” into the Wayland world was pretty problematic to me. When I would start X to run some graphical config tool on a remote machine with no GPU and low-power CPU, it seemed to me that “ssh -X” put almost no load on the remote computer; however, attempting to run the same via waypipe put a lot of load on the remote machine, making it essentially unusable for the only things I ever needed a graphical interface for.

                                                                                                                If I’ve incorrectly understood the root cause here, I’d love to have someone explain it better. While I don’t use either one very often, it’s clear that X11 is not getting the development resources wayland is, and I’d like to be able to continue using my workflow decades into the future…

                                                                                                              5. 3

                                                                                                                The primary reason for Wayland was the security model, was it not? That I believe is truly unfixable. And if you’ve decided to prioritize that, then it makes sense to stop working on X.

                                                                                                                1. 10

                                                                                                                  No, the primary reason for Wayland is, in Kristian Høgsberg’s words, the goal of “every frame is perfect”.

                                                                                                                  And even if security was the priority… X also “fixed” it (shortly before Wayland was born; X Access Control Extension 2.0 released 10 Mar 2008, Wayland initial release 30 September 2008), but it never caught on (I’d argue because it doesn’t solve a real world problem, but the proximate cause is probably that the XACE is all code hooks, no user-friendly part. But that could be solved if anybody actually cared.)

                                                                                                                  Worth noting that Microsoft had man of the same questions with Windows: they wanted to add a compositor, add elevated process isolation, per-monitor fractional scaling, all the stuff people talk about, and they successfully did it with near zero compatibility breaks, despite Win32 and X sharing a lot of functionality. If it were fundamentally impossible for X, it would have been fundamentally impossible for Windows too.

                                                                                                                  1. 8

                                                                                                                    Security can’t be solved retroactively. You can’t plug all the holes of a swiss cheese.

                                                                                                                    “Solutions” were nesting X servers into one another and such, at that point I might as well run a whole other VM.

                                                                                                                    And good for windows, maybe if we would have the legacy X API available for use under Wayland, so that Wayland’s security benefits could apply, while also not losing decades of programs already written, we could also have that for linux.. maybe we could call it WaylandX! [1]

                                                                                                                    [1] Arguably this name would make more sense

                                                                                                                    1. 11

                                                                                                                      “Solutions” were nesting X servers into one another and such, at that point I might as well run a whole other VM

                                                                                                                      No they weren’t. X11 is a client-server protocol. The only things that a client (app) sees are messages sent from the server (X.org). The default policy was that apps were trusted or untrusted. If they were untrusted, they couldn’t connect to the server. If they were trusted, they could do anything.

                                                                                                                      The security problems came from the fact that ‘anything’ meant read any key press, read the mouse location, and inspect the contents of any other window. Some things needed these abilities. For example, a compositing window manager needed to be able to redirect window contents and composite it. A window manager that did focus-follows-louse needed to be able to read all mouse clicks to determine which window was the current active one and tell the X server to send keyboard events there. A screenshot or screen sharing app needed to be able to see the rendered window or sceen contents. Generally, these were exceptions.

                                                                                                                      The X Access Control Extensions provided a general mechanism (with pluggable policies) to allow you to restrict which messages any client could see. This closed the holes for things like key loggers, while allowing you to privilege things like on-screen keyboards and screenshot tools without needing them to be modified. In contrast, Wayland just punted on this entirely and made it the compositor’s problem to solve all of these things.

                                                                                                                    2. 6

                                                                                                                      Microsoft has literal orders of magnitude more people to throw at backcompat work.

                                                                                                                      1. 1

                                                                                                                        They also have literal orders of magnitude more users to create backcompat work in the first place, though.

                                                                                                            2. 7

                                                                                                              This is a very interesting article. I was originally taken aback by the initial “Not IO bound” comment, but pointing out that our current understanding of IO is actually conflated with the OS scheduling of threads was very on point. I hadn’t considered that before. I think my reaction still stands though, but in a pedantic way. Looking at:

                                                                                                              YJIT only speeds up Ruby by 2 or 3x

                                                                                                              and

                                                                                                              Like Discourse seeing a 15.8-19.6% speedup with JIT 3.2, Lobsters seeing a 26% speedup, Basecamp and Hey seeing a 26% speedup or Shopify’s Storefront Renderer app seeing a 17% speedup.

                                                                                                              I still feel that if a component sees a 2-3x perf increase and that translates to (1.15x-1.27x) improvement that it’s a significant component (and well worth optimizing), but it isn’t the dominant/limiting factor.

                                                                                                              Towards the end of the article Jean gets into some specific numbers regarding “truly IO bound” being 95% and “kinda” being 50%. I asked him on Mastodon about them. https://ruby.social/@byroot/113877928374636091. I guess in my head “more than 50%” would be what I would classify as “IO bound.” Though I’ve never put a number to it before.

                                                                                                              Someone tagged an old thread of mine in a private slack recently where I linked to this resource https://www.youtube.com/watch?app=desktop&v=r-TLSBdHe1A. With this comment

                                                                                                              Samuel shared this in Ruby core chat, and (spoiler) that’s actually one trick for debugging performance. They want to answer the question “Is this code worth optimizing” i.e. “if we made this code 2x faster…would anyone care.” Because if you make something 100x faster that only accounts for 1% of your total time, people aren’t really going to notice.

                                                                                                              So they can’t arbitrarily make code faster, but they CAN make code arbitrarily slower. So the program simulates a speedup of one code section by making all the other code slower to report if it’s worth optimizing it or not. An all around interesting talk. It’s very aprochable as well

                                                                                                              It would be interesting to have some kind of an IO backend where you could simulate a slowdown. I.e. perform the query on the database and time it, then sleep for some multiplier of that time before returning. It would (in theory) let you put a number to how much your app is affected by (database) IO. If you set a 2x multiplier and you see requests take 2x as long…then you’re approaching 100% IO bound.

                                                                                                              The linked GVL timing gem is new and interesting. Overall, thanks for writing all this down. Great stuff.

                                                                                                              1. 4

                                                                                                                I guess in my head “more than 50%” would be what I would classify as “IO bound.”

                                                                                                                Perhaps I should have written about that in my conclusion, but ultimately IO-bound isn’t perfect to describe all I want to say.

                                                                                                                In a way it’s a good term, because the implication of an IO-bound app, is that the only way to improve its performance is to somehow parallelize the IOs it does, or to speedup the underlying system it does IOs with.

                                                                                                                With that strict definition, I think YJIT proved that it isn’t the case, given it proved it was able to substantially speedup applications.

                                                                                                                A more relaxed way I tend to use the IO-bound definition, in the context of Ruby applications, is whether or not you can substantially increase your application throughput without degrading its latency by using a concurrent server (typically Puma, but also Falcon, etc).

                                                                                                                That’s where the 50% mark is important. 50% IO implies 50% CPU, and one Ruby process can only accommodate “100%” CPU usage. And given threads won’t perfectly line up when they need the CPU, you need substantially more than 50% IO if you wish to process concurrent requests in threads without impacting latency because of GVL contention.

                                                                                                                So beyond trying say whether apps are IO-bound or not, I mostly want to explain under which conditions it makes sense to use threads or fibers, and how many.

                                                                                                                1. 3

                                                                                                                  50% IO implies 50% CPU, and one Ruby process can only accommodate “100%” CPU usage. And given threads won’t perfectly line up when they need the CPU, you need substantially more than 50% IO if you wish to process concurrent requests in threads without impacting latency because of GVL contention.

                                                                                                                  Are you comparing multiple single threaded processes to a single multithreaded process here? Otherwise, I don’t understand your logic.

                                                                                                                  If a request takes 500msec of cpu time and 500msec of “genuine” io time, then 1 request per second is 100% utilization for a single threaded server, and queue lengths will grow arbitrarily. With two threads, the CPU is only at 50% utilization, and queue lengths should stay low. You’re correct that there will be some loss due to requests overlapping, and competing for CPU time, but it’ll be dominated by the much lower overall utilization.

                                                                                                                  In the above paragraph, genuine means “actually waiting on the network”, to exclude time spent on CPU handling the networking stack/deserializing data.

                                                                                                                  P.S. Not expressing an opinion on “IO bound” it’s not one of my favorite terms.

                                                                                                                  1. 2

                                                                                                                    Are you comparing multiple single threaded processes to a single multithreaded process here?

                                                                                                                    Yes. Usually when you go with single threaded servers like Unicorn (or even just Puma configured with a single thread per process), you still account for some IO wait time by spawning a bit more processes than you have CPU core. Often it’s 1.3 or 1.5 times as much.

                                                                                                                    1. 2

                                                                                                                      I don’t think there’s anything special about 50% CPU. The more CPU time the worse, but I don’t think anything changes significantly at that point, I think it’s going to be relatively linear relationship between 40% and 60%.

                                                                                                                      You’re going to experience some slowdown (relative to multiple single-threaded processes), as long as the arrival rate is high enough that there are ever overlapping requests to the multithreaded process. Even if CPU is only 1/4 of the time, your expectation is a 25% slowdown for two requests that arrive simultaneously.

                                                                                                                      I think, but am not sure, that the expected slowdown is “percent cpu time * expected queue length (for the cpu)”. If queue length is zero, then no slowdown.

                                                                                                                2. 1

                                                                                                                  You can achieve some of that with a combination of cgroups limiting the database performance and https://github.com/Shopify/toxiproxy for extra latency/slowdown on TCP connections.

                                                                                                                3. 11

                                                                                                                  If I had to summarize, you need to have a mental model of what your code is doing, and roughly how long that should take in order to do effective performance work. The article takes a very scenic route to get to the point, though.

                                                                                                                  1. 4

                                                                                                                    and left us hanging by not mentioning how much better the code was after their fix!

                                                                                                                  2. 10

                                                                                                                    an attacker to grab the location of any target within a 250 mile radius

                                                                                                                    Hm I was wondering whether this was

                                                                                                                    1. “you can find the EXACT location of a target, as long as they are located in a 250 mile radius” or
                                                                                                                    2. “you can find a rough location for people with a vulnerable app instlalled within an accuracy of +- 250 miles”

                                                                                                                    It is the latter apparently

                                                                                                                    I don’t want to be dismissive, since there is maybe some real world consequence I haven’t thought of, but it doesn’t seem that severe to me

                                                                                                                    It apparently relies on the granularity of Cloudflare data centers. I think there are probably many worse things going on with Cloudflare than this.

                                                                                                                    I doubt they will ever have a cache every 1 mile :)

                                                                                                                    1. 31

                                                                                                                      Locating someone to within 250 miles one time is often useless. However I can think of two relevant exceptions:

                                                                                                                      1. With a known person (think a stalker) you might be able to say “if they’re in New York, they could be anywhere in the city, but they’ve gone back to Iowa, I bet they’re visiting a family member.”

                                                                                                                      2. For deanonymizing a whistleblower or political target, tracking their movements over time can be very informative. Sure, “you’re within 200 miles of DC” is tend millions of people, but if you’re spotted in DC, then San Francisco, then Texas on a certain range of days, you start to get the kind of information that could narrow down a list of a few hundred or thousand people to a single one.

                                                                                                                      1. 10

                                                                                                                        I’m fairly sure that journalists for ‘respectable’, mainstream news orgs like the BBC have entered countries like Russia before illicitly, without permission from that country’s government. I’m sure they would very much not want to be geolocated even to an accuracy of ~250miles. Same for entering a country legally but moving to another part of it for which you don’t have permission (which I think is sometimes a thing in countries at war, or governed by repressive regimes.)

                                                                                                                        1. 3

                                                                                                                          If your location is that sensitive, use Orbot to tunnel your traffic through TOR.

                                                                                                                          1. 12

                                                                                                                            You’re not wrong, but (to speak to the usefulness of this bug) this kind of research is definitely helpful to demonstrate why you’re not wrong.

                                                                                                                        2. 5

                                                                                                                          i think it’s not so much that it’s a small area on its own, it’s that it’s a massive reduction from ??somewhere in the world??

                                                                                                                          for de-anonymization and especially doxxing, being able to intersect other information you know with a reduced area like this can make a big difference.

                                                                                                                          1. 3

                                                                                                                            Agreed, I’m not seeing the seriousness of this attack. All it tells you is “this user is probably closer to this Cloudflare data center than any other Cloudflare data center”. Wouldn’t this be defeated by even the most basic VPN?

                                                                                                                          2. 12

                                                                                                                            I posted this in another thread, but it’s perhaps more relevant here. There’s a paper from December titled Training on the Test Task Confounds Evaluation and Emergence. The author summarized in a Tweet:

                                                                                                                            “After some fine-tuning, today’s open models are no better than those of 2022/2023.”

                                                                                                                            1. 11

                                                                                                                              I don’t know why people are so optimistic about the future of AI. I remember watching an interview of Yann Lecun, who’s been called one of the father of convolutional neural networks. He was saying that neural nets were basically plateauing. In his opinion, the next leap will be achieved by using something else than neural nets.

                                                                                                                              My opinion is that this new leap is decades away, if at all. (I’m leaning towards never, because I personally believe that, by then, oil will be rationed and declining, renewable will not compensate, and nobody would have ramped up nuclear enough. No-one will have the resources to do AI. But this is off-topic for Lobste.rs)

                                                                                                                              Anyway, my point is: every 6 months OpenAI and Google release a new model where they announce a jump in benchmark metrics. And every 6 months I don’t see any difference… The thing still hallucinate like hell… It still sucks at maths and problem solving. But it’s great for synonyms, rephrasing and spell checking though,… Even so it was already great at it 2 years ago.

                                                                                                                              1. 5

                                                                                                                                I don’t know why people are so optimistic about the future of AI.

                                                                                                                                I assume because of the observed progress. I mena, okay, models today aren’t better at math than… ones from 3 years ago? That’s not exactly damning. I think there’s still plenty of room for LLM-like solutions to grow, we’re very early on in the technology and it seems incredibly premature to assume we’ve already tapped it. Google’s come out with the Titans models, that seems at least interesting enough to warrant exploration.

                                                                                                                                I feel like maybe your expectations are too high a bar. 6 months is a tiny amount of time, I don’t think it says much that models are only incrementally better over the course of a few short years.

                                                                                                                                1. 5

                                                                                                                                  My opinion is that this new leap is decades away, if at all.

                                                                                                                                  The good news is the fusion reactors to power it will be ready by then ;)

                                                                                                                                  1. 4

                                                                                                                                    And every 6 months I don’t see any difference…

                                                                                                                                    How hard are you looking?

                                                                                                                                    I follow the space pretty closely and the quality differences seem very clear to me.

                                                                                                                                    These new inference-scaling models - o1, R1, QwQ, Gemini 2.0 Thinking - really do appear to be able to solve problems that the older models couldn’t.

                                                                                                                                    The NYT Connections puzzle is solved now. That’s definitely a new capability compared to the previous generation.

                                                                                                                                    1. 10

                                                                                                                                      The NYT Connections puzzle is solved now

                                                                                                                                      I enjoy that puzzle, but it’s basically a word search. It’s the kind of thing a very simple rule-based system could solve. It doesn’t require any context or understanding. The search space is bounded and small. A non-LLM approach running on a cheap GPU should be able to do it in well under a millisecond.

                                                                                                                                      1. 2

                                                                                                                                        Sure, but one weakness of LLMs has been that they couldn’t solve certain simple structured problems that were trivially solvable via brute force. Removing (some) of those limitations is interesting.

                                                                                                                                        1. 4

                                                                                                                                          I’d expect older LLMs to fail at this kind of problem because of how they do tokenisation. Word searches are hard for a model that doesn’t know that words are made of letters. That’s the same reason it can’t count the number of Rs in strawberry.

                                                                                                                                          1. 1

                                                                                                                                            I think you might be thinking of Strands instead of Connections? Connections is definitely not what I’d think of as a word search, nor the kind of thing that is susceptible to a brute force approach.

                                                                                                                                            1. 4

                                                                                                                                              Yes, you’re right, I got them mixed up. Connections is the one where there are 16 words and you have to select four groups of four that are related in some way. That’s a bit harder to win with a rule-based system but it’ something I’d expect an LLM to do well, so I’m surprised the early ones didn’t. A lot of the answers are collocations (which four have the same high-probability predecessor or successor?). Others are just things that should be nearby in any similarity space for more classic NLP techniques, which should be captured in an LLM’s latent space. They’re often hard for non-US humans because they contain a lot of cultural bias, but that means that they’re probably easier to for something trained on US newspapers. They also repeat approaches a lot, so if an LLM is trained on a few hundred of them then it can probably just learn the patterns that the creators use.

                                                                                                                                      2. 9

                                                                                                                                        The NYT Connections puzzle is solved now. That’s definitely a new capability compared to the previous generation.

                                                                                                                                        This sort of spot check is no substitute for good research. It only raises more questions— is it solved because the models have improved or because they updated the training data? Have the models also “lost” other capabilities? If so, how do we compare them?

                                                                                                                                      3. 4

                                                                                                                                        (I’m leaning towards never, because I personally believe that, by then, oil will be rationed and declining, renewable will not compensate, and nobody would have ramped up nuclear enough. No-one will have the resources to do AI. But this is off-topic for Lobste.rs)

                                                                                                                                        Why would you expect renewables to not compensate? Solar buildout has been smashing every projection and some experts estimate we’re past peak emissions per year.

                                                                                                                                        1. 3

                                                                                                                                          If experts are correct, and 2020 was peak all-oil[1], renewable will have to fulfill a exponentially growing energy demand, and compensate for a linearly declining fossil-energy supply.

                                                                                                                                          Ramping up renewable energy production has, for now and a foreseeable future, been heavily relying on fossil fuel. Mining silicon for solar panels, and iron ore for windmills is relying on off-the-grid heavy machinery using petroleum. Raw ore is transported across the globe to refineries using trucks, and cargo ships fed with heavy oil. Natural gas is used to melt steel when manufacturing windmills, and coal is the main source of energy to manufacture solar panels in Asia at our current pace.

                                                                                                                                          We will have to at least double the rate at which we provision renewable energy production, while we’re electrifying most of our infrastructure (car, habitat heating, …), while the source of energy used to increase our renewable energy production is actually going down.

                                                                                                                                          I don’t see any way this will scale at all..

                                                                                                                                          We’re entering uncharted waters, and I haven’t mentioned that most mining is at maximum capacity (copper, zinc which will be the base of of electrification. As well as phosphate which is needed to produce fertilizer to make biofuels)

                                                                                                                                          [1] This is as opposed to 2008, which was called “peak oil” at the time, but was actually “peak conventional-oil”. Fracking and oil sands allowed the world to keep growing its oil production after 2008.

                                                                                                                                          1. 3

                                                                                                                                            If experts are correct, and 2020 was peak all-oil[1], renewable will have to fulfill a exponentially growing energy demand, and compensate for a linearly declining fossil-energy supply.

                                                                                                                                            There’s good news and bad news the “peak all-oil” claim. The good news is that this was claimed peak oil demand. While supply was healthy, the rapid transition to electric vehicles and renewable power meant that supposedly we’d never need as much oil again as we did in 2020.

                                                                                                                                            The bad news is that might have been a COVID thing, and experts are now predicting peak demand to be somewhere in 2030-2035.

                                                                                                                                            (Note that oil demand can go up and emissions can go down: the worst contributor to CO2 is coal.)

                                                                                                                                            1. 2

                                                                                                                                              a exponentially growing energy demand

                                                                                                                                              Are you using “exponentially” rhetorically or literally here?

                                                                                                                                              The same people pushing GenAI tend to fret about a coming “population collapse”, so energy demand should at most scale linearly.

                                                                                                                                              Energy supply won’t stop suddenly. Instead prices will rise, leading to incentives to increase efficiency and when it comes to crunch time, rationing.

                                                                                                                                              1. 2

                                                                                                                                                The same people pushing GenAI tend to fret about a coming “population collapse”, so energy demand should at most scale linearly.

                                                                                                                                                You have billions of people who want their first smart-phone and their first TV. And I’m not blaming them, I would also want the same in their situation. A large part of them live in hot climates and don’t even have a fridge, so they’ll also, rightfully so, want one before the phone and the TV. Also, with climate change, many mostly developed and fully developed countries will also need wide spread air conditioning. This won’t be linear.

                                                                                                                                                I don’t believe in the population collapse. I can’t find it, but Hans Rosling had a great talk about that. He claims that population growth will be prevented by educating young girls in underdeveloped countries, and making them independent and part of the workforce. I agree with his analysis.

                                                                                                                                                Energy supply won’t stop suddenly. Instead prices will rise, leading to incentives to increase efficiency and when it comes to crunch time, rationing.

                                                                                                                                                I never said that energy supply will stop. I was trying to say it will be heavily constrained. I believe that efficiency is a fallacy due to the rebound effect. A good example for this is the 5G. It is much more energy efficient, per byte, than the 3G. But, now, instead of reading Facebook posts on their phone, people are streaming Youtube videos. So the energy consumption still doubled, even though there were efficiency gains.

                                                                                                                                                I can go on and on, about energy price elasticity and plasticity, which, IMHO, will lead to the lowering of living standards of the middle and lower classes from the global north. But I’ll stop there, we’re already wayyy off topic :) .

                                                                                                                                          2. 3

                                                                                                                                            (I’m leaning towards never, because I personally believe that, by then, oil will be rationed and declining, renewable will not compensate, and nobody would have ramped up nuclear enough. No-one will have the resources to do AI. But this is off-topic for Lobste.rs)

                                                                                                                                            This is a very rational take, which I also share in sentiment. I mentioned in another comment months ago that human progress was always linked with massive energy surplus. It seems to be a fixed rule. I’m sensing a new “dotcom bubble” brewing there, just related to AI.

                                                                                                                                        2. 86

                                                                                                                                          For anyone who just sees the headline or comments, the issue is not that OpenAI funded the benchmark. In a perfect world, they wouldn’t even fund it, but sources of funding for this sort of thing might be scarce. If they had funded the project and kept themselves at arms length, that would be ok.

                                                                                                                                          The real problem is that they funded it, signed an agreement to keep the funding secret, got access to many (or even all–the details are unclear[0]) of the questions and answers, and made a “verbal agreement” not to train on them.

                                                                                                                                          Another reminder that we’re dealing with someone who is…“not consistently candid.”

                                                                                                                                          [0] A holdout set has been mentioned, but on Reddit, an EpochAI employee implied the holdout set does not yet exist, it is merely in process of being created.

                                                                                                                                          1. 10

                                                                                                                                            This is a project that a friend of mine, Will (@savage), has been working on. The basic idea is to obfuscate html adversarially and decode it with css, so that bots can’t see the content of the page. I invited him to Lobsters a little while ago, so if anyone has any technical questions I’ll let him know so he can answer them. I think this post also links a demo if anyone’s curious to see how it works.

                                                                                                                                            I posted this because it’s also interesting to compare with Anubis (also being discussed). Both try to use browser features to separate the way people vs bots use browsers, Anubis primarily through compute with JS, and Versara primarily through display with css. Someone also mentioned hiding a “ban-me link” in the footer or on ‘robots.txt’. I wonder if over time it will be impossible to find a website on the clear web that isn’t obfuscated or otherwise protected from scrapers?

                                                                                                                                            1. 15

                                                                                                                                              Does that interfere with screen readers?

                                                                                                                                              1. 7

                                                                                                                                                I tried reading the page with Voiceover on MacOS and it didn’t read any of the garbage.

                                                                                                                                                The way the markup works is that each word in the article is wrapped in a span with a UUID id. The stylesheet applies display:none to the garbage span IDs. Screen readers will not read semantically-hidden elements like this, but that also means that it would be relatively trival for headless client to apply CSS rules to the DOM and query for element visibility. (Like even without a headless browser, it’s be easy to special-case this with a Python script using BeautifulSoup or similar.) The article’s claim that OCR would be required to read the article is wrong.

                                                                                                                                                However the page does require JS to work. You get an empty, white page without JS enabled. (Screen reader also reports the page is empty without JS enabled.)

                                                                                                                                                If they were to do less semantic obfuscation (e.g. making garbage text very small and same colour as the background) then screen readers would suffer.

                                                                                                                                                1. 2

                                                                                                                                                  Re-reading the article, I agree with your criticism of our claim about OCR. What we were trying to convey is that OCR is the only way for an automated AI crawler to be completely sure that it’s getting the correct content. The point of Versara is not to stop somebody from special-casing your site. If somebody has the desire to do that, they’ll eventually be successful no matter what. Our goal is to protect content from being scraped by OpenAI/Perplexity scale crawlers, which aren’t special casing every website.

                                                                                                                                                  We have been experimenting with the strategy you mention of rendering garbage text in non-visible way, but, as you say, it comes with accessibility issues. We’ll probably leave that decision to users.

                                                                                                                                                  1. 2

                                                                                                                                                    That’s a fair goal. @sknebel pointed out below about how VoiceOver reads individual span elements in a page as lists of words rather than flowing text. This is the case with your example page. There’s a stackoverflow question about fixing this which suggests using role="text" on the el containing the spans, but this doesn’t help when I tried it in FF/Chrome.

                                                                                                                                                    Could you get most of the benefit of your approach by putting junk content on either end of the real content, rather than interspersing it? That way it’d be less likely to cause accessibility problems.

                                                                                                                                                2. 5

                                                                                                                                                  My intuition is that either it does, or it (at best) only temporarily blocks scrapers. If a tool can read the content without a problem, then AI labs could use a similar tool.

                                                                                                                                                  1. 3

                                                                                                                                                    I remember text being split into small <span>s causing some trouble with Apple VoiceOver, others will at least read continuous visible spans as flowing text, I’m not 100% if that generalizes to invisible gaps. Certainly something that should be tested before promoting it as solution.

                                                                                                                                                    1. 3

                                                                                                                                                      You’re right, VoiceOver reads the page’s paragraphs as individual words, you have to manually advance each word, so it’s not a good experience to read.

                                                                                                                                                  2. 7

                                                                                                                                                    The example page remains white without JS, so “JS and CSS” seems more accurate? I’d also expect a scraper that goes to the trouble of running a headless browser to take visibility into account… OTOH a lot of this stuff seems to be built so shoddily that I might be overestimating them.