There is a regression in clang 19 (in fact in LLVM) that made the Python interpreter slower than with clang 18 by about 10% on some systems, and the 15% performance improvement reported for the new tail-call interpreter comes mostly from working around this regression. Once the LLVM regression is fixed, the non-tail-call Python interpreter speed gets back to normal, and the tail-call version only improves performance by 5%. (These numbers are ballpark, it depends on the benchmark and architecture and…)
The clang19 bug was found and reported in August 2024 by Mikulas Patocka, who develops a hobby programming language with an interpreter. In the issue thread, several other hobbyist language implementors chime in, well before the impact on Python (and basically all other interpreted languages) is identified. This is another example of how hobbyist language implementors help us move forward.
The LLVM regression was caused by another bugfix, that solves a quadratic blowup in compile time for some programs using computed gotos. This is an example that compiler development is tricky.
The LLVM regression was fixed by DianQK, who contributes to both LLVM and rustc, by reading the GCC source code, who already identified this potential regression and explicitly points out that one has to be careful with the computed gotos as used in threaded bytecode interpreters. This is an example that GCC/LLVM cross-fertilization remains relevant to compiler development and the broader open source ecosystem, even to this day.
I found the Privacy Notice rather clear. It lists several distinct uses of data, and clarifies which usage uses which sorts of data. (In particular, the data I’m most worried about is “Content”, the stuff I type into Firefox; but most usages discussed in the Privacy Notice restrict their data and do not include my Content.)
They also seem to be careful about which data remains on the device and which is shared with their servers, for example being explicit about some (opt-in) AI features being implemented with a small language model that runs locally is a nice touch.
This privacy notice gives me the impression that the Firefox people still mean well – these terms of use do not look like a secret plot to siphon all my content away, store them in their servers, for vague model-training purposes that they haven’t completely thought about yet.
This means the data stays on your device and is not sent to Mozilla’s servers unless it says otherwise in this Notice.
So unless I have a diff bot watching this notice for changes, at any moment Mozilla can gain additional privileges relative to my data, and I’ll be none the wiser. It’s a rather large escape hatch for them.
Typical companies offering online services write to their users when terms of service change, and I would expect Mozilla to do the same if it changes its privacy notice. You may be assuming that Firefox is acting in bad faith, or is likely to act in bad faith in the future. I’m personally still willing to believe that Mozilla would not outright lie or try to trick us with this – the fact that this Privacy Notice is generally reasonable and displays a care for details that I think are important (such as whether the data is processed locally or on their serves) tends to support my assessment that they are trying to do things right.
It is of course entirely possible that Mozilla becomes adversarial at some later point in the future (I have disagreed with some of their decisions in the past), but I have the impression that the amount of risk that I am tolerating is okay, by still using their product until I learn in the news that they sneakily changed their Privacy Notice to do something bad .
I’m not of the opinion that Mozilla is acting in bad faith. My concern is that their terms and privacy notice now give them considerable latitude to do so at any time in the future. In other words, I have no faith in institutions remaining “good” or neutral. Google once believed in “don’t be evil”, and Mozilla once believed in user autonomy and privacy. I believe in the fact that buyouts happen, leadership changes, and values shift over time. And these changes are perhaps an indication of the latter.
In short, my feeling is that these terms open the door to malicious action made legal by their breadth and malleability. And I think where you and I differ is our estimation of that risk.
I went back to the Privacy Notice and it says, in the Changes section:
We may need to change this policy and our notices, in which case the updates will be posted online and we will update the effective date of this notice. If the changes are substantive, we will also announce the update more prominently through Mozilla’s usual channels for such announcements, such as blog posts and forums.
I can’t recall a notification of terms/policy change that has enumerated actual changes. I just had a look through my mail archive at messages matching “updated our” (there were a lot of matches).
None of them list what the material changes are, and simply link to the new policy document. The ones that make a feeble attempt at enumerating the changes make vague statements like “made more readable”, “explanations of data we collect”. Which, to be fair to your argument, is what you want to know – that a change has been made.
If these notices came with a diff, I’d be satisfied.
This is also more robust in case the user comes back and edit the title to something else, the link will still work. Also, it removes the requirement that slugs be injective – there is no issue if two different posts have the same slugs.
The presentation is impressive, it looks like personal project that has been pushed very far towards being a usable language – and that is a lot of work.
This said, I must say that I feel skeptical that the basic idea works. It is the same basic ideas that most people have when they first think of linear logic for programming languages: if we preserve complete linearity we never need a GC. Well, that is true, but the cost of this is to add lots of copies, everywhere, and that is going to be much slower than not copying, and then the natural next step is to optimize copying away by allowing sharing of values, and for this you introduce reference counting or a tracing GC.
Now the authors of this language obviously made the bet that the naive idea, where we do copy all over the place, was worth pushing to the max to get a usable programming language. Have they found a good reason why this is not an issue in their setting, or have they just been lucky so far and not encountered this deal-breaking issue yet?
I would naively expect the following to hold:
Most semi-complex program in their language have unpredictable performance because of the copies happening in various places.
Dealing with data structures that contain sharing, for example graphs, is going to be pain because the language data model does not support it. (No aliasing!) The replacement is to use integers (which are duplicable) as fake pointers, for example store all graph nodes in a big array and use indices as indirection. This makes programming more difficult, and in some cases ends up implementing manual garbage-collection strategies or accepting a large memory overhead compared to approaches that use some for of garbage collection.
The code examples in the repository are all small snippets with tree-shaped data structures without sharing, so they don’t help much answering these questions. How would the authors implement, say, a union-find data structure, as is often used to store type nodes in type-inference engines (to implement efficient unification)?
My feedback for the authors would be to have some documentation on their website of what they think about this category of issues. For now I don’t know if they are not aware of it, or they know of a solution (or a reason that this isn’t really a problem for them).
I think it is talking about this, except that instead of saying “we think this won’t be an issue in practice”, the page is more like “here is a bunch of more complex stuff you can use when this issue occurs”. (The more complex stuff uses words I had never heard, “noema” (which is more or less defined) and “hyle” (for which I didn’t find any definition), but it is an implementation of local borrowing as presented, for example, in Wadler’s 1992 “Linear types can change the world” paper, the let! construct.)
Personally I worry more about the cost (and not just energy cost) of training models than the cost of inference, especially for models where inference can be run from a laptop – a single inference is not going to consume much energy, because laptops are reasonably low-power tools. Fine-tuning an existing model looks to be feasible in reasonable amount of compute (and thus energy), but training a new foundational model from scratch is counted in TWh range these days, has been increasing rapidly in the last few years, as is something that all large tech companies (and all technologically advanced states, all militaries, etc.) are going to want to learn to do to remain technologically independent and avoid poisoning issues that nobody knows how to detect.
Said otherwise: using an LLM is not that expensive energy-wise. But collectively getting used to depending on LLMs more an more gives money and power to entities that can train new ones, which incentivizes a lot more training, and that is expensive.
I’m not sure how to estimate this total training energy cost – as previously discussed, the easiest estimate I could think of is to look at data-center GPU sales to estimate the global (training + inference) power usage – and we are looking at millions of new units sold each year, which translate into several GW (several nuclear plants worth) of instantaneous consumption – and the current trends is that this is going up, fast.
If you read through the cited article by Husom et al. you’ll see that a laptop actually uses more energy than a better specced workstation or server. I would guess the main reason is that it takes longer to process and thus stays at an increased power draw for longer.
Also consider the fact that training a model is a one-off energy cost, that gets less severe the more the model is used. For inference, the opposite is true.
About laptops, I think we agree. I pointed out that if we can reasonably run inference from a laptop, then that means a single inference is reasonably cheap – even cheaper on specialized hardware.
Training may be a one-off energy cost for a given model, but that does not mean that it is small. From the post that we are commenting on, it looks like training one model is of the order of 10^12 times more energy-consuming that running one user query (ChatGPT: 1.3TWh of training, 2.7Wh per query). Do most models actually get a thousand billion queries over their lifetime, before they are dropped to use a better model? ChatGPT appears to serve one billion queries per day these days; at this rate they would need to run for 500 days on the same model to break even. From a distance it appears that they are not breaking even, they change the foundational models more often than every 500 days – but maybe they avoid retraining them from scratch each time? And this is going to be much worse the more entities start training their own foundational model (there are already more companies doing this than just OpenAI, including those we don’t know about), as they will each get a fraction of the userbase.
Okay, then it seems like we mostly agree. All in all, it is very difficult when the only information one has are estimations like I made in this post, and not any actual information. “Open” AI really is a misnomer.
Doesn’t this analysis show that training cost is still not that important? Even if they only use a model for 100 days, that means that training cost is 5x inference cost, but inference cost is tiny, and 6x tiny is still tiny. If, say, drinking an extra cup of coffee costs a comparable amount of energy as typical LLM use including training, that seems entirely reasonable.
I suspect that future growth is fairly well limited by the payoff it is bringing us. Investors will bet a couple of billion dollars on something that may not pay off, but it seems unlikely that they’d invest several trillion without solid evidence that it will pay off in terms of value generated.
I suspect that future growth is fairly well limited by the payoff it is bringing us.
The problem is that we don’t understand yet what the payoff is, and the natural response in this case is to experiment like crazy. The LLM boom came from the fact that we got substantial quality improvements by making models much larger and training them a lot more. The next thing to try of course is to give them a lot more data again (scraped all the textual web? maybe we could train them on all of Netflix this time!), and to train them a lot more again. There is a very real risk that, three years from now, half the R&D budget in the world is being spent powering GPUs to do matrix multiplications. (It would also be bad for the environment to spend the same amount roasting and grinding coffee beans, but that does not make me feel particularly better about it.) It is of course possible that this risk-taking pays off and we are ushered in a new world of… (what? robot-assisted comfortable living for all?), but it is also possible that current techniques reach a plateau and we only notice after burning way too much fuel on them.
The payoff is also not an objective metric, it depends in a large part on how people view these things. As long as people are convinced that LLMs are godlike powers of infinite truth, they are going to pay for them, and the payoff will be estimated as fairly large. If we try to have a more balanced evaluation of the benefits and the costs of those tools, we might take the rozy glasses off (low confidence here, some people still believe in the blockchain), and the payoff – and incentive to spend energy on it – will be smaller.
Sometimes it looks like the only thing that can force people to reduce pollution is not data, information or basic common sense, but just energy becoming fucking expensive. I suspect that this would definitely make people behave more rationally about LLMs.
Blockchain was never meaningfully useful; it was (and is) a speculative bubble. LLMs already are useful, and millions of people use them. Spending a lot of R&D money on it is entirely justified, and this R&D is mostly privately funded anyway. The glasses aren’t at all too rosy, in my view, rather the opposite. Sure, there are AI hypers that have highly inflated expectations, but most people underestimate what LLMs can do.
Even the basic ability to have a conversation with your computer would have been considered absolute sci-fi 5 years ago. These things type out hundreds of lines of code in a few seconds, often without making any mistakes, without ever even running that code. When I gave o3-mini-high some math puzzle I came up with, it solved it in a minute. Out of colleagues and students, only one was able to solve it. Maybe it existed on the internet somewhere, and maybe it was just a glorified fuzzy lookup table, but fuzzy lookup tables are very useful too (e.g., Google).
Yesterday I “wrote” 98% of the code of a UI for some decision procedure that I was working on by literally talking to my computer out loud for an hour. It compiled the Rust decision procedure to WASM, set up a JS api for it, created the interface, wrote a parser, generated examples based on the Rust unit tests, etc. I wrote maybe 20 lines total myself. Heck, I haven’t even read 80% of the code it wrote.
This is all for some infinitesimally small fraction of the energy use of various frivolous things people regularly do. This seems to me like getting a functional magic wand from the Harry Potter world and then complaining that a branch of a tree had to be cut off to get the wood. Yes, sure, we don’t want to cut down the entire Amazon rainforest to make magic wands, nor may the magic wand solve all of our problems. But the more pertinent fact here is that it does magic, not that a branch was cut off from a tree.
I don’t disagree. I should clarify that my worry is not, per se, that people invest too much effort in LLMs – I think of them as the alchemy of the modern times, a pre-scientific realm of misunderstandings and wonders, and I find it perfectly understand that people would be fascinated and decide to study them or experiment with them. My worry is that this interest translates into very large amount of energy consumption (again: half the R&D money could be spent powering machine learning hardware), as long as the dominant way to play with neural networks is to try make them bigger. Most sectors of human activity have been trying to reduce energy consumption in the face of expanded usage, but the current AI boon is a sector that has been increasing its energy needs sharply, and shows no signs of stopping. The availability of clean-ish energy has not scaled up for that demand.
I do also think that there are unreasonable expectations about AI in general – people turning to neural networks for tasks that could reasonably be done without them, especially in industry. For instance a friend working in a garbage-collecting plants was telling me about futile attempts by the company to train neural networks to classify garbage from a video feed, which were vastly underperforming simple heuristics that were already in place.
The next thing to try of course is to give them a lot more data again (scraped all the textual web?
I think this isn’t the case.
My layman understanding of the consensus is that people agree that there are limits for scaling based on input data. We maybe aren’t there yet (new data sets such as video are probably still ripe) but I don’t think we’re far, personally.
DeepSeek’s whole thing is that it’s a much smaller, faster model, with competitive performance to vastly larger ones - all because they changed the training process.
We’ve already observed optimization on model sizes, training costs, etc, which is unsurprising since companies are incentivized to do this (faster training == faster iteration == money).
energy becoming fucking expensive.
This is tricky from a policy perspective (raising the cost of all energy consumption seems sort of obviously bad, but we’ve seen things like “off-grid hours” work, etc) but regardless energy costs are, of course, already passed on to consumers. Perhaps VC money is making the difference right now, but ultimately consumers eat the costs of electricity with a margin, otherwise companies wouldn’t be able to stay up and running.
It is one-off, but I think it’s a bit misleading - the commenter above mentions this exactly: more and more entities are training more and more (and more powerful) models. So while it is a one-off cost, it’s also constant.
using an LLM is not that expensive energy-wise. But collectively getting used to depending on LLMs more an more gives money and power to entities that can train new ones, which incentivizes a lot more training, and that is expensive.
You’re not wrong but that’s how pretty much everything works. Isn’t it disingenuous to focus on LLMs in this regard?
I understand that it’s easier as it’s the new thing but we should do environmental impact analysis on everything else, too. How much do we spend making video games? How much more do we spend playing them? How about the whole of TV? Movies? Books? Fast fashion? Cars? Housing? Food? I haven’t seen much of the backlash on environmental impact grounds to anything else along with LLM.
And that makes a valid argument in isolation feel like another attack blown out of proportion. Sure, training data set is dubiously sourced. Sure, there are all sorts of moral questions about the use of LLM output. Let’s tack on environmental issues as well, why not? And let’s not bring it up for anything else to make it look like LLMs alone will drain all our water and use all the energy. Rings a bit hollow.
You’re not wrong but that’s how pretty much everything works. Isn’t it disingenuous to focus on LLMs in this regard?
The specificity of LLMs is that they are a new technology that is exploding in popularity and is highly energy-consuming. This is not unique but it is still fairly rare. People where also very concerned about the energy cost of blockchains (and still are), which were in a similar situation, at least when proof-of-work was the only scheme widely deployed. Of course computing-focused websites like Lobsters are going to discuss the energy impact (and environmental impact) of computer-adjacent technologies much more than non-computer-related things like cars and housing – this is the area that we have some expertise in and some agency over. Digital technology is also one of the main drivers of increased energy consumption and pollution these days – right now it is comparable to transportation, but growing fast every year, while the impact of most other areas of human activity are increasing less fast, stable or even decreasing. (This could of course become much worse with a few bad policy decisions.)
How about the whole of TV? Movies? Books? Fast fashion? Cars? Housing? Food? I haven’t seen much of the backlash on environmental impact grounds to anything else along with LLM.
This reads like whataboutism. I don’t know what your cultural and political context is, but around me there has been a lot of discussions of the impact of cars (why are people pushing for electric cars?), housing (entire laws have been passed, for example to force landlord to improve thermal insulation), food (why do you think is the proportion of flexitarian / vegetarian / vegan people growing fast in western countries?), planes (why are people talking about reducing flights to limit their carbon impact?), etc. You may live in a different environment where these discussions have been occulted, or have personally decided not to care about this and not get informed on the matter.
And that makes a valid argument in isolation feel like another attack blown out of proportion.
More facts, less feelings, please. If you can demonstrate that the energy/pollution costs of LLM is neglectible compared to other reasonable causes of concerns, that would be a very valid way to say that this is out of proportion. We could have more, better data about this, but there is already enough available to make ballpark estimates and make an informed decision. (For example, see this december-2024 article reporting that US data-centers energy consumption tripled over the last decade, consumed 4.4% of total US electricity in 2023, and that this proportion is expected to raise to 7%-12% by 2028.)
If you can demonstrate that the energy/pollution costs of LLM is neglectible compared to other reasonable causes of concerns, that would be a very valid way to say that this is out of proportion.
Well, that’s exactly the point I’m trying to make. Environmental impact concerns around LLMs never given in any context relating to other things. Sure, data is out there. I can bust out a spreadsheet and put in all the data and crunch the numbers and find the cost of my LLM query in burgers, ICE miles, books, TV minutes, or—like in the OP—shower minutes.
The issue in the discourse quality I’m trying to point out is that I have to go out of my way and do my own research. A valid critique of LLMs is rarely if ever put into an understandable context. And this is why I think it largely has no impact. What you call whataboutism is better reframed as putting things in context. Is 1.5 l of water for may LLM query a lot? An average person doesn’t know what to do with that number. They have nothing to relate it to. Maybe an hour of TV in the background worse? Maybe an hour of playing a game on their maxed out gaming PC is better? How would they know?
The false comparison with search engines in this bothers me. If you look at the four principles of fair use doctrine in US law, search engines pass all of them easily and uncontroversially, whereas it requires sophistry and hair splitting to make an LLM pass them.
Seems plausible, but I would like to learn more. Do have a favorite article that elaborates on your point? (As one counter-example, the Association for Research Libraries disagrees in a January 23, 2024 article: “Training Generative AI Models on Copyrighted Works Is Fair Use”: https://www.arl.org/blog/training-generative-ai-models-on-copyrighted-works-is-fair-use/
Intuitively, using LLMs for research seems to fit the definition of a public good. But using an LLM to generate some derivative work without crediting (much less compensating) the original author seems wrong. My point: this is non-obvious to me at present.
I have no vested interest one way or the other. I am not “on any side” here. I’m just seeking understanding of the issues.
A search engine is really straightforwardly fair use. It doesn’t reproduce the core of the work, it doesn’t commercially compete…very straightforward. LLMs are much less clear. If an LLM can be used to reproduce works in the style of an artist then commercial competition and the essential of a work come into play. If an academic uses it as a means of musical analysis, then it’s noncommercial and noncompetitive and fair use is plausible. If Spotify generates music from an LLM to be streamed instead of the music it was trained on then it gets into far less clear territory, and raises questions about copyright terms. Someone using an LLM to write business emails is on clearer ground because those emails will probably get written either way and there probably isn’t a large corpus of internal business emails in training sets. A programmer feels on shakier ground. If there is a lot of GPL code in the training set then it’s very unclear that producing commercial works using that LLM is possible without the commercial work being GPL’d as well.
I tried to find justifications in this article because I find their conclusion rather surprising. (It has been shown that AI models can regurgitate parts of their training dataset essentially unchanged, which is obviously problematic from a copyright perspective when said training data is copyrighted material.) They point to a document for more details, Library Copyright Alliance Principles for Copyright and Artificial Intelligence (1-page PDF), and I find the arguments interesting and surprising. Let me quote them, with comments.
Based on well-established precedent, the ingestion of copyrighted works to create large
language models or other AI training databases generally is a fair use.
Because tens—if not hundreds—of millions of works are ingested to create an LLM,
the contribution of any one work to the operation of the LLM is de minimis;
accordingly, remuneration for ingestion is neither appropriate nor feasible.
I would find this reasoning convincing if the LLM would compress the ingested documents in a form that does not allow to retrieve them in recognizable form. But LLMs have billions of parameters, so they have enough internal data to store the copyrighted works in a barely-compressed way. To make a weird comparison, libgen ingests millions of books, so one could claim that the contribution of any one work is de minimis and therefore fair use?
Further, copyright owners can employ technical means such as the Robots Exclusion
Protocol to prevent their works from being used to train AIs.
Do they? Do people seriously believe that AI crawlers respect data-mining best practices to not collect unwanted data? How does this argument apply to the millions of copyrighted material that were crawled before the authors realized that LLM exist and could consider excluding their work from training?
If an AI produces a work that is substantially similar in protected expression to a work
that was ingested by the AI, that new work infringes the copyright in the original work
This makes the argument that it’s okay for trainers to use copyrighted work, but it is the responsibility of the user of the LLM to make sure that they are not infringing copyright when they reuse its output. This is an interesting argument, but this proposal seems basically unapplicable in practice: how could the user of the LLM, that observes an output without any form of citation or attribution, possibly know whether this output infringes on someone’s copyright?
Applying traditional principles of human authorship, a work that is generated by an AI
might be copyrightable if the prompts provided by the user sufficiently controlled the AI
such that the resulting work as a whole constituted an original work of human authorship.
People like the OP often assume that using more energy automatically means more pollution and environmental damage, but it’s because they are thinking about coal plants.
What really matters is how we generate that energy, not how much we use. Take countries that rely almost entirely on hydroelectric power, where turbines harness the natural force of falling water, or those powered by nuclear plants. In these cases, if we don’t use the available energy, it simply goes to waste. The energy will be generated whether we use it or not.
So when we’re discussing the ethics of training LLM, their energy consumption it has nothing to do with being ethical or unethical. The nature of the energy sources are a different topic of conversation, regardless on how much energy they use.
What really matters is how we generate that energy, not how much we use. Take countries that rely almost entirely on hydroelectric power, where turbines harness the natural force of falling water, or those powered by nuclear plants. In these cases, if we don’t use the available energy, it simply goes to waste. The energy will be generated whether we use it or not.
Okay, but (say) the OpenAI data centers are located in the US, and we know the energy mix in the US. For example this webpage of the U.S. Energy Information Administration currently states:
In 2023, utility-scale electric power plants that burned coal, natural gas, or petroleum were the source of about 60% of total annual U.S. utility-scale electricity net generation.
And even climate friendly alternatives like hydro require damming up rivers and destroying rivers, windmills require paving new roads into the wilderness and introduce constant noise and visual pollution, solar requires land and mined minerals, nuclear … well ok nuclear is probably fine, but it’s expensive. Not that coal is better than all these, but focus on energy consumption is absolutely warranted as long as there is no single climate and nature-friendly power source that is available everywhere.
(In Norway there has been massive protests against windmills because they destroy so much wild land, and politicians are saying “yes but we must prepare for the AI future, also Teslas, so turning our forests paving-gray is actually the green thing to do also we got some money from those foreign investors which we can make half a kindergarten out of”. And that in a country where we already export much of our electricity and the rest is mostly used for the aluminum industry precisely because power is already fairly cheap here due to having dammed most of our rivers long ago.)
I agree with much of this comment, but I have two quibbles:
hydro require damming up rivers and destroying rivers
One can draw off only a fraction of the water in a river and feed it to a turbine with or without a dam, although of course there are trade-offs.
solar requires land and mined minerals, nuclear … well ok nuclear is probably fine
Nuclear power, at least in practice, implies mining minerals too, specifically radioactive ones (on top of the mining of iron, copper, etc. implied by any electric infrastructure).
The listed figures are high, and 500 MWh is about the cost of a large jet flying for 7 hours. But we do it once per foundational model, and then the cost is spread across all the remaining usage.
Also one is burning jet fuel and one is using electricity, which could be from a number of different sources. I realize that the conclusion to this section is that this isn’t that big of a deal but I just want to note the asymmetry here.
Where I come down is that I think we need a robust mechanism to opt out of use of data for training
I think this is a nice-to-have but I wonder if it’s really reasonable. You don’t have to put your content on the web. I think it’s a big ask to say that the internet should be sliced up into “you are allowed to view this otherwise public document, and you over there are not”. But I don’t have strong feelings.
And when the hardware to run the biggest models is only accessible to a few companies
Way more than a few, right? I don’t know that there are models that a motivated middle class American couldn’t afford to run, let alone a company.
I am not sure whether or not using LLMs is unethical. T
Yeah I haven’t taken a stance. I thought the ethical considerations made here represented what I’ve encountered online. Most claims that AI is unethical seem to me to really just be claims that capitalism is inevitable or something along those lines, and I think then you have to shape your claim as “AI is unethical under capitalism”, which itself is a huge restriction and requires a lot of justification.
I think that the costs of training are grossly under-estimated by taking just one model. Sure, training GPT-4 consumed a massive amount of energy, but still neglectible compared to, say, world-scale transportation. But:
For one model that was trained and then publicly talked about, how many other models were trained by the same company, completely or in part, and did not pan out?
We have heard of a few models trained from scratch, maybe 10 or 20. But how many other “foundational” models are being trained today by large companies with a deep budget that want to get into the race?
Nvidia is now one of the richest companies in the world because everyone is buying GPUs. What are these GPUs going to do, if not being plugged to a power input and then work on AI stuff? We are talking about a massive amount of GPUs, whose total power consumption is way about the training of one single model. (It is reported that Nvidia sold 3.76 millions data-center GPUs in 2023, probably more in 2024. See this article with scary estimates of datacenter GPU power consumption.)
There is the energy consumption cost of training, and then the energy and material costs (and production) of building the hardware used for training. This is much larger than the energy consumption impact in many cases, and it does more harm than just burning energy (rare metals, extremely high water consumption, etc.).
The amount of pollution caused by the current LLM craze is several orders of magnitude more than what you can estimate from training energy alone.
You can make these same claims about a 7 hour flight. How many engineers commuted via car to work on the plane? How many customers took cars to get there? How many hours of test flights to train the pilots? How about the transportation of parts and materials, transportation of fuel? etc.
People also buy GPUs for video games. Video games are unethical because of power consumption? Development costs behind them? Is it a matter of degree? Value?
What about the fact that models are getting more efficient, as are GPUs? How do we factor that in? You mention data centers, but data centers are getting much better at conservation of water etc as well. Seemingly at a much faster rate than air travel is improving.
What about the fact that people get value from these models?
You can inflate or deflate these numbers in a million ways, you can make ethical determinations based on a number of properties. I think we’re probably all on board with reducing energy costs here, I know I certainly am.
This also all assumes an ethical model based on quantifying costs, value, and moral responsibilities. What are my responsibilities here as a consumer? What are my responsibilities given unknowns about environmental impact?
People work to estimate the energy and pollution costs of entire industries, including transportation by plane, and there are figures computing estimates for these. (I think some of your questions are bit disingenuous, for example it’s easy to convince ourselves that people take the car to get to a plane for a much shorter distance than they fly, and therefore pollute much less in the car part.)
Video games are unethical because of power consumption?
It is, as you hint, a matter of comparing costs and benefits. Note also that data-center GPU are likely to show higher utilization than gaming GPUs. (The article I linked guesstimated utilization at 61%; few gaming computers are used 61% of the time.) Finally, gaming GPUs are on a stable-to-downward trend in terms of sales, while AI GPUs are exploding. This is an entirely new form of tech usage (and benefits, and costs) whose impact is worth discussing.
What about the fact that models are getting more efficient, as are GPUs? How do we factor that in? […] Seemingly at a much faster rate than air travel is improving.
Tech in general, and AI in particular (but consumer tech products are also non-neglectible and worth looking at), are consuming a larger and large fraction of energy and resources every year. For carbon-equivalent pollution, the current estimates are about a 6% annual increase, and they come from before the LLM boom. Plane transportation pollutes less than computers&tech, and its impact increases at a lower pace – I read figure around 2.2% per year for twenty years before COVID.
Individual efficiency gains in GPUs or datacenter are typically lessed by the fact that we replace them often, and by the fact that we use more of them, the famous Rebound effect.
What about the fact that people get value from these models?
Sure! But the costs of these new technologies should also be presented clearly for people to compare and contrast the value they get from the cost that is imposed. People will see the commercial price, but environmental impact are less directly visible and it is important to discuss them for new technologies so that people realize that they exist and are growing. (There are of course other forms of externalities around machine learning than just environment.)
People work to estimate the energy and pollution costs of entire industries, including transportation by plane, and there are figures computing estimates for these.
For sure, I’m fairly familiar with that.
think some of your questions are bit disingenuous, for example it’s easy to convince ourselves that people take the car to get to a plane for a much shorter distance than they fly, and therefore pollute much less in the car part.
None of my questions are disingenuous, my point was to get across that tertiary impact is really complex and requires huge effort. If we start asking questions about the tertiary impact of AI it’s only reasonable to ask those about the comparison made in the article, that’s all.
It is, as you hint, a matter of comparing costs and benefits.
I wasn’t really hinting at this fwiw, it was one example of how one might begin to form an ethical framework. I’m not a utilitarian so I would argue that it’s about more than cost vs benefit, but I would also likely commit to cost and benefit being critical components too.
whose impact is worth discussing.
Oh yeah, totally. I’d like to see a lot more research done on this and I’d also reallllly like to see legal requirements for data centers to start rapidly adopting cleaner approaches, especially new data centers.
Sure! But the costs of these new technologies should also be presented clearly
I completely agree, I think it should be a legal requirement, in fact.
If we start asking questions about the tertiary impact of AI
The sources of pollution I mentioned are (1) the training of other models than the most well-known ones and (2) the production costs, rather than merely the utilization costs, of the hardware used in this training. I wouldn’t call this “tertiary”, these are not very indirect costs.
Hm. If you’re referring to intermediary models, I agree. If you’re just referring to some random person training a model, or some other company, I don’t agree. But I don’t think it matters - I agree with you that we should incorporate more global, tertiary costs, I’m only advocating that if our starting point is to create a sort of utilitarian comparison to planes that we maintain that symmetry.
It’s hard to make really strong statements here as different sources compute things differently and are tricky to compare. But for example this document claims a 2-2.5% of all emissions due to the plane sector, that goes up to 3.5% when you take radiative forcing into account – which we should, and around a 2% yearly growth. And this document claims 3.5% of emissions due to digital technologies, with a 6% projected yearly growth. (And it comes from 2021, so they may not have predicted the consumption increase due to the LLM craze, but of course machine-learning was already going strong at the time.)
Is that based on data center energy costs or does it include the manufacturing and usage costs for every digital device?
It includes manufacturing and usage costs. As of a few years ago, consumer devices would tend to dominate data-center impact, and manufacturing typically has larger impact than usage, at least in countries with low-carbon energy sources.
So you’re arguing that if some things are bad, it’s fine to do something else bad? The article isn’t about cars or planes, it’s about LLMs. Not to mention that you’ve completely neglected the “powered by plagiarism” aspect.
What about the fact that people get value from these models?
Is the value commensurate with the cost? If you truly believe that, go with god, I guess.
I didn’t make any argument whatsoever, I only attempted to show that if we’re going to take the initial 1:1 comparison and start adding tertiary costs to one side that we should do the same to the other side. The article is also where the plane comparison comes from, not me.
Not to mention that you’ve completely neglected the “powered by plagiarism” aspect.
Neglected? I didn’t address it but I’ve made no argument about LLMs / AI being ethical or not, I stated that I thought it was a good overview of the cases being made currently. My post was not an attempt to rebut any of these arguments presented.
edit: I’ll even state this plainly. I’m agnostic as to whether LLMs are ethical or not, but my personal moral framework chooses a libertarian approach given ambiguity ie: given ambiguity I believe that it’s best to default to moral permissibility. If there’s evidence that LLMs are more or less ethical in the future I’ll update that judgment accordingly. Today, I think the strongest evidence for LLMs being unethical is the issue of copyright as well as the environmental impact, but I think we’re too early on for me to make a judgment - I look forward to the various copyright cases moving forward, and I would always advocate for environmental impact to be something we push hard to understand and improve.
We should have open source foundational models per language/geo area trained on all intellectual property available and released for public use. Problem solved.
I’m surprised by the number of people around here that seem to consider that vague documentation that says “we will generate stuff using an AI model on your project” is enough intent warning for a feature that zips the current git repository and sends it, without asking for any form of explicit confirmation, to a remote server.
If I was a manager in a proprietary software company, I would freak out if an employee would zip our source code and send it to a remote server. I would seriously consider (1) firing the employee right away and (2) suing the owner of the remote server (this is unauthorized acces that results in obtaining private, sensitive data). I’m not saying I would actually do this – it is likely that the employee was not aware that this would be a consequence of trying out a cool new feature of a hip nix-adjacent tool, and it might be excessive to sue someone whose intent is probably not malicious – but pondering both those things would be my first reaction.
Writing code that exfiltrates an entire zip repository is really not okay. And “but it’s AI” is not a good defense:
AI does not require copying people’s private/sensitive data. For example you could do inference on the client side. Or you could work harder to figure which features are useful for making good predictions and are less likely to leak sensitive data, compute the features on-site and send them over for evaluation.
If you are going to copy people’s private/sensitive data, you should be very explicit and very careful about this. Enabling this code by default without clear warning signs and intent-checking popups is a red flag. (“We are going to send the code to our server https://... for AI-assisted analysis. We will not keep any data afterwards, unless you enabled option Foo Bar. Here is a link to our data policy: <…>. Do you confirm? (You can disable this warning by setting the configuration option …”)
I would not touch a development-environment solution whose maintainers are unable to understand this before releasing such a feature.
As someone who has been vocally supportive of devenv both here on lobste.rs and privately amongst colleagues at various software companies, this was embarrassing. Devenv is an objectively good piece of software that I appreciate as part of my development workflow every day. I’ll continue using devenv because of that, but pay more attention to release notes and the PRs that go into releases because of devenv 1.4.
Now I’m in the awkward position of having convinced people at multiple commercial software shops to use devenv, knowing that you released a feature that I’m morally and commercially opposed to. Because of that, I’m notifying everyone I know who is using devenv because of me that 1.4 was released with a feature that is potentially hostile to them and their commercial interests. They can make their own decisions based on that information. I won’t continue recommending devenv freely, and if I do recommend it, it will be with an asterisk. This is what a loss of goodwill means to me.
I appreciate the action you’ve taken based on pushback. Once devenv has a long pattern of behavior in users’ interest, I’ll re-evaluate my position on it. Until then, I’ll remain suspicious that your response is just another “we heard you” post from the CEO fuck-up playbook.
I personally don’t have any interest in the feature, but it probably just needs more warnings / opt-ins in tooling. Anyway, thanks for responding to the furore in good time :)
Anyway, thanks for responding to the furore in good time :)
Yep, posting an apology and acknowledging that this was done very, very poorly is the right move. Except it hasn’t happened yet. (This may be in-preparation, as the release only happened last Thursday.)
I have been a long-time contributor and this was somewhat surprising to me. I use devenv at work, and recommend it to all of my coworkers at every opportunity when they ask “how to do local development”. Due to the nature of my work, though, (contracting) I can’t send any client IP to any third party without express written permission by the client. That means no CoPilot (or similar), no Zed multi-user editing, and definitely not sending a git listing of the client repo to a service. If I make a StackOverflow post, I’m not even supposed to use the same variable names.
I am not trying to hate on the feature. I’m sure there are a lot of folks that would like some of the benefits of the nix and devenv ecosystem, but just don’t want to fuss with writing a bunch of nix. I personally don’t use LLMs or agents for code but I can’t / won’t stop anyone else from using them.
To clarify, though, you can still use devenv initwithout the AI features, right? It’s only devenv generate that uses the third party? If that’s the case, then maybe just be really noisy / obvious about what you send and to whom. I’m sure a lot of people would be satiated if you had to answer Y or yes in order to send a list of stuff that got printed on the screen to some URL that is also on the screen.
Thanks for the quick response. You have always been super-fast in merging my PRs.
I’m surprised this is a publishable result. Not only is counter mode well known to be a cryptographically secure prng itself, its parallelism has long been obvious. GCM mode is built on top of counter mode explicitly due to the parallelism. And several real systems divide up the counter-space per thread or per process.
I think this was the first work that tried applying counter-mode to non-cryptographic PRNGs and managed to get decent performance while passing the usual statistical tests. It was published in 2011 at a supercomputing conference which is a field that has longstanding difficulties getting enough random numbers fast enough – the paper’s introduction explains the unsatisfactory state of the art. Similarly, the seekability of LCGs was “just” an application of modexp repeated squaring, but it was publishable at a supercomputing conference in 1994 for roughly the same reasons (but was no longer relevant in 2011 because raw LCGs are crappy).
Counter mode is itself a cryptographic PRNG. If you’re talking about the security bounds when seeding without sufficient entropy, it is akin to talking about the bits of entropy in the key.
But it does make sense that most people in HPC wouldn’t have been deep in the crypto literature. At the time they would have been likely to know counter mode, but probably not as likely to know it was a cryptographic PRNG. So point taken.
Counter mode is a cryptographic PRNG if the mixing function is cryptographically strong. The point of this paper is to get good results fast by using a weak mixing function to make an insecure PRNG. (In supercomputing they use known seeds for reproducibility; I dunno how they choose seeds (I bet it’s just the time) but that’s not what the paper is about.)
Their starting point is a reduced-round version of AES (10 -> 5) with a simplified key schedule. AES-NI was new at the time so this was probably its first use in a fast insecure PRNG.
This was published in 2011, and the audience is the High-Performance Computing (HPC) community (not crypto people or PRNG experts). I don’t know what the PRNG field was like in 2011, and it might be that the paper mostly contains not-so-novel ideas but was accepted because those ideas were little-known in the HPC and benefited from more publicity.
GCM mode was 2005, and a NIST standard by 2007. The result about counter mode probably came in the paper that first formalized the notion of a pseudo-random-permutation, which I believe was Luby and Rackoff in ’88.
The drive for GCM was hardware manufacturers like Cisco and Intel that were concerned about supporting high performance computing.
But I get the point that the reviewers may not have had the background.
Looking at the comments, it seems that it is the time of the month to complain about open-source desktop stacks. Let me add my own complaint: why aren’t “window manager” and “desktop environment” separate things in practice? I’m using Gnome with keybinding hacks to feel somewhat like a tiling wm. I would prefer to use a proper wm, but I want the “desktop environment” part of Gnome more: providing me with configuration screens to decide the display layout when plugging an external monitor, having plugging an USB disk just work, having configuration screens to configure bluetooth headsets, having an easy time setting up a printer, having a secrets manager handle my SSH connections, etc.
None of this should be intrisically coupled with window management logic (okay maybe the external-monitor configuration one), yet for some reason I don’t know of any project that succeeded in taking the “desktop environment” of Gnome or Kde or XFCE, and swapping the window manager to something nice. (There have been hacks on top of Kwin or gnome-shell, some of them like PaperWM are impressive, but they feel like piling complexity and corner cases on top of a mess rather than a proper separation of concerns.)
The alternative that I know of currently is to spend days reading the ArchLinux wiki to find out how to setup a systray on your tiling WM to get the NetworkManager applet (for some reason the NetworkManager community can’t be bothered to come up with a decent TUI, although it would clearly be perfectly appropriate for its configuration), re-learn about another system-interface layer to get usb keys to automount, figure out which bluetooth deamon to run manually, etc. (It may be that Nix or other declarative-minded systems make it easier than old-school distributions.) This is also relevant for the Wayland discussion because Wayland broke things for several of these subsystems, and forced people to throw away decades of such manual configuration to rebuild it in various way.
Another approach, of course, would be to have someone build a pleasant, consistent “desktop environment” experience that is easy to reuse on top of independent WM projects. But I suspect that this is actually the more painful and less fun part of the problem – this plumbing gets ugly fast – so it may be that only projects that envision themselves with a large userbase of non-expert users can be motivated enough to pull this through. Maybe this would have more chances of succeeding if we had higher-level abstractions to talk to these subsystems (maybe syndicate and its system layer project which discusses exactly this, maybe Goblins, whatever), that various subsystems owner would be willing to adopt, and that would make it easier to have consistent tools to manipulate and configure them.
The latest versions of lxqt and xfce support running on pretty much any compositor that supports xdg-layer-shell (and in fact, neither lxqt nor xfce ship a compositor of their own). Cosmic also has some support for running with other compositors, although it does ship its own. There’s definitely room for other desktop environments to support this, too.
Another approach, of course, would be to have someone build a pleasant, consistent “desktop environment” experience that is easy to reuse on top of independent WM projects.
I think this is the main reason I use NixOS nowadays: you configure things the way you want, and they will be there even if you reinstall the system. In some ways I think NixOS is more of a meta-distro, where you customize the way you want, and to make things easier there are lots of modules that make configuring things like audio or systray easier.
You will still need to spend days reading documentation and code to get up there, but once it is working this rarely breaks (of course it does break eventually, but generally it is only one thing instead of several of them, so it is relatively easy to get it working again).
What you describe is a declarative configuration of the hodgepodge of services that form a “desktop environment” today, which is easy to transfer in new systems and to tweak as things change. This is not bad (and I guess most tiling-WM-with-not-much-more users have a version of this), it is a way to manage the heterogeneity that exists today.
But I had something better in mind. Those services could support a common format/protocol to export their configuration capabilities, and it would be easy for user-facing systems to export unified configuration tools for them (in your favorite GUI toolkit, as a TUI, whatever). systemd standardized a lot of things about actually running small system services, not much about exposing their options/configurations to users.
Yes, there are expert-level tools that can make Git somewhat pleasant and diminish the added value of jj.
But these tools are only used by a minority of Git users – for example Magit assumes Emacs, and most new people entering programming these days are not using Emacs.
Bringing others up to speed with advanced Git workflows is a pain, and I’m directly affected by the way they use git when we collaborate together.
A tool like jj (or a better git porcelain that would actually get traction) can improve the field for everyone, without relying on extra advanced tools.
Besides, jj is not just a porcelain, it has the idea of conflicts being first-class commit objects which is a net conceptual improvement.
It’s no surprise that expert magit or lazygit users don’t get that many jj benefits for themselves (I guess there still are, in particular around conflicts; but jj is also less pleasant to use in various ways). But maybe they could think of jj as a gift for others.
I’ve been having similar feelings, while git feels a bit like a bloated mess, as a Magit (/gitu) user I have not really seen where it solves a significant problem.
I also like the same workflow of making a bunch of changes, then coalescing them into commits. I actually think the “decide on a change, then implement it” flow is nicer, almost Pomodoro-esque, but it’s not how I work in practice.
Otherwise I have years of accumulated git fixing experience that doesn’t help me, and commit signing is painfully slow which isn’t great for jj’s constant committing.
I do hope we get some progress in the space, Pijul seemed promising, I just don’t see the value for me personally at this point.
Otherwise I have years of accumulated git fixing experience that doesn’t help me, and commit signing is painfully slow which isn’t great for jj’s constant committing.
I’ve been using git practically since it was initially released, I have 4 digit github user id. I’m on my 3rd or 4th attempt to switch to JJ. Breaking habits that are so old is really painful. Also there’s bunch of nice things that JJ could have but still doesn’t, and one can tell right away that the project is still early. Still, I hope it will be worth it…
In git managing a lot of commits is just too inconvenient, and despite all the deficiencies a lot of people (me included) can immediately see the potential of the whole philosophy.
commit signing is painfully slow
With git I use gpg on yubikey, so I need to touch the yubikey for every signature (paranoia hurts), and I had to just disable it in JJ, because it’s unworkable.
I hope eventually JJ will have ability to sign only when switching away from a change and/or when pushing changes. Similarly, I’m missing git pre-commit hooks. I hope soon JJ will have a hook system optimized for that flow.
I also really, really miss the git rebase -i.
Pijul seemed promising
Pijul seems great in theory, but JJ wins by being (at least initially) fully compatible with Git. One can’t ignore network effects as strong as Git has at this point.
Same here, but with a single unlock. I’ve been experimenting with signing through SSH on Yubikey instead, which seems to be somewhat faster. I guess GPG is also just waiting to get replaced by something that isn’t as user-hostile. I get a pit in my stomach every time I have to fix anything related to it.
One can’t ignore network effects as strong as Git has at this point.
This actually reminds me, I really like the fact that branches aren’t named, but in reality we’re all pushing to GitHub, which means you need to name your branches after all, and JJ even adds the extra step of moving your branch/bookmark before every push, which I thought was a bit of a drag.
JJ even adds the extra step of moving your branch/bookmark before every push, which I thought was a bit of a drag.
People are discussion big ideas with “topics” that would bring some of the auto-forwarding functionality of branches back. In the meantime, I found making a few aliases is perfectly fine. This one in particular is great:
It finds the closest ancestor with a bookmark and moves it to the parent of your working-copy commit. For me, pushing a single branch is usually jj tug ; jj p
I also have a more advanced alias that creates a new commit and places it at the tip of an arbitrary bookmark in a single step. This is great in combination with the megamerge workflow and stacked PRs:
cob = ["util", "exec", "--", "bash", "-c", """
#!/usr/bin/env bash
set -euo pipefail
# usage: jj cob BOOKMARK_NAME [COMMIT_ARGS...]
if test -z ${1+x}
then
echo "You need to specify a bookmark onto which to place the commit!"
exit 1
fi
target_bookmark="$1" ; shift
jj commit "$@"
change_id="$(jj --ignore-working-copy log --revisions @- --no-graph --template change_id)"
jj --ignore-working-copy rebase --revisions "$change_id" --insert-after "$target_bookmark"
jj --ignore-working-copy bookmark move "$target_bookmark" --to "$change_id"
""", ""]
I’ve been experimenting with signing through SSH on Yubikey instead
I have not looked into it. Does it work? I would definitely consider. Just not having to touch gpg is always a plus.
It generally does, and seems well enough supported (mainly GitHub). It does require registering the key as an allowed signing key in a separate file in ~/.ssh, but I can live with that I guess.
Note that you can still use git rebase -i in a colocated repo. Make sure to add the --colocated flag to jj git clone and jj git init. All my repos are always colocated, it makes all the git tooling work out-of-the-box.
The next time you run jj after a git rebase, it will import those changes from the git repo, so everything just works. With one big exception: jj will not be able to correlate the old commit hashes with the new ones, so you lose the evolog of every commit that changed its hash. But then again, git doesn’t have an evolog in the first place, so you’re not losing anything compared to the baseline.
It could be a fun side-project to add the git-like rebase UI to jj. A standalone binary that opens your editor with the same git rebase todo list, parses its content and submits equivalent jj rebase commands instead of letting git do it.
It could be a fun side-project to add the git-like rebase UI to jj. A standalone binary that opens your editor with the same git rebase todo list, parses its content and submits equivalent jj rebase commands instead of letting git do it.
Any particular reason you suggest a separate tool, rather than something to contribute to jj upstream? I also wouldn’t mind a convenient tool to reorder changes, and I would rather have it integrated in jj rebase directly. (This could also be a TUI instead, whatever.)
There are other interactive modes that git support, for example the usual -p workflows with a basic TUI to approve/reject patches in the terminal (as an alternative to jj’s split interface relying on a 2-diff tool), which I wouldn’t mind seeing supported in jj as well. (I don’t think it’s a matter of keeping the builtin feature set minimalistic, given that the tool already embeds a builtin pager, etc.)
Any particular reason you suggest a separate tool, rather than something to contribute to jj upstream? I also wouldn’t mind a convenient tool to reorder changes, and I would rather have it integrated in jj rebase directly. (This could also be a TUI instead, whatever.)
I think nobody has yet invented a text-based interface for general graph editing:
For example: It’s not obvious how to intuitively represent divergent branches in the git rebase -i interface.
Ideally, there would be a simple text-oriented way to express the topology (via indentation, maybe?), rather than relying on label/reset commands in git rebase -i.
(I don’t think git rebase -i can currently initiate edits which would affect multiple branches (with no common descendant), so you incidentally don’t encounter this situation that often in practice.)
For example: The “mega-merge” workflow, which involves continually rewriting a set of commits and re-merging them, seems particularly painful in the git rebase -i interface.
A non-text-based TUI might work (perhaps based on GitUp’s control scheme?), but nobody has implemented that, either.
There are other interactive modes that git support, for example the usual -p workflows with a basic TUI to approve/reject patches in the terminal (as an alternative to jj’s split interface relying on a 2-diff tool), which I wouldn’t mind seeing supported in jj as well.
A naive idea for a TUI would be to reuse the current terminal-intended visualization approach of jj log, or git log --oneline --graph: they show one change per line, with an ascii-art rendering of the commit placement on the left to visualize the position in the commit graph. In the TUI we could move the cursor to any commit and use simple commands to move them “up” or “down” in their own linear history (the default case), and other commands to move them to another branch that is displayed in parallel to the current branch (or maybe to another parent or child of the current node). Of course, this visualization allows other operations than commit displacement, arbitrary interactive-rebase style operations could be performed, or maybe just prepared, on this view.
You should already be able to do this with jj’s :builtin TUI
Woah, thanks! Turns out I read the doc before trying jj split, and I dutifully followed the recommendation to go with meld from the start, so I never got to try this.
Replying to myself: it looks like jjui offers a workflow similar to the text-based graph editing I described above, see their demonstration video for Rebase.
(I don’t think it’s a matter of keeping the builtin feature set minimalistic, given that the tool already embeds a builtin pager, etc.)
I do think it would bloat the UI. And I have a hunch that the maintainers would see it the same way, but do feel free to open a feature request! It’s always good to talk these things through.
My opinion is that the rebase todo list workflow is not very good and doesn’t fit with the way jj does things. When you edit the todo file, you are still making several distinct operations (reorder, squash, drop, reword etc.). In jj those map to a single command each. By just using the CLI, you can always confirm that what happens is what you expected. If it isn’t you can easily jj undo that single step. With the todo file, you just hope for the best and have to start all over if something didn’t go well. The todo file also is not a great graphical representation of your commit tree. In git, it’s realy optimized for a single branch. You can configure git so interactive rebase can be used in the context of a mega-merge-like situation… but the todo file will become very ugly and difficult to manage. On the other hand, the output of jj log is great! So I think offering the todo-file approach in jj would be inconsistent UI in the sense that it discourages workflows that other parts of the UI intentionally encourage.
Regarding the comparison with the pager, I don’t think the maintainers are too concerned about binary size or number of dependencies, but rather having a consistent UI. A pager doesn’t really intrude on that.
When you edit the todo file, you are still making several distinct operations (reorder, squash, drop, reword etc.). In jj those map to a single command each. By just using the CLI, you can always confirm that what happens is what you expected. If it isn’t you can easily jj undo that single step.
I find this argument very unconvincing. When I operate on a patchset that I am preparing for external review, I have a global view of the patchset, and it is common to think of changes that affect several changes in the set at once. Reordering a line of commits, for example (let’s forget about squash,drop,reword for this discussion), is best viewed as a global operation: instead of A-B-C-D I want to have C-B-A-D. The cli forces me to sequentialize this multi-change operation into a sequence of operations on individual changes, and doing this (1) is unnatural, and (2) introduces needless choices. Exercise time: can you easily describe a series of jj rebase command to do this transformation on commits in that order?
The todo file also is not a great graphical representation of your commit tree.
I agree! But the command-line is even worse as it is no graphical representation at all. It would be nice to have a TUI or a keyboard-driven GUI that is good at displaying trees when we do more complex things, but the linear view of an edit buffer is still better than the no-view-at-all of the CLI when I want to operate on groups of changes as a whole.
Exercise time: can you easily describe a series of jj rebase command to do this transformation on commits in that order?
Yeah, that’s not hard with jj.
jj rebase -r C -B A # insert C between A and its parent(s)
jj rebase -r B -B A # insert B between A and its parent, which is now C
And I would still insist that in a realistic scenario, these commits have semantic meaning so there are naturally going to be thoughts like “X should be before Y” which trivially translates to jj rebase -r X -B Y.
To make it clear though, I’m not saying your perspective is wrong. Just that I don’t think this workflow would be a good addition upstream. I’d be very happy if there was an external tool that implemented this workflow for you and I don’t think the experience would be any worse than as a built-in option (apart from the on-time install step I guess.)
But the command-line is even worse as it is no graphical representation at all.
What do you mean it’s not graphical? Have you seen the output of jj log? For example:
The cli forces me to sequentialize this multi-change operation into a sequence of operations on individual changes, and doing this (1) is unnatural, and (2) introduces needless choices.
I hear what you’re saying, and I think it’s kinda funny: from a different perspective, git rebase forces you into a serial sequence of operations, whereas jj rebase never does. Doesn’t mean you’re wrong, of course, it just took me a moment to grok what you meant, given that I usually view it as the opposite!
(Another pain point with the CBAD thing is that last time i had to do this, it introduced a lot of conflicts thanks to the intermediate state being, well, not what i wanted, and so seeing all that red was stressful. they disappeared after moving another commit around, but in the moment, i was not psyched about it)
Note that you can still use git rebase -i in a colocated repo. Make sure to add the –colocated flag to jj git clone and jj git init. All my repos are always colocated, it makes all the git tooling work out-of-the-box.
Oh, that’s good know. Generally, I am afraid to do anything with git directly after enabling jj in a given repo. I’m afraid of confusing myself, and I’m afraid of confusing the tooling.
It could be a fun side-project to add the git-like rebase UI to jj. A standalone binary that opens your editor with the same git rebase todo list, parses its content and submits equivalent jj rebase commands instead of letting git do it.
I’m looking forward for it to be built-in, at least eventually. git has it. And jj already opens commit message editor on jj desc, so it’s not some new type of UI.
Can you describe what you’re doing with interactive rebases that you can’t do (or can’t do as efficiently) with JJ? Is it specifically this interface to rebasing commits that you’re missing, or a particular feature that only works with git rebase -i?
Just the interface. Editing lines in a text editor is a perfect blend of (T)UI and CLI. Just being able to reorder commits would be great. With squash/fixup, that’s 99.9% of my usage of git rebase -i.
I have not really seen where it solves a significant problem.
The main problem I encounter that it could solve is when I talk to someone who doesn’t already know git and have to kinda sheepishly say “welllll, yeah you can get the code you want with this one tool, but it suuuuuuucks; it’s so bad, I must apologize on behalf of all programmers everywhere except Richard Hipp”
I’m fully aware that I’m just Stockholm-syndromed to git. Having tried to explain how to use git to someone myself, I completely agree that it’s incredibly opaque and inconsistent. I do think that a lot of that only surfaces once you use git in non-trivial ways, clone-edit-stage-commit-push might not be optimal, but it’s fine.
For casual users I feel like the biggest overall usability win would be if GitHub could find a way to let you contribute to a repository without having to fork it.
For casual users I feel like the biggest overall usability win would be if GitHub could find a way to let you contribute to a repository without having to fork it.
This is one of the reasons that as a serial drive-by contributor, I much prefer projects hosted on Codeberg (or random Forgejo instances, perhaps even SourceHut, though I’m not a fan of the email based workflow): I can submit PRs without forking.
After your comment I actually went back and signed into Codeberg, but I’m not finding how you’re supposed to PR without forking. Even their documentation talks about the fork-based workflow. Am I missing something?
It is the AGit workflow that lets you do this. It’s not advertised, because there’s no UI built around it (yet?), and is less familiar than the fork+PR model. But it’s there, even if slightly hidden.
While I’m in general not a big fan of it, that’s one useful thing about Githubs gh commandline tool: a simple gh repo fork in a clone of the upstream will create a fork and register it as a remote. Now if they only added a way to automatically garbage-collect forks that have no open branches anymore…
That’s still 1-2 commands more, and an additional tool, compared to a well crafted git push.
Of course, I could build a small cron job that iterates over my GH repos, and finds forks that have no open PRs, and are older than a day or so, and nukes them. It can be automated. But with Codeberg, I don’t have to automate anything.
I’m assuming “being screwed” means it was hard to fix the conflict, that sucks, I’m sorry to hear that.
What “first class” means in this conflict is that jj stores the information that the conflict exists as part of the metadata of the commit itself. Git doesn’t really let you keep a conflict around, it detects conflicts and then has you resolve them immediately. jj will happily let a change sit there conflicted for as long as you’d like. Rather than “good at making conflicts not happen in the first place,” which is a separate thing that seems like hasn’t been true for you. I don’t know if and what differences jj has with git in that regard.
I had it again today and managed to work my way through it but not because there’s a good “first aid in case of conflict” documentation or anything. That’s the major issue: the tricks and skills we collectively built up to work with and around git aren’t there for jj yet.
But then while resolving the conflict, gg showed my bookmark to have split into 3 which was mildly surprising to say the least.
Happy to see vc-jj.el mentioned! We just made an organization for it on codeberg and will work on getting it into gnu elpa next. It’s not feature-complete yet, but happy to get feature requests etc, https://codeberg.org/emacs-jj-vc/vc-jj.el
I was talking about what I’d need in an editor plugin: give me the magit collapsible diff UI, plus two or three shortcuts to common operations (like new, describe or commit), and I’ll be happy!
gg is not as rich as magit, and of course isn’t integrated into the editor. But it demonstrates the power of the simpler model of jj, because it’s like 70% of what you need even though it doesn’t have a ton of functions.
It could use better docs/tutorials, because it can do some very useful things that are not very discoverable (basically, drag and drop is pretty magical). If it had a jj split view it would be in a good spot — almost no need for the CLI anymore.
I used to do a lot in GitX and gg is almost to that point now.
gg looks nice, although it hasn’t been updated in a few months now, so it’s a bit behind the latest jj releases. But it doesn’t really fit the magit use case; I’m sure there’s a public for these standalone GUIs, similar to GitKraken and others for git, but they’re not for me, I really need something integrated into my editor.
The wording of the blog post confused me, because in my mind “FFI” (Foreign Function Interface) usually means “whatever the language provide to call C code”, so in particular the comparison between “the function written as a C extension” and “the function written using the FFI” is confusing as they sound like they are talking about the exact same thing.
The author is talking specifically about a Ruby library called FFI, where people write Ruby code that describe C functions and then the library is able to call them. I would guess that it interprets foreign calls, in the sense that the FFI library (maybe?) inspects the Ruby-side data describing their interface on each call, to run appropriate datatype-conversion logic before and after the call. I suppose that JIT-ing is meant to remove this interpretation overhead – which is probably costly for very fast functions, but not that noticeable for longer-running C functions.
Details about this would have helped me follow the blog post, and probably other people unfamiliar with the specific Ruby library called FFI.
Replying to myself: I wonder why the author needs to generate assembly code for this. I would assume that it should be possible, on the first call to this function, to output C code (using the usual Ruby runtime libraries that people use for C extensions) to a file, call the C compiler to produce a dynamic library, dlopen the result, and then (on this first call and all future calls) just call the library code. This would probably get similar performance benefits and be much more portable.
I would guess because that would require shipping a C compiler? Ruby FFI/C extensions are compiled before runtime; the only thing you need to ship to prod is your code, Ruby, and the build artifacts.
Yeah, that’s why I wrote “predominantly”. Also, for such a localised JIT, a second port to aarch64 is not that hard. You just won’t have an eBPF port falling out of your compiler (this is absurd for effect, I know this isn’t a reasonable thing).
Note that the point here is not to JIT arbirary Ruby code, which is probably quite hard, but a “tiny JIT” to use compilation rather than interpretation for the FFI wrappers around external calls. (In fact it’s probably feasible, if a bit less convenient, to set things up to compile these wrappers ahead-of-time.)
It sounds like the performance win is an algorithmic complexity win in the context of extremely heavily loaded hash tables. In which case it might theoretically be a “faster” hash table, but that win comes in the case of load factors that far exceed what any real world hash table would actually permit.
I’m still digesting the paper, but to comment about the heavily loaded hash tables…
Real world hash tables do get heavily loaded. Specifically, lots of hot hashtables grow through several resizes while they are hot. And each time they get close to growing they end up heavily loaded.
So I at least definitely end up caring about loaded performance of hashtables.
I’m curious what you use as the load factor that triggers growing the table (shrinking matters but as a memory use optimization so isn’t as important in the discussion)? My scanning of the paper leads me to think the perf win would not show up as a beneficial until well over the common growth points, on the other hand I also just looked at the libc++ hash table implementation (and I recognize c++‘s unordered_map is known to be specified in a way that is not great for perf) and it’s default max load factor is 100%, so :-O
unordered_map is effectively forced to be a closed-address hash table, so the load factor discussion is different relatively to open adressing hash tables.
Abseil and Carbon’s hash tables use a max load factor of 7/8ths, and grow if exceeded.
The STL hash table is constrained to use less efficient implementation strategies and most compensate for this with a lower max load factor (spending some memory to reduce the performance overhead).
But see my top-level comment – this paper is talking about a totally unrelated hashtable design that doesn’t really have relevance for any in-memory production hash tables i’m aware of…
Naive question as someone who does not know much about hash tables: isn’t there a tradeoff between load factor and memory use, where we are forced to resize to a larger backing array to keep the load factor down? If a probing strategy lets me increase the load factor without degrading performance too much, it may be interesting for use-cases interested in paying time (increased constant factors for their more complex strategy) to save space.
You are correct that the load factor is more or less the fraction of the memory you can use for data.
However, the result is for getting insertion-only hash table to perform better than thought possible at >95% utilisation without access to the elements in advance and without reordering them after initial placement.
A practical use case probably will negotiate no-moves before using this.
From the point of view of algorithm design as a coherent field of study, it surely is important that such a restricted case had an open problem for a long time, and now implications of some techniques are better understood.
Maybe, but… In tightly space constrained environments you may not be able to afford to use a hash table that could fill up and need to expand, anyway. And with extremely large tables where the extra space is an appreciable expense, there’s probably a more complex architecture that would be more appropriate.
(Not to say there’s nothing interesting here, I haven’t read the paper yet)
There is a performance trade off for memory use vs runtime performance. The problem being addressed (at least per the article) is cases where the load factor is extremely high, far beyond what I would expect even in resource constrained environments. Essentially the naive performance drop off for hash tables (e.g not doing clever things, incl. potentially what happens in the article) is pretty catastrophic at high load factors - I assume it’s probably exponential perf cost, but honestly it’s been so long since I was at uni that the theoretical costs are long gone from my memory. For the vast majority of use cases you’re unlikely to ever have anything approaching those load factors in hash tables, and for the “I have gigantic tables using huge amounts of memory” cases people seem to migrate to varying tree structures because you can get much more controllable performance characteristics (both memory and runtime), and can use low load factor hash tables for smaller frequent/recent use caches.
The tone in the discussion here is fairly negative so far (people don’t like an implementation they found online, which was not written by the authors of the paper). I skimmed the paper and from a distance and as a non-expert it seems reasonable, it is co-authored with people that are knowledgeable about this research field, there are detailed proof arguments, and the introduction shows a good understanding of the context and previous attempts in this area. Unlike the sensationalist Quanta title, the overall claim is (impressive but) fairly reasonable, they disprove a conjectured lower bound by doing better with a clever trick. They point out that this trick is in fact already used in some more advanced parts of the hashtable literature, but maybe people had not done the work of proving complexity bounds, and therefore not realized that it worked much better than expected – no pun intended.
Is this correct? I don’t know – I timed out without reading this in any level of details. But it has the appearance of serious work, so I would be inclined to assume that it is unless proven otherwise.
The PoC was written by a random person AFAIK, and it shouldn’t reflect in any way on the paper or its authors. That could’ve been made more explicit in the other thread though :)
And I agree the paper seems serious though I don’t have the background to understand everything.
I think the paper might help in crafting some novelty implementation with specific worst case tradeoffs. But it would not impact the generic implementations such as Abseil (Google) or Folly F14 (Meta).
TL;DR: Author does not know how to solve Soduku and eventually just prints static answer to static input.
This article isn’t about over-engineering, but about the futility of using the wrong approach. The author uses a “enumerate everything” brute force solution and then tries to using multi-threading for a solution. Brute forcing Soduku is similiar to the “8 Queens Problem” where pruning your enumeration cuts down the number of solutions considered by many orders of magnitude. If “2 in the 10th box” fails, I really don’t need to check “2 in the 10th box and 1 in the 11th box”; I just move to “3 in the 10th box”. This drops so many orders of magnitude to the search that no amount of better compilation or more processors will help.
Still, the author learned one trick about reading the actual requirements, got to write some fun code, and the day was not wasted.
My impression is that your TL;DR is wrong. It’s not very kind to assume that the author “does not know how to solve Sudoku” when there is zero indication to that effect in their blog post – the author does not describe how they resolve sudoku.
The problem being described is a variant of Sudkou paired with a global optimization constraint on the GCD (greatest common dividers) of each row, read as a 9-digit number. All of the blog post is about how to efficiently enumerate numbers that can be the GCD of a set of rows, and then trying to solve the sudoku board with this additional constraint.
The sudoku solving function, solve, is never shown in the blog post, and just described as “a backtracking search”. You might believe that the author means this in a very naive way, but “a backtracking search” could also be used to describe proper constraint-propagating implementations, when their decisions are just picking one (valid) value for one undetermined position. In fact, its last benchmarked version of the program solves an unspecified number of sudoku boards in 360ms, which would suggest that the solver is not completely naive.
I’m attracted to purism. I need innovation plus an application to keep reading. I need something that’s thoroughly thought through to really like it. [..] For an example Pijul satisfies almost all. But git is kinda “good enough” (for me at least). The benefits don’t seem to really justify replacing a whole ecosystem of how we manage code. So there isn’t really an application for me to start using it. I’d definitely use it if others did, but I wouldn’t push it.
I followed the same train of thought and decided to go with Jujutsu (jj). Like pijul, it also has new, strong design ideas – but it is less radical. Unlike pijul, it is compatible with git repositories and forges, which means that it has a much easier path to adoption. (My application is “much better experience than git when rewriting histories in preparation for one or several github PRs”.)
Yes true! I have it in my backlog to take another look (didn’t get it last time i tried it, I didn’t realize I should change the workflow :)).
I wonder if the longterm goal of Jujutsu is to introduce an alternative backend which is more “native” to Jujutsu. In that case it would be taking “the long road” very similarly to Oils. Reimplement, then improve.
I have the feeling this is a currently very liked approach. E.g. coreutils is one other example (thought this is more about the language/development ergonomics than features/usage economics).
Interesting tidbits from following links:
There is a regression in clang 19 (in fact in LLVM) that made the Python interpreter slower than with clang 18 by about 10% on some systems, and the 15% performance improvement reported for the new tail-call interpreter comes mostly from working around this regression. Once the LLVM regression is fixed, the non-tail-call Python interpreter speed gets back to normal, and the tail-call version only improves performance by 5%. (These numbers are ballpark, it depends on the benchmark and architecture and…)
The clang19 bug was found and reported in August 2024 by Mikulas Patocka, who develops a hobby programming language with an interpreter. In the issue thread, several other hobbyist language implementors chime in, well before the impact on Python (and basically all other interpreted languages) is identified. This is another example of how hobbyist language implementors help us move forward.
The LLVM regression was caused by another bugfix, that solves a quadratic blowup in compile time for some programs using computed gotos. This is an example that compiler development is tricky.
The LLVM regression was fixed by DianQK, who contributes to both LLVM and
rustc, by reading the GCC source code, who already identified this potential regression and explicitly points out that one has to be careful with the computed gotos as used in threaded bytecode interpreters. This is an example that GCC/LLVM cross-fertilization remains relevant to compiler development and the broader open source ecosystem, even to this day.I found the Privacy Notice rather clear. It lists several distinct uses of data, and clarifies which usage uses which sorts of data. (In particular, the data I’m most worried about is “Content”, the stuff I type into Firefox; but most usages discussed in the Privacy Notice restrict their data and do not include my Content.)
They also seem to be careful about which data remains on the device and which is shared with their servers, for example being explicit about some (opt-in) AI features being implemented with a small language model that runs locally is a nice touch.
This privacy notice gives me the impression that the Firefox people still mean well – these terms of use do not look like a secret plot to siphon all my content away, store them in their servers, for vague model-training purposes that they haven’t completely thought about yet.
For me, the problem is this statement
So unless I have a diff bot watching this notice for changes, at any moment Mozilla can gain additional privileges relative to my data, and I’ll be none the wiser. It’s a rather large escape hatch for them.
Typical companies offering online services write to their users when terms of service change, and I would expect Mozilla to do the same if it changes its privacy notice. You may be assuming that Firefox is acting in bad faith, or is likely to act in bad faith in the future. I’m personally still willing to believe that Mozilla would not outright lie or try to trick us with this – the fact that this Privacy Notice is generally reasonable and displays a care for details that I think are important (such as whether the data is processed locally or on their serves) tends to support my assessment that they are trying to do things right.
It is of course entirely possible that Mozilla becomes adversarial at some later point in the future (I have disagreed with some of their decisions in the past), but I have the impression that the amount of risk that I am tolerating is okay, by still using their product until I learn in the news that they sneakily changed their Privacy Notice to do something bad .
Your position is very reasonable.
I’m not of the opinion that Mozilla is acting in bad faith. My concern is that their terms and privacy notice now give them considerable latitude to do so at any time in the future. In other words, I have no faith in institutions remaining “good” or neutral. Google once believed in “don’t be evil”, and Mozilla once believed in user autonomy and privacy. I believe in the fact that buyouts happen, leadership changes, and values shift over time. And these changes are perhaps an indication of the latter.
In short, my feeling is that these terms open the door to malicious action made legal by their breadth and malleability. And I think where you and I differ is our estimation of that risk.
Time will tell :)
[edit]
I wasn’t aware of this leadership change when I wrote the above comment: https://blog.mozilla.org/en/mozilla/mozilla-leadership-growth-planning-updates/
Why would they give themselves the right to transgress your expectations, if they didn’t want to maintain the possibility of doing just that?
I went back to the Privacy Notice and it says, in the Changes section:
This sounds reasonable to me.
We probably differ in opinion here too.
I can’t recall a notification of terms/policy change that has enumerated actual changes. I just had a look through my mail archive at messages matching “updated our” (there were a lot of matches).
None of them list what the material changes are, and simply link to the new policy document. The ones that make a feeble attempt at enumerating the changes make vague statements like “made more readable”, “explanations of data we collect”. Which, to be fair to your argument, is what you want to know – that a change has been made.
If these notices came with a diff, I’d be satisfied.
If done properly, it can work as a built-in shortener.
https://example.com/p/58473/photos-of-cute-kittenshttps://example.com/p/58473This is also more robust in case the user comes back and edit the title to something else, the link will still work. Also, it removes the requirement that slugs be injective – there is no issue if two different posts have the same slugs.
But in this case the slug is meaningless. I avoid URLs like this because they look like they’ve been generated from the mind of a SEO guru.
No, when present, it tells the user a bit more about what the post is about. The OP mentioned it in #1 above.
The presentation is impressive, it looks like personal project that has been pushed very far towards being a usable language – and that is a lot of work.
This said, I must say that I feel skeptical that the basic idea works. It is the same basic ideas that most people have when they first think of linear logic for programming languages: if we preserve complete linearity we never need a GC. Well, that is true, but the cost of this is to add lots of copies, everywhere, and that is going to be much slower than not copying, and then the natural next step is to optimize copying away by allowing sharing of values, and for this you introduce reference counting or a tracing GC.
Now the authors of this language obviously made the bet that the naive idea, where we do copy all over the place, was worth pushing to the max to get a usable programming language. Have they found a good reason why this is not an issue in their setting, or have they just been lucky so far and not encountered this deal-breaking issue yet?
I would naively expect the following to hold:
The code examples in the repository are all small snippets with tree-shaped data structures without sharing, so they don’t help much answering these questions. How would the authors implement, say, a union-find data structure, as is often used to store type nodes in type-inference engines (to implement efficient unification)?
My feedback for the authors would be to have some documentation on their website of what they think about this category of issues. For now I don’t know if they are not aware of it, or they know of a solution (or a reason that this isn’t really a problem for them).
Am I misunderstanding your comment, or is this page not talking about the same issues in-depth?
I think it is talking about this, except that instead of saying “we think this won’t be an issue in practice”, the page is more like “here is a bunch of more complex stuff you can use when this issue occurs”. (The more complex stuff uses words I had never heard, “noema” (which is more or less defined) and “hyle” (for which I didn’t find any definition), but it is an implementation of local borrowing as presented, for example, in Wadler’s 1992 “Linear types can change the world” paper, the
let!construct.)Reminds me of this take on async: https://www.youtube.com/watch?v=bzkRVzciAZg
Personally I worry more about the cost (and not just energy cost) of training models than the cost of inference, especially for models where inference can be run from a laptop – a single inference is not going to consume much energy, because laptops are reasonably low-power tools. Fine-tuning an existing model looks to be feasible in reasonable amount of compute (and thus energy), but training a new foundational model from scratch is counted in TWh range these days, has been increasing rapidly in the last few years, as is something that all large tech companies (and all technologically advanced states, all militaries, etc.) are going to want to learn to do to remain technologically independent and avoid poisoning issues that nobody knows how to detect.
Said otherwise: using an LLM is not that expensive energy-wise. But collectively getting used to depending on LLMs more an more gives money and power to entities that can train new ones, which incentivizes a lot more training, and that is expensive.
I’m not sure how to estimate this total training energy cost – as previously discussed, the easiest estimate I could think of is to look at data-center GPU sales to estimate the global (training + inference) power usage – and we are looking at millions of new units sold each year, which translate into several GW (several nuclear plants worth) of instantaneous consumption – and the current trends is that this is going up, fast.
If you read through the cited article by Husom et al. you’ll see that a laptop actually uses more energy than a better specced workstation or server. I would guess the main reason is that it takes longer to process and thus stays at an increased power draw for longer.
Also consider the fact that training a model is a one-off energy cost, that gets less severe the more the model is used. For inference, the opposite is true.
About laptops, I think we agree. I pointed out that if we can reasonably run inference from a laptop, then that means a single inference is reasonably cheap – even cheaper on specialized hardware.
Training may be a one-off energy cost for a given model, but that does not mean that it is small. From the post that we are commenting on, it looks like training one model is of the order of 10^12 times more energy-consuming that running one user query (ChatGPT: 1.3TWh of training, 2.7Wh per query). Do most models actually get a thousand billion queries over their lifetime, before they are dropped to use a better model? ChatGPT appears to serve one billion queries per day these days; at this rate they would need to run for 500 days on the same model to break even. From a distance it appears that they are not breaking even, they change the foundational models more often than every 500 days – but maybe they avoid retraining them from scratch each time? And this is going to be much worse the more entities start training their own foundational model (there are already more companies doing this than just OpenAI, including those we don’t know about), as they will each get a fraction of the userbase.
Okay, then it seems like we mostly agree. All in all, it is very difficult when the only information one has are estimations like I made in this post, and not any actual information. “Open” AI really is a misnomer.
Doesn’t this analysis show that training cost is still not that important? Even if they only use a model for 100 days, that means that training cost is 5x inference cost, but inference cost is tiny, and 6x tiny is still tiny. If, say, drinking an extra cup of coffee costs a comparable amount of energy as typical LLM use including training, that seems entirely reasonable.
I suspect that future growth is fairly well limited by the payoff it is bringing us. Investors will bet a couple of billion dollars on something that may not pay off, but it seems unlikely that they’d invest several trillion without solid evidence that it will pay off in terms of value generated.
The problem is that we don’t understand yet what the payoff is, and the natural response in this case is to experiment like crazy. The LLM boom came from the fact that we got substantial quality improvements by making models much larger and training them a lot more. The next thing to try of course is to give them a lot more data again (scraped all the textual web? maybe we could train them on all of Netflix this time!), and to train them a lot more again. There is a very real risk that, three years from now, half the R&D budget in the world is being spent powering GPUs to do matrix multiplications. (It would also be bad for the environment to spend the same amount roasting and grinding coffee beans, but that does not make me feel particularly better about it.) It is of course possible that this risk-taking pays off and we are ushered in a new world of… (what? robot-assisted comfortable living for all?), but it is also possible that current techniques reach a plateau and we only notice after burning way too much fuel on them.
The payoff is also not an objective metric, it depends in a large part on how people view these things. As long as people are convinced that LLMs are godlike powers of infinite truth, they are going to pay for them, and the payoff will be estimated as fairly large. If we try to have a more balanced evaluation of the benefits and the costs of those tools, we might take the rozy glasses off (low confidence here, some people still believe in the blockchain), and the payoff – and incentive to spend energy on it – will be smaller.
Sometimes it looks like the only thing that can force people to reduce pollution is not data, information or basic common sense, but just energy becoming fucking expensive. I suspect that this would definitely make people behave more rationally about LLMs.
Blockchain was never meaningfully useful; it was (and is) a speculative bubble. LLMs already are useful, and millions of people use them. Spending a lot of R&D money on it is entirely justified, and this R&D is mostly privately funded anyway. The glasses aren’t at all too rosy, in my view, rather the opposite. Sure, there are AI hypers that have highly inflated expectations, but most people underestimate what LLMs can do.
Even the basic ability to have a conversation with your computer would have been considered absolute sci-fi 5 years ago. These things type out hundreds of lines of code in a few seconds, often without making any mistakes, without ever even running that code. When I gave o3-mini-high some math puzzle I came up with, it solved it in a minute. Out of colleagues and students, only one was able to solve it. Maybe it existed on the internet somewhere, and maybe it was just a glorified fuzzy lookup table, but fuzzy lookup tables are very useful too (e.g., Google).
Yesterday I “wrote” 98% of the code of a UI for some decision procedure that I was working on by literally talking to my computer out loud for an hour. It compiled the Rust decision procedure to WASM, set up a JS api for it, created the interface, wrote a parser, generated examples based on the Rust unit tests, etc. I wrote maybe 20 lines total myself. Heck, I haven’t even read 80% of the code it wrote.
This is all for some infinitesimally small fraction of the energy use of various frivolous things people regularly do. This seems to me like getting a functional magic wand from the Harry Potter world and then complaining that a branch of a tree had to be cut off to get the wood. Yes, sure, we don’t want to cut down the entire Amazon rainforest to make magic wands, nor may the magic wand solve all of our problems. But the more pertinent fact here is that it does magic, not that a branch was cut off from a tree.
I don’t disagree. I should clarify that my worry is not, per se, that people invest too much effort in LLMs – I think of them as the alchemy of the modern times, a pre-scientific realm of misunderstandings and wonders, and I find it perfectly understand that people would be fascinated and decide to study them or experiment with them. My worry is that this interest translates into very large amount of energy consumption (again: half the R&D money could be spent powering machine learning hardware), as long as the dominant way to play with neural networks is to try make them bigger. Most sectors of human activity have been trying to reduce energy consumption in the face of expanded usage, but the current AI boon is a sector that has been increasing its energy needs sharply, and shows no signs of stopping. The availability of clean-ish energy has not scaled up for that demand.
I do also think that there are unreasonable expectations about AI in general – people turning to neural networks for tasks that could reasonably be done without them, especially in industry. For instance a friend working in a garbage-collecting plants was telling me about futile attempts by the company to train neural networks to classify garbage from a video feed, which were vastly underperforming simple heuristics that were already in place.
I think this isn’t the case.
My layman understanding of the consensus is that people agree that there are limits for scaling based on input data. We maybe aren’t there yet (new data sets such as video are probably still ripe) but I don’t think we’re far, personally.
DeepSeek’s whole thing is that it’s a much smaller, faster model, with competitive performance to vastly larger ones - all because they changed the training process.
We’ve already observed optimization on model sizes, training costs, etc, which is unsurprising since companies are incentivized to do this (faster training == faster iteration == money).
This is tricky from a policy perspective (raising the cost of all energy consumption seems sort of obviously bad, but we’ve seen things like “off-grid hours” work, etc) but regardless energy costs are, of course, already passed on to consumers. Perhaps VC money is making the difference right now, but ultimately consumers eat the costs of electricity with a margin, otherwise companies wouldn’t be able to stay up and running.
It is one-off, but I think it’s a bit misleading - the commenter above mentions this exactly: more and more entities are training more and more (and more powerful) models. So while it is a one-off cost, it’s also constant.
You’re not wrong but that’s how pretty much everything works. Isn’t it disingenuous to focus on LLMs in this regard?
I understand that it’s easier as it’s the new thing but we should do environmental impact analysis on everything else, too. How much do we spend making video games? How much more do we spend playing them? How about the whole of TV? Movies? Books? Fast fashion? Cars? Housing? Food? I haven’t seen much of the backlash on environmental impact grounds to anything else along with LLM.
And that makes a valid argument in isolation feel like another attack blown out of proportion. Sure, training data set is dubiously sourced. Sure, there are all sorts of moral questions about the use of LLM output. Let’s tack on environmental issues as well, why not? And let’s not bring it up for anything else to make it look like LLMs alone will drain all our water and use all the energy. Rings a bit hollow.
The specificity of LLMs is that they are a new technology that is exploding in popularity and is highly energy-consuming. This is not unique but it is still fairly rare. People where also very concerned about the energy cost of blockchains (and still are), which were in a similar situation, at least when proof-of-work was the only scheme widely deployed. Of course computing-focused websites like Lobsters are going to discuss the energy impact (and environmental impact) of computer-adjacent technologies much more than non-computer-related things like cars and housing – this is the area that we have some expertise in and some agency over. Digital technology is also one of the main drivers of increased energy consumption and pollution these days – right now it is comparable to transportation, but growing fast every year, while the impact of most other areas of human activity are increasing less fast, stable or even decreasing. (This could of course become much worse with a few bad policy decisions.)
This reads like whataboutism. I don’t know what your cultural and political context is, but around me there has been a lot of discussions of the impact of cars (why are people pushing for electric cars?), housing (entire laws have been passed, for example to force landlord to improve thermal insulation), food (why do you think is the proportion of flexitarian / vegetarian / vegan people growing fast in western countries?), planes (why are people talking about reducing flights to limit their carbon impact?), etc. You may live in a different environment where these discussions have been occulted, or have personally decided not to care about this and not get informed on the matter.
More facts, less feelings, please. If you can demonstrate that the energy/pollution costs of LLM is neglectible compared to other reasonable causes of concerns, that would be a very valid way to say that this is out of proportion. We could have more, better data about this, but there is already enough available to make ballpark estimates and make an informed decision. (For example, see this december-2024 article reporting that US data-centers energy consumption tripled over the last decade, consumed 4.4% of total US electricity in 2023, and that this proportion is expected to raise to 7%-12% by 2028.)
Well, that’s exactly the point I’m trying to make. Environmental impact concerns around LLMs never given in any context relating to other things. Sure, data is out there. I can bust out a spreadsheet and put in all the data and crunch the numbers and find the cost of my LLM query in burgers, ICE miles, books, TV minutes, or—like in the OP—shower minutes.
The issue in the discourse quality I’m trying to point out is that I have to go out of my way and do my own research. A valid critique of LLMs is rarely if ever put into an understandable context. And this is why I think it largely has no impact. What you call whataboutism is better reframed as putting things in context. Is 1.5 l of water for may LLM query a lot? An average person doesn’t know what to do with that number. They have nothing to relate it to. Maybe an hour of TV in the background worse? Maybe an hour of playing a game on their maxed out gaming PC is better? How would they know?
The false comparison with search engines in this bothers me. If you look at the four principles of fair use doctrine in US law, search engines pass all of them easily and uncontroversially, whereas it requires sophistry and hair splitting to make an LLM pass them.
Seems plausible, but I would like to learn more. Do have a favorite article that elaborates on your point? (As one counter-example, the Association for Research Libraries disagrees in a January 23, 2024 article: “Training Generative AI Models on Copyrighted Works Is Fair Use”: https://www.arl.org/blog/training-generative-ai-models-on-copyrighted-works-is-fair-use/
Intuitively, using LLMs for research seems to fit the definition of a public good. But using an LLM to generate some derivative work without crediting (much less compensating) the original author seems wrong. My point: this is non-obvious to me at present.
I have no vested interest one way or the other. I am not “on any side” here. I’m just seeking understanding of the issues.
There are four criteria for fair use: https://www.law.cornell.edu/uscode/text/17/107
A search engine is really straightforwardly fair use. It doesn’t reproduce the core of the work, it doesn’t commercially compete…very straightforward. LLMs are much less clear. If an LLM can be used to reproduce works in the style of an artist then commercial competition and the essential of a work come into play. If an academic uses it as a means of musical analysis, then it’s noncommercial and noncompetitive and fair use is plausible. If Spotify generates music from an LLM to be streamed instead of the music it was trained on then it gets into far less clear territory, and raises questions about copyright terms. Someone using an LLM to write business emails is on clearer ground because those emails will probably get written either way and there probably isn’t a large corpus of internal business emails in training sets. A programmer feels on shakier ground. If there is a lot of GPL code in the training set then it’s very unclear that producing commercial works using that LLM is possible without the commercial work being GPL’d as well.
I tried to find justifications in this article because I find their conclusion rather surprising. (It has been shown that AI models can regurgitate parts of their training dataset essentially unchanged, which is obviously problematic from a copyright perspective when said training data is copyrighted material.) They point to a document for more details, Library Copyright Alliance Principles for Copyright and Artificial Intelligence (1-page PDF), and I find the arguments interesting and surprising. Let me quote them, with comments.
I would find this reasoning convincing if the LLM would compress the ingested documents in a form that does not allow to retrieve them in recognizable form. But LLMs have billions of parameters, so they have enough internal data to store the copyrighted works in a barely-compressed way. To make a weird comparison,
libgeningests millions of books, so one could claim that the contribution of any one work is de minimis and therefore fair use?Do they? Do people seriously believe that AI crawlers respect data-mining best practices to not collect unwanted data? How does this argument apply to the millions of copyrighted material that were crawled before the authors realized that LLM exist and could consider excluding their work from training?
This makes the argument that it’s okay for trainers to use copyrighted work, but it is the responsibility of the user of the LLM to make sure that they are not infringing copyright when they reuse its output. This is an interesting argument, but this proposal seems basically unapplicable in practice: how could the user of the LLM, that observes an output without any form of citation or attribution, possibly know whether this output infringes on someone’s copyright?
This sounds reasonable.
The focus on energy consumption misses the point.
People like the OP often assume that using more energy automatically means more pollution and environmental damage, but it’s because they are thinking about coal plants.
What really matters is how we generate that energy, not how much we use. Take countries that rely almost entirely on hydroelectric power, where turbines harness the natural force of falling water, or those powered by nuclear plants. In these cases, if we don’t use the available energy, it simply goes to waste. The energy will be generated whether we use it or not.
So when we’re discussing the ethics of training LLM, their energy consumption it has nothing to do with being ethical or unethical. The nature of the energy sources are a different topic of conversation, regardless on how much energy they use.
Okay, but (say) the OpenAI data centers are located in the US, and we know the energy mix in the US. For example this webpage of the U.S. Energy Information Administration currently states:
And even climate friendly alternatives like hydro require damming up rivers and destroying rivers, windmills require paving new roads into the wilderness and introduce constant noise and visual pollution, solar requires land and mined minerals, nuclear … well ok nuclear is probably fine, but it’s expensive. Not that coal is better than all these, but focus on energy consumption is absolutely warranted as long as there is no single climate and nature-friendly power source that is available everywhere.
(In Norway there has been massive protests against windmills because they destroy so much wild land, and politicians are saying “yes but we must prepare for the AI future, also Teslas, so turning our forests paving-gray is actually the green thing to do also we got some money from those foreign investors which we can make half a kindergarten out of”. And that in a country where we already export much of our electricity and the rest is mostly used for the aluminum industry precisely because power is already fairly cheap here due to having dammed most of our rivers long ago.)
I agree with much of this comment, but I have two quibbles:
One can draw off only a fraction of the water in a river and feed it to a turbine with or without a dam, although of course there are trade-offs.
Nuclear power, at least in practice, implies mining minerals too, specifically radioactive ones (on top of the mining of iron, copper, etc. implied by any electric infrastructure).
In theory, yes. In practice, small hydro has had a much worse impact on the environment than large hydro compared to kwh: https://www-nrk-no.translate.goog/klima/xl/unik-kartlegging_-sma-kraftverk-legger-mer-km-elv-i-ror-enn-store-for-a-lage-like-mye-strom-1.16982097?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=no&_x_tr_pto=wapp
tl;dr actual small hydro plants in Norway have used 5x as much river as large ones per kwh. Though sure in theory it could be better.
Also one is burning jet fuel and one is using electricity, which could be from a number of different sources. I realize that the conclusion to this section is that this isn’t that big of a deal but I just want to note the asymmetry here.
I think this is a nice-to-have but I wonder if it’s really reasonable. You don’t have to put your content on the web. I think it’s a big ask to say that the internet should be sliced up into “you are allowed to view this otherwise public document, and you over there are not”. But I don’t have strong feelings.
Way more than a few, right? I don’t know that there are models that a motivated middle class American couldn’t afford to run, let alone a company.
Yeah I haven’t taken a stance. I thought the ethical considerations made here represented what I’ve encountered online. Most claims that AI is unethical seem to me to really just be claims that capitalism is inevitable or something along those lines, and I think then you have to shape your claim as “AI is unethical under capitalism”, which itself is a huge restriction and requires a lot of justification.
I think that the costs of training are grossly under-estimated by taking just one model. Sure, training GPT-4 consumed a massive amount of energy, but still neglectible compared to, say, world-scale transportation. But:
The amount of pollution caused by the current LLM craze is several orders of magnitude more than what you can estimate from training energy alone.
You can make these same claims about a 7 hour flight. How many engineers commuted via car to work on the plane? How many customers took cars to get there? How many hours of test flights to train the pilots? How about the transportation of parts and materials, transportation of fuel? etc.
People also buy GPUs for video games. Video games are unethical because of power consumption? Development costs behind them? Is it a matter of degree? Value?
What about the fact that models are getting more efficient, as are GPUs? How do we factor that in? You mention data centers, but data centers are getting much better at conservation of water etc as well. Seemingly at a much faster rate than air travel is improving.
What about the fact that people get value from these models?
You can inflate or deflate these numbers in a million ways, you can make ethical determinations based on a number of properties. I think we’re probably all on board with reducing energy costs here, I know I certainly am.
This also all assumes an ethical model based on quantifying costs, value, and moral responsibilities. What are my responsibilities here as a consumer? What are my responsibilities given unknowns about environmental impact?
People work to estimate the energy and pollution costs of entire industries, including transportation by plane, and there are figures computing estimates for these. (I think some of your questions are bit disingenuous, for example it’s easy to convince ourselves that people take the car to get to a plane for a much shorter distance than they fly, and therefore pollute much less in the car part.)
It is, as you hint, a matter of comparing costs and benefits. Note also that data-center GPU are likely to show higher utilization than gaming GPUs. (The article I linked guesstimated utilization at 61%; few gaming computers are used 61% of the time.) Finally, gaming GPUs are on a stable-to-downward trend in terms of sales, while AI GPUs are exploding. This is an entirely new form of tech usage (and benefits, and costs) whose impact is worth discussing.
Tech in general, and AI in particular (but consumer tech products are also non-neglectible and worth looking at), are consuming a larger and large fraction of energy and resources every year. For carbon-equivalent pollution, the current estimates are about a 6% annual increase, and they come from before the LLM boom. Plane transportation pollutes less than computers&tech, and its impact increases at a lower pace – I read figure around 2.2% per year for twenty years before COVID.
Individual efficiency gains in GPUs or datacenter are typically lessed by the fact that we replace them often, and by the fact that we use more of them, the famous Rebound effect.
Sure! But the costs of these new technologies should also be presented clearly for people to compare and contrast the value they get from the cost that is imposed. People will see the commercial price, but environmental impact are less directly visible and it is important to discuss them for new technologies so that people realize that they exist and are growing. (There are of course other forms of externalities around machine learning than just environment.)
For sure, I’m fairly familiar with that.
None of my questions are disingenuous, my point was to get across that tertiary impact is really complex and requires huge effort. If we start asking questions about the tertiary impact of AI it’s only reasonable to ask those about the comparison made in the article, that’s all.
I wasn’t really hinting at this fwiw, it was one example of how one might begin to form an ethical framework. I’m not a utilitarian so I would argue that it’s about more than cost vs benefit, but I would also likely commit to cost and benefit being critical components too.
Oh yeah, totally. I’d like to see a lot more research done on this and I’d also reallllly like to see legal requirements for data centers to start rapidly adopting cleaner approaches, especially new data centers.
I completely agree, I think it should be a legal requirement, in fact.
The sources of pollution I mentioned are (1) the training of other models than the most well-known ones and (2) the production costs, rather than merely the utilization costs, of the hardware used in this training. I wouldn’t call this “tertiary”, these are not very indirect costs.
Hm. If you’re referring to intermediary models, I agree. If you’re just referring to some random person training a model, or some other company, I don’t agree. But I don’t think it matters - I agree with you that we should incorporate more global, tertiary costs, I’m only advocating that if our starting point is to create a sort of utilitarian comparison to planes that we maintain that symmetry.
Really? Is that based on data center energy costs or does it include the manufacturing and usage costs for every digital device?
It’s hard to make really strong statements here as different sources compute things differently and are tricky to compare. But for example this document claims a 2-2.5% of all emissions due to the plane sector, that goes up to 3.5% when you take radiative forcing into account – which we should, and around a 2% yearly growth. And this document claims 3.5% of emissions due to digital technologies, with a 6% projected yearly growth. (And it comes from 2021, so they may not have predicted the consumption increase due to the LLM craze, but of course machine-learning was already going strong at the time.)
It includes manufacturing and usage costs. As of a few years ago, consumer devices would tend to dominate data-center impact, and manufacturing typically has larger impact than usage, at least in countries with low-carbon energy sources.
So you’re arguing that if some things are bad, it’s fine to do something else bad? The article isn’t about cars or planes, it’s about LLMs. Not to mention that you’ve completely neglected the “powered by plagiarism” aspect.
Is the value commensurate with the cost? If you truly believe that, go with god, I guess.
I didn’t make any argument whatsoever, I only attempted to show that if we’re going to take the initial 1:1 comparison and start adding tertiary costs to one side that we should do the same to the other side. The article is also where the plane comparison comes from, not me.
Neglected? I didn’t address it but I’ve made no argument about LLMs / AI being ethical or not, I stated that I thought it was a good overview of the cases being made currently. My post was not an attempt to rebut any of these arguments presented.
edit: I’ll even state this plainly. I’m agnostic as to whether LLMs are ethical or not, but my personal moral framework chooses a libertarian approach given ambiguity ie: given ambiguity I believe that it’s best to default to moral permissibility. If there’s evidence that LLMs are more or less ethical in the future I’ll update that judgment accordingly. Today, I think the strongest evidence for LLMs being unethical is the issue of copyright as well as the environmental impact, but I think we’re too early on for me to make a judgment - I look forward to the various copyright cases moving forward, and I would always advocate for environmental impact to be something we push hard to understand and improve.
We should have open source foundational models per language/geo area trained on all intellectual property available and released for public use. Problem solved.
I’m surprised by the number of people around here that seem to consider that vague documentation that says “we will generate stuff using an AI model on your project” is enough intent warning for a feature that zips the current git repository and sends it, without asking for any form of explicit confirmation, to a remote server.
If I was a manager in a proprietary software company, I would freak out if an employee would zip our source code and send it to a remote server. I would seriously consider (1) firing the employee right away and (2) suing the owner of the remote server (this is unauthorized acces that results in obtaining private, sensitive data). I’m not saying I would actually do this – it is likely that the employee was not aware that this would be a consequence of trying out a cool new feature of a hip nix-adjacent tool, and it might be excessive to sue someone whose intent is probably not malicious – but pondering both those things would be my first reaction.
Writing code that exfiltrates an entire zip repository is really not okay. And “but it’s AI” is not a good defense:
AI does not require copying people’s private/sensitive data. For example you could do inference on the client side. Or you could work harder to figure which features are useful for making good predictions and are less likely to leak sensitive data, compute the features on-site and send them over for evaluation.
If you are going to copy people’s private/sensitive data, you should be very explicit and very careful about this. Enabling this code by default without clear warning signs and intent-checking popups is a red flag. (“We are going to send the code to our server https://... for AI-assisted analysis. We will not keep any data afterwards, unless you enabled option Foo Bar. Here is a link to our data policy: <…>. Do you confirm? (You can disable this warning by setting the configuration option …”)
I would not touch a development-environment solution whose maintainers are unable to understand this before releasing such a feature.
We heard your feedback, I was personally very excited we can generate developer environments from source code.
We’ll remove it for now and think about how to introduce it in a saner way.
Thank you all!
As someone who has been vocally supportive of devenv both here on lobste.rs and privately amongst colleagues at various software companies, this was embarrassing. Devenv is an objectively good piece of software that I appreciate as part of my development workflow every day. I’ll continue using devenv because of that, but pay more attention to release notes and the PRs that go into releases because of devenv 1.4.
Now I’m in the awkward position of having convinced people at multiple commercial software shops to use devenv, knowing that you released a feature that I’m morally and commercially opposed to. Because of that, I’m notifying everyone I know who is using devenv because of me that 1.4 was released with a feature that is potentially hostile to them and their commercial interests. They can make their own decisions based on that information. I won’t continue recommending devenv freely, and if I do recommend it, it will be with an asterisk. This is what a loss of goodwill means to me.
I appreciate the action you’ve taken based on pushback. Once devenv has a long pattern of behavior in users’ interest, I’ll re-evaluate my position on it. Until then, I’ll remain suspicious that your response is just another “we heard you” post from the CEO fuck-up playbook.
I hope we can get your trust back, I’ll write a reflection on how we came here and lessons learned.
I personally don’t have any interest in the feature, but it probably just needs more warnings / opt-ins in tooling. Anyway, thanks for responding to the furore in good time :)
Yep, posting an apology and acknowledging that this was done very, very poorly is the right move. Except it hasn’t happened yet. (This may be in-preparation, as the release only happened last Thursday.)
I have been a long-time contributor and this was somewhat surprising to me. I use devenv at work, and recommend it to all of my coworkers at every opportunity when they ask “how to do local development”. Due to the nature of my work, though, (contracting) I can’t send any client IP to any third party without express written permission by the client. That means no CoPilot (or similar), no Zed multi-user editing, and definitely not sending a git listing of the client repo to a service. If I make a StackOverflow post, I’m not even supposed to use the same variable names.
I am not trying to hate on the feature. I’m sure there are a lot of folks that would like some of the benefits of the
nixanddevenvecosystem, but just don’t want to fuss with writing a bunch ofnix. I personally don’t use LLMs or agents for code but I can’t / won’t stop anyone else from using them.To clarify, though, you can still use
devenv initwithout the AI features, right? It’s onlydevenv generatethat uses the third party? If that’s the case, then maybe just be really noisy / obvious about what you send and to whom. I’m sure a lot of people would be satiated if you had to answerYoryesin order to send a list of stuff that got printed on the screen to some URL that is also on the screen.Thanks for the quick response. You have always been super-fast in merging my PRs.
I’m surprised this is a publishable result. Not only is counter mode well known to be a cryptographically secure prng itself, its parallelism has long been obvious. GCM mode is built on top of counter mode explicitly due to the parallelism. And several real systems divide up the counter-space per thread or per process.
I think this was the first work that tried applying counter-mode to non-cryptographic PRNGs and managed to get decent performance while passing the usual statistical tests. It was published in 2011 at a supercomputing conference which is a field that has longstanding difficulties getting enough random numbers fast enough – the paper’s introduction explains the unsatisfactory state of the art. Similarly, the seekability of LCGs was “just” an application of modexp repeated squaring, but it was publishable at a supercomputing conference in 1994 for roughly the same reasons (but was no longer relevant in 2011 because raw LCGs are crappy).
Counter mode is itself a cryptographic PRNG. If you’re talking about the security bounds when seeding without sufficient entropy, it is akin to talking about the bits of entropy in the key.
But it does make sense that most people in HPC wouldn’t have been deep in the crypto literature. At the time they would have been likely to know counter mode, but probably not as likely to know it was a cryptographic PRNG. So point taken.
Counter mode is a cryptographic PRNG if the mixing function is cryptographically strong. The point of this paper is to get good results fast by using a weak mixing function to make an insecure PRNG. (In supercomputing they use known seeds for reproducibility; I dunno how they choose seeds (I bet it’s just the time) but that’s not what the paper is about.)
Their starting point is a reduced-round version of AES (10 -> 5) with a simplified key schedule. AES-NI was new at the time so this was probably its first use in a fast insecure PRNG.
This was published in 2011, and the audience is the High-Performance Computing (HPC) community (not crypto people or PRNG experts). I don’t know what the PRNG field was like in 2011, and it might be that the paper mostly contains not-so-novel ideas but was accepted because those ideas were little-known in the HPC and benefited from more publicity.
GCM mode was 2005, and a NIST standard by 2007. The result about counter mode probably came in the paper that first formalized the notion of a pseudo-random-permutation, which I believe was Luby and Rackoff in ’88.
The drive for GCM was hardware manufacturers like Cisco and Intel that were concerned about supporting high performance computing.
But I get the point that the reviewers may not have had the background.
Looking at the comments, it seems that it is the time of the month to complain about open-source desktop stacks. Let me add my own complaint: why aren’t “window manager” and “desktop environment” separate things in practice? I’m using Gnome with keybinding hacks to feel somewhat like a tiling wm. I would prefer to use a proper wm, but I want the “desktop environment” part of Gnome more: providing me with configuration screens to decide the display layout when plugging an external monitor, having plugging an USB disk just work, having configuration screens to configure bluetooth headsets, having an easy time setting up a printer, having a secrets manager handle my SSH connections, etc.
None of this should be intrisically coupled with window management logic (okay maybe the external-monitor configuration one), yet for some reason I don’t know of any project that succeeded in taking the “desktop environment” of Gnome or Kde or XFCE, and swapping the window manager to something nice. (There have been hacks on top of Kwin or gnome-shell, some of them like PaperWM are impressive, but they feel like piling complexity and corner cases on top of a mess rather than a proper separation of concerns.)
The alternative that I know of currently is to spend days reading the ArchLinux wiki to find out how to setup a systray on your tiling WM to get the NetworkManager applet (for some reason the NetworkManager community can’t be bothered to come up with a decent TUI, although it would clearly be perfectly appropriate for its configuration), re-learn about another system-interface layer to get usb keys to automount, figure out which bluetooth deamon to run manually, etc. (It may be that Nix or other declarative-minded systems make it easier than old-school distributions.) This is also relevant for the Wayland discussion because Wayland broke things for several of these subsystems, and forced people to throw away decades of such manual configuration to rebuild it in various way.
Another approach, of course, would be to have someone build a pleasant, consistent “desktop environment” experience that is easy to reuse on top of independent WM projects. But I suspect that this is actually the more painful and less fun part of the problem – this plumbing gets ugly fast – so it may be that only projects that envision themselves with a large userbase of non-expert users can be motivated enough to pull this through. Maybe this would have more chances of succeeding if we had higher-level abstractions to talk to these subsystems (maybe syndicate and its system layer project which discusses exactly this, maybe Goblins, whatever), that various subsystems owner would be willing to adopt, and that would make it easier to have consistent tools to manipulate and configure them.
The latest versions of lxqt and xfce support running on pretty much any compositor that supports xdg-layer-shell (and in fact, neither lxqt nor xfce ship a compositor of their own). Cosmic also has some support for running with other compositors, although it does ship its own. There’s definitely room for other desktop environments to support this, too.
I think this is the main reason I use NixOS nowadays: you configure things the way you want, and they will be there even if you reinstall the system. In some ways I think NixOS is more of a meta-distro, where you customize the way you want, and to make things easier there are lots of modules that make configuring things like audio or systray easier.
You will still need to spend days reading documentation and code to get up there, but once it is working this rarely breaks (of course it does break eventually, but generally it is only one thing instead of several of them, so it is relatively easy to get it working again).
What you describe is a declarative configuration of the hodgepodge of services that form a “desktop environment” today, which is easy to transfer in new systems and to tweak as things change. This is not bad (and I guess most tiling-WM-with-not-much-more users have a version of this), it is a way to manage the heterogeneity that exists today.
But I had something better in mind. Those services could support a common format/protocol to export their configuration capabilities, and it would be easy for user-facing systems to export unified configuration tools for them (in your favorite GUI toolkit, as a TUI, whatever). systemd standardized a lot of things about actually running small system services, not much about exposing their options/configurations to users.
My reasoning:
jj.jj(or a better git porcelain that would actually get traction) can improve the field for everyone, without relying on extra advanced tools.jjis not just a porcelain, it has the idea of conflicts being first-class commit objects which is a net conceptual improvement.It’s no surprise that expert magit or lazygit users don’t get that many
jjbenefits for themselves (I guess there still are, in particular around conflicts; butjjis also less pleasant to use in various ways). But maybe they could think ofjjas a gift for others.I’ve been having similar feelings, while git feels a bit like a bloated mess, as a Magit (/gitu) user I have not really seen where it solves a significant problem.
I also like the same workflow of making a bunch of changes, then coalescing them into commits. I actually think the “decide on a change, then implement it” flow is nicer, almost Pomodoro-esque, but it’s not how I work in practice.
Otherwise I have years of accumulated git fixing experience that doesn’t help me, and commit signing is painfully slow which isn’t great for jj’s constant committing.
I do hope we get some progress in the space, Pijul seemed promising, I just don’t see the value for me personally at this point.
I’ve been using git practically since it was initially released, I have 4 digit github user id. I’m on my 3rd or 4th attempt to switch to JJ. Breaking habits that are so old is really painful. Also there’s bunch of nice things that JJ could have but still doesn’t, and one can tell right away that the project is still early. Still, I hope it will be worth it…
In git managing a lot of commits is just too inconvenient, and despite all the deficiencies a lot of people (me included) can immediately see the potential of the whole philosophy.
With git I use gpg on yubikey, so I need to touch the yubikey for every signature (paranoia hurts), and I had to just disable it in JJ, because it’s unworkable.
I hope eventually JJ will have ability to sign only when switching away from a change and/or when pushing changes. Similarly, I’m missing git pre-commit hooks. I hope soon JJ will have a hook system optimized for that flow.
I also really, really miss the
git rebase -i.Pijul seems great in theory, but JJ wins by being (at least initially) fully compatible with Git. One can’t ignore network effects as strong as Git has at this point.
Rejoice! This feature was released last week in v0.26.0: https://jj-vcs.github.io/jj/latest/config/#sign-commits-only-on-jj-git-push
That’s awesome! Thanks!
Same here, but with a single unlock. I’ve been experimenting with signing through SSH on Yubikey instead, which seems to be somewhat faster. I guess GPG is also just waiting to get replaced by something that isn’t as user-hostile. I get a pit in my stomach every time I have to fix anything related to it.
This actually reminds me, I really like the fact that branches aren’t named, but in reality we’re all pushing to GitHub, which means you need to name your branches after all, and JJ even adds the extra step of moving your branch/bookmark before every push, which I thought was a bit of a drag.
People are discussion big ideas with “topics” that would bring some of the auto-forwarding functionality of branches back. In the meantime, I found making a few aliases is perfectly fine. This one in particular is great:
It finds the closest ancestor with a bookmark and moves it to the parent of your working-copy commit. For me, pushing a single branch is usually
jj tug ; jj pI also have a more advanced alias that creates a new commit and places it at the tip of an arbitrary bookmark in a single step. This is great in combination with the megamerge workflow and stacked PRs:
Oh, welcome, fellow gpg-sufferer.
I have not looked into it. Does it work? I would definitely consider. Just not having to touch gpg is always a plus.
Same. I’m looking forward to some automatically-moving bookmarks
It generally does, and seems well enough supported (mainly GitHub). It does require registering the key as an allowed signing key in a separate file in
~/.ssh, but I can live with that I guess.I followed https://calebhearth.com/sign-git-with-ssh, and that worked just fine with my key from the Yubikey.
Note that you can still use
git rebase -iin a colocated repo. Make sure to add the--colocatedflag tojj git cloneandjj git init. All my repos are always colocated, it makes all the git tooling work out-of-the-box.The next time you run
jjafter a git rebase, it will import those changes from the git repo, so everything just works. With one big exception: jj will not be able to correlate the old commit hashes with the new ones, so you lose theevologof every commit that changed its hash. But then again, git doesn’t have anevologin the first place, so you’re not losing anything compared to the baseline.It could be a fun side-project to add the git-like rebase UI to jj. A standalone binary that opens your editor with the same git rebase todo list, parses its content and submits equivalent
jj rebasecommands instead of letting git do it.Any particular reason you suggest a separate tool, rather than something to contribute to
jjupstream? I also wouldn’t mind a convenient tool to reorder changes, and I would rather have it integrated injj rebasedirectly. (This could also be a TUI instead, whatever.)There are other interactive modes that
gitsupport, for example the usual-pworkflows with a basic TUI to approve/reject patches in the terminal (as an alternative to jj’ssplitinterface relying on a 2-diff tool), which I wouldn’t mind seeing supported injjas well. (I don’t think it’s a matter of keeping the builtin feature set minimalistic, given that the tool already embeds a builtin pager, etc.)I think nobody has yet invented a text-based interface for general graph editing:
git rebase -iinterface.label/resetcommands ingit rebase -i.git rebase -ican currently initiate edits which would affect multiple branches (with no common descendant), so you incidentally don’t encounter this situation that often in practice.)git rebase -iinterface.A non-text-based TUI might work (perhaps based on GitUp’s control scheme?), but nobody has implemented that, either.
If you’re only relying on approving/rejecting patches (and not e.g. editing hunks), then you should already be able to do this with jj’s
:builtinTUI (see https://jj-vcs.github.io/jj/v0.26.0/config/#editing-diffs).A naive idea for a TUI would be to reuse the current terminal-intended visualization approach of
jj log, orgit log --oneline --graph: they show one change per line, with an ascii-art rendering of the commit placement on the left to visualize the position in the commit graph. In the TUI we could move the cursor to any commit and use simple commands to move them “up” or “down” in their own linear history (the default case), and other commands to move them to another branch that is displayed in parallel to the current branch (or maybe to another parent or child of the current node). Of course, this visualization allows other operations than commit displacement, arbitrary interactive-rebase style operations could be performed, or maybe just prepared, on this view.Woah, thanks! Turns out I read the doc before trying
jj split, and I dutifully followed the recommendation to go withmeldfrom the start, so I never got to try this.Replying to myself: it looks like
jjuioffers a workflow similar to the text-based graph editing I described above, see their demonstration video for Rebase.I do think it would bloat the UI. And I have a hunch that the maintainers would see it the same way, but do feel free to open a feature request! It’s always good to talk these things through.
My opinion is that the rebase todo list workflow is not very good and doesn’t fit with the way jj does things. When you edit the todo file, you are still making several distinct operations (reorder, squash, drop, reword etc.). In jj those map to a single command each. By just using the CLI, you can always confirm that what happens is what you expected. If it isn’t you can easily
jj undothat single step. With the todo file, you just hope for the best and have to start all over if something didn’t go well. The todo file also is not a great graphical representation of your commit tree. In git, it’s realy optimized for a single branch. You can configure git so interactive rebase can be used in the context of a mega-merge-like situation… but the todo file will become very ugly and difficult to manage. On the other hand, the output ofjj logis great! So I think offering the todo-file approach in jj would be inconsistent UI in the sense that it discourages workflows that other parts of the UI intentionally encourage.Regarding the comparison with the pager, I don’t think the maintainers are too concerned about binary size or number of dependencies, but rather having a consistent UI. A pager doesn’t really intrude on that.
I find this argument very unconvincing. When I operate on a patchset that I am preparing for external review, I have a global view of the patchset, and it is common to think of changes that affect several changes in the set at once. Reordering a line of commits, for example (let’s forget about squash,drop,reword for this discussion), is best viewed as a global operation: instead of A-B-C-D I want to have C-B-A-D. The cli forces me to sequentialize this multi-change operation into a sequence of operations on individual changes, and doing this (1) is unnatural, and (2) introduces needless choices. Exercise time: can you easily describe a series of
jj rebasecommand to do this transformation on commits in that order?I agree! But the command-line is even worse as it is no graphical representation at all. It would be nice to have a TUI or a keyboard-driven GUI that is good at displaying trees when we do more complex things, but the linear view of an edit buffer is still better than the no-view-at-all of the CLI when I want to operate on groups of changes as a whole.
Yeah, that’s not hard with jj.
And I would still insist that in a realistic scenario, these commits have semantic meaning so there are naturally going to be thoughts like “X should be before Y” which trivially translates to
jj rebase -r X -B Y.To make it clear though, I’m not saying your perspective is wrong. Just that I don’t think this workflow would be a good addition upstream. I’d be very happy if there was an external tool that implemented this workflow for you and I don’t think the experience would be any worse than as a built-in option (apart from the on-time install step I guess.)
What do you mean it’s not graphical? Have you seen the output of jj log? For example:
I’d say that’s quite graphical. jj even has a templating language that let’s you customize this output in a very powerful and ergonomic way.
You don’t get this visual tree structure in git rebase’s todo file.
I hear what you’re saying, and I think it’s kinda funny: from a different perspective,
git rebaseforces you into a serial sequence of operations, whereasjj rebasenever does. Doesn’t mean you’re wrong, of course, it just took me a moment to grok what you meant, given that I usually view it as the opposite!(Another pain point with the CBAD thing is that last time i had to do this, it introduced a lot of conflicts thanks to the intermediate state being, well, not what i wanted, and so seeing all that red was stressful. they disappeared after moving another commit around, but in the moment, i was not psyched about it)
Oh, that’s good know. Generally, I am afraid to do anything with git directly after enabling jj in a given repo. I’m afraid of confusing myself, and I’m afraid of confusing the tooling.
I’m looking forward for it to be built-in, at least eventually. git has it. And jj already opens commit message editor on
jj desc, so it’s not some new type of UI.Can you describe what you’re doing with interactive rebases that you can’t do (or can’t do as efficiently) with JJ? Is it specifically this interface to rebasing commits that you’re missing, or a particular feature that only works with git rebase -i?
Just the interface. Editing lines in a text editor is a perfect blend of (T)UI and CLI. Just being able to reorder commits would be great. With squash/fixup, that’s 99.9% of my usage of
git rebase -i.TUIs like jjui are really good for that.
The main problem I encounter that it could solve is when I talk to someone who doesn’t already know git and have to kinda sheepishly say “welllll, yeah you can get the code you want with this one tool, but it suuuuuuucks; it’s so bad, I must apologize on behalf of all programmers everywhere except Richard Hipp”
I’m fully aware that I’m just Stockholm-syndromed to git. Having tried to explain how to use git to someone myself, I completely agree that it’s incredibly opaque and inconsistent. I do think that a lot of that only surfaces once you use git in non-trivial ways, clone-edit-stage-commit-push might not be optimal, but it’s fine.
For casual users I feel like the biggest overall usability win would be if GitHub could find a way to let you contribute to a repository without having to fork it.
This is one of the reasons that as a serial drive-by contributor, I much prefer projects hosted on Codeberg (or random Forgejo instances, perhaps even SourceHut, though I’m not a fan of the email based workflow): I can submit PRs without forking.
After your comment I actually went back and signed into Codeberg, but I’m not finding how you’re supposed to PR without forking. Even their documentation talks about the fork-based workflow. Am I missing something?
It is the AGit workflow that lets you do this. It’s not advertised, because there’s no UI built around it (yet?), and is less familiar than the fork+PR model. But it’s there, even if slightly hidden.
While I’m in general not a big fan of it, that’s one useful thing about Githubs
ghcommandline tool: a simplegh repo forkin a clone of the upstream will create a fork and register it as a remote. Now if they only added a way to automatically garbage-collect forks that have no open branches anymore…That’s still 1-2 commands more, and an additional tool, compared to a well crafted
git push.Of course, I could build a small cron job that iterates over my GH repos, and finds forks that have no open PRs, and are older than a day or so, and nukes them. It can be automated. But with Codeberg, I don’t have to automate anything.
Correct. Git is a tool I’d be embarrassed to show a new developer. Jujutsu is one of be proud of.
I at this point have been screwed most of the times that jj ran into a conflict. Not sure how first class it is…
I’m assuming “being screwed” means it was hard to fix the conflict, that sucks, I’m sorry to hear that.
What “first class” means in this conflict is that
jjstores the information that the conflict exists as part of the metadata of the commit itself. Git doesn’t really let you keep a conflict around, it detects conflicts and then has you resolve them immediately.jjwill happily let a change sit there conflicted for as long as you’d like. Rather than “good at making conflicts not happen in the first place,” which is a separate thing that seems like hasn’t been true for you. I don’t know if and what differencesjjhas withgitin that regard.I had it again today and managed to work my way through it but not because there’s a good “first aid in case of conflict” documentation or anything. That’s the major issue: the tricks and skills we collectively built up to work with and around git aren’t there for jj yet.
But then while resolving the conflict, gg showed my bookmark to have split into 3 which was mildly surprising to say the least.
Thanks for writing this comparison! I am looking forward to getting a magit-like interface to jj. How well does gg compare?
+1. I’m keen on the idea of jj, but am too comfortable with magit.
I might only need 5-10% of what magit can do: collapseable diff and interactive staging.
Yeah I’ve been running jj split in a separate terminal window but magit staging for n-level splits would be great
Yes, I’d also settle for that (plus some commands to create a new change when i’m done, and maybe describe them)
I’ve wanted to play around with https://github.com/caldwell/commit-patch?tab=readme-ov-file#commit-patch-bufferel for a while, which abstracts “commit this hunk” using vc-diff and diff-mode.
Maybe that could work with vc-jj.el, easier to write than a whole new UI.
Happy to see vc-jj.el mentioned! We just made an organization for it on codeberg and will work on getting it into gnu elpa next. It’s not feature-complete yet, but happy to get feature requests etc, https://codeberg.org/emacs-jj-vc/vc-jj.el
Your parenthetical wish sounds like either
jj new -m <description>orjj commit -m <description>, I’m not sure which.I was talking about what I’d need in an editor plugin: give me the magit collapsible diff UI, plus two or three shortcuts to common operations (like
new,describeorcommit), and I’ll be happy!gg is not as rich as magit, and of course isn’t integrated into the editor. But it demonstrates the power of the simpler model of jj, because it’s like 70% of what you need even though it doesn’t have a ton of functions.
It could use better docs/tutorials, because it can do some very useful things that are not very discoverable (basically, drag and drop is pretty magical). If it had a
jj splitview it would be in a good spot — almost no need for the CLI anymore.I used to do a lot in GitX and gg is almost to that point now.
gg looks nice, although it hasn’t been updated in a few months now, so it’s a bit behind the latest jj releases. But it doesn’t really fit the magit use case; I’m sure there’s a public for these standalone GUIs, similar to GitKraken and others for git, but they’re not for me, I really need something integrated into my editor.
The wording of the blog post confused me, because in my mind “FFI” (Foreign Function Interface) usually means “whatever the language provide to call C code”, so in particular the comparison between “the function written as a C extension” and “the function written using the FFI” is confusing as they sound like they are talking about the exact same thing.
The author is talking specifically about a Ruby library called
FFI, where people write Ruby code that describe C functions and then the library is able to call them. I would guess that it interprets foreign calls, in the sense that the FFI library (maybe?) inspects the Ruby-side data describing their interface on each call, to run appropriate datatype-conversion logic before and after the call. I suppose that JIT-ing is meant to remove this interpretation overhead – which is probably costly for very fast functions, but not that noticeable for longer-running C functions.Details about this would have helped me follow the blog post, and probably other people unfamiliar with the specific Ruby library called
FFI.Replying to myself: I wonder why the author needs to generate assembly code for this. I would assume that it should be possible, on the first call to this function, to output C code (using the usual Ruby runtime libraries that people use for C extensions) to a file, call the C compiler to produce a dynamic library,
dlopenthe result, and then (on this first call and all future calls) just call the library code. This would probably get similar performance benefits and be much more portable.I would guess because that would require shipping a C compiler? Ruby FFI/C extensions are compiled before runtime; the only thing you need to ship to prod is your code, Ruby, and the build artifacts.
This is essentially how MJIT worked.
https://www.ruby-lang.org/en/news/2018/12/06/ruby-2-6-0-rc1-released/ https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch#mjit-organization
Ruby has since that evolved very fast on the JIT side, spawning YJIT, RJIT, now FJIT…
I’m also not sure if the portability is needed here. Ruby is predominantly done on x86, at least at the scale where these optimisations matter.
Apple Silicon exists and is quite popular for development
You’re correct. I was referring to deployment systems (where the last bit of performance matters) and should have been clearer about that.
Even in production, ARM64 is getting more common these days, because of AWS Graviton and al.
But yes, x86_64 is still the overwhelming majority of production deployments.
Yeah, that’s why I wrote “predominantly”. Also, for such a localised JIT, a second port to aarch64 is not that hard. You just won’t have an eBPF port falling out of your compiler (this is absurd for effect, I know this isn’t a reasonable thing).
Note that the point here is not to JIT arbirary Ruby code, which is probably quite hard, but a “tiny JIT” to use compilation rather than interpretation for the FFI wrappers around external calls. (In fact it’s probably feasible, if a bit less convenient, to set things up to compile these wrappers ahead-of-time.)
It sounds like the performance win is an algorithmic complexity win in the context of extremely heavily loaded hash tables. In which case it might theoretically be a “faster” hash table, but that win comes in the case of load factors that far exceed what any real world hash table would actually permit.
This is a pretty good point, but if you talk like that, millions of graduate students and professors will be out of jobs.
Translation/Desnarkified: You’re right, but the research is still likely to be valuable.
I’m still digesting the paper, but to comment about the heavily loaded hash tables…
Real world hash tables do get heavily loaded. Specifically, lots of hot hashtables grow through several resizes while they are hot. And each time they get close to growing they end up heavily loaded.
So I at least definitely end up caring about loaded performance of hashtables.
I’m curious what you use as the load factor that triggers growing the table (shrinking matters but as a memory use optimization so isn’t as important in the discussion)? My scanning of the paper leads me to think the perf win would not show up as a beneficial until well over the common growth points, on the other hand I also just looked at the
libc++hash table implementation (and I recognize c++‘s unordered_map is known to be specified in a way that is not great for perf) and it’s default max load factor is 100%, so :-Ounordered_map is effectively forced to be a closed-address hash table, so the load factor discussion is different relatively to open adressing hash tables.
Abseil and Carbon’s hash tables use a max load factor of 7/8ths, and grow if exceeded.
The STL hash table is constrained to use less efficient implementation strategies and most compensate for this with a lower max load factor (spending some memory to reduce the performance overhead).
But see my top-level comment – this paper is talking about a totally unrelated hashtable design that doesn’t really have relevance for any in-memory production hash tables i’m aware of…
Naive question as someone who does not know much about hash tables: isn’t there a tradeoff between load factor and memory use, where we are forced to resize to a larger backing array to keep the load factor down? If a probing strategy lets me increase the load factor without degrading performance too much, it may be interesting for use-cases interested in paying time (increased constant factors for their more complex strategy) to save space.
You are correct that the load factor is more or less the fraction of the memory you can use for data.
However, the result is for getting insertion-only hash table to perform better than thought possible at >95% utilisation without access to the elements in advance and without reordering them after initial placement.
A practical use case probably will negotiate no-moves before using this.
From the point of view of algorithm design as a coherent field of study, it surely is important that such a restricted case had an open problem for a long time, and now implications of some techniques are better understood.
Maybe, but… In tightly space constrained environments you may not be able to afford to use a hash table that could fill up and need to expand, anyway. And with extremely large tables where the extra space is an appreciable expense, there’s probably a more complex architecture that would be more appropriate. (Not to say there’s nothing interesting here, I haven’t read the paper yet)
There is a performance trade off for memory use vs runtime performance. The problem being addressed (at least per the article) is cases where the load factor is extremely high, far beyond what I would expect even in resource constrained environments. Essentially the naive performance drop off for hash tables (e.g not doing clever things, incl. potentially what happens in the article) is pretty catastrophic at high load factors - I assume it’s probably exponential perf cost, but honestly it’s been so long since I was at uni that the theoretical costs are long gone from my memory. For the vast majority of use cases you’re unlikely to ever have anything approaching those load factors in hash tables, and for the “I have gigantic tables using huge amounts of memory” cases people seem to migrate to varying tree structures because you can get much more controllable performance characteristics (both memory and runtime), and can use low load factor hash tables for smaller frequent/recent use caches.
The tone in the discussion here is fairly negative so far (people don’t like an implementation they found online, which was not written by the authors of the paper). I skimmed the paper and from a distance and as a non-expert it seems reasonable, it is co-authored with people that are knowledgeable about this research field, there are detailed proof arguments, and the introduction shows a good understanding of the context and previous attempts in this area. Unlike the sensationalist Quanta title, the overall claim is (impressive but) fairly reasonable, they disprove a conjectured lower bound by doing better with a clever trick. They point out that this trick is in fact already used in some more advanced parts of the hashtable literature, but maybe people had not done the work of proving complexity bounds, and therefore not realized that it worked much better than expected – no pun intended.
Is this correct? I don’t know – I timed out without reading this in any level of details. But it has the appearance of serious work, so I would be inclined to assume that it is unless proven otherwise.
The PoC was written by a random person AFAIK, and it shouldn’t reflect in any way on the paper or its authors. That could’ve been made more explicit in the other thread though :)
And I agree the paper seems serious though I don’t have the background to understand everything.
I think the paper might help in crafting some novelty implementation with specific worst case tradeoffs. But it would not impact the generic implementations such as Abseil (Google) or Folly F14 (Meta).
TL;DR: Author does not know how to solve Soduku and eventually just prints static answer to static input.
This article isn’t about over-engineering, but about the futility of using the wrong approach. The author uses a “enumerate everything” brute force solution and then tries to using multi-threading for a solution. Brute forcing Soduku is similiar to the “8 Queens Problem” where pruning your enumeration cuts down the number of solutions considered by many orders of magnitude. If “2 in the 10th box” fails, I really don’t need to check “2 in the 10th box and 1 in the 11th box”; I just move to “3 in the 10th box”. This drops so many orders of magnitude to the search that no amount of better compilation or more processors will help.
Still, the author learned one trick about reading the actual requirements, got to write some fun code, and the day was not wasted.
My impression is that your TL;DR is wrong. It’s not very kind to assume that the author “does not know how to solve Sudoku” when there is zero indication to that effect in their blog post – the author does not describe how they resolve sudoku.
The problem being described is a variant of Sudkou paired with a global optimization constraint on the GCD (greatest common dividers) of each row, read as a 9-digit number. All of the blog post is about how to efficiently enumerate numbers that can be the GCD of a set of rows, and then trying to solve the sudoku board with this additional constraint.
The sudoku solving function,
solve, is never shown in the blog post, and just described as “a backtracking search”. You might believe that the author means this in a very naive way, but “a backtracking search” could also be used to describe proper constraint-propagating implementations, when their decisions are just picking one (valid) value for one undetermined position. In fact, its last benchmarked version of the program solves an unspecified number of sudoku boards in 360ms, which would suggest that the solver is not completely naive.I followed the same train of thought and decided to go with Jujutsu (jj). Like pijul, it also has new, strong design ideas – but it is less radical. Unlike pijul, it is compatible with git repositories and forges, which means that it has a much easier path to adoption. (My application is “much better experience than git when rewriting histories in preparation for one or several github PRs”.)
Yes true! I have it in my backlog to take another look (didn’t get it last time i tried it, I didn’t realize I should change the workflow :)). I wonder if the longterm goal of Jujutsu is to introduce an alternative backend which is more “native” to Jujutsu. In that case it would be taking “the long road” very similarly to Oils. Reimplement, then improve.
I have the feeling this is a currently very liked approach. E.g. coreutils is one other example (thought this is more about the language/development ergonomics than features/usage economics).
jj already has it’s own background but from the beginning it has had git compatible storage as well as sync protocol.
I also bounced of half a year ago or so, but things have come a long way now and it’s really much better now and usable as a daily driver.
Yes, see our roadmap.