The interface of Git and its underlying data models are two very different things, that are best treated separately.
The interface is pretty bad. If I wasn’t so used to it I would be fairly desperate for an alternative. I don’t care much for the staging area, I don’t like to have to clean up my working directory every time I need to switch branches, and I don’t like how easy it is to lose commit from a detached HEAD (though there’s always git reflog
I guess).
The underlying data model however is pretty good. We can probably ditch the staging area, but apart from that, viewing the history of a repository as a directed graph of snapshots is nice. Captures everything we need. Sure patches have to be derived from those snapshots, but we care less about the patches than we care about the various versions we saved. If there’s one thing we need to get right, it’s those snapshots. You get reproducible builds & test from them, not from patches. So I think Patches are secondary. I used to love DARCS, but I think patch theory was probably the wrong choice.
Now one thing Git really really doesn’t like is large binary files. Especially if we keep changing them. But then that’s just a compression problem. Let the data model pretend there’s a blob for each version of that huge file, even though in fact the software is automatically compressing & decompressing things under the hood.
What’s wrong with the staging area? I use it all the time to break big changes into multiple commits and smaller changes. I’d hate to see it removed just because a few people don’t find it useful.
Absolutely, I would feel like I’m missing a limb without the staging area. I understand that it’s conceptually difficult at first, but imo it’s extremely worth the cost.
Do you actually use it, or do you just do git commit -p
, which only happens to use the staging area as an implementation detail?
And how do you test the code you’re committing? How do you make sure that the staged hunks aren’t missing another hunk that, for example, changes the signature the function you’re calling? It’s a serious slowdown in workflow to need to wait for CI rounds, stash and rebase to get a clean commit, and push again.
I git add -p
to the staging area and then diff it before generating the commit. I guess that could be done without a staging area using a different workflow but I don’t see the benefit (even if I have to check git status for the command every time I need to unstage something (-: )
As for testing, since I’m usually using Github I use the PR as the base unit that needs to pass a test (via squash merges, the horror I know). My commits within a branch often don’t pass tests; I use commits to break things up into sections of functionality for my own benefit going back later.
Just to add on, the real place where the staging area shines is with git reset -p
. You can reset part of a commit, amend the commit, and then create a new commit with your (original) changes or continue editing. The staging area becomes more useful the more you do commit surgery.
Meh, you don’t need a staging area for that (or anything). hg uncommit -i
(for --interactive
) does quite the same thing, and because it has no artificial staging/commit split it gets to use the clear verb.
I guess that could be done without a staging area using a different workflow but I don’t see the benefit
I don’t see the cost.
My commits within a branch often don’t pass tests;
If you ever need to git bisect
, you may come to regret that. I almost never use git bisect
, but for the few times I did need it it was a life saver, and passing tests greatly facilitate it.
I bisect every so often, but on the squashed PR commits on main, not individual commits within a PR branch. I’ve never needed to do that to diagnose a bug. If you have big PRs, don’t squash, or don’t use a PR-based workflow, that’s different of course. I agree with the general sentiment that all commits on main should pass tests for the purposes of bisection.
I use git gui for committing, (the built in git gui command) which let’s you pick by line not just hunks. Normally the things I’m excluding are stuff like enabling debug flags, or just extra logging, so it’s not really difficult to make sure it’s correct. Not saying I never push bad code, but I can’t recall an instance where I pushed bad code because of that so use the index to choose parts of my unfinished work to save in a stash (git stash –keep-index), and sometimes if I’m doing something risky and iterative I’ll periodically add things to the staging area as I go so I can have some way to get back to the last known good point without actually making a bunch of commits ( I could rebase after, yeah but meh).
It being just an implementation detail in most of that is a fair point though.
I personally run the regression test (which I wrote) to test changes.
Then I have to wait for the code review (which in my experience has never stopped a bug going through; when I have found bugs, in code reviews, it was always “out of scope for the work, so don’t fix it”) before checking it in. I’m dreading the day when CI is actually implemented as it would slow down an already glacial process [1].
Also, I should mention I don’t work on web stuff at all (thank God I got out of that industry).
[1] Our customer is the Oligarchic Cell Phone Company, which has a sprint of years, not days or weeks, with veto power over when we deploy changes.
Author of the Jujutsu VCS mentioned in the article here. I tried to document at https://github.com/martinvonz/jj/blob/main/docs/git-comparison.md#the-index why I think users don’t actually need the index as much as they think.
I missed the staging area for at most a few weeks after I switched from Git to Mercurial many years ago. Now I miss Mercurial’s tools for splitting commits etc. much more whenever I use Git.
Thanks for the write up. From what I read it seems like with Jujutsu if I have some WIP of which I want to commit half and continue experimenting with the other half I would need to commit it all across two commits. After that my continuing WIP would be split across two places: the second commit and the working file changes. Is that right? If so, is there any way to tag that WIP commit as do-not-push?
Not quite. Every time you run a command, the working copy is snapshotted and becomes a real commit, amending the precis working-copy commit. The changes in the working copy are thus treated just like any other commit. The corresponding think to git commit -p
is jj split
, which creates two stacked commits from the previous working-copy commit, and the second commit (the child) is what you continue to edit in the working copy.
Your follow-up question still applies (to both commits instead of the single commit you seemed to imagine). There’s not yet any way of marking the working copy as do-not-push. Maybe we’ll copy Mercurial’s “phase” concept, but we haven’t decided yet.
Way I see it, the staging area is a piece of state needed specifically for a command line interface. I use it too, for the exact reason you do. But I could do the same by committing it directly. Compare the possible workflows. Currently we do:
# most of the time
git add .
git commit
# piecemeal
git add -p .
# review changes
git commit
Without a staging area, we could instead do that:
# most of the time
git commit
# piecemeal
git commit -p
# review changes
git reset HEAD~ # if the changes are no good
And I’m not even talking about a possible GUI for the incremental making of several commits.
Personally I use git add -p
all of the time. I’ve simply been burned by the other way too many times. What I want is not to save commands but to have simple commands that work for me in every situation. I enjoy the patch selection phase. More often than not it is what triggers my memory of a TODO item I forgot to jot down, etc. The patch selection is the same as reviewing the diff I’m about to push but it lets me do it incrementally so that when I’m (inevitably) interrupted I don’t have to remember my place.
From your example workflows it seems like you’re interested in avoiding multiple commands. Perhaps you could use git commit -a
most of the time? Or maybe add a commit-all
alias?
Never got around to write that alias, and if I’m being honest I quite often git diff --cached
to see what I’ve added before I actually commit it.
I do need something that feels like a staging area. I was mostly wondering whether that staging area really needed to be implemented differently than an ordinary commit. Originally I believed commits were enough, until someone pointed out pre-commit hooks. Still, I wonder why the staging area isn’t at least a pointer to a tree
object. It would have been more orthogonal, and likely require less effort to implement. I’m curious what Linus was thinking.
Very honourable to revise your opinion in the face of new evidence, but I’m curious to know what would happen if you broadened the scope of your challenge with “and what workflow truly requires pre-commit hooks?”!
Hmm, that’s a tough one. Strictly speaking, none. But I can see the benefits.
Take Monocypher for instance: now it’s pretty stable, and though it is very easy for me to type make test
every time I modify 3 characters, in practice I may want to make sure I don’t forget to do it before I commit anything. But even then there are 2 alternatives:
I use git add -p
all the time, but only because Magit makes it so easy. If I had an equally easy interface to something like hg split
or jj split
, I don’t think I’d care about the lack of an index/staging area.
# most of the time
git add .
Do you actually add your entire working directory most of the time? Unless I’ve just initialized a repository I essentially never do that.
Here’s something I do do all the time, because my mind doesn’t work in a red-green-refactor way:
Get a bug report
Fix bug in foo_controller
Once the bug is fixed, I finally understand it well enough to write an automated regression test around it, so go do that in foo_controller_spec
Run test suite to ensure I didn’t break anything and that my new test is green
Add foo_controller and foo_controller_spec to staging area
Revert working copy (but not staged copy!) of foo_controller (but not it’s spec)
Run test suite again and ensure I have exactly one red test (the new regression test). If yes, commit the stage.
If no, debug spec against old controller until I understand why it’s not red, get it red, pull staged controller back to working area, make sure it’s green.
—
Yeah, I could probably simulate this by committing halfway through and then doing some bullshit with cherry-picks from older commits and in some cases reverting the top commit but, like, why? What would I gain from limiting myself to just this awkward commit dance as the only way of working? That’s just leaving me to cobble together a workflow that’s had a powerful abstraction taken away from it, just to satisfy some dogmatic “the commit is the only abstraction I’m willing to allow” instinct.
Do you actually add your entire working directory most of the time?
Yes. And when I get a bug report, I tend to first reproduce the bug, then write a failing test, then fix the code.
Revert working copy (but not staged copy!) of foo_controller (but not it’s spec)
Sounds useful. How do you do that?
Revert working copy (but not staged copy!) of foo_controller (but not it’s spec)
Sounds useful. How do you do that?
You can checkout
a file into your working copy from any commit.
Yes. And when I get a bug report, I tend to first reproduce the bug, then write a failing test, then fix the code.
Right, but that was just one example. Everything in your working copy should always be committed at all times? I’m almost never in that state. Either I’ve got other edits in progress that I intend to form into later commits, or I’ve got edits on disk that I never intend to commit but in files that should not be git ignored (because I still intend to merge upstream changes into them).
I always want to be intentionally forming every part of a commit, basically.
Sounds useful. How do you do that?
git add foo_controller <other files>; git restore -s HEAD foo_controller
and then
git restore foo_controller
will copy the staged version back into the working set.
TBH, I have no idea what “git add -p” does off hand (I use Magit), and I’ve never used staging like that.
I had a great example use of staging come up just yesterday. I’m working in a feature branch, and we’ve given QA a build to test what we have so far. They found a bug with views, and it was an easy fix (we didn’t copy attributes over when copying a view).
So I switched over to views.cpp and made the change. I built, tested that specific view change, and in Magit I staged that specific change in views.cpp. Then I commited, pushed it, and kicked off a pipeline build to give to QA.
I also use staging all the time if I refactor while working on new code or fixing bugs. Say I’m working on “foo()”, but while doing so I refactor “bar()” and “baz()”. With staging, I can isolate the changes to “bar()” and “baz()” in their own commits, which is handy for debugging later, giving the changes to other people without pulling in all of my changes, etc.
Overall, it’s trivial to ignore staging if you don’t want it, but it would be a lot of work to simulate it if it weren’t a feature.
What’s wrong with the staging area? I use it all the time to break big changes into multiple commits and smaller changes.
I’m sure you do – that’s how it was meant to be used. But you might as well use commits as the staging area – it’s easy to commit and squash. This has the benefit that you can work with your whole commit stack at the same time. I don’t know what problem the staging area solves that isn’t better solved with commits. And yet, the mere existence of this unnecessary feature – this implicitly modified invisible state that comes and crashes your next commit – adds cognitive load: Commands like git mv
, git rm
and git checkout
pollutes the state, then git diff
hides it, and finally, git commit --amend
accidentally invites it into the topmost commit.
The combo of being not useful and a constant stumbling block makes it bad.
I don’t know what problem the staging area solves that isn’t better solved with commits.
If I’ve committed too much work in a single commit how would I use commits to split that commit into two commits?
Using e.g. hg split
or jj split
. The former has a text-based interface similar to git commit -p
as well as a curses-based TUI. The latter lets you use e.g. Meld or vimdiff to edit the diff in a temporary directory and then rewrites the commit and all descendants when you’re done.
That temporary directory sounds a lot like the index – a temporary place where changes to the working copy can be batched. Am I right to infer here that the benefit you find in having a second working copy in a temp directory because it works better with some other tools that expect to work files?
The temporary directory is much more temporary than the index - it only exists while you split the commit. For example, if you’re splitting a commit that modifies 5 files, then the temporary directory will have only 2*5 files (for before and after). Does that clarify?
The same solution for selecting part of the changes in a commit is used by jj amend -i
(move into parent of specified commit, from working-copy commit by default), jj move -i --from <rev> --to <rev>
(move changes between arbitrary commits) etc.
I use git revise. Interactive revise is just like interactive rebase, except that it has is a cut
subcommand. This can be used to split a commit by selecting and editing hunks like git commit -p
.
Before git-revise, I used to manually undo part of the commit, commit that, then revert it, and then sqash the undo-commit into the commit to be split. The revert-commit then contains the split-off changes.
I don’t know, I find it useful. Maybe if git built in mercurials “place changes into commit that isn’t the most recent” amend thing then I might have an easier time doing things but just staging up relevant changes in a patch-based flow is pretty straightforward and helpful IMO
I wonder if this would be as controversial if patching was the default
What purpose does it serve that wouldn’t also be served by first-class rollback and an easier way of collapsing changesets on their way upstream? I find that most of the benefits of smaller changesets disappear when they don’t have commit messages, and when using the staging area for this you can only rollback one step without having to get into the hairy parts of git.
The staging area is difficult to work with until you understand what’s happening under the hood. In most version control systems, an object under version control would be in one of a handful of states: either the object has been cataloged and stored in its current state, or it hasn’t. From a DWIM standpoint for a new git user, would catalog and store the object in its current state. With the stage, you can stage, and change, stage again, and change again. I’ve used this myself to logically group commits so I agree with you that it’s useful. But I do see how it breaks peoples DWIM view on how git works.
Also, If I stage, and then change, is there a way to have git restore the file as I staged it if I haven’t committed?
Also, If I stage, and then change, is there a way to have git restore the file as I staged it if I haven’t committed?
Git restore .
I’ve implemented git from scratch. I still find the staging area difficult to use effectively in practice.
Try testing your staged changes atomically before you commit. You can’t.
A better design would have been an easy way to unstage, similar to git stash but with range support.
Interesting, that would solve the problem. I’m surprised I’ve not come across that before.
In terms of “what’s wrong with the staging area”, what I was suggesting would work better is to have the whole thing work in reverse. So all untracked files are “staged” by default and you would explicitly un-stage anything you don’t want to commit. Firstly this works better for the 90% use-case, and compared to this workaround it’s a single step rather than 2 steps for the 10% case where you don’t want to commit all your changes yet.
The fundamental problem with the staging area is that it’s an additional, hidden state that the final committed state has to pass through. But that means that your commits do not necessarily represent a state that the filesystem was previously in, which is supposed to be a fundamental guarantee. The fact that you have to explicitly stash anything to put the staging area into a knowable state is a bit of a hack. It solves a problem that shouldn’t exist.
The way I was taught this, the way I’ve taught this to others, and the way it’s represented in at least some guis is not compatible.
I mean, sure, you can have staged and unstaged changes in a file and need to figure it out for testing, or unstage parts, but mostly it’s edit
-> stage
-> commit
-> push
.
That feels, to me and to newbies who barely know what version control is, like a logical additive flow. Tons of cases you stage everything and commit so it’s a very small operation.
The biggest gripe may be devs who forget to add files in the proper commit, which makes bisect
hard. Your case may solve that for sure, but I find it a special case of bad guis and sloppy devs who do that. Also at some point the fs layout gets fewer new files.
Except that in a completely linear flow the distinction between edit and stage serves no purpose. At best it creates an extra step for no reason and at worst it is confusing and/or dangerous to anyone who doesn’t fully understand the state their working copy is in. You can bypass the middle state with git add .; git commit
and a lot of new developers do exactly that, but all that does is pretend the staging state doesn’t exist.
Staging would serve a purpose if it meant something similar to pushing a branch to CI before a merge, where you have isolated the branch state and can be assured that it has passed all required tests before it goes anywhere permanent. But the staging area actually does the opposite of that, by creating a hidden state that cannot be tested directly.
As you say, all it takes is one mistake and you end up with a bad commit that breaks bisect later. That’s not just a problem of developers being forgetful, it’s the bad design of the staging area that makes this likely to happen by default.
I think I sort of agree but do not completely concur.
Glossing over the staging can be fine in some projects and dev sloppiness is IMO a bigger problem than an additive flow for clean commits.
These are societal per-project issues - what’s the practice or policy or mandate - and thus they could be upheld by anything, even using the undo buffer for clean commits like back in the day. Which isn’t to say you never gotta do trickery like that with Git, just that it’s a flow that feels natural and undo trickery less common.
Skimming the other comments, maybe jj
is more like your suggestion, and I wouldn’t mind “a better Git”, but I can’t be bothered when eg. gitless
iirc dropped the staging and would make clean commits feel like 2003.
If git stash --keep-index
doesn’t do what you want the you could help further the conversation by elaborating on what you want.
It’s usually not that hard.
https://lobste.rs/s/yi97jn/is_it_time_look_past_git#c_ss5cj3
The underlying data model however is pretty good. We can probably ditch the staging area,
Absolutely not. The staging area was a godsend coming from Subversion – it’s my favorite part of git bar none.
Everyone seem to suppose I would like to ditch the workflows enabled by the staging area. I really don’t. I’m quite sure there ways to keep those workflows without using a staging area. If there aren’t well… I can always admit I was wrong.
Well, what I prize being able to do is to build up a commit piecemeal out of some but not all of the changes in my working directory, in an incremental rather than all-in-one-go fashion (ie. I should be able to form the commit over time and I should be able to modify a file, move it’s state into the “pending commit” and continue to modify the file further without impacting the pending commit). It must be possible for any commit coming out of this workflow to both not contain everything in my working area, and to contain things no longer in my working area. It must be possible to diff my working area against the pending commit and against the last actual commit (separately), and to diff the pending commit against the last actual commit.
You could call it something else if you wanted but a rose by any other name etc. A “staging area” is a supremely natural metaphor for what I want to work with in my workflow, so replacing it hardly seems desirable to me.
How about making the pending commit an actual commit? And then adding the porcelain necessary to treat it like a staging area? Stuff like git commit -p foo
if you want to add changes piecemeal.
No. That’s cool too and is what tools like git revise
and git absorb
enable, but making it an actual commit would have other drawbacks: it would imply it has a commit message and passes pre-commit hooks and things like that. The staging area is useful precisely for what it does now—help you build up the pieces necessary to make a commit. As such it implies you don’t have everything together to make a commit out of it. As soon as I do I commit, then if necessary --ammend
, --edit
, or git revise
later. If you don’t make use of workflows that use staging then feel free to use tooling that bypasses it for you, but don’t try to take it away from the rest of us.
pre-commit hooks
Oh, totally missed that one. Probably because I’ve never used it (instead i rely on CI or manually pushing a button). Still, that’s the strongest argument so far, and I have no good solution that doesn’t involve an actual staging area there. I guess it’s time to change my mind.
I think the final word is not said. These tools could also run hooks. It may be that new hooks need to be defined.
Here is one feature request: run git hooks on new commit
I think you missed the point, my argument is that the staging area is useful as a place to stage stuff before things like commit related hooks get run. I don’t want tools like git revise
to run precommit hooks. When I use git revise
the commit has already been made and presumably passed precommit phase.
For the problem that git revise
“bypasses” the commit hook when using it to split a commit, I meant the commit hook (not precommit hook).
I get that the staging area lets you assemble a commit before you can run the commit hook. But if this was possible to do statelessly (which would only be an improvement), you could do without it. And for other reasons, git would be so much better without this footgun:
Normally, you can look at git diff
and commit what you see with git commit -a
. But if the staging area is clobbered, which you might have forgot, you also have invisible state that sneaks in!
Normally, you can look at git diff and commit what you see with git commit -a.
Normally I do nothing of the kind. I might have used git commit -a
a couple times in the last 5 years (and I make dozens to hundreds of commits per day). The stattefullness of the staging area is exactly what benefits my workflow and not the part I would be trying to eliminate. The majority of the time I stage things I’m working on from my editor one hunk at a time. The difference between my current buffer and the last git commit is highlighted and after I make some progress I start adding related hunks and shaping them into commits. I might fiddle around with a couple things in the current file, then when I like it stage up pieces into a couple different commits.
The most aggressive I’d get is occasionally (once a month?) coming up with a use for git commit -u
.
A stateless version of staging that “lets you assemble a commit” sounds like an oxymoron to me. I have no idea what you think that would even look like, but a state that is neither the full contents of the current file system nor yet a commit is exactly what I want.
Why not allow an empty commit message, and skip the commit hooks if a message hasn’t been set yet?
Why deliberately make a mess of things? Why make a discreet concept of a “commit” into something else with multiple possible states? Why not just use staging like it is now? I see no benefit to jurry rigging more states on top of a working one. If the point is to simplify the tooling you won’t get there by overloading one clean concept with an indefinite state and contextual markers like “if commit message empty then this is not a real commit”.
Again, what’s the benefit?
Sure, you could awkwardly simulate a staging area like this. The porcelain would have to juggle a whole bunch of shit to avoid breaking anytime you merge a bunch of changes after adding something to the fake “stage”, pull in 300 new commits, and then decide you want to unstage something, so the replacement of the dedicated abstraction seems likely to leak and introduce merge conflict resolution where you didn’t previously have to worry about it, but maybe with enough magic you could do it.
But what’s the point? To me it’s like saying that I could awkwardly simulate if
, while
and for
with goto
, or simulate basically everything with enough NAND
s. You’re not wrong, but what’s in it for me? Why am I supposed to like this any better than having a variety of fit-for-purpose abstractions? It just feels like I’d be tying one hand behind my back so there can be one less abstraction, without explain why having N-1 abstractions is even more desirable than having N.
Seems like an “a foolish consistency is the hobgoblin of little minds” desire than anything beneficial, really.
Again, what’s the benefit?
Simplicity of implementation. Implementing the staging area like a commit, or at least like a pointer to a tree
object, would likely make the underlying data model simpler. I wonder why the staging area was implemented the way it is.
At the interface level however I’ve had to change my mind because of pre-commit hooks. When all you have is commits, and some tests are automatically launched every time you commit anything, it’s pretty hard to add stuff piecemeal.
Yes, simplicity of implementation and UI. https://github.com/martinvonz/jj (mentioned in the article) makes the working copy (not the staging area) an actual commit. That does make the implementation quite a lot simpler. You also get backups of the working copy that way.
Simplicity of implementation.
No offence but, why would I give a shit about this? git is a tool I use to enable me to get other work done, it’s not something I’m reimplementing. If “making the implementation simpler” means my day-to-day workflows get materially more unpleasant, the simplicity of the implementation can take a long walk off a short pier for all I care.
It’s not just pre-commit hooks that get materially worse with this. “Staging” something would then have to have a commit message, I would effectively have to branch off of head before doing every single “staging” commit in order to be able to still merge another branch and then rebase it back on top of everything without fucking about in the reflog to move my now-burried-in-the-past stage commit forward, etc, etc. “It would make the implementation simpler” would be a really poor excuse for a user hostile change.
If “making the implementation simpler” means my day-to-day workflows get materially more unpleasant, the simplicity of the implementation can take a long walk off a short pier for all I care.
I agree. Users shouldn’t have to care about the implementation (except for minor effects like a simpler implementation resulting in fewer bugs). But I don’t understand why your workflows would be materially more unpleasant. I think they would actually be more pleasant. Mercurial users very rarely miss the staging area. I was a git developer (mostly working on git rebase
) a long time ago, so I consider myself a (former) git power user. I never miss the staging area when I use Mercurial.
“Staging” something would then have to have a commit message
Why? I think the topic of this thread is about what can be done differently, so why would the new tool require a commit message? I agree that it’s useful if the tool lets you provide a message, but I don’t think it needs to be required.
I would effectively have to branch off of head before doing every single “staging” commit in order to be able to still merge another branch and then rebase it back on top of everything without fucking about in the reflog to move my now-burried-in-the-past stage commit forward
I don’t follow. Are you saying you’re currently doing the following?
git add -p
git merge <another branch>
git rebase <another branch>
I don’t see why the new tool would bury the staging commit in the past. That’s not what happens with Jujutsu/jj anyway. Since the working copy is just like any other commit there, you can simply merge the other branch with it and then rebase the whole stack onto the other branch after.
I’ve tried to explain a bit about this at https://github.com/martinvonz/jj/blob/main/docs/git-comparison.md#the-index. Does that help clarify?
Mercurial users very rarely miss the staging area.
Well, I’m not them. As somebody who was forced to use Mercurial for a bit and hated every second of it, I missed the hell out of it, personally (and if memory serves, there was later at least one inevitably-nonstandard Mercurial plugin to paper over this weakness, so I don’t think I was the only person missing it).
I’ve talked about my workflow elsewhere in this thread, I’m not really interested in rehashing it, but suffice to say I lean on the index for all kinds of things.
Are you saying you’re currently doing the following? git add -p git merge
I’m saying that any number of times I start putting together a commit by staging things on Friday afternoon, come back on Monday, pull in latest from main, and continue working on forming a commit.
If I had to (manually, we’re discussing among other things the assertion that you could eliminate the stage because it’s pointless, and you could “just” commit whenever you want to stage and revert the commit whenever they want to unstage ) commit things on Friday, forget I’d done so on Monday, pull in 300 commits from main, and then whoops I want to revert a commit 301 commits back so now I get to back out the merge and etc etc, this is all just a giant pain in the ass to even type out.
Does that help clarify?
I’m honestly not interested in reading it, or in what “Jujutsu” does, as I’m really happy with git and totally uninterested in replacing it. All I was discussing in this thread with Loup-Vaillant was the usefulness of the stage as an abstraction and my disinterest in seeing it removed under an attitude of “well you could just manually make commits when you would want to stage things, instead”.
I’m honestly not interested in reading it, or in what “Jujutsu” does
Too bad, this link you’re refusing to read is highly relevant to this thread. Here’s a teaser:
As a Git power-user, you may think that you need the power of the index to commit only part of the working copy. However, Jujutsu provides commands for more directly achieving most use cases you’re used to using Git’s index for.
What “jujutsu” does under the hood has nothing whatsoever to do with this asinine claim of yours, which is the scenario I was objecting to: https://lobste.rs/s/yi97jn/is_it_time_look_past_git#c_k6w2ut
At this point I’ve had enough of you showing up in my inbox with these poorly informed, bad faith responses. Enough.
I was claiming that the workflows we have with the staging area, we could achieve without. And Jujutsu here has ways to do exactly that. It has everything to do with the scenario you were objecting to.
Also, this page (and what I cited specifically) is not about what jujutsu does under the hood, it’s about its user interface.
No offence but, why would I give a shit about [simplicity of implementation]?
It’s because people don’t give a shit that we have bloated (and often slow) software.
And it’s because of developers with their heads stuck so far up their asses that they prioritize their implementation simplicity over the user experience that so much software is actively user-hostile.
Let’s end this little interaction here, shall we.
Sublime Merge is the ideal git client for me. It doesn’t pretend it’s not git like all other GUI clients I’ve used so you don’t have to learn something new and you don’t unlearn git. It uses simple git commands and shows them to you. Most of git’s day-to-day problems go away if you can just see what you’re doing (including what you’ve mentioned).
CLI doesn’t cut it for projects of today’s size. A new git won’t fix that. The state of a repository doesn’t fit in a terminal and it doesn’t fit in my brain. Sublime Merge shows it just right.
I like GitUp for the same reasons. Just let me see what I’m doing… and Undo! Since it’s free, it’s easy to get coworkers to try it.
I didn’t know about GitUp but I have become a big fan of gitui as of late.
I use Fork for the same purpose and the staging area has never been a problem since it is visible and diffable at any time, and that’s how you compose your commits.
See Game of Trees for an alternative to the git tool that interacts with normal git repositories.
Have to agree with others about the value of the staging area though! It’s the One Big Thing I missed while using Mercurial.
Well, on the one hand people could long for a better way to store the conflict resolutions to reuse them better on future merges.
On the other hand, of all approaches to DAG-of-commits, Git’s model is plain worse than the older/parallel ones. Git is basically intended to lose valuable information about intent. The original target branch of the commit often tells as much as the commit message… but it is only available in reflog… auto-GCed and impossible to sync.
Half of my branches are called werwerdsdffsd
. I absolutely don’t want them permanently burned in the history. These scars from work-in-progress annoyed me in Mercurial.
Honestly I have completely the opposite feeling. Back in the days before git crushed the world, I used Mercurial quite a lot and I liked that Mercurial had both the ephemeral “throw away after use” model (bookmarks) and the permanent-part-of-your-repository-history model (branches). They serve different purposes, and both are useful and important to have. Git only has one and mostly likes to pretend that the other is awful and horrible and nobody should ever want it, but any long-lived project is going to end up with major refactoring or rewrites or big integrations that they’ll want to keep some kind of “here’s how we did it” record to easily point to, and that’s precisely where the heavyweight branch shines.
And apparently I wrote this same argument in more detail around 12 years ago.
This is a very good point. It would be interesting to tag and attach information to a group of related commits. I’m curious of the linux kernel workflows. If everything is an emailed patch, maybe features are done one commit at a time.
If you go further, there are many directions to extend what you can store and query in the repository! And of course they are useful. But even the data Git forces you to have (unlike, by the way, many other DVCSes where if you do not want a meaningful name you can just have multiple heads in parallel inside a branch) could be used better.
I can’t imagine a scenario where the original branch point of a feature would ever matter, but I am constantly sifting through untidy merge histories that obscure the intent.
Tending to your commit history with intentionality communicates to reviewers what is important, and removes what isn’t.
It is not about the point a branch started from. It is about which of the recurring branches the commit was in. Was it in quick-fix-train branch or in update-major-dependency-X branch?
The reason why this isn’t common is because of GitHub more than Git. They don’t provide a way to use merge commits that isn’t a nightmare.
When I was release managing by hand, my preferred approach was rebasing the branch off HEAD but retaining the merge commit, so that the branch commits were visually grouped together and the branch name was retained in the history. Git can do this easily.
I never understood the hate for Git’s CLI. You can learn 99% of what you need to know on a daily basis in a few hours. That’s not a bad time investment for a pivotal tool that you use multiple times every day. I don’t expect a daily driver tool to be intuitive, I expect it to be rock-solid, predictable, and powerful.
This is a false dichotomy: it can be both (as Mercurial is). Moreover, while it’s true that you can learn the basics to get by with in a few hours, it causes constant low-level mental overhead to remember how different commands interact, what the flag is in this command vs. that command, etc.—and never mind that the man pages are all written for people thinking in terms of the internals, instead of for general users. (That this is a common failing of man pages does not make it any less a problem for git!)
One way of saying it: git has effectively zero progressive disclosure of complexity. That makes it a continual source of paper cuts at minimum unless you’ve managed to actually fully internalize not only a correct mental model for it but in many cases the actual implementation mechanics on which it works.
Its manpages are worthy of a parody: https://git-man-page-generator.lokaltog.net
Its predecessors CVS and svn had much more intuitive commands (even if they were was clumsy to use in other ways). DARCS has been mentioned many times as being much more easy to use as well. People migrating from those tools really had a hard time, especially because git changed the meanings of some commands, like checkout.
Then there were some other tools that came up around the same time or shortly after git but didn’t get the popularity of git like hg and bzr, which were much more pleasant to use as well.
I think the issues people have are less about the CLI itself and more about how it interfaces with the (for some developers) complex and hard to understand concepts at hand.
Take rebase for example. Once you grok what it is, it’s easy, but trying to explain the concept of replaying commits on top of others to someone used to old school tools like CVS or Subversion can be a challenge, especially when they REALLY DEEPLY don’t care and see this as an impediment to getting their work done.
I’m a former release engineer, so I see the value in the magic Git brings to the table, but it can be a harder sell for some :)
The interface is pretty bad.
I would argue that this is one of the main reasons for git’s success. The CLI is so bad that people were motivated to look for tools to avoid using it. Some of them were motivated to write tools to avoid using it. There’s a much richer set of local GUI and web tools than I’ve seen for any other revision control system and this was true even when git was still quite new.
I never used a GUI with CVS or Subversion, but I wanted to as soon as I started touching the git command line. I wanted features like PRs and web-based code review, because I didn’t want to merge things locally. I’ve subsequently learned a lot about how to use the git CLI and tend to use it for a lot of tasks. If it had been as good as, say, Mercurial’s from the start then I never would have adopted things like gitx
/ gitg
and GitHub and it’s those things that make the git ecosystem a pleasant place to be.
The interface of Git and its underlying data models are two very different things, that are best treated separately.
Yes a thousand times this! :) Git’s data model has been a quantum leap for people who need to manage source code at scale. Speaking as a former release engineer, I used to be the poor schmoe who used to have to conduct Merge Day, where a branch gets merged back to main.
There was exactly one thing you could always guarantee about merge day: There Will Be Blood.
So let’s talk about looking past git’s god awful interface, but keep the amazing nubbins intact and doing the nearly miraculous work they do so well :)
And I don’t just mean throwing a GUI on top either. Let’s rethink the platonic ideal for how developers would want their workflow to look in 2022. Focus on the common case. Let the ascetics floating on a cloud of pure intellect script their perfect custom solutions, but make life better for the “cold dark matter” developers which are legion.
I would say that you simultaneously give credit where it is not due (there were multiple DVCSes before Git, and approximately every one had a better data model, and then there are things that Subversion still has better than everyone else, somehow), and ignore the part that actually made your life easier — the efforts of pushing Git down people’s throat, done by Linus Torvalds, spending orders of magnitude more of his time on this than on getting things right beyond basic workability in Git.
Not a DVCS expert here, so would you please consider enlightening me? Which earlier DVCS were forgotten?
My impressions of Mercurial and Bazaar are that they were SL-O-O-W, but they’re just anecdotal impressions.
Well, Bazaar is technically earlies. Monotone is significantly earlier. Monotone has quite interesting and nicely decoupled data model where the commit DAG is just one thing; changelog, author — and branches get the same treatment — are not parts of a commit, but separately stored claims about a commit, and this claim system is extensible and queriable. And of course Git was about Linus Torvalds speedrunning implementation of the parts of BitKeeper he really really needed.
It might be that in the old days running on Python limited speed of both Mercurial and Bazaar. Rumour has it that the Monotone version Torvalds found too slow was indeed a performance regression (they had one particularly slow release at around that time; Monotone is not in Python)
Note that one part of things making Git fast is that enables some optimisations that systems like Monotone make optional (it is quite optimistic about how quickly you can decide that the file must not have been modified, for example). Another is that it was originally only intended to be FS-safe on ext3… and then everyone forgot to care, so now it is quite likely to break the repository in case of unclean shutdown mid-operation. Yes, I have damaged repositories that way to a state where I could not find advice on how to avoid re-cloning to get even partially working repository.
As of Subversion, it has narrow checkouts which are a great feature, and DVCSes could also have them, but I don’t think anyone properly has them. You kind of can hack something with remote-automate in Monotone, but probably flakily.
Let the data model pretend there’s a blob for each version of that huge file, even though in fact the software is automatically compressing & decompressing things under the hood.
Ironically, that’s part of the performance problem – compressing the packfiles tends to be where things hurt.
Still, this is definitely a solvable problem.
I used to love DARCS, but I think patch theory was probably the wrong choice.
I have created and maintains official test suite for pijul, i am the happiest user ever.
Hmm, knowing you I’m sure you’ve tested it to death.
I guess they got rid of the exponential conflict resolution that plagued DARCS? If so perhaps I should give patch theory another go. Git ended up winning the war before I got around to actually study patch theory, maybe it is sounder than I thought.
Pijul is a completely different thing than Darcs, the current state of a repository in Pijul is actually a special instance of a CRDT, which is exactly what you want for a version control system.
Git is also a CRDT, but HEAD isn’t (unlike in Pijul), the CRDT in Git is the entire history, and that is not a very useful property.
Best test suite ever. Thanks again, and again, and again for that. It also helped debug Sanakirja, a database engine used as the foundation of Pijul, but usable in other contexts.
There are git-compatible alternatives that keep the underlying model and change the interface. The most prominent of these is probably gitless.
I’ve been using git entirely via UI because of that. Much better overview, much more intuitive, less unwanted side effects.
You can’t describe Git without discussing rebase and merge: these are the two most common operations in Git, yet they don’t satisfy any interesting mathematical property such as associativity or symmetry:
Associativity is when you want to merge your commits one by one from a remote branch. This should intuitively be the same as merging the remote HEAD, but Git manages to make it different sometimes. When that happens, your lines can be shuffled around more or less randomly.
Symmetry means that merging A and B is the same as merging B and A. Two coauthors doing the same conflictless merge might end up with different results. This is one of the main benefits of GitHub: merges are never done concurrently when you use a central server.
Well, at least this is not the fault of the data model: if you have all the snapshots, you can deduce all the patches. It’s the operations themselves that need fixing.
My point is that this is a common misconception: no datastructure is ever relevant without considering the common operations we want to run on it.
For Git repos, you can deduce all the patches indeed, but merge and rebase can’t be fixed while keeping a reasonable performance, since the merge problem Git tries to solve is the wrong one (“merge the HEADs, knowing their youngest common ancestor”). That problem cannot have enough information to satisfy basic intuitive properties.
The only way to fix it is to fetch the entire sequence of commits from the common ancestor. This is certainly doable in Git, but merges become O(n) in time complexity, where n is the size of history.
The good news is, this is possible. The price to pay is a slightly more complex datastructure, slightly harder to implement (but manageable). Obviously, the downside is that it can’t be consistent with Git, since we need more information. On the bright side, it’s been implemented: https://pijul.org
no datastructure is ever relevant without considering the common operations we want to run on it.
Agreed. Now, how often do we actually merge stuff, and how far is the common ancestor in practice?
My understanding of the usage of version control is that merging two big branches (with an old common ancestor) is rare. Far more often we merge (or rebase) work units with just a couple commits. Even more often than that we have one commit that’s late, so we just pull in the latest change then merge or rebase that one commit. And there are the checkout operations, which in some cases can occur most frequently. While a patch model would no doubt facilitate merges, it may not be worth the cost of making other, arguably more frequent operations, slower.
(Of course, my argument is moot until we actually measure. But remember that Git won in no small part because of its performance.)
I agree with all that, except that:
the only proper modelling of conflicts, merges and rebases/cherry-picking I know of (Pijul) can’t rely on common ancestors only, because rebases can make some future merges more complex than a simple 3-way merge problem.
I know many engineers are fascinated by Git’s speed, but the algorithm running on the CPU is almost never the bottleneck: the operator’s brain is usually much slower than the CPU in any modern version control system (even Darcs has fixed its exponential merge). Conflicts do happen, so do cherry-picks and rebases. They aren’t rare in large projects, and can be extremely confusing without proper tools. Making these algorithms fast is IMHO much more important from a cost perspective than gaining 10% on a operation already taking less than 0.1 second. I won’t deny the facts though: if Pijul isn’t used more in industry, it could be partly because that opinion isn’t widely shared.
some common algorithmic operations in Git are slower than in Pijul (pijul credit
is much faster than git blame
on large instances), and most operations are comparable in speed. One thing where Git is faster is browsing old history: the datastructures are ready in Pijul, but I haven’t implemented the operations yet (I promised I would do that as soon as this is needed by a real project).
Depending on your problem, a large peach monoseed might be much more manageable than apple style microseeds. YMMV
Scattering the shader compiles + resource allocation around does sound like it will result in a lot if stutter around startup. Do you have a way to deal with that?
Also, and this is a more subjective comment, I have the general feeling that using standard opengl/DX/Vulcan can actually be more maintainable than a custom system like this, because GPU programmers are familiar with it. A heavyweight wrapper like this will be very unfamiliar to any new people joining the project, and while it might make things easier for people who don’t know GPU programming, I think it might make things harder for people who do.
Basically, I have a vague sense that it might be falling into the trap of making easy things easier at the expense of making hard things harder.
Actually the framework is designed to let revealed preference solve that. Your vague sense is misplaced.
The most basic drawing abstraction is literally just a handful of lines of code that gather functions to call, and then calls them, with no idea what they do.
Within a render pass, the same applies: it just passes a normal webgpu render pass encoder to a lambda, which can do anything it wants.
Everything beyond that is opt-in. If you want to construct naked draw calls from pure, uncomposed shaders, you can.
There is no overarching scene abstraction, and the extensions to WGSL are extremely minimal, unlike almost every other engine out there. Specifically, what I wanted to avoid is exactly what you describe, which you run into in e.g. three.js: if you wish to render something that doesn’t fit into three’s scene model, you still need to pretend it is a scene, just to render e.g. a full screen quad.
Furthermore, the abstractions Use.GPU does have, rely as much as possible on native webgpu types which are not wrapped in any way. I call this “No API” design.
In short: I recommend you actually look at its code before judging. It may surprise you. Most of the work has not gone into towering abstractions, but rather, decomposing the existing practices along saner lines, that allow for a la carte, opt-in composition.
As for the start up problem: I compile shaders async, and hence it loads similar to a webpage, with different elements popping in when available. If you don’t want this, you can use a Suspense like mechanism to render fallback content/code, or to keep rendering the previous content until the new content is ready.
I use https://www.blacknight.com/ because I’m Irish, and like to support a “local” business. (Local in quotes because I don’t live in Ireland anymore :p) Quite happy with them so far.
I use plenty of open source code at work that I would be glad for us to pay for, but I don’t necessarily have any interest in a talk from the authors.
I asked for a raise at work and was told that I am “useless” and a “net drain on the company” and that it was an insult for me to ask. Which, even if those things were true (I personally think they’re not), that is absolutely not something you say to an employee lol. They didn’t fire me so I must not be useless, at least, but I will be applying to new jobs starting tomorrow.
And they won’t be getting two weeks notice, lol.
Damn, if you can afford to, you should leave immediately IMO. Screw that shit, nobody gets to talk to you like that, it doesn’t even matter if they’re right or not.
There’s a lot more to complain about. The boss accusing a coworker who wasn’t present of having dementia, calling the police on the last person who quit, etc. I’m getting out of there ASAP, but I feel for my coworkers who have to stay.
I read it with a laughing and a crying eye. Much of it is true. that said:
So, I’m looking for a portable case now… Or I have one custom built, so that I can take my “desktop” with me to the cabin, to the hotel, to the garden allotment for the weekend. DIY is probably needed, much like in the case of decent keyboards.
FWIW, I got tired of wearing eyeglasses and tilting my neck down and realized there’s a tradeoff between tiny displays and having them closer to your face - so after my laptop died I went with a ‘raised tablet’ setup: https://www.reddit.com/r/ErgoMobileComputers/comments/s6k1qr/raised_tablet_pc_setup/ . So it’s bulkier yet enabled more pandemic-time outdoor usage, HiDPI screen has crisp fonts, and somehow ended up with a small-but-mechanical keyboard.
Notice how the tablet sits over the keyboard and trackpad closer to the face - the reduced distance between display and face has made for less glare - like when you walk up to a window to see inside. I usually work outdoors where there’s some tree cover or some shade canopy to help - so far it’s been good enough to manage ~4-hour sessions of work.
I’m someone who spent all sorts of time looking into eink and high-brightness screens and this setup made me realize I may not need such innovations. Only other tradeoff is battery life so I’ve caved into carrying around an external battery.
Have a look at the louqe ghost s1. Its suuuper small, but fits a lot of hardware if you’re careful about your choices. Fits in a backpack.
There’s a surprising amount of “small-factor” PCs around, a cursory Google gives me this result
https://www.pcmag.com/picks/the-best-windows-mini-pcs
I personally purchased a used NUC from work, which runs as a server at home.
I think the biggest (ha!) issue is the screen. Most screens are not designed to be easily portable.
As an aside, if you haven’t looked at terminal glasses (spectacles) I suggest you do. I’ve worn glasses all my life but I now have 2 pairs, one being a dedicated terminal glasses pair. I find them useful for all sorts of close-up viewing, such as home DIY.
Most laptops have a light indicating that the camera is on, so you can tell if someone is spying on you.
I’m pretty sure it’s hardwired so that the camera can only receive power through the LED, making it impossible to bypass.
I’m in the same situation and at some was researching if there are modular cases where i could fit the peripherals of my choice, the focus being on keyboard and screen, durability and modularity. Unfortunately didn’t find anything - either my research was bad or this is a niche to be filled. Please let us know if you find a solution.
I think I will go with a custom built case. The bulkiest things which the worst packaging issues for now are power supplies for all the stuff.
A monitor can get pretty decent flat package, if it is without it’s own case.
https://www.acmeportable.com/industrial-workstations
but they are expensive. and 17.3” displays are to small for my taste
What’s the motivation for using this over another implementation, like musl? Not trying to crap on the project, just genuinely curious.
musl has been ported to quite a few environments and is fairly common in bare-metal toolchains. That said, it doesn’t have a clean OS abstraction and the ports generally work by providing a stub layer that looks like Linux, which isn’t ideal. Musl is also a C codebase and a C standard library can be a lot cleaner and simpler if implemented in a higher-level language. For example, there are a bunch of things that, on modern systems, end up being reference counted (e.g. bits of locale state) and having std::shared_ptr
available is much nicer than remembering to manually do the refcounting for these things.
The Fuchsia libc is based on musl but a lot of hacking and slashing was required to free it from the assumptions of a posixy unix world.
There seem to be a lot of Fuchsia people involved with LLVM libc, so I’m guessing that this experience wasn’t particularly pleasant for them?
My own involvement with Fuchsia’s libc has been pretty small so I’m not the best to answer authoritatively, but my sense is that it was great to have a relatively small libc to build our platform specific libc on, but they saw the value in having a portable libc that could be shared with other platforms. To get musl working on Fuchsia it had to be chopped up in ways that makes what we have a fork rather than a port - it would be nice to be able to take a cleaner approach.
Names in databases really do suck. While we’re on the subject of broken schemas, I have a bone to pick with addresses too.
People who live in apartments frequently have to fill out the “address line 2 field” for an apartment/suite number, but some sites expect to jam it all into “address line 1” and will complain that it’s “too long”. I’ve also seen sites that will only let you specify a single number for the suite/unit when it should be freeform. Some sites will also force everything to be all caps, which isn’t more correct than what I input. Even USPS does some of this stuff, as well as classifying addresses into “commercial” addresses, then banning you from submitting an address change request if they think you’re moving to a so-called commercial address.
The worst sites are the ones that refuse to accept an address unless accepted by some address validation service. If you live in a new location, a location that was recently annexed into a city, a location that’s just weird enough (aka not a single family home in a rich suburb) to not be in the system, you’re screwed. Only use address validators if you let users override them. I know where I live, damn it!
And that’s just for places I’ve lived personally. There’s probably a bunch more issues for rural areas and international addresses.
For a few years I lived in an apartment building that had retail on the first floor, with an entrance for each. If a delivery went to the wrong door it usually was lost or stolen, but specifying the correct door went on the second line of the address. USPS always saw it, but the harried, closely-monitored package delivery services almost never did. So I’d swap to put that on the first line, which is invalid but always worked. Except when blocked by a site that required using an address their “validator” approved of. It’s very frustrating to have a site tell me I don’t know my own address.
Until recently we didn’t have postcodes in Ireland. It was always a fun dance trying to figure out which combo of “”, “NA”, “ N/A”, “000”, “ 000000” etc would be accepted.
And then of course the package would show up on your door with the nonsense postcode proudly printed on it.
The UK is interesting for this type of thing because UK addresses basically break almost every assumption about addresses. Houses without a number and houses on streets that don’t have a name are quite common. Another thing that annoys me to no end, systems insisting that the country has to be “United Kingdom” when it’s almost always better to put the actual country in the country field (e.g. England, Scotland, etc).
The nice thing is that house name or number + postcode always uniquely identifies an address in the UK. Unfortunately, a lot of things don’t take advantage of this. My address is {number}, {road name}, {postcode} but there are two blocks of flats on the same road that have the address {number}, {block of flats name}, {road name}, {postcode}, where the number and road name are the same as my address but the postcode is different. Delivery drivers (especially food delivery ones) often come to my house when they’re aiming for one of the flats.
Validators are such a pain. A store I go to always asks for a phone number for their rewards account. In the last few months, some idiot decided they should validate all of the phone numbers in their database…
Of course, the phone number I’ve used for seven years doesn’t validate. Along with hundreds of other people who shop there.
From the perspective of compilers of the Turbo Pascal era, compilers on Unix looked slow: they would call assemblers and linkers as a separate process whereas Turbo Pascal would create an executable directly. Certainly it is possible to create binaries more directly but other desirable features like FFI might still depend on doing this in steps.
You will always need a linker to call external functions from system libraries, but you could use this approach internally for almost everything, and then run a linker to link in libc.so and friends. I’m.not sure how much link time would be saved, but my intuition is that it would be a lot.
Unless I misunderstood your point (i.e. we do need a dynamic linker at runtime for shared libraries), Go does not use a conventional linker step to link against a symbol in libc or any other shared library (see //go:cgo_import_dynamic
directive), and neither the presence of the target library is required (in fact, it is not accessed at all).
Older Windows/Mac IDEs like Turbo, Lightspeed/THINK, Metrowerks used the same compile-then-link architecture, they just had the compiler and linker built into the app instead of being separate binaries. I definitely recall waiting for THINK C++’s link phase to finish.
Our dear author, besides being a newbie to technical writing, also can’t into terminology:
A color value of 1.0is full brightness,while a value of 0.0 is zero brightness.
…and so it sucks, just like SVG. Actually, SVG at least has wide support for gamma-correct filters.
Why can’t humanity learn.
One mildly redeeming feature is that it looks easy to implement in Cairo terms, because it’s also awfully broken. And even then it’s not ideal.
Instead of just ranting, feel free to contribute, write a github issue to the specification repo. This is a not a 1.0 release, but a first draft of the specification and i’m open for more precision and correctness. Keep that in mind.
- The file extension is implicit, undefined in the specification.
- No media type is even suggested either.
Both are in the specification since an hour or so, the version on the website will be updated tomorrow at 5:00 UTC
- Gradients are implicitly in sRGB’s pseudo-gamma space, therefore incorrect. See reference rendering.
Gradients are is not yet specified properly yet, but it is defined to be blended in linear color space. See the reference implementation
- Blending is undefined, therefore implicitly in sRGB’s pseudo-gamma space, and incorrect.
Blending is defined as linear color blending, as implemented in the reference implementation
- RGB 565 has an undefined colour space.
That is kinda correct. Feel free to propose a good color space that will fit real world applications. I should figure out if display controllers can properly set gamma curves, this would allow to just fix it to sRGB as well
Our dear author, besides being a newbie to technical writing, also can’t into terminology:
I am not a native speaker, so excuse my bad english technical writing. My main language is german, and i can do better technical writing there, but i assume you wouldn’t be happy with a fully german spec either.
I’m a native grump, and I’ve seen way too much broken stuff, nice to meet you. I considered implementing the format, but the more I read, the faster I backed out.
The specification says 1.0, and there are no obvious traces of it being a WIP, so I criticized it as such. If you make changes to it now without bumping, you’re losing points in the technical documentation department already. If you don’t, your 1.0 is subpar. The document also unnecessarily craps on SVG in the introduction, which kind of set the tone for me.
[…] the color is linearly interpolated […]
is ambiguous at best–I assumed the straightforward interpretation. “Is interpolated in a linear colour space” will finally be clear.
My point about RGB 565 is concerned with what I see as omission. Perhaps you’ve quoted inappropriately.
I’m no expert here, and colour is tricky, but scRGB in a floating point format would be future-proof and easy to convert. Find an expert.
A color value of 1.0 is full brightness
If you replace “brightness” with “intensity” (of primaries), that particular sentence will stop sounding funny.
Coming from the sameish quadrant - what would be the less grumpy wishlist (spec and otherwise) from my booster-jab-muddled mine, what I can think of right now:
Just because I know it sucks when someone shits on your work, I’ll say this: ignore that guy, they’re just being an asshole. Your library looks awesome, and I’n excited to see how it develops.
I mean, the critique is valid, only the presentation is bad ;)
This is actually all stuff i will incorporate in improving the spec.
Awesome. I was going to suggest something similar (cmyk support). If this could work for print, it could really take off.
Regardles, I think this really cool.
To be frank, no project that uses the GPL post-GPL2 will ever be the ‘boring’ variant. Not because it can’t, but because you need to convince lawyers that it’s boring.
And GCC has done little to create a situation where it might be. clang/LLVM breaks the compiler in two, with a frontend and a backend that can evolve independently. Can you even do that with gcc? And I mean, in a practical sense. I know that the frontend and backend of gcc can technically be decoupled, but technically != plausibly.
What does a compiler’s choice of license have to do with its approach to undefined behaviour? Maybe just being dense, but I don’t understand what point you’re making here.
Your information is outdated. You can use GCC backend without GCC frontend, and it is an option supported by upstream. Since GCC 5. See https://gcc.gnu.org/wiki/JIT.
Since the compiler’s license has no effect on the license of your code, nor does GPL3 change anything much vs GPL2 (in reality, I understand there is a lot of spin to the contrary), this seems like an axe to grind more than a contribution
We couldn’t use gcc post-GPL3 when I was at Amazon (or recent versions of emacs for that matter). Do GOOG/MSFT/FB treat gcc differently?
There were many engineers using emacs, but the official line was that you weren’t allowed to install any GPL3/AGPL software on a work machine for any purpose, and that explicitly included recent versions of emacs (and also recent versions of gcc, which meant the build system was stuck with obsolete versions of gcc). I suspect everyone just ignored the emacs restriction, though. I’m sure a lot has changed since I left in 2014 (I bet the build system has moved to clang), and I don’t know the current policy on GPL software.
At Microsoft, the policy is surprisingly sane: You can use any open source program. The only problems happen when you need to either:
There are approvals processes for these. There’s no blanket ban on any license (that I’m aware of) but there are automatic approvals for some licenses (e.g. MIT), at least from the lawyers - the security folks might have different opinions if upstream has no coordinated disclosure mechanism or even a mechanism for reporting CVEs.
That sound unsustainable. Do you not already need new builds of GCC to build Linux? Surely if not, then you will eventually. And I can’t see Amazon ditching Linux any time soon
Keep in mind my information is 7 years out of date (I left in 2014, when Amazon was just starting to hire Linux kernel developers).
Eek.
Normally you would expect someone to backport patches and fixes; but web browser codebases are massive and ugly, so I suspect that’s a really hard job for volunteers. They would possibly have to invent their own fixes too, as upstream might have replaced whole systems within the codebase when fixing the bugs.
Options I can see:
Any other options?
When I used Debian I just used the Google Chrome deb repo. I used Debian testing, which is what Google tracks internally, so Chrome is guaranteed to work. That is, if Chrome were broken on Debian testing, it would be broken for Google developers. And the Google developer workflow heavily relies on web-based tooling. That’s as close to a “blessed” configuration you can get for web browsers on Linux as far as I know.
but then you’re introducing an untrusted binary package into the system (untrusted in that it was built by a 3rd party, not from source on debian-owned servers, etc)
Yeah, but most people don’t care about that and just want their computers to work. Even as a relatively security-conscious SRE, that includes me.
On the list of “people likely to distribute malware-infected binaries,” Google is pretty far down. Unless Chrome falls under your personal definition of malware I suppose.
Very much so. It’s amazing how much the goalposts of “malware” have shifted.
Chrome is spyware. Having a EULA or opt-in was never a reason for spyware not to be listed by AV tools in the past (at best this might make them get flagged as “PUPS” instead of “spyware”). If Chrome came out from a small company in the 2000’s then it would get flagged.
No-one dares mark Chrome as malware. You cannot upset such a large company nor such a large computer base without expecting users to think you are the one at fault. We are not malware, we are an industry leader, you must be mistaken sir :)
It seems that you can, indirectly, buy your way out of being considered malware simply by being a big player.
…from a small company in the 2000’s then it would get flagged.
I get your point, but c’mon… Stuff got flagged back then because it interrupted what the user was trying to do. If you don’t launch Chrome, you don’t see it, and it doesn’t attempt to interact with you. That’s what most users care about, that’s what most users consider to be malware, and, as far as I recall, that’s (largely) what got apps flagged as malware in the 2000s.
Chrome is like Internet Explorer with all those nasty toolbars installed, except the toolbars are hidden by default ¯\(ツ)/¯.
That’s a silly distinction. If you use Chrome, then you’re already executing tons of arbitrary code from Google. In practice, whether you get Chrome from Debian or Google, you still have no choice but trust Google.
same here, even as a long term Debian user (20+ years), this is just the only way for me, for both the private and regular workstation.
Remove the browser packages.
I’d go with that. Well, leave netsurf in so there’s at least a modicum of web browsing functionality OOTB. Motivated users can download Firefox themselves and the world won’t end. That’s what they have to do on windows and macOS. But trying to keep up with the merry go round is like trying to boil the ocean. Then volunteer effort can be spent on an area that the investment will recoup.
In previous Debian releases they had a section in the release notes about how the version of webkit they shipped was known to be behind on security patches and that it was only included so that you could use it to view trusted sources like your own HTML files or whatever. They were very specific about the fact that only Firefox and Chromium were safe to use with untrusted content.
But I only found out about it by a friend telling me about it in chat. I have my doubts that this could be communicated effectively.
Normally you would expect someone to backport patches and fixes; but web browser codebases are massive and ugly, so I suspect that’s a really hard job for volunteers. They would possibly have to invent their own fixes too, as upstream might have replaced whole systems within the codebase when fixing the bugs.
The article allows us an interesting glimpse into just how hard this is, and it’s not just because of the web browsers:
Debian’s official web browser is Mozilla Firefox (the ESR version). The last update of Firefox ESR in Debian stable has been version 78.15.0. This version also has quite a few unpatched security issues and the 78.x ESR branch is not maintained by Mozilla anymore. They need to update to the 91.x ESR branch, which apparently causes big problems in the current stable Debian platform. In an issue, people complain about freezing browser sessions with the 91.x release, which blocks the new Firefox ESR release from being pushed to “stable-security”. Somebody in the issue claims the reason: “Firefox-ESR 91.3 doesn’t use OpenGL GLX anymore. Instead it uses EGL by default. EGL requires at least mesa version 21.x. Debian stable (bullseye) ships with mesa version 20.3.5.”
“So just update mesa” doesn’t sound like the kind of thing you could do over just a couple of days, seeing how many packages depend on it. Assuming that even fixes the Firefox end of things, I’m not sure I want to think about how many things could break with that update, not before I’ve had my second coffee of the day in any case. Just testing the usual “I updated mesa and now it crashes/looks funny” suspects – Gnome, Plasma, a bunch of games – takes weeks. It’s something you can do in testing
but it takes a while.
Large commercial vendors are hitting release management problems like these, too, this is actually part of the reason why you see so many Linux gadgets unironically using tech stacks from three years ago. It’s worse for Debian because they’re trying to build a general-purpose system out of parts that are increasingly made for special-purpose systems that you can either freeze forever (embedded devices) or overwork DevOps teams into PTSD and oblivion in order to keep them running (cloud apps).
Not saying Debian should drop stable releases and becoming a rolling release, but perhaps there’s some slightly more rapid cadence they could adopt with releases? Like, is the issue highlighted in the article also a problem with OpenSuSE and Red Hat?
“Stable” means different things to different distros.
To Debian, “Stable” means that bugs will be patched, but features and usage will not. This does not fit with Mozilla and Google’s monthly release cadence; all changes need to be checked over by skilled devs.
SuSE just builds whatever Mozilla hands them, as far as I can tell.
For Firefox (and some other packages iirc) Debian have already given up on that. They would package the latest Firefox ESR even if it introduced new features (and it would, of course). The issue is even that is an insurmountable amount of work. The latest ESR needs much newer LLVM and Rust toolchain versions than the last one. And Debian also wants to build all packages for a given release with other packages in that release; so that means updating all that stack too.
This is why I don’t really see the point in LTS Linux distros. By a couple of years into their lifetime, the only thing that you’re getting from the stability is needing to install most things that you actually want from a separate repo. If ‘stable’ means ‘does not get security fixes’ then it’s worse than useless. A company like Red Hat might have the resources to do security backports for a large set of packages but even they don’t have that ability for everything in their package repos.
It works a bit better in the BSD world, where there’s a strict distinction between the base system and third-party packages, so the base system can provide ABI stability over a multi-year period within a single release but other things can be upgraded. The down side of this is that the stability applies only to the base system. This is great if you’re building an appliance but for anything else you’re likely to have dependencies from the packages that are outside of the base system.
The Debian stable approach works really well for servers. It works moderately well for desktops, with the very notable exception of web browsers – which are, without a doubt, the most used most exposed most insanely complicated subsystem on any desktop, so much so that Google’s ChromeOS is a tiny set of Linux vital organs supporting Chrome.
Even so, Debian is working on this and within a few weeks, I think, there will be new packages for stable and oldstable and even LTS.
I used to think that the “stability” was fine for servers, but it practice it meant that every couple of years I was totally screwed when I had to urgently fix a small thing, but it couldn’t be done without a major upgrade of the whole OS that upset everything. It also encourages having “snowflake” servers, which is problematic on its own.
I feel like the total amount of problems and hassle is the same whether you use a rolling release or snapshots, but snapshot approach forces you to deal with all of them at once. You can’t never upgrade, and software is going to evolve whether you like it or not, so only choice you have is whether you deal with upgrade problems one by one, or all at once.
The Debian release cadence is about 2 years, and has been for 16 years. How much faster would work? What’s Firefox ESR’s cadence? The best I could find from Mozilla was “on average 42 weeks” but I’m not sure that’s quite the right thing. ESR 78 only came out in September this year and is already unsupported. The latest ESR has very different toolchain requirements to build. It’s a confusing picture.
This “huge VM” approach works great and I am using it extensively in production (though not with a dynamic array interface). For 32-bit systems or other contexts in which allocating large virtual address ranges is not possible, another alternative that preserves pointer stability (but does not guarantee contiguity of elements in virtual memory) is the “segment array”: a small fixed-size array stores pointers to “segments” of exponentially increasing size (except the initial size is repeated once). The total memory doubles each time a segment is added to the array: 8, 8 + 8 = 16, 8 + 8 + 16 = 32, etc.
I first encountered the “segment array” idea in “The Design and Implementation of Dynamic Hashing for Sets and Tables in Icon” but have seen it in many other places since. The “huge VM” paradigm shows up in many, many places as well; one of the most interesting is the “virtual span” idea in scalloc. I use a similar approach to nearly eliminate the need for compaction due to virtual memory fragmentation in a compacting GC for an MVCC database (compaction is generally required only for physical memory fragmentation, i.e., sparse allocations within a single page).
It was actually just after implementing a segment array that I thought of doing this. I’d been using the virtual memory ring buffer trick at work a few weeks before so it was fresh in my head, and I thought “wait, the page table could just put all my segments beside each other…”
You can exploit demand paging in so many applications. Another “huge VM” trick I found was to adapt linear hashing to a predetermined layout of page-size buckets in a single VM segment, so buckets can be split in any order (rather than strictly in order per standard linear hashing). It doesn’t matter how sparse allocated pages are, as long as 1) allocations within the pages themselves are dense, and 2) you can spare enough virtual memory (which you nearly always can).
They do mention it in passing, but I really can’t help but feel that the approach outlined here is probably not the best option in most cases. If you are measuring your memory budget in megabytes, you should probably just not use a garbage collected language.
Sure, but that’s tangential to my point. In a gced language, doing almost anything will generate garbage. Calling standard library functions will generate garbage. This makes it difficult to have really tight control of your memory usage. If you were to use, for example, c++ (or rust if you want to be trendy) you could carefully preallocate pretty much everything, and at runtime have no dynamic allocation (or very little, and carefully bounded, depending on your problem and constraints). This would be (for my skillset, at least) a much easier way to keep memory usage down. They do mention they have a lot of go internals expertise, so maybe the tradeoff is different for them, but that seems like an uncommon scenario.
I wouldn’t say that, because it’s likely that they wouldn’t have been short on memory to begin with if they hadn’t used a GC language. (And yes, I’m familiar with the pros and cons of GC; I’m writing a concurrent compacting GC right now for work.)
Only maybe. Without a gc long running processes can end up with really fragmented memory. With a gc you can compact and not waste address space with dead objects.
If you’re really counting megs, perhaps the better option is to forgo dynamic heap allocations entirely, like an embedded system does.
Technically yes. But they probably used this to deploy one code base for everything, instead of rewriting this only for the iOS part.
Exactly this. You can try to do this in a gced language, and even make some progress, but you will be fighting the language.
I feel like you’re being sarcastic, but making most of the app not do dynamic applications is not a crazy or extreme idea. It’s not super common in phone apps and the system API itself may force some allocations. But doing 90+% of work in statically allocated memory and indexed arenas is a valid path here.
Of course that would require a different language than Go, which they have good reasons not to do.
I’m being sarcastic. But one of the issues identified in the article is that different tailnets have different sizes and topologies - they rejected the idea of limiting the size of networks that would work with iOS which is what they’d need to do if they wanted to do everything statically allocated.
they rejected the idea of limiting the size of networks
They’re already limited. They can’t use more than the allowed memory, so the difference is - does the app tell you that you reached the limit, or does it get silently killed.
I believe that fragment was related to “how other team would solve it keeping other things the same” (i.e. keeping go). Preallocation/arenas requires going away from go, so it would give them more possible connections not less.
That is absolutely not my experience with garbage collectors.
Few are compacting/moving, and even fewer are designed to operate well in low-memory environments[1]. golang’s collector is none of that.
On the other hand, it is usually trivial to avoid wasting address space in languages without garbage collectors, and a application-specific memory management scheme typically gives 2-20x performance boost in a busy application. I would think this absolutely worth the limitations in an application like this.
[1]: not that I think 15mb is terribly low-memory. If you can syscall 500 times a second, that equates to about 2.5gb/sec transfer filling the whole thing - a speed which far exceeds the current (and likely next two) generations of iOS devices.
To back up what you’re saying, this presentation on the future direction that the Golang team are aiming to take is worth reading. https://go.dev/blog/ismmkeynote
At the end of that presentation there’s some tea-leaf reading about the likely direction that hardware development is likely to go in. Golang’s designers are betting on DRAM capacity improving in future faster than bandwidth improvements and MUCH faster than latency improvements.
Based on their predictions about what hardware will look like in future, they’re deliberately trading off higher total RAM usage in order to get good throughput and very low pause times (and they expect to move further in that direction in future).
One nitpick:
Few are compacting/moving,
Unless my memory is wildly wrong, Haskell’s generation 1 collector is copying, and I’m led to understand it’s pretty common for the youngest generation in a generational GC to be copying (which implies compaction) even if the later ones aren’t.
I believe historically a lot of functional programming languages have tended to have copying GCs.
At the end of that presentation there’s some tea-leaf reading about the likely direction that hardware development is likely to go in. Golang’s designers are betting on DRAM capacity improving in future faster than bandwidth improvements and MUCH faster than latency improvements.
Given the unprecedented semiconductor shortages, as well as crypto’s market influence slowly spreading out of the GPU space, that seems a risky bet to me.
That’s the short term, but it’s not super relevant either way. They’re betting on the ratios between these quantities changing, not on the exact rate at which they change. If overall price goes down slower than desired, that doesn’t really have any bearing.
Aren’t most GCs compacting and moving?
The first multi-user system I used heavily was a SunOS 4.1.3 system with 16MB of RAM. It was responsive with a dozen users so long as they weren’t all running Emacs. Emacs, written in a garbage collected, interpreted language would have run well on a much smaller system if there was only one user.
The first OS I worked on ran in 16MB of RAM and ran a Java VM and that worked well.
Any non-moving allocator is vulnerable to fragmentation from adversarial workloads (see Robson bounds), but modern size-class slab allocators (“segregated storage” in the classical allocation literature) typically keep fragmentation quite minimal on real-world workloads. (But see a fascinating alternative to compaction for libc-compatible allocators: https://github.com/plasma-umass/Mesh.)
This does strike me as a place where refcounting might be a better approach, if you’re going to have any dynamic memory at all.
With ref-counting you have problems with cycles and memory fragmentation. The short-term memory consumption is typically lower with ref-counting than a compacting GC, but the are many more opportunities to have leaks and grow over time. For a long-running process I’m skeptical that ref-counting is a sound choice.
Right. I was thinking that for this kind of problem with sharply limited space available you’d avoid the cycles problem by defining your structs so there’s no void* and the types form a DAG.
I will be playing + rating lots of ludum dare games! (It’s a game jam that took place last weekend, for two weeks after you play and rate the games that other people made) Anyone else on here take part? I’d be glad to rate your games.
My entry: https://ldjam.com/events/ludum-dare/49/drilbert-ii-diggeredoo
Slight tangent: as someone who has done some gamedev, and specifically multiplayer games using UDP, I’m unsure what the use case of a new project using raw TCP is these days. I have done custom UDP stuff, and used http, but I can’t think of why I would use raw TCP without http now.
Because you want streams, reliable delivery, and flow control, but you don’t need HTTP’s headers, request/response cycle, and so on?
The main thing that comes to mind is that HTTP isn’t a great wire format from the perspective of fast parsing and serialisation. You can get higher throughput with a binary protocol where e.g. fields are at fixed offsets and there isn’t much parsing to do.
Not really, I think. HTTP 2 burns a lot of CPU cycles on both sides in order to make better use of a long, slow network. There’s a whole complicated compression system and just no.
If we just bumped pids to 64 bits, we could create a million process a second without recycling ids for…
(2^63)/(1000000x60x60x24x365) ~= 292 thousand years. Obviously migration is a PITA, but this really seems like the ultimate solution.
I look forward to typing
kill -9 9203847209374947
.