1. 29

TL;DR: when creating links to code on github, use the magic ‘y’ keyboard shortcut instead of linking to a branch, which will change. Works with line number anchor links, too.

  1.  

  2. 14

    Quite frankly, this is just a usability bug in GitHub. Clicking a line number should implicitly do the y thing before appending #L to the uri.

    1. 5

      I agree that it’s a usability bug but I don’t think linking to a specific commit (the ‘y’ behavior) is desirable. It should link to the line you originally linked to at the current version (it may have moved, but we have that info in the git history), or if the line was deleted, should display “that line no longer exists, it last appeared in commit ab352cf”

      1. 9

        But the linker is not necessarily trying to link to a single line. Often they are trying to link to a larger context which may have disappeared by the current version even if the line they clicked on still exists. Or it could be the opposite. The line they linked may have disappeared by the overall aim of the link remains the same.

        1. 6

          Pretty sure you can’t identify a line that accurately through git history in any automated fashion. Git’s basic unit is the whole file object, the concept of lines being moved around or compared is all provided by higher-level tooling. That tooling still has problems with complicated scenarios.

          1. 3

            I think what cmm is trying to say is that the link should refer to the current latest commit. That way future viewers will see the content of the file in the context it was originally linked.

            It might also be cool to think about a semantic linking scheme, following a piece of code through history. I have no idea how hard it would be to implement something like that.

            1. 1

              Here’s my idea:

              Super hard. :-)

              1. 5

                Apologies for carrying this argument into multiple threads, but I think you are overstating this problem’s difficulty. Nobody seems to believe me that this is doable, so I quickly hacked up a solution.

                Clone https://github.com/Droogans/.emacs.d/ (the repo from the linked example) somewhere. I snooped the history and the original commit they linked to was 6ecd48 from 2014. Run fwd_lines.py 6ecd48 init.el 135 | less and scroll through. It will display both files with the corresponding lines marked with >>> <<<. You will see that we have correctly identified line 177 as the corresponding line in HEAD, 3 years later.

                In general, any time GitHub’s diff functionality would identify lines as the same, so will fwd_lines.py. Like I argued in my original thread, if this kind of identification wasn’t accurate enough for general usage, code reviews would be a lot more difficult than they currently are. And this is just using an off-the-shelf diff algorithm and only using the linked commit and HEAD. You can increase the precision by taking into account the entire history between those two commits, and you can add some error-tolerance by allowing lines to match w/ some noise. Finally, if there’s the occasional error, I would argue that’s better than the alternatives: linking arbitrary lines (currently) or linking stale results (‘y’).

                I haven’t exhaustively tested this so if anyone finds an interesting case where this fails, let me know.

                1. 1

                  Nice demo! It was cool to see it run. I still want to make a case though for preferring “linking stale results” as you put it to either the current or your proposed option. I’ll provide two justifications for my position:

                  a) You’re right that you can easily do as well as Github’s existing diff results. But there’s no reason to think that may be good enough. If I move a function twice in its history it’s not really acceptable for early links to inside it to forever error out with a message that it was deleted. Showing a function to have been deleted in one set of lines and added back in another is fine for a diff presentation, but not a good fit for taking lines forward.

                  b) Even in situations where you can take lines forward correctly, as @tome pointed out there’s perfectly valid use cases where you want to show a link as it was at the time you wrote it. In fact, in my experience I rarely encounter situations where I’m referring to a set of lines of code without wanting to talk about what’s there. In that situation explanations get very confusing if the code changes from under you. Even if you don’t agree with me that that’s the primary use case I hope it’s reasonable that it is a use case. So perhaps we need two different sets of links, one for the specific version and one with “smart take-forward”. But then that gets confusing. If I had to choose I’d rather create just permalinks that don’t try to be smart, just link to the code I want to link to. Going from a specific version of a file to head is pretty easy, so checking what the code looks like right now should, I argue, remain a manual process. That would be robust to the errors I described above, and it also gives me control to pick the precise tag or branch that I want to compare with (trunk vs release 1 vs release 2, etc.)

                  1. 1

                    If I move a function twice in its history it’s not really acceptable for early links to inside it to forever error out with a message that it was deleted.

                    Solvable by taking into account the entire history rather than just the linked commit and HEAD, and in my first message I acknowledged that we can just display the commit linked to if the line was deleted. You lose nothing over ‘y’

                    Even in situations where you can take lines forward correctly, as @tome pointed out there’s perfectly valid use cases where you want to show a link as it was at the time you wrote it.

                    I don’t remember advocating for GitHub to remove the ‘y’ functionality, I just don’t think it’s the best default. It’s certainly not what I want most of the time when I link people to a particular line. Outside of a code review the historical perspective doesn’t matter that much to me

                    So perhaps we need two different sets of links, one for the specific version and one with “smart take-forward”.

                    This is already the case, except replace “smart take-forward” with “arbitrary nonsense”. All of GitHub’s links are relative to HEAD unless you explicitly ask

                    1. 2

                      We can agree that the current default is “arbitrary nonsense”. Beyond that we’ll have to agree to disagree about what the best default is.

                      Falling back to ‘y’ on deletes is reasonable, yes. But here’s another example to show that this is a hard problem with tons of rare use cases that a normal codebase is going to run afoul of at some point (causing all older links to break each time). Say we have an intervening commit that replaces:

                      f(a, b, c);
                      

                      with

                      f(a, b, c, false);
                      f(a, b, c, true);
                      

                      What would you want your tool to do? I would just like to see the former. I would very much want my tools to not mislead me about what somebody meant when they sent me a link to the former.

                      Hmm, perhaps the best of both worlds would be to show the lines highlighted in the specific commit with a call-out link up top to perform the smart carry forward. I don’t object to smarts, but unless they’re utterly reliable I don’t want to be forcibly opted in to them.

                      1. 1

                        What would you want your tool to do? I would just like to see the former.

                        This indeed gets fuzzier if you allow lines to match with noise, but without that you would get what you want, because that line no longer exists in the source, so we fall back to the commit you linked at.

                        Hmm, perhaps the best of both worlds would be to show the lines highlighted in the specific commit with a call-out link up top to perform the smart carry forward. I don’t object to smarts, but unless they’re utterly reliable I don’t want to be forcibly opted in to them.

                        Probably a good compromise, I didn’t expect so much push-back on this idea. I don’t understand what you mean by “forcibly opted in” though. Do you think you are currently being “forcibly opted in” in the current relative-to-HEAD linking scheme? When you link other sites than GitHub, are you miffed that your browser doesn’t automatically replace your link with archive.org? I think relative-to-HEAD is a useful convenience and my suggestion is in the vein of making it better, not ruining all of your specific-commit links.

                        1. 1

                          My mental model is that when I make simple actions like taking a step or clicking on a link, I expect the response to them to be simple. If I see a link to a file on github I expect it to go to HEAD. If I try to go to a link and it doesn’t exist I want to know that so that I can decide whether I want to go to archive.org. I would be miffed if I was automatically taken to archive.org. I wouldn’t be miffed if that happened because I chose to install an extension or turn on some setting to automatically be taken to archive.org.

                          As a second example, the autocomplete on my phone has lately gotten smart enough to enter some uncanny valley where it’s often doing bullshit like replacing ‘have’ with ‘gave’. It has also started replacing words two back, so even if I check each word after I type it I still end up sending out crap I didn’t type. This kind of shit gives me hives. I’d rather go back to making typos with an utterly dumb keyboard than put up with crappy ‘smarts’ like this.

                          You’re right that the current default on github is “arbitrary nonsense”, but I’d actually prefer it to your smart carry-forward – unless I could choose when I want to try it at a time of my own choosing.

                          1. 1

                            If I see a link to a file on github I expect it to go to HEAD.

                            Do you then also disagree with auto-‘y’ (auto archive.org), the perspective of the person who started this thread?

                            You’re right that the current default on github is “arbitrary nonsense”, but I’d actually prefer it to your smart carry-forward – unless I could choose when I want to try it at a time of my own choosing.

                            But, arbitrary nonsense doesn’t work, just like your crappy smartphone autocomplete doesn’t work. That’s the whole point of the linked post, and the perspective of virtually everyone I’ve talked to about this issue. You are using simplicity and predictability as measuring sticks here, but the behavior of a GitHub relative-to-HEAD link a year from now is completely unpredictable. Isn’t it worth investigating alternatives that might work, even if they might end up sucking like phone autocomplete? I mean, I do acknowledge that you are saying “I could choose when I want to try it”, but I can’t help but feel like this is a way of ducking engaging with the idea.

                            1. 1

                              Yes, I definitely don’t mean to rain on your parade. My initial comment was just to inject a note of caution and a competing viewpoint. Absolutely, this is all fascinating and worth trying. I think I’m just frustrated with my phone :)

                              I didn’t find OP’s auto-y unpredictable because as the reader I can see that a shared URL refers to a particular tree state. Whereas I interpreted your proposal to be taking a URL to a specific tree state but then showing me something outside of that tree. Is that accurate?

                              Coda: after I first submitted this comment I noticed that my phone had autocompleted ‘auto-y’ to ‘auto-insertion’. Sigh..

                              1. 2

                                Whereas I interpreted your proposal to be taking a URL to a specific tree state but then showing me something outside of that tree. Is that accurate?

                                Technically accurate, but maybe missing intent. The intent of such a URL is to point to HEAD. To make this completely concrete, instead of links like

                                https://github.com/Droogans/.emacs.d/blob/mac/init.el#L135

                                we would have links like

                                https://github.com/Droogans/.emacs.d/blob/mac/init.el#6ecd48/L135 (if GitHub supported this type of URL, clicking this would take you here)

                                We use 6ecd48 here similar to how on the web we use redirects to ensure that URLs always point to the latest content, or in the case where the line was deleted, error 404 + “maybe check here?”

                                I didn’t find OP’s auto-y unpredictable because as the reader I can see that a shared URL refers to a particular tree state.

                                I think I would find this behavior more predictable if navigating to https://github.com/Droogans/.emacs.d/ started a new “session” by redirecting you to https://github.com/Droogans/.emacs.d/tree/aecc15138a407c69b8736325c0d12d1062f1f21b (the current commit HEAD points to). Otherwise, there is an inconsistency between your current browsing state (your URL is always pointing to HEAD) and the links you are sending to others.

                                Coda: after I first submitted this comment I noticed that my phone had autocompleted ‘auto-y’ to ‘auto-insertion’. Sigh..

                                Maybe if I owned a smartphone I would better understand everyone’s objections :)

                                1. 2

                                  Indeed, this comment helps understand our relative positions. I see now that you’re right – what we find ‘predictable’ is subjective. I find OP’s auto-y more predictable partly because it fits my needs more often than a link to HEAD. Your hash/line proposal seems pretty reasonable!

            2. 1

              Git’s storage model and the functions provided in their CLI toolkit/libgit2/whatever GitHub uses are irrelevant. With the history and a modified diff algorithm you can identify lines across commits. Make an index of that data. GitHub already has to maintain a separate index for search, nothing new.

              1. 3

                With the history and a modified diff algorithm you can identify lines across commits.

                What @jtdowney is saying is: No, you can’t. You can almost do it. You can often do it. But the only reliable way to know what code that link was talking about, is to show the original file where the line belongs.

                1. 1

                  All of what you’re saying applies to diff algorithms in general. It’s necessarily a fuzzy, ill-defined task, and yet we have effective algorithms for solving it. Our whole software development industry is powered by diff, if it’s good enough for pull requests, it’s good enough for forwarding links to a current commit. We’re not talking about mission critical software here.

                  edit: also, that is not what the parent comment was saying. Their criticism was rooted in Git’s storage model and tooling, as if I was implying GitHub should shell out to a non-existent “git find-corresponding-line-in-commit” command. If someone wants to argue that what I am suggesting is not feasible it should not be grounded in Git implementation details.

                  1. 1

                    Yes.

                    Good points. I agree that a good-enough algorithm could show us the line in an updated context or detect that it can’t and then fall back to showing us the place where that line number was originally anchored.

                    And yes, the low-level tooling is irrelevant. No matter how you store the data, unless you have a human confirm that yes, this line has the same identity as that line, it’s still just a succession of changesets (as svn stores it), which is just the dual of a succession of trees (as git stores it).

                    Maybe the way bzr and svn handle file copies can help things along, but it’s not an essential piece of information (as git shows us), and there’s no tool out there that explicitly tracks line identities. Bzr does it in the low-level layer, but it’s still algorithmic, not human-verified, so that metadata has no more value than what git can infer after the fact. Basically it’s done to amortize some of the work of the diff/merge and for space efficiency.

                    Thank you. Worse is better. Don’t let the perfect get in the way of the good.

        2. 3

          The internet in general has this problem too which can’t be solved so easily.

          1. 2

            I wish Gerrit could link to ranges of line numbers like github can.

            I linked to lines of code in some documentation the other day, and I made sure to always link to a specific version of a file.