1. 18

  2. 4

    It’s always been slightly weird to me that all these patch algorithms seem to operate at the level of individual lines of text, yet programs are not semantically broken up into lines.

    Fortunately for VCS systems, programmers tend to write programs so that semantically separate chunks are on successive lines, so these merge algorithms do the right thing most of the time. But wouldn’t an algorithm that was capable of breaking patches on lexical boundaries for the programming language in question give better results?

    1. 2

      AFAIK even semantic merging of XML documents is still a hard problem being researched in academia. Perhaps with some more advanced AI, we’ll eventually get there…

      1. 2

        Yes, but it would no longer be a general difftool; you now need to know which lexer to use. Eg Git supports choosing a difftool based on file extension.

        Given that different versions of the same language can have different rules, you will soon have a difficult task maintaining a list of known digging tools ?

        1. 2

          Sure, but at the same time we’re currently in a situation where whether a merge is correct can depend on the indentation choice the programmer made. That’s not exactly an ideal situation either!

        2. 1

          If your comment means “why don’t the tools know about the structure of the language”, the problem is - languages change. Syntactic constructs get added and removed, so the parser for code today may not parse that same code a year down the line. Thus baking language specific constructs also requires baking a lot more information in, and potentially dealing with diffs between two different schemas of ASTs, rather than just two different ASTs of the same schema.

        3. 2

          I would love to see a merge(1) style tool that uses this algorithm. This could then be plugged into any VCS.

          any real VCS needs to deal with a lot of tedious details (directories, binary files, file renaming, etc.).

          I would not call them “tedious details”. These are actually the “hard problems” you’re left with once textual merging is working.

          1. 1

            I would love to see a merge(1) style tool that uses this algorithm.

            I don’t think that’s possible without the patch history in a darcs or pijul repository.

            1. 1

              Ah, OK. That makes it more complicated than I thought it would be.

              1. 1

                You could probably reconstruct it from a git change history, but it would be very expensive & any cherry-picking or merging prior to the merge you’re trying to do will introduce ambiguity.

            2. 1

              I think the point of the paper / blog posts is that git / svn / mercurial / diff3 / meld style textural merging isn’t working!

              You can solve things like file renaming with a similar approach to the patch tracking that pijul already does. Binary files are just a pain regardless :)

              1. 1

                Yes, another solution to versioned file renaming (with branching and merging) would be very welcome.

                In SVN’s design it cannot really be solved, it can only be worked around.

                Git punts on the entire problem by instead guessing where content has gone to (which works most of the time but not always).

                Some systems (e.g. ClearCase) assign IDs to objects and then keep track of which ID sits at which path. That does not scale to distributed systems and edge-cases remain terribly complicated because moving things between directories allows for essentially arbitrary differences in tree constellations – and just like with text, merging arbitrary tree changes 100% correctly all the time is hard.

                My BSc thesis was dealing with this subject and barely scratched the surface.