it looks like the output isn’t actually unified diff to be used as a patch? Seems like a missed opportunity
Patches operate on text; this tool shows semantic difference, which is not the same thing as lexical difference.
The output of this tool is still text, though. You could produce a patch that only contained the lines with a semantic difference, which would be a huge win.
Wouldn’t you need a format that you require? Is one file formatted with black and one is formatted with flake8 and which do you want? Shouldn’t this be the first step or last step where you format the final file and the initial file with a formatter? Then you run this to see if there are differences and it will print them both in a “good” format? Or maybe after and print both the snippets piped through a formatter into a “good” format?
Think of working with two people’s code who doesn’t match your preferred style. Or someone submitting patches that don’t match your style. You don’t want a patch to push the formatting in either direction in the first case and definitely not in the post direction in the second case.
I had a student many years ago who worked on something similar (he did Python and C) for the first stage but then extended it to try to infer refactorings. I don’t find the output from this tool much more readable than the input. His tool, in contrast, would tell you things like ‘function X renamed to Y and all callers updated’ or ‘function Y inlined into X and Z’. It was most useful when it said things like ‘variable X renamed to Y and all uses updated except in function Z in foo.c line 12’ because that’s very easy to miss in code review (and if X was renamed because it shadowed another variable then you may not get a compile failure). I’d love to see a maintained implementation of these ideas and this looks like it might be a good building block.
Similar: https://github.com/wilfred/difftastic (WIP) Parsing inspired by Comby.
It’s supported by TreeSitter, so I’m guessing the exclusion is explicit.