      Actually, does anyone know of a tool like this designed to be applied to source code? I have an interesting use case or two for asking “how similar are these programs?”, as opposed to checksums which ask “are these programs identical?”

      Since I’m dealing with source code documents I assume doc2vec or other general purpose ML-y text analysis will produce something useful, eventually, but wondered if there was anything more specialized.

        You’d want to be operating on the AST, I think, rather than on the textual representation.