I’m working on a side project to identify audio clips that have been (poorly) pieced together. A short clip may actually be the result of piecing together clips from one or more of series of larger works of approximately the same audio quality. These are voice clips from podcasts. A goal is to detect alterations that significantly change what the speaker said, e.g. by leaving out context as larger as parts of a sentence to as small as cutting out “not”.
Ideally, I’d like to split the smaller clip at the silent moments (actual silence according to the waveforms in Audacity) and then identify where in the other clips these small sections came from.
I’ve found a host of tools that are aimed at music: AcoustID, echoprint, etc. What tooling I’ve found seems to check presence of a needle in the haystack but not expose where in the clip it came from: music identification doesn’t need the precision I seek.
I’d appreciate pointers to software that might be able to just do it or advice on how to implement something like this. This is my first time doing anything with audio programatically so I’m out of my element but looking to learn.