1. 11
  1.  

  2. 3

    This oldie is a goodie, make sure to check out the sequel Kind of Like That. Does anyone have any tips or blog to do something similar with audio instead? Feels a bit harder, given that it’s not static in time unlike images.

    1. 4

      Do you mean audio fingerprinting? Take a look at https://acoustid.org/chromaprint

      Here’s an article about using chromaprints that I recently borrowed code from: https://medium.com/@shivama205/audio-signals-comparison-23e431ed2207

      Or: https://mtg.github.io/essentia-labs/news/2019/09/05/cover-song-similarity/

      1. 3

        My naive guess: maybe you can convert the sound to a spectrogram, and then think of it as an image?

        Feels a bit harder, given that it’s not static in time unlike images.

        What do you mean? Audio varies over time (unlike images), but an image varies over position (unlike audio). Maybe with audio, the query is a short clip, so you want to find both 1. which song contains this 2. at which offset? That seems analogous to comparing cropped images.

        1. 1

          Fair point on images varying on X and Y axis while audio varying on amplitude and time (also two axis). But I think I was referring to the fact that different, more fancy transformations are needed for audio? Like the signal goes up and down really fast instead of being the contour of the sound. I hardly have any knowledge on signals, so I wouldn’t know.

          Comparing cropped images, much like a short clip of audio in a song, really increases the difficulty as well. Thanks for the comment though, I think this problem is good food for thought.

          1. 2

            Like the signal goes up and down really fast instead of being the contour of the sound.

            That’s a good point! The individual peaks and troughs aren’t very meaningful to us. A spectrogram seems closer to what we perceive as the “contour of the sound”.

            You can use the Fourier transform to break up a signal into its component frequencies. 3blue1brown has some nice explanations of that (https://www.youtube.com/watch?v=spUNpyF58BY). To get frequency-over-time you could chop the signal up into little slices and take the Fourier transform of each slice (https://en.wikipedia.org/wiki/Short-time_Fourier_transform).