1. 2
  1. 2

    Has anybody compared it with the original python implementation? Does it run faster? How much more effort does it need to be able to fine-tune or train the model?

    1. 1

      Transcribing a 16k wav converted from a 43:10.46 podcast using medium.en.

      whisper.cpp, macOS 13.1 Beta, M1 with Accelerate, battery power: ~800s whisper, Windows 10, 3080 using CUDA: ~1200s

      whisper.cpp gets about 2:00 of output before whisper outputs anything and then slowly widens the gap until it finishes at 43:10 when whisper is on 31:13. No idea why whisper is being that slow; occasionally my Windows box just cannot be bothered to work above a snail’s pace.

      1. 1

        As per the author:

        The implementation runs fully on the CPU and utilizes FP16, AVX intrinsics on x86 architectures and NEON + Accelerate framework on Apple Silicon. The latter is especially efficient and I observe that the inference is about 2-3 times faster compared to the current PyTorch implementation provided by OpenAI when running it on my MacBook M1 Pro.

      2. 2

        I have some minor issues with how the makefile is written or rather I’m missing some typical targets which makes it a bit harder to package for nixos, but overall it is quite easy to use. Maybe I’ll do a PR for that when I have a bit more free time on my hands. I still need to look into it a bit deeper since I wanted to build a super basic voice recognition assistant with this and I need a different weighting for some key phrases. Let’s see how that one goes.