Has anybody compared it with the original python implementation? Does it run faster? How much more effort does it need to be able to fine-tune or train the model?
Transcribing a 16k wav converted from a 43:10.46 podcast using medium.en.
whisper.cpp, macOS 13.1 Beta, M1 with Accelerate, battery power: ~800s
whisper, Windows 10, 3080 using CUDA: ~1200s
whisper.cpp gets about 2:00 of output before whisper outputs anything and then slowly widens the gap until it finishes at 43:10 when whisper is on 31:13. No idea why whisper is being that slow; occasionally my Windows box just cannot be bothered to work above a snail’s pace.
The implementation runs fully on the CPU and utilizes FP16, AVX intrinsics on x86 architectures and NEON + Accelerate framework on Apple Silicon. The latter is especially efficient and I observe that the inference is about 2-3 times faster compared to the current PyTorch implementation provided by OpenAI when running it on my MacBook M1 Pro.
I have some minor issues with how the makefile is written or rather I’m missing some typical targets which makes it a bit harder to package for nixos, but overall it is quite easy to use. Maybe I’ll do a PR for that when I have a bit more free time on my hands. I still need to look into it a bit deeper since I wanted to build a super basic voice recognition assistant with this and I need a different weighting for some key phrases. Let’s see how that one goes.
Has anybody compared it with the original python implementation? Does it run faster? How much more effort does it need to be able to fine-tune or train the model?
Transcribing a 16k wav converted from a 43:10.46 podcast using
medium.en
.whisper.cpp, macOS 13.1 Beta, M1 with Accelerate, battery power: ~800s whisper, Windows 10, 3080 using CUDA: ~1200s
whisper.cpp gets about 2:00 of output before whisper outputs anything and then slowly widens the gap until it finishes at 43:10 when whisper is on 31:13. No idea why whisper is being that slow; occasionally my Windows box just cannot be bothered to work above a snail’s pace.
As per the author:
I have some minor issues with how the makefile is written or rather I’m missing some typical targets which makes it a bit harder to package for nixos, but overall it is quite easy to use. Maybe I’ll do a PR for that when I have a bit more free time on my hands. I still need to look into it a bit deeper since I wanted to build a super basic voice recognition assistant with this and I need a different weighting for some key phrases. Let’s see how that one goes.