Whisper is a great tool to transcribe audio, it however has some drawbacks. Namely the large model is just too big to fit in a simple commercial GPU’s video RAM and it is painfully slow on simple CPUs.
This is where quantization comes in the picture. In my previous article, I have already covered the installation of whisper-ctranslate2 which offloads the processing to GPU using a quantized model. Now I will cover on how the CPU or non-Nvidia GPUs can be utilized with the whisper.cpp framework.
Preparing the environment
I assume you already have git, curl and Anaconda installed, if not, there are great resources explaining those on the Internet.
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
Building for CPU
cmake . --fresh
msbuild ALL_BUILD.vcxproj /p:Configuration=Release
copy bin\Release\main.exe bin\Release\whisper.dll ..\whisper_cpp.exe
Building with OpenBLAS
Download OpenBLAS from https://github.com/xianyi/OpenBLAS/releases. Extract the release into a folder in your source path (OpenBLAS-0.3.23-x64 at the time of writing)
set OPENBLAS_PATH=OpenBLAS-0.3.23-x64
cmake . --fresh -DWHISPER_OPENBLAS=ON -DBLAS_LIBRARIES=OpenBLAS-0.3.23-x64\lib\libopenblas.lib
msbuild ALL_BUILD.vcxproj /p:Configuration=Release
copy OpenBLAS-0.3.23-x64\bin\libopenblas.dll ..\libopenblas.exp.dll
copy bin\Release\whisper.dll ..
copy bin\Release\main.exe ..\whisper_cpp.exe
copy bin\Release\quantize.exe ..
Building with CLBlast
You will need the OpenCL libraries for your architecture. I am targeting Intel Iris Xe processor, built in the I7 and it should also support the ARC family. For Intel you can download the SDK from here. Once downloaded, extract and install to the recommended location.
Download CLBlast from https://github.com/CNugteren/CLBlast/releases. Extract the release directory into your source path. (1.6.1 at the time of writing).
Edit the CLBlast-1.6.1-windows-x64\lib\cmake\CLBlast\CLBlastConfig.cmake file and change the part in the INTERFACE_INCLUDE_DIRECTORIES line from ;C:/vcpkg/packages/opencl_x64-windows/include to the path you installed the SDK in (C:\Program Files (x86)\IntelSWTools\system_studio_2020\OpenCL\sdk\include).
INTERFACE_INCLUDE_DIRECTORIES "${_IMPORT_PREFIX}/include;C:/Program Files (x86)/IntelSWTools/system_studio_2020/OpenCL/sdk/include/"
Edit the CLBlast-1.6.1-windows-x64\lib\cmake\CLBlast\CLBlastConfig-release.cmake file and change the line IMPORTED_LINK_INTERFACE_LIBRARIES_RELEASE to point to “C:/Program Files (x86)/IntelSWTools/system_studio_2020/OpenCL/sdk/lib/x64/OpenCL.lib”
set CLBlast_DIR=lib\cmake
cmake . --fresh -DWHISPER_CLBLAST=ON
msbuild ALL_BUILD.vcxproj /p:Configuration=Release
copy CLBlast-1.6.1-windows-x64\lib\clblast.dll ..
copy bin\Release\whisper.dll ..
copy bin\Release\main.exe ..\whisper_cpp.exe
copy bin\Release\quantize.exe ..
I fixed the issue with the kernel (Windows line-encoding issue) and the fixes are already included in the upstream.
Building for NVIDIA
To use MSVC to build with CUDA support, you need to install MS Visual Studio as well as the cuda-toolkit on your computer (not just within Conda).
Download toolkit from https://developer.nvidia.com/cuda-toolkit-archive
All you really need is the CUDA\Runtime\Libraries, CUDA\Development and CUDA\Visual Studio Integration, so you can select custom install and untick everything but these.
cmake . --fresh -DWHISPER_CUBLAS=ON
msbuild ALL_BUILD.vcxproj /p:Configuration=Release
copy bin\Release\whisper.dll ..
copy bin\Release\main.exe ..\whisper_cpp.exe
Preparing your model
Switch to your whisper directory and create a directory to hold your models
cd ..
md models
Download your favourite model using, the following commands (you better delete it first, to make sure you download the entire model). As the cmd in whisper.cpp/models uses the extremely slow PowerShell to get the file, you may use curl the following if you are in a hurry.
set model=large
curl -L https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-%model%.bin -o models\ggml-%model%.bin
Quantize the model to make it smaller, you can use q4_0, q4_1, q5_0, q5_1, q8_0 as the quantization scale.
quantize.exe models\ggml-large.bin models\ggml-large-q4_0.bin q4_0
Run on existing sound files
Use ffmpeg to convert your original to
ffmpeg -i INPUT.MP3 -ar 16000 -ac 1 -c:a pcm_s16le OUTPUT.WAV
Transcribing your files
whisper_cpp.exe --help usage: whisper_cpp.exe [options] file0.wav file1.wav ... options: -h, --help [default] show this help message and exit -t N, --threads N [4 ] number of threads to use during computation -p N, --processors N [1 ] number of processors to use during computation -ot N, --offset-t N [0 ] time offset in milliseconds -on N, --offset-n N [0 ] segment index offset -d N, --duration N [0 ] duration of audio to process in milliseconds -mc N, --max-context N [-1 ] maximum number of text context tokens to store -ml N, --max-len N [0 ] maximum segment length in characters -sow, --split-on-word [false ] split on word rather than on token -bo N, --best-of N [2 ] number of best candidates to keep -bs N, --beam-size N [-1 ] beam size for beam search -wt N, --word-thold N [0.01 ] word timestamp probability threshold -et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail -lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail -su, --speed-up [false ] speed up audio by x2 (reduced accuracy) -tr, --translate [false ] translate from source language to english -di, --diarize [false ] stereo audio diarization -tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model) -nf, --no-fallback [false ] do not use temperature fallback while decoding -otxt, --output-txt [false ] output result in a text file -ovtt, --output-vtt [false ] output result in a vtt file -osrt, --output-srt [false ] output result in a srt file -olrc, --output-lrc [false ] output result in a lrc file -owts, --output-words [false ] output script for generating karaoke video -fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video -ocsv, --output-csv [false ] output result in a CSV file -oj, --output-json [false ] output result in a JSON file -of FNAME, --output-file FNAME [ ] output file path (without file extension) -ps, --print-special [false ] print special tokens -pc, --print-colors [false ] print colors -pp, --print-progress [false ] print progress -nt, --no-timestamps [false ] do not print timestamps -l LANG, --language LANG [en ] spoken language ('auto' for auto-detect) -dl, --detect-language [false ] exit after automatically detecting language --prompt PROMPT [ ] initial prompt -m FNAME, --model FNAME [models/ggml-base.en.bin] model path -f FNAME, --file FNAME [ ] input WAV file path -oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
Results
Transcoding a 10 minutes sound file with the same model parameter size has produced similar outputs on all models, but at a much better speed.
whisper (standard) | 11 033s |
whisper-ctranslate2 | 1 623s |
whisper.cpp CPU (q4_0) 1 processor, 4 threads | 1 001s |
whisper.cpp CPU (q4_0) 3 processors, 6 threads | 886s |
whisper.cpp CLBLast (q4_0) 1 processor, 4 threads | 743s |
whisper.cpp CLBLast (q4_0) 3 processors, 6 threads | 577s |
Hi,
When I run main.exe, there is no result displayed.
Also when I run your app in the command mode, there is a popup window and the whole app crashes.
Any idea why?
Hi,
First of all, I cannot take any credit for the app, all I did was listing the steps to build the native Whisper implementation by the legendary G. Gerganov. (https://github.com/ggerganov/whisper.cpp) I did the writeup mostly for myself, so it will be repeatable across a number of my machines.
Unfortunately, I can’t help you, as I have zero information on what machine you are running, what build you made, what the popup looks like, etc.
main.exe MUST be run from the command line and there is no way it would not return without a response, if you built for the proper environment.
Try building for the CPU first, that’s the simplest. Once you’re done with that, you can start building for your proper environment OpenBlas should run on most architectures, ClBlast needs CPU specific libraries, of those I got Intel’s as all my machines run Intel, so if you have AMD, you’ll need to research it yourself. If you have Nvidia GPU, you may try the CuDA version.
If you find a particular issue, please let me know, so I can update the article!