https://github.com/rhinodevel/mt_stt
Pure C wrapper library to use Whisper.cpp with Linux and Windows as simple as possible.
https://github.com/rhinodevel/mt_stt
speech-to-text stt whisper whisper-cpp
Last synced: about 1 month ago
JSON representation
Pure C wrapper library to use Whisper.cpp with Linux and Windows as simple as possible.
- Host: GitHub
- URL: https://github.com/rhinodevel/mt_stt
- Owner: RhinoDevel
- License: mit
- Created: 2025-05-18T15:31:53.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-05-24T11:39:08.000Z (5 months ago)
- Last Synced: 2025-07-09T23:28:11.248Z (3 months ago)
- Topics: speech-to-text, stt, whisper, whisper-cpp
- Language: C++
- Homepage:
- Size: 18.6 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# mt_stt
*Marcel Timm, RhinoDevel, 2025*
**mt_stt** is a C++ library for Linux and Windows that offers a pure C interface
to the great speech-to-text inference engine
[Whisper.cpp](https://github.com/ggml-org/whisper.cpp) by Georgi Gerganov that
itself runs [OpenAI Whisper](https://github.com/openai/whisper) models.With **mt_stt** you can:
- Transcribe from raw audio in memory to a string.
- Use a model to be loaded from file or already held in memory.
- Translate to English.
- Add an optional initial prompt (to bias/help the transcription process).
- Use a progress callback and the cancel option.
- Optionally transcribe a specific part of the audio data, only.
- Output probabilities of the transcribed words (how sure the model is about the
word representing the correct result).## How To
Clone the **mt_stt** repository:
`git clone https://github.com/RhinoDevel/mt_stt.git`
Enter the created folder:
`cd mt_stt`
Get the [Whisper.cpp](https://github.com/ggml-org/whisper.cpp) submodule
content:`git submodule update --init --recursive`
## Linux
No details for Linux here, yet, but you can take a look at the Windows
instructions below and at the [Makefile](./mt_stt/Makefile).## Windows
#### Note:
All the following examples are building static libraries, there may be use cases
where dynamically linked libraries are sufficient, too.### Build [Whisper.cpp](https://github.com/ggml-org/whisper.cpp)
#### Compile `whisper.lib` and `ggml.lib` as static libraries
Compile the necessary `whisper.lib` and `ggml.lib` libraries via Visual Studio
and `mt_stt/whisper.cpp/CMakeLists.txt` as static libraries.To do that, modify the file `mt_stt/whisper.cpp/CMakePresets.json` which is
created by Visual Studio:If the binary of `git` is not in your path, modify `"configurePresets"` entry
with `"name"` `"windows-base"` by adding the following entry to
`"cacheVariables"`:`"GIT_EXE": "C:\\Program Files\\Git\\bin\\git.exe"`
Add entry
```
{
"name": "mt-x64-release-static",
"displayName": "MT x64 Release Static (native)",
"description": "MT: Target Windows (64-bit), static, with the Visual Studio development environment. (RelWithDebInfo)",
"inherits": "x64-release",
"cacheVariables": {
"BUILD_SHARED_LIBS": "OFF"
}
}
```to `mt_stt/whisper.cpp/CMakePresets.json`'s `configurePresets` array.
#### OpenBLAS build
Download [OpenBLAS](http://www.openmathlib.org/OpenBLAS/) (e.g.
`OpenBLAS-0.3.29-x64.zip`) and unpack the content to `C:\openblas`.Additionally add entry
```
{
"name": "mt-x64-release-static-blas",
"displayName": "MT x64 Release Static BLAS",
"description": "MT: Target Windows (64-bit), static, BLAS, with the Visual Studio development environment. (RelWithDebInfo)",
"inherits": "mt-x64-release-static",
"cacheVariables": {
"GGML_BLAS": "ON",
"BLAS_LIBRARIES": "C:/openblas/lib/libopenblas.lib",
"BLAS_INCLUDE_DIRS": "C:/openblas/include"
}
}
```Put the `libopenblas.dll` (from `C:\openblas\bin\libopenblas.dll`) into the
folder of the executable file that will be linked with THIS project's resulting
DLL.#### CUDA build
Working with (e.g.): CUDA 12.4.131 and Whisper.cpp v1.7.5
Additionally add entry
```
{
"name": "mt-x64-release-static-cuda",
"displayName": "MT x64 Release Static CUDA (native)",
"description": "MT: Target Windows (64-bit), static, CUDA, with the Visual Studio development environment. (RelWithDebInfo)",
"inherits": "mt-x64-release-static",
"cacheVariables": {
"GGML_CUDA": "ON"
}
}
```to `mt_stt/whisper.cpp/CMakePresets.json`'s configurePresets array.
In **mt_stt**, link with these libraries (e.g. from `C:\cuda\lib\`):
- `x64\cublas.lib`
- `x64\cuda.lib`
- `x64\cudart.lib`Put the following files (e.g. from `C:\cuda\bin`) into the folder of the
executable file that will be linked with **this** project's resulting DLL:- `cublas64_12.dll`
- `cublasLt64_12.dll`
- `cudart64_12.dll`On a non-development PC, make sure that the most recent Nvidia drivers are
installed (they include CUDA support).#### Build for non-AVX processors (e.g. Celeron)
Additionally add entry
```
{
"name": "mt-x64-release-static-sse",
"displayName": "MT x64 Release Static SSE",
"description": "MT: Target Windows (64-bit), static, SSE, with the Visual Studio development environment. (RelWithDebInfo)",
"inherits": "mt-x64-release-static",
"cacheVariables": {
"GGML_NATIVE": "OFF",
"GGML_AVX": "OFF",
"GGML_AVX2": "OFF"
}
}
```to `mt_stt/whisper.cpp/CMakePresets.json`'s configurePresets array.
**and** change the line
`#if defined(_MSC_VER) && (defined(__AVX__) || defined(__AVX2__) || defined(__AVX512F__))`
to
`#if defined(_MSC_VER)// && (defined(__AVX__) || defined(__AVX2__) || defined(__AVX512F__))`
in the file
`mt_stt/whisper.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h`
before the line
`#ifndef __SSE3__`
to enable SSE3 and SSSE3.
#### Build for non-AVX processors (e.g. Celeron), with OpenBLAS
Download [OpenBLAS](http://www.openmathlib.org/OpenBLAS/) (e.g.
`OpenBLAS-0.3.29-x64.zip`) and unpack the content to `C:\openblas`.Additionally add entry (also don't forget `ggml-cpu-impl.h` - see above)
```
{
"name": "mt-x64-release-static-sse-blas",
"displayName": "MT x64 Release Static SSE and BLAS",
"description": "MT: Target Windows (64-bit), static, SSE, BLAS, with the Visual Studio development environment. (RelWithDebInfo)",
"inherits": "mt-x64-release-static-sse",
"cacheVariables": {
"GGML_BLAS": "ON",
"BLAS_LIBRARIES": "C:/openblas/lib/libopenblas.lib",
"BLAS_INCLUDE_DIRS": "C:/openblas/include"
}
}
```Put the `libopenblas.dll` (from `C:\openblas\bin\libopenblas.dll`) into the folder
of the executable file that will be linked with **this** project's resulting DLL.### Build mt_stt
- Open solution `mt_stt.sln` with Visual Studio (tested with 2022).
- Compile in release or debug mode.### Test mt_stt
- The sample code below is using [mt_tts](https://github.com/RhinoDevel/mt_tts),
which is kind of the counterpart to **this** project.
- Follow [Test mt_tts](https://github.com/RhinoDevel/mt_tts?tab=readme-ov-file#test-mt_tts)
first.
- Get the DLL and LIB files resulting from building **this** project, e.g. for
release mode `x64\Release\mt_stt.dll` and `x64\Release\mt_stt.lib`, copy them
to the folder from [Test mt_tts](https://github.com/RhinoDevel/mt_tts?tab=readme-ov-file#test-mt_tts).
- Also copy the file `mt_stt\mt_stt.h` to that folder.
- Copy a [Whisper(.cpp) model file](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small-q5_1.bin) that supports translation to English to the same new folder.
- Open `x64 Native Tools Command Prompt for VS 2022` commandline.
- Go to the example folder and put the following code into the already existing file `main.c`:```
#include
#include#include "mt_tts.h"
#include "mt_stt.h"/** Example use of mt_stt transcribing & translating German language audio to
* text in English.
*
* The audio is generated first with mt_tts.
*/
int main(void)
{
int16_t* tts_result = NULL;
int sample_count = -1;
float* stt_input = NULL;
char* stt_result = NULL;// *************************************************************************
// *** TTS: Create raw audio data from a text given in German: ***
// *************************************************************************// Initialize TTS system with a model/voice for output in German:
mt_tts_reinit("de_DE-thorsten-high.onnx", "de_DE-thorsten-high.onnx.json");// Get the actual raw audio data:
tts_result = mt_tts_to_raw(
"Hallo! Dies ist ein Text in deutscher Sprache. Erst wird er in ein Tonsignal umgewandelt, welches dann wiederum in Text transkribiert wird, jedoch nun auf Englisch.",
&sample_count);// Convert the audio data into normalized floating-point representation:
stt_input = malloc(sample_count * sizeof *stt_input);
for(int i = 0; i < sample_count; ++i)
{
stt_input[i] = (float)tts_result[i] / 16384.0f;
}// Free memory and de-initialize TTS system:
mt_tts_free_raw(tts_result);
tts_result = NULL;mt_tts_deinit();
// *************************************************************************
// *** STT: Transcribe the audio while also translating it to English: ***
// *************************************************************************
stt_result = mt_stt_transcribe_with_file(
false,
4,
NULL,
true,
NULL,
"ggml-small-q5_1.bin",
stt_input,
sample_count,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
0);// Output the translated transcription of the spoken text:
printf("%s\n", stt_result);// Free memory and exit:
free(stt_result);
stt_result = NULL;
return 0;
}
```- Compile via `cl main.c mt_tts.lib mt_stt.lib`.
- Run `main.exe`, which should show the transcription/translation result.### Notes
- Install Microsoft Visual C++ Redistributable Version for Visual Studio 2015,
2017, 2019, and 2022 (e.g. version 14.42.34433.0).