https://github.com/rhinodevel/mt_llm
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.
https://github.com/rhinodevel/mt_llm
inference llama-cpp llm
Last synced: 3 months ago
JSON representation
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.
- Host: GitHub
- URL: https://github.com/rhinodevel/mt_llm
- Owner: RhinoDevel
- License: mit
- Created: 2025-08-26T09:38:26.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-09-30T19:51:13.000Z (3 months ago)
- Last Synced: 2025-09-30T21:29:51.148Z (3 months ago)
- Topics: inference, llama-cpp, llm
- Language: C++
- Homepage:
- Size: 56.6 KB
- Stars: 14
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# mt_llm
*Marcel Timm, RhinoDevel, 2025*
**mt_llm** is a C++ library for Linux and Windows that offers a pure C interface
to the awesome large-language model inference engine called
[llama.cpp](https://github.com/ggml-org/llama.cpp) by Georgi Gerganov.
**mt_llm** is intended to be used for single-user LLM inference.
**mt_llm** supports:
- Simplified/reduced configuration parameters.
- Simple init./query/reset/deinit. functions.
- Callback to send tokens to and more and let the callback decide, when to stop
inference.
- Snapshot interface to store/update/reset the current LLM state (using RAM).
- Let the callback retrieve the probabilities of the digits 0 to 9 being the
next inferred token while ignoring sampling (e.g. for categorization).
## STT -> LLM -> TTS pipeline example in C
Take a look at the [example](./stt_llm_tts-pipeline-example) showing a simple
**S**peech-**To**-**T**ext, **L**arge-**L**anguage-**M**odel,
**T**ext-**T**o-**S**peech pipeline via
[mt_stt](https://github.com/RhinoDevel/mt_stt),
[mt_llm](./)
and [mt_tts](https://github.com/RhinoDevel/mt_tts)!
## How To
Clone the **mt_llm** repository:
`git clone https://github.com/RhinoDevel/mt_llm.git`
Enter the created folder:
`cd mt_llm`
Get the [llama.cpp](https://github.com/ggml-org/llama.cpp) submodule content:
`git submodule update --init --recursive`
## Linux
No details for Linux here, yet, but you can take a look at the Windows
instructions below and at the [Makefile](./mt_llm/Makefile).
## Windows
#### Note:
All the following examples are building static libraries, there may be use cases
where dynamically linked libraries are sufficient, too.
### Build [llama.cpp](https://github.com/ggml-org/llama.cpp)
#### Compile `llama.lib`, `ggml.lib` and `common.lib` as static libraries
Compile the necessary `llama.lib`, `ggml.lib` and `common.lib` libraries via
Visual Studio and `mt_llm/llama.cpp/CMakeLists.txt` as static libraries.
To do that, select `x64-windows-msvc-debug` or `x64-windows-msvc-release` as
configuration, in Visual Studio.
Also modify the file `mt_llm/llama.cpp/CMakePresets.json` which is created by
Visual Studio:
Add
```
, "BUILD_SHARED_LIBS": "OFF"
, "LLAMA_CURL": "OFF"
```
to the properties of `configurePresets.cacheVariables`, where the `"name"` is
`"base"`.
#### CUDA build
Modify the file `llama.cpp/CMakePresets.json`:
Add
```
, "GGML_CUDA": "ON"
```
to the properties of `configurePresets.cacheVariable`, where the `"name"` is
`"base"`.
Additionally link **mt_llm** with this (from the llama.cpp build result folder):
```
ggml\src\ggml-cuda\ggml-cuda.lib
```
Additionally link **mt_llm** with these (from the CUDA folder):
```
lib\x64\cublas.lib
lib\x64\cuda.lib
lib\x64\cudart.lib
```
### Test [llama.cpp](https://github.com/ggml-org/llama.cpp) (without mt_llm)
The [llama.cpp](https://github.com/ggml-org/llama.cpp) binaries are also created
by the build described above.
E.g. for a release build, you can find them at
`mt_llm\llama.cpp\build-x64-windows-msvc-release\bin`.
### Build mt_llm
- Open solution `mt_llm.sln` with Visual Studio (tested with 2022).
- Compile in release or debug mode.
### Test mt_llm
- Get the DLL and LIB files resulting from the build, e.g. for release mode
`x64\Release\mt_llm.dll` and `x64\Release\mt_llm.lib`, copy them to a new
folder.
- Copy the following header files to that new folder, too:
- `mt_llm\mt_llm.h`
- `mt_llm\mt_llm_lib.h`
- `mt_llm\mt_llm_p.h`
- `mt_llm\mt_llm_tok_type.h`
- `mt_llm\mt_llm_snapshot.h`
- Also copy a [supported](mt_llm/mt_llm_model.cpp)
[GGUF model file](https://huggingface.co/unsloth/gemma-3-1b-it-GGUF/resolve/main/gemma-3-1b-it-Q5_K_M.gguf?download=true)
to the same new folder.
- Go to the new folder and create a file `main.c` with the following code:
```
#include "mt_llm.h"
#include "mt_llm_p.h"
#include "mt_llm_tok_type.h"
#include
#include
static bool my_callback(
int tok,
char const * piece,
int type,
float const * dig_probs)
{
if(type == MT_TOK_TYPE_SAMPLED_NON_EOG_NON_CONTROL)
{
// This example may not display all characters correctly..
printf("%s", piece);
}
else
{
if(type == MT_TOK_TYPE_SAMPLED_EOG)
{
printf("\n\n");
}
}
return false;
}
int main(void)
{
struct mt_llm_p p;
// *****************************
// *** Setup the parameters: ***
// *****************************
p.n_gpu_layers = 0;
p.seed = -1;
p.n_ctx = 2048;
p.threads = 0;
p.top_k = 40;
p.top_p = 0.95;
p.min_p = 0.05;
p.temp = 0.8;
p.grammar[0] = '\0';
strncpy(
p.model_file_path,
"gemma-3-1b-it-Q5_K_M.gguf",
MT_LLM_P_LEN_MODEL_FILE_PATH);
strncpy(
p.sys_prompt,
"You are a helpful AI assistant.",
MT_LLM_P_LEN_SYS_PROMPT);
p.prompt_beg_delim[0] = '\0';
p.prompt_end_delim[0] = '\0';
p.sys_prompt_beg_delim[0] = '\0';
p.sys_prompt_mid_delim[0] = '\0';
p.sys_prompt_end_delim[0] = '\0';
p.rev_prompt[0] = '\0';
p.think_beg_delim[0] = '\0';
p.think_end_delim[0] = '\0';
p.try_prompts_by_model = true;
p.callback = my_callback;
// **************************
// *** Initialize mt_llm: ***
// **************************
mt_llm_reinit(&p); // Ignoring return value, here..
// **********************
// *** Query the LLM: ***
// **********************
mt_llm_query("Please tell me a very short story about a dog!");
// (inference is running here, and will call the callback for each token)
// ****************************
// *** Deinitialize mt_llm: ***
// ****************************
mt_llm_deinit();
return 0;
}
```
- Open `x64 Native Tools Command Prompt for VS 2022` commandline.
- Compile via `cl main.c mt_llm.lib`.
- Run `main.exe`.