https://github.com/rhinodevel/mt_llm

Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.
https://github.com/rhinodevel/mt_llm

inference llama-cpp llm

Last synced: 3 months ago
JSON representation

Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.

Host: GitHub
URL: https://github.com/rhinodevel/mt_llm
Owner: RhinoDevel
License: mit
Created: 2025-08-26T09:38:26.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-09-30T19:51:13.000Z (3 months ago)
Last Synced: 2025-09-30T21:29:51.148Z (3 months ago)
Topics: inference, llama-cpp, llm
Language: C++
Homepage:
Size: 56.6 KB
Stars: 14
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          # mt_llm

*Marcel Timm, RhinoDevel, 2025*

**mt_llm** is a C++ library for Linux and Windows that offers a pure C interface

to the awesome large-language model inference engine called

[llama.cpp](https://github.com/ggml-org/llama.cpp) by Georgi Gerganov.

**mt_llm** is intended to be used for single-user LLM inference.

**mt_llm** supports:

- Simplified/reduced configuration parameters.

- Simple init./query/reset/deinit. functions.

- Callback to send tokens to and more and let the callback decide, when to stop

  inference.

- Snapshot interface to store/update/reset the current LLM state (using RAM).

- Let the callback retrieve the probabilities of the digits 0 to 9 being the

  next inferred token while ignoring sampling (e.g. for categorization).

## STT -> LLM -> TTS pipeline example in C

Take a look at the [example](./stt_llm_tts-pipeline-example) showing a simple

**S**peech-**To**-**T**ext, **L**arge-**L**anguage-**M**odel,

**T**ext-**T**o-**S**peech pipeline via

[mt_stt](https://github.com/RhinoDevel/mt_stt),

[mt_llm](./)

and [mt_tts](https://github.com/RhinoDevel/mt_tts)!

## How To

Clone the **mt_llm** repository:

`git clone https://github.com/RhinoDevel/mt_llm.git`

Enter the created folder:

`cd mt_llm`

Get the [llama.cpp](https://github.com/ggml-org/llama.cpp) submodule content:

`git submodule update --init --recursive`

## Linux

No details for Linux here, yet, but you can take a look at the Windows

instructions below and at the [Makefile](./mt_llm/Makefile).

## Windows

#### Note:

All the following examples are building static libraries, there may be use cases

where dynamically linked libraries are sufficient, too.

### Build [llama.cpp](https://github.com/ggml-org/llama.cpp)

#### Compile `llama.lib`, `ggml.lib` and `common.lib` as static libraries

Compile the necessary `llama.lib`, `ggml.lib` and `common.lib` libraries via

Visual Studio and `mt_llm/llama.cpp/CMakeLists.txt` as static libraries.

To do that, select `x64-windows-msvc-debug` or `x64-windows-msvc-release` as

configuration, in Visual Studio.

Also modify the file `mt_llm/llama.cpp/CMakePresets.json` which is created by

Visual Studio:

Add

```

, "BUILD_SHARED_LIBS": "OFF"

, "LLAMA_CURL": "OFF"

```

to the properties of `configurePresets.cacheVariables`, where the `"name"` is

`"base"`.

#### CUDA build

Modify the file `llama.cpp/CMakePresets.json`:

Add

```

, "GGML_CUDA":  "ON"

```

to the properties of `configurePresets.cacheVariable`, where the `"name"` is

`"base"`.

Additionally link **mt_llm** with this (from the llama.cpp build result folder):

```

ggml\src\ggml-cuda\ggml-cuda.lib

```

Additionally link **mt_llm** with these (from the CUDA folder):

```

lib\x64\cublas.lib

lib\x64\cuda.lib

lib\x64\cudart.lib

```

### Test [llama.cpp](https://github.com/ggml-org/llama.cpp) (without mt_llm)

The [llama.cpp](https://github.com/ggml-org/llama.cpp) binaries are also created

by the build described above.

E.g. for a release build, you can find them at

`mt_llm\llama.cpp\build-x64-windows-msvc-release\bin`.

### Build mt_llm

- Open solution `mt_llm.sln` with Visual Studio (tested with 2022).

- Compile in release or debug mode.

### Test mt_llm

- Get the DLL and LIB files resulting from the build, e.g. for release mode

  `x64\Release\mt_llm.dll` and `x64\Release\mt_llm.lib`, copy them to a new

  folder.

- Copy the following header files to that new folder, too:

  - `mt_llm\mt_llm.h`

  - `mt_llm\mt_llm_lib.h`

  - `mt_llm\mt_llm_p.h`

  - `mt_llm\mt_llm_tok_type.h`

  - `mt_llm\mt_llm_snapshot.h`

- Also copy a [supported](mt_llm/mt_llm_model.cpp)

  [GGUF model file](https://huggingface.co/unsloth/gemma-3-1b-it-GGUF/resolve/main/gemma-3-1b-it-Q5_K_M.gguf?download=true)

  to the same new folder.

- Go to the new folder and create a file `main.c` with the following code:

```

#include "mt_llm.h"

#include "mt_llm_p.h"

#include "mt_llm_tok_type.h"

#include 

#include 

static bool my_callback(

    int tok,

    char const * piece,

    int type,

    float const * dig_probs)

{

    if(type == MT_TOK_TYPE_SAMPLED_NON_EOG_NON_CONTROL)

    {

        // This example may not display all characters correctly..

        printf("%s", piece);

    }

    else

    {

        if(type == MT_TOK_TYPE_SAMPLED_EOG)

        {

            printf("\n\n");

        }

    }

    return false;

}

int main(void)

{

    struct mt_llm_p p;

    

    // *****************************

    // *** Setup the parameters: ***

    // *****************************

    

    p.n_gpu_layers = 0;

    

    p.seed = -1;

    p.n_ctx = 2048;

    p.threads = 0;

    

    p.top_k = 40;

    p.top_p = 0.95;

    p.min_p = 0.05;

    p.temp = 0.8;

    p.grammar[0] = '\0';

    

    strncpy(

        p.model_file_path,

        "gemma-3-1b-it-Q5_K_M.gguf",

        MT_LLM_P_LEN_MODEL_FILE_PATH);

    strncpy(

        p.sys_prompt,

        "You are a helpful AI assistant.",

        MT_LLM_P_LEN_SYS_PROMPT);

    p.prompt_beg_delim[0] = '\0';

    p.prompt_end_delim[0] = '\0';

    p.sys_prompt_beg_delim[0] = '\0';

    p.sys_prompt_mid_delim[0] = '\0';

    p.sys_prompt_end_delim[0] = '\0';

    p.rev_prompt[0] = '\0';

    p.think_beg_delim[0] = '\0';

    p.think_end_delim[0] = '\0';

    p.try_prompts_by_model = true;

    p.callback = my_callback;

    // **************************

    // *** Initialize mt_llm: ***

    // **************************

    mt_llm_reinit(&p); // Ignoring return value, here..

    

    // **********************

    // *** Query the LLM: ***

    // **********************

    

    mt_llm_query("Please tell me a very short story about a dog!");

    

    // (inference is running here, and will call the callback for each token)

    

    // ****************************

    // *** Deinitialize mt_llm: ***

    // ****************************

    mt_llm_deinit();

    return 0;

}

```

- Open `x64 Native Tools Command Prompt for VS 2022` commandline.

- Compile via `cl main.c mt_llm.lib`.

- Run `main.exe`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rhinodevel/mt_llm

Awesome Lists containing this project

README