{"id":31615336,"url":"https://github.com/rhinodevel/mt_llm","last_synced_at":"2025-10-06T11:14:42.358Z","repository":{"id":315193091,"uuid":"1044842911","full_name":"RhinoDevel/mt_llm","owner":"RhinoDevel","description":"Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible. ","archived":false,"fork":false,"pushed_at":"2025-09-30T19:51:13.000Z","size":58,"stargazers_count":14,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-30T21:29:51.148Z","etag":null,"topics":["inference","llama-cpp","llm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RhinoDevel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-26T09:38:26.000Z","updated_at":"2025-09-30T19:51:17.000Z","dependencies_parsed_at":"2025-09-17T08:42:03.655Z","dependency_job_id":"e9997e17-1f5a-4bef-9d28-e4789eb0733d","html_url":"https://github.com/RhinoDevel/mt_llm","commit_stats":null,"previous_names":["rhinodevel/mt_llm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/RhinoDevel/mt_llm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RhinoDevel%2Fmt_llm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RhinoDevel%2Fmt_llm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RhinoDevel%2Fmt_llm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RhinoDevel%2Fmt_llm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RhinoDevel","download_url":"https://codeload.github.com/RhinoDevel/mt_llm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RhinoDevel%2Fmt_llm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278598813,"owners_count":26013338,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-06T02:00:05.630Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["inference","llama-cpp","llm"],"created_at":"2025-10-06T11:14:38.579Z","updated_at":"2025-10-06T11:14:42.343Z","avatar_url":"https://github.com/RhinoDevel.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mt_llm\n\n*Marcel Timm, RhinoDevel, 2025*\n\n**mt_llm** is a C++ library for Linux and Windows that offers a pure C interface\nto the awesome large-language model inference engine called\n[llama.cpp](https://github.com/ggml-org/llama.cpp) by Georgi Gerganov.\n\n**mt_llm** is intended to be used for single-user LLM inference.\n\n**mt_llm** supports:\n\n- Simplified/reduced configuration parameters.\n- Simple init./query/reset/deinit. functions.\n- Callback to send tokens to and more and let the callback decide, when to stop\n  inference.\n- Snapshot interface to store/update/reset the current LLM state (using RAM).\n- Let the callback retrieve the probabilities of the digits 0 to 9 being the\n  next inferred token while ignoring sampling (e.g. for categorization).\n\n## STT -\u003e LLM -\u003e TTS pipeline example in C\n\nTake a look at the [example](./stt_llm_tts-pipeline-example) showing a simple\n**S**peech-**To**-**T**ext, **L**arge-**L**anguage-**M**odel,\n**T**ext-**T**o-**S**peech pipeline via\n[mt_stt](https://github.com/RhinoDevel/mt_stt),\n[mt_llm](./)\nand [mt_tts](https://github.com/RhinoDevel/mt_tts)!\n\n## How To\n\nClone the **mt_llm** repository:\n\n`git clone https://github.com/RhinoDevel/mt_llm.git`\n\nEnter the created folder:\n\n`cd mt_llm`\n\nGet the [llama.cpp](https://github.com/ggml-org/llama.cpp) submodule content:\n\n`git submodule update --init --recursive`\n\n## Linux\n\nNo details for Linux here, yet, but you can take a look at the Windows\ninstructions below and at the [Makefile](./mt_llm/Makefile).\n\n## Windows\n\n#### Note:\n\nAll the following examples are building static libraries, there may be use cases\nwhere dynamically linked libraries are sufficient, too.\n\n### Build [llama.cpp](https://github.com/ggml-org/llama.cpp)\n\n#### Compile `llama.lib`, `ggml.lib` and `common.lib` as static libraries\n\nCompile the necessary `llama.lib`, `ggml.lib` and `common.lib` libraries via\nVisual Studio and `mt_llm/llama.cpp/CMakeLists.txt` as static libraries.\n\nTo do that, select `x64-windows-msvc-debug` or `x64-windows-msvc-release` as\nconfiguration, in Visual Studio.\n\nAlso modify the file `mt_llm/llama.cpp/CMakePresets.json` which is created by\nVisual Studio:\n\nAdd\n\n```\n, \"BUILD_SHARED_LIBS\": \"OFF\"\n, \"LLAMA_CURL\": \"OFF\"\n```\n\nto the properties of `configurePresets.cacheVariables`, where the `\"name\"` is\n`\"base\"`.\n\n#### CUDA build\n\nModify the file `llama.cpp/CMakePresets.json`:\n\nAdd\n\n```\n, \"GGML_CUDA\":  \"ON\"\n```\n\nto the properties of `configurePresets.cacheVariable`, where the `\"name\"` is\n`\"base\"`.\n\nAdditionally link **mt_llm** with this (from the llama.cpp build result folder):\n\n```\nggml\\src\\ggml-cuda\\ggml-cuda.lib\n```\n\nAdditionally link **mt_llm** with these (from the CUDA folder):\n\n```\nlib\\x64\\cublas.lib\nlib\\x64\\cuda.lib\nlib\\x64\\cudart.lib\n```\n\n### Test [llama.cpp](https://github.com/ggml-org/llama.cpp) (without mt_llm)\n\nThe [llama.cpp](https://github.com/ggml-org/llama.cpp) binaries are also created\nby the build described above.\n\nE.g. for a release build, you can find them at\n`mt_llm\\llama.cpp\\build-x64-windows-msvc-release\\bin`.\n\n### Build mt_llm\n\n- Open solution `mt_llm.sln` with Visual Studio (tested with 2022).\n- Compile in release or debug mode.\n\n### Test mt_llm\n\n- Get the DLL and LIB files resulting from the build, e.g. for release mode\n  `x64\\Release\\mt_llm.dll` and `x64\\Release\\mt_llm.lib`, copy them to a new\n  folder.\n\n- Copy the following header files to that new folder, too:\n  - `mt_llm\\mt_llm.h`\n  - `mt_llm\\mt_llm_lib.h`\n  - `mt_llm\\mt_llm_p.h`\n  - `mt_llm\\mt_llm_tok_type.h`\n  - `mt_llm\\mt_llm_snapshot.h`\n\n- Also copy a [supported](mt_llm/mt_llm_model.cpp)\n  [GGUF model file](https://huggingface.co/unsloth/gemma-3-1b-it-GGUF/resolve/main/gemma-3-1b-it-Q5_K_M.gguf?download=true)\n  to the same new folder.\n\n- Go to the new folder and create a file `main.c` with the following code:\n\n```\n#include \"mt_llm.h\"\n#include \"mt_llm_p.h\"\n#include \"mt_llm_tok_type.h\"\n\n#include \u003cstring.h\u003e\n#include \u003cstdio.h\u003e\n\nstatic bool my_callback(\n    int tok,\n    char const * piece,\n    int type,\n    float const * dig_probs)\n{\n    if(type == MT_TOK_TYPE_SAMPLED_NON_EOG_NON_CONTROL)\n    {\n        // This example may not display all characters correctly..\n        printf(\"%s\", piece);\n    }\n    else\n    {\n        if(type == MT_TOK_TYPE_SAMPLED_EOG)\n        {\n            printf(\"\\n\\n\");\n        }\n    }\n    return false;\n}\n\nint main(void)\n{\n    struct mt_llm_p p;\n    \n    // *****************************\n    // *** Setup the parameters: ***\n    // *****************************\n    \n    p.n_gpu_layers = 0;\n    \n    p.seed = -1;\n    p.n_ctx = 2048;\n    p.threads = 0;\n    \n    p.top_k = 40;\n    p.top_p = 0.95;\n    p.min_p = 0.05;\n    p.temp = 0.8;\n    p.grammar[0] = '\\0';\n    \n    strncpy(\n        p.model_file_path,\n        \"gemma-3-1b-it-Q5_K_M.gguf\",\n        MT_LLM_P_LEN_MODEL_FILE_PATH);\n\n    strncpy(\n        p.sys_prompt,\n        \"You are a helpful AI assistant.\",\n        MT_LLM_P_LEN_SYS_PROMPT);\n    p.prompt_beg_delim[0] = '\\0';\n    p.prompt_end_delim[0] = '\\0';\n    p.sys_prompt_beg_delim[0] = '\\0';\n    p.sys_prompt_mid_delim[0] = '\\0';\n    p.sys_prompt_end_delim[0] = '\\0';\n    p.rev_prompt[0] = '\\0';\n    p.think_beg_delim[0] = '\\0';\n    p.think_end_delim[0] = '\\0';\n\n    p.try_prompts_by_model = true;\n\n    p.callback = my_callback;\n\n    // **************************\n    // *** Initialize mt_llm: ***\n    // **************************\n\n    mt_llm_reinit(\u0026p); // Ignoring return value, here..\n    \n    // **********************\n    // *** Query the LLM: ***\n    // **********************\n    \n    mt_llm_query(\"Please tell me a very short story about a dog!\");\n    \n    // (inference is running here, and will call the callback for each token)\n    \n    // ****************************\n    // *** Deinitialize mt_llm: ***\n    // ****************************\n\n    mt_llm_deinit();\n\n    return 0;\n}\n```\n- Open `x64 Native Tools Command Prompt for VS 2022` commandline.\n- Compile via `cl main.c mt_llm.lib`.\n- Run `main.exe`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhinodevel%2Fmt_llm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frhinodevel%2Fmt_llm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhinodevel%2Fmt_llm/lists"}