{"id":25696372,"url":"https://github.com/marcom/llamacpp.jl","last_synced_at":"2025-11-19T07:05:26.139Z","repository":{"id":154224087,"uuid":"616636185","full_name":"marcom/LlamaCpp.jl","owner":"marcom","description":"Julia interface to llama.cpp, a C/C++ library for running language models","archived":false,"fork":false,"pushed_at":"2025-02-25T19:13:24.000Z","size":345,"stargazers_count":31,"open_issues_count":4,"forks_count":4,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-11-10T03:22:54.038Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marcom.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-20T19:23:35.000Z","updated_at":"2025-08-06T20:19:58.000Z","dependencies_parsed_at":"2025-02-21T19:28:59.563Z","dependency_job_id":"df9cc805-b1e1-41fb-bd50-0f91484f9ea9","html_url":"https://github.com/marcom/LlamaCpp.jl","commit_stats":null,"previous_names":["marcom/llamacpp.jl","marcom/llama.jl"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/marcom/LlamaCpp.jl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcom%2FLlamaCpp.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcom%2FLlamaCpp.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcom%2FLlamaCpp.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcom%2FLlamaCpp.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marcom","download_url":"https://codeload.github.com/marcom/LlamaCpp.jl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcom%2FLlamaCpp.jl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285200717,"owners_count":27131417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-19T02:00:05.673Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-25T01:54:16.997Z","updated_at":"2025-11-19T07:05:26.125Z","avatar_url":"https://github.com/marcom.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LlamaCpp.jl\n\n[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://marcom.github.io/LlamaCpp.jl/dev/)\n[![Build Status](https://github.com/marcom/LlamaCpp.jl/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/marcom/LlamaCpp.jl/actions/workflows/ci.yml?query=branch%3Amain)\n[![Coverage](https://codecov.io/gh/svilupp/LlamaCpp.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/marcom/LlamaCpp.jl)\n[![Aqua](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)\n\nJulia interface to\n[llama.cpp](https://github.com/ggerganov/llama.cpp), a C/C++ port of\nMeta's [LLaMA](https://arxiv.org/abs/2302.13971) (a large language\nmodel).\n\n\u003e [!WARNING]\n\u003e This project has been renamed from Llama.jl to LlamaCpp.jl to avoid confusion with other projects.\n\u003e If you have an older version of Llama.jl, please remove it and install LlamaCpp.jl.\n\n## Installation\n\nPress `]` at the Julia REPL to enter pkg mode, then:\n\n```\nadd https://github.com/marcom/LlamaCpp.jl\n```\n\nThe `llama_cpp_jll.jl` package used behind the scenes currently works\non Linux, Mac, and FreeBSD on `i686`, `x86_64`, and `aarch64` (note: only\ntested on `x86_64-linux` and `aarch64-macos` so far).\n\n## Downloading the model weights\n\nYou will need a file with quantized model weights in the right format (GGUF).\n\nYou can either download the weights from the [HuggingFace Hub](https://huggingface.co) (search for \"GGUF\" to download the right format) or convert them from the original PyTorch weights (see [llama.cpp](https://github.com/ggerganov/llama.cpp) for instructions.)\n\nGood weights to start with are the Llama3-family fine-tuned weights ([here](https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF) with a Llama-specific licence) or Qwen 2.5 family, which are Apache 2.0 licensed and can be downloaded [here](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF). Click on the tab \"Files\" and download one of the `*.gguf` files. We recommend the Q5_K_M version (~5.5GB).\n\nIn the future, there might be new releases, so you might want to check for new versions.\n\nOnce you have a `url` link to a `.gguf` file, you can simply download it via:\n\n```julia\nusing LlamaCpp\n# Example for a 360M parameter model (c. 0.3GB)\nurl = \"https://huggingface.co/bartowski/SmolLM2-360M-Instruct-GGUF/resolve/main/SmolLM2-360M-Instruct-Q5_K_S.gguf\"\nmodel = download_model(url)\n# Output: \"models/SmolLM2-360M-Instruct-Q5_K_S.gguf\"\n```\n\nYou can use the model variable directly in the `run_*` functions, like `run_server`.\n\n## Running example executables from llama.cpp\n\n### Simple HTTP Server\n\nServer mode is the easiest way to get started with LlamaCpp.jl. It provides both an in-browser chat interface and an OpenAI-compatible chat completion endpoint (for packages like [PromptingTools.jl](https://github.com/svilupp/PromptingTools.jl)).\n\n```julia\nusing LlamaCpp\n\n# Use the `model` downloaded above\nLlamaCpp.run_server(; model)\n```\n\nJust open the URL `http://127.0.0.1:10897` in your browser to see the chat interface or use GET requests to the `/v1/chat/completions` endpoint.\n\nIf you use PromptingTools.jl, you can test your local server like this: `ai\"say hi!\"local` or `aigenerate(\"say hi!\")`.\n\n### Llama Text Generation\n\n```julia\nusing LlamaCpp\nmodel = \"models/SmolLM2-360M-Instruct-Q5_K_S.gguf\"\n\ns = run_llama(; model, prompt=\"Hello\")\n\n# Provide additional arguments to llama.cpp (check the documentation for more details or the help text below)\ns = run_llama(; model, prompt=\"Hello\", n_gpu_layers=0, args=`-n 16`)\n\n# print the help text with more options\nrun_llama(model=\"\", prompt=\"\", args=`-h`)\n```\n\n\u003e [!TIP]\n\u003e If you're getting gibberish output, it's likely that the model requires a \"prompt template\" (ie, structure to how you provide your instructions). Review the model page on HF Hub to see how to use your model or use the server.\n\n\n### Interactive chat mode\n\n```julia\nrun_chat(; model, prompt=\"Hello chat mode\")\n```\n\n## REPL mode\n\nThe REPL mode is currently non-functional, but stay tuned!\n\n## LibLlama\n\nThe `libllama` bindings are currently non-functional, but stay tuned!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcom%2Fllamacpp.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarcom%2Fllamacpp.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcom%2Fllamacpp.jl/lists"}