{"id":24607156,"url":"https://github.com/ling0322/libllm","last_synced_at":"2025-05-05T21:33:49.072Z","repository":{"id":203494426,"uuid":"708742587","full_name":"ling0322/libllm","owner":"ling0322","description":"Efficient inference of large language models.","archived":false,"fork":false,"pushed_at":"2024-12-05T12:40:57.000Z","size":1635,"stargazers_count":146,"open_issues_count":1,"forks_count":7,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-28T02:05:19.824Z","etag":null,"topics":["ai","chinese","cpp","language-model","python"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ling0322.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-23T09:50:03.000Z","updated_at":"2025-03-11T01:32:04.000Z","dependencies_parsed_at":"2024-01-08T13:36:15.349Z","dependency_job_id":"fccf15d8-bd7e-4e7a-9466-16b4d6a90114","html_url":"https://github.com/ling0322/libllm","commit_stats":null,"previous_names":["ling0322/libllm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ling0322%2Flibllm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ling0322%2Flibllm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ling0322%2Flibllm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ling0322%2Flibllm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ling0322","download_url":"https://codeload.github.com/ling0322/libllm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252580021,"owners_count":21771252,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chinese","cpp","language-model","python"],"created_at":"2025-01-24T17:21:41.133Z","updated_at":"2025-05-05T21:33:49.038Z","avatar_url":"https://github.com/ling0322.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# libLLM: Efficient inference of large language models.\n\n[![Linux](https://github.com/ling0322/libllm/actions/workflows/cmake-linux.yml/badge.svg?branch=main)](https://github.com/ling0322/libllm/actions/workflows/cmake-linux.yml) [![Windows](https://github.com/ling0322/libllm/actions/workflows/cmake-windows.yml/badge.svg?branch=main)](https://github.com/ling0322/libllm/actions/workflows/cmake-windows.yml) [![macOS](https://github.com/ling0322/libllm/actions/workflows/cmake-darwin.yml/badge.svg?branch=main)](https://github.com/ling0322/libllm/actions/workflows/cmake-darwin.yml)\n\nWelcome to libLLM, an open-source project designed for efficient inference of large language models (LLM) on ordinary personal computers and mobile devices. The core is implemented in C++14, without any third-party dependencies (such as BLAS or SentencePiece), enabling seamless operation across a variety of devices.\n\n欢迎使用libLLM，这是一个专为在普通个人电脑和移动设备上高效推理大型语言模型（LLM）而设计的开源项目。核心使用C++14编写，没有第三方依赖（BLAS、SentencePiece等），能在各种设备中无缝运行。\n\n## Model download:\n\n| Model       | Download       |  llm Command  |\n|-------------|----------------|---------------|\n| Index-1.9B-Character (Role-playing) | [🤗[HF](https://huggingface.co/ling0322/bilibili-index-1.9b-libllm/blob/main/bilibili-index-1.9b-character-q4.llmpkg)] [[MS](https://modelscope.cn/models/ling0322/bilibili-index-libllm/file/view/master?fileName=bilibili-index-1.9b-character-q4.llmpkg\u0026status=2)] | llm chat -m index:character |\n| Index-1.9B-Chat | [🤗[HF](https://huggingface.co/ling0322/bilibili-index-1.9b-libllm/blob/main/bilibili-index-1.9b-chat-q4.llmpkg)] [[MS](https://modelscope.cn/models/ling0322/bilibili-index-libllm/file/view/master?fileName=bilibili-index-1.9b-chat-q4.llmpkg\u0026status=2)] | llm chat -m index |\n| Qwen2-1.5B-Instruct | [🤗[HF](https://huggingface.co/ling0322/qwen-libllm/blob/main/qwen2-1.5b-instruct-q4.llmpkg)] [[MS](https://modelscope.cn/models/ling0322/qwen2-libllm/file/view/master?fileName=qwen2-1.5b-instruct-q4.llmpkg\u0026status=2)] | llm chat -m qwen:1.5b |\n| Qwen2-7B-Instruct | [🤗[HF](https://huggingface.co/ling0322/qwen-libllm/blob/main/qwen2-7b-instruct-q4.llmpkg)] [[MS](https://modelscope.cn/models/ling0322/qwen2-libllm/file/view/master?fileName=qwen2-7b-instruct-q4.llmpkg\u0026status=2)] | llm chat -m qwen:7b |\n| Llama3.2-1B-Instruct | [🤗[HF](https://huggingface.co/ling0322/llama3.2-libllm/resolve/main/llama3.2-1b-instruct-q4.llmpkg)] [[MS](https://modelscope.cn/models/ling0322/whisper-libllm/file/view/master?fileName=whisper-large-v3-q4.llmpkg\u0026status=2)] | llm chat -m llama3.2:1b |\n| Llama3.2-3B-Instruct | [🤗[HF](https://huggingface.co/ling0322/llama3.2-libllm/resolve/main/llama3.2-3b-instruct-q4.llmpkg)] [[MS](https://modelscope.cn/models/ling0322/whisper-libllm/file/view/master?fileName=whisper-large-v3-q4.llmpkg\u0026status=2)] | llm chat -m llama3.2 |\n| Whisper-large-v3 | [🤗[HF](https://huggingface.co/ling0322/whisper-libllm/resolve/main/whisper-large-v3-q4.llmpkg)] [[MS](https://modelscope.cn/models/ling0322/whisper-libllm/file/view/master?fileName=whisper-large-v3-q4.llmpkg\u0026status=2)] |  llm transcribe -m whisper |\n\n`HF` = HuggingFace, `MS` = ModelScope\n\n## Kernel support matrix\n\n| OS       |  Platform | CUDA       |  avx2  |  avx512 | asimdhp |\n|----------|-----------|------------|--------|---------|---------|\n| Linux    | x64       | ✅         | ✅     | ✅       |         |\n| Windows  | x64       | ✅         | ✅     | ✅       |         |\n| macOS    | arm64     |            |        |         | ✅      |\n\n## Recent updates\n\n- [2024-09-28] Support Llama3.2 models.\n- [2024-08-12] Support Whisper models.\n- [2024-08-02] Support the translation command in llm.\n- [2024-07-30] Support model downloading from huggingface. For example, `llm chat -model index-character` will automatically download the `index-character` model from 🤗[Huggingface](https://huggingface.co/ling0322/bilibili-index-1.9b-libllm/blob/main/bilibili-index-1.9b-chat-q4.llmpkg).\n\n## Quickstart\n\nTo run and chat with Bilibili-Index-1.9B-Character:\n\n```bash\n$ llm chat -m index-character\n```\n\nIt will automatically download the `Bilibili-Index-1.9B-Character` from Huggingface or ModelScope (in China), and start the chat CLI in llm.\n\n## 开始\n\n与`Bilibili-Index-1.9B-Character`模型聊天：\n\n```bash\n$ llm chat -m index-character\n```\n\n`llm`会自动从Huggingface或者ModelScope（如果是中国IP）下载模型`Bilibili-Index-1.9B-Character`, 并且开始与它对话。\n\n## llm command line\n\n```bash\n$ src/libllm/llm chat -m index-character\nINFO 2024-07-30T12:02:28Z interface.cc:67] ISA support: AVX2=1 F16C=1 AVX512F=1\nINFO 2024-07-30T12:02:28Z interface.cc:71] Use Avx512 backend.\nINFO 2024-07-30T12:02:30Z matmul.cc:43] Use GEMM from cuBLAS.\nINFO 2024-07-30T12:02:30Z cuda_operators.cc:51] cuda numDevices = 2\nINFO 2024-07-30T12:02:30Z cuda_operators.cc:52] cuda:0 maxThreadsPerMultiProcessor = 2048\nINFO 2024-07-30T12:02:30Z cuda_operators.cc:54] cuda:0 multiProcessorCount = 20\nINFO 2024-07-30T12:02:30Z thread_pool.cc:73] ThreadPool started. numThreads=20\nINFO 2024-07-30T12:02:30Z llm.cc:204] read model package: /home/xiaoych/.libllm/models/bilibili-index-1.9b-character-q4.llmpkg\nINFO 2024-07-30T12:02:30Z model_for_generation.cc:43] model_type = index\nINFO 2024-07-30T12:02:30Z model_for_generation.cc:44] device = cuda\nINFO 2024-07-30T12:02:31Z state_map.cc:66] 220 tensors read.\nPlease input your question.\n    Type ':new' to start a new session (clean history).\n    Type ':sys \u003csystem_prompt\u003e' to set the system prompt and start a new session .\n\u003e hi\n您好！我是Index，请问有什么我可以帮助您的吗？\n(12 tokens, time=0.76s, 63.47ms per token)\n\u003e \n```\n\n## Build\n\n### libLLM CPU only\n\n```bash\n$ mkdir build \u0026\u0026 cd build\n$ cmake ..\n$ make -j\n```\n\n#### For macOS\n\nPlease brew install OpenMP before cmake. NOTE: currently libllm macOS expected to be very slow since there is no aarch64 kernel for it.\n\n```bash\n% brew install libomp\n% export OpenMP_ROOT=$(brew --prefix)/opt/libomp\n% mkdir build \u0026\u0026 cd build\n% cmake ..\n% make -j\n```\n\n### Build with CUDA\n\nNOTE: specify `-DCUDAToolkit_ROOT=\u003cCUDA-DIR\u003e` if there is multiple CUDA versions in your OS.\n\nRecommand versions are:\n- CUDA: 11.7\n\n```bash\n$ mkdir build \u0026\u0026 cd build\n$ cmake -DWITH_CUDA=ON [-DCUDAToolkit_ROOT=\u003cCUDA-DIR\u003e] ..\n$ make -j\n```\n\n## API Examples\n\n### Python\n\n```python\nfrom libllm import Model, ControlToken\n\nmodel = Model(\"tools/bilibili_index.llmpkg\")\nprompt = [ControlToken(\"\u003c|reserved_0|\u003e\"), \"hi\", ControlToken(\"\u003c|reserved_1|\u003e\")]\n\nfor chunk in model.complete(prompt):\n    print(chunk.text, end=\"\", flush=True)\n\nprint(\"\\nDone!\")\n```\n\n### Go\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"log\"\n\n    \"github.com/ling0322/libllm/go/llm\"\n)\n\nfunc main() {\n    model, err := llm.NewModel(\"../../tools/bilibili_index.llmpkg\", llm.Auto)\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    prompt := llm.NewPrompt()\n    prompt.AppendControlToken(\"\u003c|reserved_0|\u003e\")\n    prompt.AppendText(\"hi\")\n    prompt.AppendControlToken(\"\u003c|reserved_1|\u003e\")\n    comp, err := model.Complete(llm.NewCompletionConfig(), prompt)\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    for comp.IsActive() {\n        chunk, err := comp.GenerateNextChunk()\n        if err != nil {\n            log.Fatal(err)\n        }\n\n        fmt.Print(chunk.Text)\n    }\n    fmt.Println()\n}\n\n```\n\n## Export Huggingface models\n\nHere is an example of exporting Index-1.9B model from huggingface.\n\n```bash\n$ cd tools\n$ python bilibili_index_exporter.py \\\n    -huggingface_name IndexTeam/Index-1.9B-Character \\\n    -quant q4  \\\n    -output index.llmpkg \n\n```\n\nThen all required modules realted to `IndexTeam/Index-1.9B-Character`, including model, tokenizer and configs will be written to `index.llmpkg`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fling0322%2Flibllm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fling0322%2Flibllm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fling0322%2Flibllm/lists"}