{"id":13407343,"url":"https://github.com/getumbrel/llama-gpt","last_synced_at":"2025-05-13T21:11:09.634Z","repository":{"id":188721524,"uuid":"669601481","full_name":"getumbrel/llama-gpt","owner":"getumbrel","description":"A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!","archived":false,"fork":false,"pushed_at":"2024-04-23T18:56:06.000Z","size":1788,"stargazers_count":10959,"open_issues_count":99,"forks_count":713,"subscribers_count":82,"default_branch":"master","last_synced_at":"2025-04-19T11:49:00.302Z","etag":null,"topics":["ai","chatgpt","code-llama","codellama","gpt","gpt-4","gpt4all","llama","llama-2","llama-cpp","llama2","llamacpp","llm","localai","openai","self-hosted"],"latest_commit_sha":null,"homepage":"https://apps.umbrel.com/app/llama-gpt","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getumbrel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-22T20:12:54.000Z","updated_at":"2025-04-19T06:42:27.000Z","dependencies_parsed_at":"2024-01-06T18:44:39.745Z","dependency_job_id":"0bb50b84-8173-4616-82cc-370a3456b4de","html_url":"https://github.com/getumbrel/llama-gpt","commit_stats":{"total_commits":343,"total_committers":99,"mean_commits":"3.4646464646464645","dds":0.6501457725947521,"last_synced_commit":"43994a365ffb067d58fc36cd363b2114a9037a48"},"previous_names":["getumbrel/llama-gpt"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getumbrel%2Fllama-gpt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getumbrel%2Fllama-gpt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getumbrel%2Fllama-gpt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getumbrel%2Fllama-gpt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getumbrel","download_url":"https://codeload.github.com/getumbrel/llama-gpt/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251326360,"owners_count":21571626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chatgpt","code-llama","codellama","gpt","gpt-4","gpt4all","llama","llama-2","llama-cpp","llama2","llamacpp","llm","localai","openai","self-hosted"],"created_at":"2024-07-30T20:00:38.182Z","updated_at":"2025-04-28T13:59:31.834Z","avatar_url":"https://github.com/getumbrel.png","language":"TypeScript","funding_links":[],"categories":["Libraries","Models and Tools","TypeScript","Tools for Self-Hosting","Chatbots","Apps","A01_文本生成_文本对话","HarmonyOS","Repos","GitHub projects","Deployment","chatgpt","ChatGPT-based applications for regular users and specialized problems","Chatbots \u0026 Virtual Companions","Desktop \u0026 Web UIs (22)"],"sub_categories":["LLM Boilerplate","LLMs","AI","大语言对话模型及数据","Windows Manager","Other sdk/libraries"],"readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://apps.umbrel.com/app/llama-gpt\"\u003e\n    \u003cimg width=\"150\" height=\"150\" src=\"https://i.imgur.com/LI59cui.png\" alt=\"LlamaGPT\" width=\"200\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003eLlamaGPT\u003c/h1\u003e\n  \u003cp align=\"center\"\u003e\n    A self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.\n    \u003cbr/\u003e\n    \u003cstrong\u003eNew: Support for Code Llama models and Nvidia GPUs.\u003c/strong\u003e\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n    \u003ca href=\"https://umbrel.com\"\u003e\u003cstrong\u003eumbrel.com (we're hiring) »\u003c/strong\u003e\u003c/a\u003e\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n    \u003ca href=\"https://twitter.com/umbrel\"\u003e\n      \u003cimg src=\"https://img.shields.io/twitter/follow/umbrel?style=social\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://t.me/getumbrel\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/community-chat-%235351FB\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://reddit.com/r/getumbrel\"\u003e\n      \u003cimg src=\"https://img.shields.io/reddit/subreddit-subscribers/getumbrel?style=social\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://community.umbrel.com\"\u003e\n      \u003cimg src=\"https://img.shields.io/badge/community-forum-%235351FB\"\u003e\n    \u003c/a\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://umbrel.com/#start\"\u003e\n    \u003cimg src=\"https://i.imgur.com/sj5vqEG.jpg\" width=\"100%\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n## Contents\n\n1. [Demo](#demo)\n2. [Supported Models](#supported-models)\n3. [How to install](#how-to-install)\n   - [On umbrelOS home server](#install-llamagpt-on-your-umbrelos-home-server)\n   - [On M1/M2 Mac](#install-llamagpt-on-m1m2-mac)\n   - [Anywhere else with Docker](#install-llamagpt-anywhere-else-with-docker)\n   - [Kubernetes](#install-llamagpt-with-kubernetes)\n4. [OpenAI-compatible API](#openai-compatible-api)\n5. [Benchmarks](#benchmarks)\n6. [Roadmap and contributing](#roadmap-and-contributing)\n7. [Acknowledgements](#acknowledgements)\n\n## Demo\n\nhttps://github.com/getumbrel/llama-gpt/assets/10330103/5d1a76b8-ed03-4a51-90bd-12ebfaf1e6cd\n\n## Supported models\n\nCurrently, LlamaGPT supports the following models. Support for running custom models is on the roadmap.\n\n| Model name                               | Model size | Model download size | Memory required |\n| ---------------------------------------- | ---------- | ------------------- | --------------- |\n| Nous Hermes Llama 2 7B Chat (GGML q4_0)  | 7B         | 3.79GB              | 6.29GB          |\n| Nous Hermes Llama 2 13B Chat (GGML q4_0) | 13B        | 7.32GB              | 9.82GB          |\n| Nous Hermes Llama 2 70B Chat (GGML q4_0) | 70B        | 38.87GB             | 41.37GB         |\n| Code Llama 7B Chat (GGUF Q4_K_M)         | 7B         | 4.24GB              | 6.74GB          |\n| Code Llama 13B Chat (GGUF Q4_K_M)        | 13B        | 8.06GB              | 10.56GB         |\n| Phind Code Llama 34B Chat (GGUF Q4_K_M)  | 34B        | 20.22GB             | 22.72GB         |\n\n## How to install\n\n### Install LlamaGPT on your umbrelOS home server\n\nRunning LlamaGPT on an [umbrelOS](https://umbrel.com) home server is one click. Simply install it from the [Umbrel App Store](https://apps.umbrel.com/app/llama-gpt).\n\n[![LlamaGPT on Umbrel App Store](https://apps.umbrel.com/app/llama-gpt/badge-light.svg)](https://apps.umbrel.com/app/llama-gpt)\n\n### Install LlamaGPT on M1/M2 Mac\n\nMake sure your have Docker and Xcode installed.\n\nThen, clone this repo and `cd` into it:\n\n```\ngit clone https://github.com/getumbrel/llama-gpt.git\ncd llama-gpt\n```\n\nRun LlamaGPT with the following command:\n\n```\n./run-mac.sh --model 7b\n```\n\nYou can access LlamaGPT at http://localhost:3000.\n\n\u003e To run 13B or 70B chat models, replace `7b` with `13b` or `70b` respectively.\n\u003e To run 7B, 13B or 34B Code Llama models, replace `7b` with `code-7b`, `code-13b` or `code-34b` respectively.\n\nTo stop LlamaGPT, do `Ctrl + C` in Terminal.\n\n### Install LlamaGPT anywhere else with Docker\n\nYou can run LlamaGPT on any x86 or arm64 system. Make sure you have Docker installed.\n\nThen, clone this repo and `cd` into it:\n\n```\ngit clone https://github.com/getumbrel/llama-gpt.git\ncd llama-gpt\n```\n\nRun LlamaGPT with the following command:\n\n```\n./run.sh --model 7b\n```\n\nOr if you have an Nvidia GPU, you can run LlamaGPT with CUDA support using the `--with-cuda` flag, like:\n\n```\n./run.sh --model 7b --with-cuda\n```\n\nYou can access LlamaGPT at `http://localhost:3000`.\n\n\u003e To run 13B or 70B chat models, replace `7b` with `13b` or `70b` respectively.\n\u003e To run Code Llama 7B, 13B or 34B models, replace `7b` with `code-7b`, `code-13b` or `code-34b` respectively.\n\nTo stop LlamaGPT, do `Ctrl + C` in Terminal.\n\n\u003e Note: On the first run, it may take a while for the model to be downloaded to the `/models` directory. You may also see lots of output like this for a few minutes, which is normal:\n\u003e\n\u003e ```\n\u003e llama-gpt-llama-gpt-ui-1       | [INFO  wait] Host [llama-gpt-api-13b:8000] not yet available...\n\u003e ```\n\u003e\n\u003e After the model has been automatically downloaded and loaded, and the API server is running, you'll see an output like:\n\u003e\n\u003e ```\n\u003e llama-gpt-ui_1   | ready - started server on 0.0.0.0:3000, url: http://localhost:3000\n\u003e ```\n\u003e\n\u003e You can then access LlamaGPT at http://localhost:3000.\n\n---\n\n### Install LlamaGPT with Kubernetes\n\nFirst, make sure you have a running Kubernetes cluster and `kubectl` is configured to interact with it.\n\nThen, clone this repo and `cd` into it.\n\nTo deploy to Kubernetes first create a namespace:\n\n```bash\nkubectl create ns llama\n```\n\nThen apply the manifests under the `/deploy/kubernetes` directory with\n\n```bash\nkubectl apply -k deploy/kubernetes/. -n llama\n```\n\nExpose your service however you would normally do that.\n\n## OpenAI compatible API\n\nThanks to llama-cpp-python, a drop-in replacement for OpenAI API is available at `http://localhost:3001`. Open http://localhost:3001/docs to see the API documentation.\n\n## Benchmarks\n\nWe've tested LlamaGPT models on the following hardware with the default system prompt, and user prompt: \"How does the universe expand?\" at temperature 0 to guarantee deterministic results. Generation speed is averaged over the first 10 generations.\n\nFeel free to add your own benchmarks to this table by opening a pull request.\n\n#### Nous Hermes Llama 2 7B Chat (GGML q4_0)\n\n| Device                              | Generation speed |\n| ----------------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM)       | 54 tokens/sec    |\n| GCP c2-standard-16 vCPU (64 GB RAM) | 16.7 tokens/sec  |\n| Ryzen 5700G 4.4GHz 4c (16 GB RAM)   | 11.50 tokens/sec |\n| GCP c2-standard-4 vCPU (16 GB RAM)  | 4.3 tokens/sec   |\n| Umbrel Home (16GB RAM)              | 2.7 tokens/sec   |\n| Raspberry Pi 4 (8GB RAM)            | 0.9 tokens/sec   |\n\n#### Nous Hermes Llama 2 13B Chat (GGML q4_0)\n\n| Device                              | Generation speed |\n| ----------------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM)       | 20 tokens/sec    |\n| GCP c2-standard-16 vCPU (64 GB RAM) | 8.6 tokens/sec   |\n| GCP c2-standard-4 vCPU (16 GB RAM)  | 2.2 tokens/sec   |\n| Umbrel Home (16GB RAM)              | 1.5 tokens/sec   |\n\n#### Nous Hermes Llama 2 70B Chat (GGML q4_0)\n\n| Device                              | Generation speed |\n| ----------------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM)       | 4.8 tokens/sec   |\n| GCP e2-standard-16 vCPU (64 GB RAM) | 1.75 tokens/sec  |\n| GCP c2-standard-16 vCPU (64 GB RAM) | 1.62 tokens/sec  |\n\n#### Code Llama 7B Chat (GGUF Q4_K_M)\n\n| Device                        | Generation speed |\n| ----------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM) | 41 tokens/sec    |\n\n#### Code Llama 13B Chat (GGUF Q4_K_M)\n\n| Device                        | Generation speed |\n| ----------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM) | 25 tokens/sec    |\n\n#### Phind Code Llama 34B Chat (GGUF Q4_K_M)\n\n| Device                        | Generation speed |\n| ----------------------------- | ---------------- |\n| M1 Max MacBook Pro (64GB RAM) | 10.26 tokens/sec |\n\n## Roadmap and contributing\n\nWe're looking to add more features to LlamaGPT. You can see the roadmap [here](https://github.com/getumbrel/llama-gpt/issues/8#issuecomment-1681321145). The highest priorities are:\n\n- [x] Moving the model out of the Docker image and into a separate volume.\n- [x] Add Metal support for M1/M2 Macs.\n- [x] Add support for Code Llama models.\n- [x] Add CUDA support for NVIDIA GPUs.\n- [ ] Add ability to load custom models.\n- [ ] Allow users to switch between models.\n\nIf you're a developer who'd like to help with any of these, please open an issue to discuss the best way to tackle the challenge. If you're looking to help but not sure where to begin, check out [these issues](https://github.com/getumbrel/llama-gpt/labels/good%20first%20issue) that have specifically been marked as being friendly to new contributors.\n\n## Acknowledgements\n\nA massive thank you to the following developers and teams for making LlamaGPT possible:\n\n- [Mckay Wrigley](https://github.com/mckaywrigley) for building [Chatbot UI](https://github.com/mckaywrigley).\n- [Georgi Gerganov](https://github.com/ggerganov) for implementing [llama.cpp](https://github.com/ggerganov/llama.cpp).\n- [Andrei](https://github.com/abetlen) for building the [Python bindings for llama.cpp](https://github.com/abetlen/llama-cpp-python).\n- [NousResearch](https://nousresearch.com) for [fine-tuning the Llama 2 7B and 13B models](https://huggingface.co/NousResearch).\n- [Phind](https://www.phind.com/) for [fine-tuning the Code Llama 34B model](https://www.phind.com/blog/code-llama-beats-gpt4).\n- [Tom Jobbins](https://huggingface.co/TheBloke) for [quantizing the Llama 2 models](https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GGML).\n- [Meta](https://ai.meta.com/llama) for releasing Llama 2 and Code Llama under a permissive license.\n\n---\n\n[![License](https://img.shields.io/github/license/getumbrel/llama-gpt?color=%235351FB)](https://github.com/getumbrel/llama-gpt/blob/master/LICENSE.md)\n\n[umbrel.com](https://umbrel.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetumbrel%2Fllama-gpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetumbrel%2Fllama-gpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetumbrel%2Fllama-gpt/lists"}