{"id":23011992,"url":"https://github.com/chriamue/chat-flame-backend","last_synced_at":"2025-08-14T06:32:23.509Z","repository":{"id":213075355,"uuid":"732954592","full_name":"chriamue/chat-flame-backend","owner":"chriamue","description":"ChatFlameBackend is an innovative backend solution for chat applications, leveraging the power of the Candle AI framework with a focus on the Mistral model","archived":false,"fork":false,"pushed_at":"2024-01-21T07:40:57.000Z","size":1429,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-12-13T18:09:23.179Z","etag":null,"topics":["backend-api","candle","huggingface-inference-endpoint","llama2","llm-inference","mistral","phi","rust-lang"],"latest_commit_sha":null,"homepage":"https://blog.chriamue.de/chat-flame-backend/chat_flame_backend/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chriamue.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":["chriamue"]}},"created_at":"2023-12-18T08:37:26.000Z","updated_at":"2024-08-09T20:20:34.000Z","dependencies_parsed_at":"2023-12-24T00:15:28.103Z","dependency_job_id":"1e4e6e83-a94f-4ce7-9f66-63297b0585e7","html_url":"https://github.com/chriamue/chat-flame-backend","commit_stats":{"total_commits":82,"total_committers":2,"mean_commits":41.0,"dds":0.03658536585365857,"last_synced_commit":"cbfcdf918e9452c99677eeb34310be67902b1f7e"},"previous_names":["chriamue/chat-flame-backend"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriamue%2Fchat-flame-backend","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriamue%2Fchat-flame-backend/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriamue%2Fchat-flame-backend/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriamue%2Fchat-flame-backend/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chriamue","download_url":"https://codeload.github.com/chriamue/chat-flame-backend/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229805136,"owners_count":18126808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backend-api","candle","huggingface-inference-endpoint","llama2","llm-inference","mistral","phi","rust-lang"],"created_at":"2024-12-15T10:12:05.737Z","updated_at":"2024-12-15T10:12:06.498Z","avatar_url":"https://github.com/chriamue.png","language":"Rust","funding_links":["https://github.com/sponsors/chriamue"],"categories":[],"sub_categories":[],"readme":"# chat-flame-backend\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Doc](https://img.shields.io/badge/Docs-online-green.svg)](https://blog.chriamue.de/chat-flame-backend/chat_flame_backend/)\n[![codecov](https://codecov.io/gh/chriamue/chat-flame-backend/graph/badge.svg?token=MNHB75EJ2Z)](https://codecov.io/gh/chriamue/chat-flame-backend)\n\nChatFlameBackend is an innovative backend solution for chat applications, leveraging the power of the Candle AI framework with a focus on the Mistral model\n\n## Quickstart\n\n### Installation\n\n```bash\ncargo build --release\n```\n\n### Running\n\nRun the server\n\n```bash\ncargo run --release\n```\n\nRun one of the models\n\n```bash\ncargo run --release -- --model phi-v2 --prompt 'write me fibonacci in rust'\n```\n\n### Docker\n\n```bash\ndocker-compose up --build\n```\n\nVisit http://localhost:8080/swagger-ui for the swagger ui.\n\n## Testing\n\n### Test using the shell\n\n```bash\ncargo test\n```\n\nor with curl\n\n```bash\ncurl -X POST http://localhost:8080/generate \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\"inputs\": \"Your text prompt here\"}'\n```\n\nor the stream endpoint\n\n```bash\ncurl -X POST -H \"Content-Type: application/json\" -d '{\"inputs\": \"Your input text\"}' http://localhost:8080/generate_stream\n```\n\n### Test using python\n\nYou can find a detailed documentation on how to use the python client on [huggingface](https://huggingface.co/docs/text-generation-inference/basic_tutorials/consuming_tgi#inference-client).\n\n```bash\nvirtualenv .venv\nsource .venv/bin/activate\npip install huggingface-hub\npython test.py\n```\n\n## Architecture\n\nThe backend is written in rust. The models are loaded using the [candle](https://github.com/huggingface/candle) framework.\nTo serve the models on an http endpoint, axum is used.\nUtoipa is used to provide a swagger ui for the api.\n\n## Supported Models\n\n- [x] [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1)\n- [x] Zephyr\n- [x] OpenChat\n- [x] Starling\n- [x] [Phi](https://huggingface.co/microsoft/phi-2) (Phi-1, Phi-1.5, Phi-2)\n- [ ] GPT-Neo\n- [ ] GPT-J\n- [ ] Llama\n\n### Mistral\n\n[\"lmz/candle-mistral\"](https://huggingface.co/lmz/candle-mistral)\n\n### Phi\n\n[\"microsoft/phi-2\"](https://huggingface.co/microsoft/phi-2)\n\n## Performance\n\nThe following table shows the performance metrics of the model on different systems:\n\n| Model            | System                     | Tokens per Second |\n| ---------------- | -------------------------- | ----------------- |\n| 7b-open-chat-3.5 | AMD 7900X3D (12 Core) 64GB | 9.4 tokens/s      |\n| 7b-open-chat-3.5 | AMD 5600G (8 Core VM) 16GB | 2.8 tokens/s      |\n| 13b (llama2 13b) | AMD 7900X3D (12 Core) 64GB | 5.2 tokens/s      |\n| phi-2            | AMD 7900X3D (12 Core) 64GB | 20.6 tokens/s     |\n| phi-2            | AMD 5600G (8 Core VM) 16GB | 5.3 tokens/s      |\n| phi-2            | Apple M2 (10 Core) 16GB    | 24.0 tokens/s     |\n\n### Hint\n\nThe performance of the model is highly dependent on the memory bandwidth of the system.\nWhile getting 20.6 tokens/s for the Phi-2 Model on a AMD 7900X3D with 64GB of DDR5-4800 memory,\nthe performance could be increased to\n21.8 tokens/s by overclocking the memory to DDR5-5600.\n\n## Todo\n\n- [x] implement api for https://huggingface.github.io/text-generation-inference/#/\n- [x] model configuration\n- [x] generate stream\n- [x] docker image and docker-compose\n- [ ] add tests\n- [ ] add documentation\n- [ ] fix stop token\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchriamue%2Fchat-flame-backend","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchriamue%2Fchat-flame-backend","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchriamue%2Fchat-flame-backend/lists"}