{"id":21457841,"url":"https://github.com/grctest/fastapi-bitnet","last_synced_at":"2025-07-15T01:30:54.890Z","repository":{"id":262368924,"uuid":"886042315","full_name":"grctest/FastAPI-BitNet","owner":"grctest","description":"Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.","archived":false,"fork":false,"pushed_at":"2025-07-02T13:18:21.000Z","size":109,"stargazers_count":33,"open_issues_count":0,"forks_count":8,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-07-10T05:29:24.344Z","etag":null,"topics":["1-bit","benchmarking","bitnet","docker","fastapi","inference","llm","model-context-protocol","multi-chat","perplexity","python","server-orchestration","uvicorn"],"latest_commit_sha":null,"homepage":"https://hub.docker.com/repository/docker/grctest/fastapi_bitnet","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grctest.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["grctest"]}},"created_at":"2024-11-10T03:11:03.000Z","updated_at":"2025-07-02T13:18:24.000Z","dependencies_parsed_at":"2024-11-12T04:18:16.218Z","dependency_job_id":"6db5088f-59ac-4024-bf01-ba06d0263e67","html_url":"https://github.com/grctest/FastAPI-BitNet","commit_stats":null,"previous_names":["grctest/fastapi-bitnet"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/grctest/FastAPI-BitNet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grctest%2FFastAPI-BitNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grctest%2FFastAPI-BitNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grctest%2FFastAPI-BitNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grctest%2FFastAPI-BitNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grctest","download_url":"https://codeload.github.com/grctest/FastAPI-BitNet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grctest%2FFastAPI-BitNet/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265385737,"owners_count":23756728,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["1-bit","benchmarking","bitnet","docker","fastapi","inference","llm","model-context-protocol","multi-chat","perplexity","python","server-orchestration","uvicorn"],"created_at":"2024-11-23T06:07:47.804Z","updated_at":"2025-07-15T01:30:54.881Z","avatar_url":"https://github.com/grctest.png","language":"Python","funding_links":["https://github.com/sponsors/grctest"],"categories":[],"sub_categories":[],"readme":"# FastAPI-BitNet\n\nThis project provides a robust REST API built with FastAPI and Docker to manage and interact with `llama.cpp`-based BitNet model instances. It allows developers and researchers to programmatically control `llama-cli` and `llama-server` processes for automated testing, benchmarking, and interactive chat sessions.\n\n## Key Features\n\n*   **Session Management**: Start, stop, and check the status of multiple persistent `llama-cli` and `llama-server` session based chats.\n*   **Batch Operations**: Initialize, shut down, and chat with multiple instances in a single API call.\n*   **Interactive Chat**: Send prompts to running bitnet sessions and receive cleaned model responses.\n*   **Model Benchmarking**: Programmatically run benchmarks and calculate perplexity on GGUF models.\n*   **Resource Estimation**: Estimate maximum server capacity based on available system RAM and CPU threads.\n*   **VS Code Integration**: Connects directly to GitHub Copilot Chat as a tool via the Model Context Protocol.\n*   **Automatic API Docs**: Interactive API documentation powered by Swagger UI and ReDoc.\n\n## Technology Stack\n\n*   [FastAPI](https://github.com/fastapi/fastapi) for the core web framework.\n*   [Uvicorn](https://www.uvicorn.org/) as the ASGI server.\n*   [Docker](https://www.docker.com/) for containerization and easy deployment.\n*   [Pydantic](https://docs.pydantic.dev/) for data validation and settings management.\n*   [fastapi-mcp](https://github.com/tadata-org/fastapi_mcp) for VS Code Copilot tool integration.\n\n---\n\n## Getting Started\n\n### Prerequisites\n\n*   [Docker Desktop](https://www.docker.com/products/docker-desktop/)\n*   [Conda](https://www.anaconda.com/download) (or another Python environment manager)\n*   Python 3.10+\n\n### 1. Set Up the Python Environment\n\nCreate and activate a Conda environment:\n```bash\nconda create -n bitnet python=3.11\nconda activate bitnet\n```\n\nInstall the Huggingface-CLI tool to download the models:\n```\npip install -U \"huggingface_hub[cli]\"\n```\n \nDownload Microsoft's official BitNet model:\n```\nhuggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir app/models/BitNet-b1.58-2B-4T\n```\n\n---\n\n## Running the Application\n\n### Using Docker (Recommended)\n\nThis is the easiest and recommended way to run the application.\n\n1.  **Build the Docker image:**\n    ```bash\n    docker build -t fastapi_bitnet .\n    ```\n\n2.  **Run the Docker container:**\n    This command runs the container in detached mode (`-d`) and maps port 8080 on your host to port 8080 in the container.\n    ```bash\n    docker run -d --name ai_container -p 8080:8080 fastapi_bitnet\n    ```\n\n### Local Development\n\nFor development, you can run the application directly with Uvicorn, which enables auto-reloading.\n\n```bash\nuvicorn app.main:app --host 0.0.0.0 --port 8080 --reload\n```\n\n---\n\n## API Usage\n\nOnce the server is running, you can access the interactive API documentation:\n\n*   **Swagger UI**: [http://127.0.0.1:8080/docs](http://127.0.0.1:8080/docs)\n*   **ReDoc**: [http://127.0.0.1:8080/redoc](http://127.0.0.1:8080/redoc)\n\n---\n\n## VS Code Integration\n\n### As a Copilot Tool (MCP)\n\nYou can connect this API directly to VS Code's Copilot Chat to create and interact with models.\n\n1.  Run the application using Docker or locally.\n2.  In VS Code, open the Copilot Chat panel.\n3.  Click the wrench icon (\"Configure Tools...\").\n4.  Scroll to the bottom and select `+ Add MCP Server`, then choose `HTTP`.\n5.  Enter the URL: `http://127.0.0.1:8080/mcp`\n\nCopilot will now be able to use the API to launch and chat with BitNet instances.\n\n### See Also - VSCode Extension!\n\nFor a more integrated experience, check out the companion VS Code extension:\n*   **GitHub**: [https://github.com/grctest/BitNet-VSCode-Extension](https://github.com/grctest/BitNet-VSCode-Extension)\n*   **Marketplace**: [https://marketplace.visualstudio.com/items?itemName=nftea-gallery.bitnet-vscode-extension](https://marketplace.visualstudio.com/items?itemName=nftea-gallery.bitnet-vscode-extension)\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrctest%2Ffastapi-bitnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrctest%2Ffastapi-bitnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrctest%2Ffastapi-bitnet/lists"}