{"id":24859656,"url":"https://github.com/aleefbilal/llama3.1-runpod-serverless","last_synced_at":"2026-05-06T20:36:08.349Z","repository":{"id":275035414,"uuid":"869088265","full_name":"AleefBilal/llama3.1-runpod-serverless","owner":"AleefBilal","description":"This project hosts the LLaMA 3.1 CPP model on RunPod's serverless platform using Docker. It features a Python 3.11 environment with CUDA 12.2, enabling scalable AI request processing through configurable payload options and GPU support.","archived":false,"fork":false,"pushed_at":"2024-10-07T18:01:07.000Z","size":18,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-30T20:33:06.639Z","etag":null,"topics":["docker","llama3","llamacpp","runpod","runpod-serverless"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AleefBilal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-07T17:40:35.000Z","updated_at":"2024-10-15T06:51:56.000Z","dependencies_parsed_at":"2025-01-30T20:43:29.651Z","dependency_job_id":null,"html_url":"https://github.com/AleefBilal/llama3.1-runpod-serverless","commit_stats":null,"previous_names":["aleefbilal/llama3.1-runpod-serverless"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AleefBilal%2Fllama3.1-runpod-serverless","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AleefBilal%2Fllama3.1-runpod-serverless/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AleefBilal%2Fllama3.1-runpod-serverless/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AleefBilal%2Fllama3.1-runpod-serverless/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AleefBilal","download_url":"https://codeload.github.com/AleefBilal/llama3.1-runpod-serverless/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245689477,"owners_count":20656418,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","llama3","llamacpp","runpod","runpod-serverless"],"created_at":"2025-01-31T20:59:25.572Z","updated_at":"2026-05-06T20:36:08.300Z","avatar_url":"https://github.com/AleefBilal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# LLaMA 3.1 CPP on RunPod Serverless with Docker\n\nThis project hosts a LLaMA 3.1 CPP model on RunPod's serverless service using Docker. The model processes requests, handles inputs, and outputs responses. It uses Python 3.11, CUDA 12.2, and runs on Ubuntu 22.04.\n\n\n## Features\n- Dockerized environment\n- Python 3.11\n- CUDA 12.2\n- Ubuntu 22.04\n- LLaMA 3.1 8b CPP-based model for handling AI requests\n- Serverless deployment using RunPod\n\n## Docker Setup\n\n### Building the Docker Image\nTo build the Docker image, run the following command:\n\n```bash\nsudo docker build -t \u003cdocker_name\u003e:\u003cdocker_tag\u003e .\n```\n\n### Running the Docker Container\nTo run the Docker container with GPU support, use the following command:\n\n```bash\nsudo docker run --rm -it --gpus all \u003cdocker_name\u003e:\u003cdocker_tag\u003e\n```\n\nOnce the Docker is running, as it is a serverless Docker hosted on RunPod, it will process a predefined test input (`test_input.json`) and return a response.\n\n## Main File: `app.py`\nThe core of the pipeline is implemented in `src/app.py`. It handles the model inference and input/output processing.\n\n## Payload Format\nThe Docker processes a payload in the following format:\n\n```json\n{\n    \"input\": {\n        \"llm_kwargs\": {\n            \"n_batch\": 2048,\n            \"max_tokens\": 1000,\n            \"temperature\": 0.8,\n            \"top_k\": 40,\n            \"top_p\": 0.9\n        },\n        \"text\": [\n            {\n                \"role\": \"system\",\n                \"content\": \"system_message here\"\n            },\n            {\n                \"role\": \"user\",\n                \"content\": \"user_query here\"\n            }\n        ]\n    }\n}\n```\n\n### Key Parameters:\n- **n_batch**: Batch size for processing (default: 2048)\n- **max_tokens**: Maximum number of tokens to generate (default: 1000)\n- **temperature**: Sampling temperature for randomness (default: 0.8)\n- **top_k**: Top-k sampling for the model (default: 40)\n- **top_p**: Nucleus sampling threshold (default: 0.9)\n\n\n## Important Note\n- Do not forget to update the path to your `llama-cpp` model in `src/app.py`\n- you can check logs of your model loading to see if your model is utilizing cuda or not.\n- If Nvidia GPU is available, docker is build successfully, but your llama model is still not utilizing GPU, most probably this issue would be with `llama-cpp` library, It is pretty unstable.\n- In this case, experiment with different versions and stuff.\n- Hope it gets fixed soon.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faleefbilal%2Fllama3.1-runpod-serverless","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faleefbilal%2Fllama3.1-runpod-serverless","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faleefbilal%2Fllama3.1-runpod-serverless/lists"}