{"id":25877193,"url":"https://github.com/awsdataarchitect/lambda-ollama-deepseek","last_synced_at":"2026-04-13T17:06:24.599Z","repository":{"id":280232128,"uuid":"940869692","full_name":"awsdataarchitect/lambda-ollama-deepseek","owner":"awsdataarchitect","description":"DeepSeek-R1 inference on AWS Lambda using Function URL: An Experimental Approach for AI Prototyping","archived":false,"fork":false,"pushed_at":"2025-03-02T05:46:33.000Z","size":44,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-02T06:25:06.758Z","etag":null,"topics":["aws-lambda","cdk","deepseek-r1","docker","ollama"],"latest_commit_sha":null,"homepage":"https://vivek-aws.medium.com/deepseek-r1-inference-on-aws-lambda-using-function-url-no-api-gateway-needed-4d4e4d183164","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/awsdataarchitect.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-01T00:25:32.000Z","updated_at":"2025-03-02T05:46:36.000Z","dependencies_parsed_at":"2025-03-02T06:37:31.743Z","dependency_job_id":null,"html_url":"https://github.com/awsdataarchitect/lambda-ollama-deepseek","commit_stats":null,"previous_names":["awsdataarchitect/lambda-ollama-deepseek"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsdataarchitect%2Flambda-ollama-deepseek","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsdataarchitect%2Flambda-ollama-deepseek/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsdataarchitect%2Flambda-ollama-deepseek/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/awsdataarchitect%2Flambda-ollama-deepseek/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/awsdataarchitect","download_url":"https://codeload.github.com/awsdataarchitect/lambda-ollama-deepseek/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241494514,"owners_count":19971931,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-lambda","cdk","deepseek-r1","docker","ollama"],"created_at":"2025-03-02T11:19:11.559Z","updated_at":"2026-04-13T17:06:24.581Z","avatar_url":"https://github.com/awsdataarchitect.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepSeek-R1 inference on AWS Lambda using Function URL (no API Gateway needed): An Experimental Approach for AI Prototyping\n\nFull AWS-CDK code for LLM deployment on AWS Lambda-Docker Container.\n\nFor more details on how to deploy the infrastructure and the solution details, please refer to the Blog Post:\n* [DeepSeek-R1 inference on AWS Lambda using Function URL (no API Gateway needed)](https://vivek-aws.medium.com/deepseek-r1-inference-on-aws-lambda-using-function-url-no-api-gateway-needed-4d4e4d183164).\n\nOnce deployed, get the Function URL from CDK outputs.\n\nRun a test request (e.g.):\n\n```bash\ncurl -X POST https://amnfnya7regz5vbtc5cguxpfbm0iyogj.lambda-url.us-east-1.on.aws/ \\\n     -d '{\"prompt\": \"Explain quantum computing\"}' \\\n     -H \"Content-Type: application/json\"\n```\n\nExpected Response:\n\n```json\n{\n  \"response\": \"Quantum computing is a type of computing that uses quantum bits...\"\n}\n```\n\n## Comparison: Deployment Options for DeepSeek-R1 on AWS\n\n| **Service**                 | **Architecture Support** | **Memory Limits**               | **Storage Capacity**                               | **Execution Timeouts**                          | **Cost Model**                                                         | **Scaling Capabilities**                                                   | **Cold Start Impact**                           | **Infrastructure Management**          | **Model Updates**                             | **Integration Capabilities**                                                   | **Ideal Use Cases**                                                                 |\n|-----------------------------|-------------------------|--------------------------------|--------------------------------------------------|-------------------------------------------------|------------------------------------------------------------------------|----------------------------------------------------------------------------------|------------------------------------------------|--------------------------------------|--------------------------------------------|----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|\n| **AWS Lambda**              | x86_64, ARM64 (Graviton2) | 10GB max                      | Ephemeral /tmp (10GB max), EFS                   | 15 minutes maximum                              | Pay-per-invocation + compute duration (GB-seconds)                   | Automatic scaling to account limits; Provisioned Concurrency option            | Significant for large containers               | Minimal (serverless)                  | Redeployment required                       | Native with API Gateway, Function URL, CloudWatch, S3, DynamoDB, etc               | Development, prototyping, low-traffic inference endpoints                            |\n| **Amazon SageMaker AI (JumpStart)** | x86_64, ARM64 (Graviton), GPU (NVIDIA) | Up to 768GB (on 24xlarge instances) | EBS volumes (up to several TB), FSx, S3 integration | No timeout for inference endpoints              | Hourly instance rates + storage costs; Savings Plans available                  | Auto-scaling based on invocations or custom metrics; Multi-model endpoints       | Minimal with persistent endpoints              | Medium (managed inference)            | Built-in model versioning and staging       | Deep integration with AWS ML services, including EFA for HPC                        | Production ML workloads, high-throughput inference, regulated environments          |\n| **Amazon Bedrock**          | Managed by AWS          | Managed by AWS                | Managed by AWS                                  | API timeout: 30 seconds for standard requests | Pay-per-token pricing (input/output tokens)                          | Transparent, fully-managed scaling                                       | None (always available)                        | None (fully managed)                  | Automatic updates by AWS                   | Native with all AWS services; Guardrails for content filtering                      | Enterprise applications, content generation, customer-facing applications          |\n| **Amazon EKS**              | x86_64, ARM64, GPU (NVIDIA), AWS Inferentia, Trainium | Limited by node type (up to 24TB with u-24tb1.metal) | EBS, EFS, FSx, persistent volumes, instance store | Configurable – no built-in limits              | EC2/Fargate costs + $0.10/hour per cluster                           | HPA/VPA/Cluster Autoscaler/Karpenter; Complex scaling strategies                 | Depends on warm pool configuration            | High (Kubernetes expertise required)  | CI/CD / GitOps pipelines can be used       | Native integration with numerous AWS services                                      | Complex ML pipelines, multi-model serving, custom scaling requirements             |\n| **Amazon ECS Fargate**      | x86_64, ARM64          | Up to 120GB per task          | EFS integration, ephemeral storage (up to 200GB) | No built-in task timeout                        | vCPU and memory per second; Fargate Savings Plans available           | Service Auto Scaling based on CloudWatch metrics, target tracking, step scaling | Moderate (task startup time: 10–15 seconds)  | Low-Medium (container orchestration) | Task definition updates for new models      | Native integration with CloudWatch, ALB, VPC                                       | Mid-scale deployments, containerized applications with moderate resource needs     |\n\n\n## Useful commands\n\nThe `cdk.json` file tells the CDK Toolkit how to execute your app.\n\n* `npm run build`   compile typescript to js\n* `npm run watch`   watch for changes and compile\n* `npm run test`    perform the jest unit tests\n* `npx cdk deploy`  deploy this stack to your default AWS account/region\n* `npx cdk diff`    compare deployed stack with current state\n* `npx cdk synth`   emits the synthesized CloudFormation template\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawsdataarchitect%2Flambda-ollama-deepseek","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fawsdataarchitect%2Flambda-ollama-deepseek","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fawsdataarchitect%2Flambda-ollama-deepseek/lists"}