{"id":30855260,"url":"https://github.com/juliensimon/sagemaker-inference-container-cpu","last_synced_at":"2026-04-04T22:31:14.337Z","repository":{"id":309530141,"uuid":"1036567751","full_name":"juliensimon/sagemaker-inference-container-cpu","owner":"juliensimon","description":"An Amazon SageMaker Container for Hugging Face Inference on Graviton and Intel CPUs","archived":false,"fork":false,"pushed_at":"2025-10-06T07:52:28.000Z","size":121,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-02-08T02:21:37.740Z","etag":null,"topics":["amd64","arm64","aws","docker","docker-compose","graviton","helm","huggingface","inference","intel","kubernetes-deployment","llamacpp","local-deployment","python","sagemaker"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/juliensimon.png","metadata":{"files":{"readme":"README-multiarch.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-12T09:08:21.000Z","updated_at":"2025-12-02T21:37:26.000Z","dependencies_parsed_at":"2025-08-26T13:04:12.352Z","dependency_job_id":"088baa7e-9a13-4747-8757-607e9eba6ed8","html_url":"https://github.com/juliensimon/sagemaker-inference-container-cpu","commit_stats":null,"previous_names":["juliensimon/sagemaker-inference-container-graviton","juliensimon/sagemaker-inference-container-cpu"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/juliensimon/sagemaker-inference-container-cpu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Fsagemaker-inference-container-cpu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Fsagemaker-inference-container-cpu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Fsagemaker-inference-container-cpu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Fsagemaker-inference-container-cpu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/juliensimon","download_url":"https://codeload.github.com/juliensimon/sagemaker-inference-container-cpu/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juliensimon%2Fsagemaker-inference-container-cpu/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31416770,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T20:09:54.854Z","status":"ssl_error","status_checked_at":"2026-04-04T20:09:44.350Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amd64","arm64","aws","docker","docker-compose","graviton","helm","huggingface","inference","intel","kubernetes-deployment","llamacpp","local-deployment","python","sagemaker"],"created_at":"2025-09-07T11:03:59.395Z","updated_at":"2026-04-04T22:31:14.203Z","avatar_url":"https://github.com/juliensimon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multi-Architecture AFM-4.5B Inference Container\n\nThis repository provides a multi-architecture Docker container for running the AFM-4.5B model on both ARM64 and AMD64 platforms with architecture-specific optimizations.\n\n## 🏗️ Repository Structure\n\n```\nsagemaker-inference-container-graviton/\n├── docker/\n│   ├── arm64/                    # ARM64-specific configurations\n│   │   ├── Dockerfile           # ARM64-optimized Dockerfile\n│   │   └── docker-compose.yml   # ARM64-specific compose file\n│   ├── amd64/                    # AMD64/Intel-specific configurations\n│   │   ├── Dockerfile           # AMD64-optimized Dockerfile\n│   │   └── docker-compose.yml   # AMD64-specific compose file\n│   └── multiarch/                # Multi-architecture configurations\n│       ├── Dockerfile           # Multi-arch Dockerfile\n│       └── docker-compose.yml   # Multi-arch compose file\n├── scripts/\n│   ├── build-multiarch.sh       # Build for all architectures\n│   ├── build-arm64.sh           # Build for ARM64 only\n│   ├── build-amd64.sh           # Build for AMD64 only\n│   └── detect-architecture.sh   # Auto-detect and configure\n├── config/\n│   ├── arm64/                   # ARM64-specific build configs\n│   ├── amd64/                   # AMD64-specific build configs\n│   └── common/                  # Shared configurations\n├── docs/\n│   ├── arm64-setup.md          # ARM64 setup guide\n│   ├── amd64-setup.md          # AMD64 setup guide\n│   └── multiarch-deployment.md # Multi-arch deployment guide\n├── app/                         # Shared application code\n└── tests/                       # Architecture-specific tests\n```\n\n## 🚀 Quick Start\n\n### 1. Auto-Detect Your Architecture\n\n```bash\n# This will automatically configure everything for your platform\nsource scripts/detect-architecture.sh\n```\n\n### 2. Build for Your Platform\n\n```bash\n# Build for your detected architecture\n./scripts/build-$ARCH_NAME.sh\n\n# Or build for all architectures\n./scripts/build-multiarch.sh\n```\n\n### 3. Run the Service\n\n```bash\n# First run (download, convert, quantize)\ndocker-compose -f $COMPOSE_FILE --profile first-run up --build afm-first-run\n\n# Subsequent runs (fast startup)\ndocker-compose -f $COMPOSE_FILE --profile fast up afm-fast\n```\n\n## 📋 Prerequisites\n\n- Docker and Docker Compose installed\n- HuggingFace token for AFM-4.5B (gated model)\n- Sufficient disk space (~15GB for full model + conversions)\n\n## 🔧 Build Options\n\n### Single Architecture Build\n```bash\n# ARM64 only\n./scripts/build-arm64.sh\n\n# AMD64 only\n./scripts/build-amd64.sh\n```\n\n## 🚀 Deployment Options\n\n### Option 1: Auto-Detection (Recommended)\n```bash\nsource scripts/detect-architecture.sh\ndocker-compose -f $COMPOSE_FILE --profile fast up afm-fast\n```\n\n### Option 2: Manual Selection\n```bash\n# ARM64\ndocker-compose -f docker/arm64/docker-compose.yml --profile fast up afm-fast\n\n# AMD64\ndocker-compose -f docker/amd64/docker-compose.yml --profile fast up afm-fast\n\n## 📊 Performance Comparison\n\n| Metric | ARM64 | AMD64 | Notes |\n|--------|-------|-------|-------|\n| Build Time | ~15-20 min | ~10-15 min | AMD64 typically faster |\n| Startup Time | ~30-45s | ~25-35s | Depends on hardware |\n| Inference Speed | ~12-20 tokens/s | ~15-25 tokens/s | CPU-dependent |\n| Memory Usage | ~8GB | ~8GB | Similar across platforms |\n| Power Efficiency | Better | Good | ARM64 more efficient |\n\n## 🧪 Testing\n\n### Health Check\n```bash\ncurl http://localhost:8080/ping\n```\n\n### API Test\n```bash\ncurl -X POST http://localhost:8080/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"messages\": [\n      {\"role\": \"user\", \"content\": \"Hello, how are you?\"}\n    ],\n    \"max_tokens\": 50,\n    \"temperature\": 0.7\n  }'\n```\n\n## 📚 Documentation\n\n- [ARM64 Setup Guide](docs/arm64-setup.md) - Detailed ARM64 setup and optimization\n- [AMD64 Setup Guide](docs/amd64-setup.md) - Detailed AMD64 setup and optimization\n- [Original Docker Compose Guide](README-docker-compose.md) - Original setup guide\n\n## 🔍 Troubleshooting\n\n### Common Issues\n\n1. **Build failures**: Ensure you have the correct Docker platform support\n2. **Performance issues**: Check thread count and memory allocation\n3. **Model loading errors**: Verify sufficient disk space and memory\n\n### Debug Commands\n\n```bash\n# Check architecture\nuname -m\n\n# Check Docker platform\ndocker version\n\n# Check container logs\ndocker-compose -f $COMPOSE_FILE logs afm-fast\n\n# Check resource usage\ndocker stats\n```\n\n## 🤝 Contributing\n\nWhen contributing to this multi-architecture setup:\n\n1. **Test on both platforms**: Ensure changes work on ARM64 and AMD64\n2. **Update documentation**: Keep architecture-specific guides current\n3. **Add tests**: Include tests for both architectures\n4. **Performance testing**: Benchmark changes on both platforms\n\n## 📄 License\n\nThis project is licensed under the same terms as the original repository.\n\n## 🙏 Acknowledgments\n\n- Original AFM-4.5B model by Arcee AI\n- llama.cpp for the inference engine","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliensimon%2Fsagemaker-inference-container-cpu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuliensimon%2Fsagemaker-inference-container-cpu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliensimon%2Fsagemaker-inference-container-cpu/lists"}