https://github.com/kingabzpro/deploying-llama-3.3-70b
Serve Llama 3.3 70B (with AWQ quantization) using vLLM and deploy it on BentoCloud.
https://github.com/kingabzpro/deploying-llama-3.3-70b
bentocloud bentoml cloud fastapi llama3-3 mlops vllm
Last synced: about 2 months ago
JSON representation
Serve Llama 3.3 70B (with AWQ quantization) using vLLM and deploy it on BentoCloud.
- Host: GitHub
- URL: https://github.com/kingabzpro/deploying-llama-3.3-70b
- Owner: kingabzpro
- License: apache-2.0
- Created: 2025-01-27T17:29:27.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-01-27T20:01:23.000Z (3 months ago)
- Last Synced: 2025-01-27T21:22:30.445Z (3 months ago)
- Topics: bentocloud, bentoml, cloud, fastapi, llama3-3, mlops, vllm
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0