https://github.com/kingabzpro/deploying-llama-3.3-70b

Serve Llama 3.3 70B (with AWQ quantization) using vLLM and deploy it on BentoCloud.
https://github.com/kingabzpro/deploying-llama-3.3-70b

bentocloud bentoml cloud fastapi llama3-3 mlops vllm

Last synced: about 2 months ago
JSON representation

Serve Llama 3.3 70B (with AWQ quantization) using vLLM and deploy it on BentoCloud.

Host: GitHub
URL: https://github.com/kingabzpro/deploying-llama-3.3-70b
Owner: kingabzpro
License: apache-2.0
Created: 2025-01-27T17:29:27.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-01-27T20:01:23.000Z (3 months ago)
Last Synced: 2025-01-27T21:22:30.445Z (3 months ago)
Topics: bentocloud, bentoml, cloud, fastapi, llama3-3, mlops, vllm
Language: Python
Homepage:
Size: 9.77 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0

Awesome Lists containing this project