https://github.com/pchsu-hsupc/edge_ai_13th
This project optimizes the LLaMA-3.2B-Instruct model for fast inference on a single NVIDIA T4 GPU (16 GB), targeting high throughput and low perplexity for efficient edge deployment.
https://github.com/pchsu-hsupc/edge_ai_13th
gguf llama-cpp-python llama3 lora
Last synced: 4 months ago
JSON representation
This project optimizes the LLaMA-3.2B-Instruct model for fast inference on a single NVIDIA T4 GPU (16 GB), targeting high throughput and low perplexity for efficient edge deployment.
- Host: GitHub
- URL: https://github.com/pchsu-hsupc/edge_ai_13th
- Owner: pchsu-hsupc
- Created: 2025-05-06T07:49:03.000Z (6 months ago)
- Default Branch: master
- Last Pushed: 2025-06-04T15:25:47.000Z (5 months ago)
- Last Synced: 2025-06-04T17:50:50.187Z (5 months ago)
- Topics: gguf, llama-cpp-python, llama3, lora
- Language: Python
- Homepage:
- Size: 19.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0