https://github.com/FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
https://github.com/FMInference/FlexLLMGen
deep-learning gpt-3 high-throughput large-language-models machine-learning offloading opt
Last synced: about 1 month ago
JSON representation
Running large language models on a single GPU for throughput-oriented scenarios.
- Host: GitHub
- URL: https://github.com/FMInference/FlexLLMGen
- Owner: FMInference
- License: apache-2.0
- Archived: true
- Created: 2023-02-15T21:18:53.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-28T03:05:41.000Z (6 months ago)
- Last Synced: 2025-02-24T09:36:55.388Z (about 2 months ago)
- Topics: deep-learning, gpt-3, high-throughput, large-language-models, machine-learning, offloading, opt
- Language: Python
- Homepage:
- Size: 37.1 MB
- Stars: 9,266
- Watchers: 111
- Forks: 558
- Open Issues: 58
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- my-awesome-starred - FMInference/FlexLLMGen - Running large language models on a single GPU for throughput-oriented scenarios. (Python)
- awesome-repositories - FMInference/FlexLLMGen - Running large language models on a single GPU for throughput-oriented scenarios. (Python)