https://github.com/llm-db/fineinfer
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)
https://github.com/llm-db/fineinfer
fine-tuning inference llm lora peft pytorch
Last synced: 7 months ago
JSON representation
Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)
- Host: GitHub
- URL: https://github.com/llm-db/fineinfer
- Owner: llm-db
- License: mit
- Created: 2024-02-27T11:31:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-28T20:47:42.000Z (about 1 year ago)
- Last Synced: 2024-06-07T23:22:49.226Z (about 1 year ago)
- Topics: fine-tuning, inference, llm, lora, peft, pytorch
- Language: Python
- Homepage: https://dl.acm.org/doi/10.1145/3642970.3655835
- Size: 53.7 KB
- Stars: 9
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
FineInfer
| Paper |FineInfer is a research prototype for fine-tuning and serving large language models.
FineInfer supports concurrent parameter-efficient fine-tuning and inference through the following features:
* Deferred continuous batching
* Hybrid system architecture
* Heterogeneous batching## Get Started
[Installation and examples](https://github.com/llm-db/FineInfer/tree/main/benchmarks/fineinfer)The current version removes some previous features and functionalities. If you need them, please download [previous versions](https://github.com/llm-db/FineInfer/releases).
## Citation
```
@inproceedings{FineInfer,
author = {He, Yongjun and Lu, Yao and Alonso, Gustavo},
title = {Deferred Continuous Batching in Resource-Efficient Large Language Model Serving},
year = {2024},
booktitle = {Proceedings of the 4th Workshop on Machine Learning and Systems},
pages = {98–106},
series = {EuroMLSys '24}
}
```