Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andrewn6/tinyllm
Minimal, fast inference engine for LLM's
https://github.com/andrewn6/tinyllm
Last synced: 6 days ago
JSON representation
Minimal, fast inference engine for LLM's
- Host: GitHub
- URL: https://github.com/andrewn6/tinyllm
- Owner: andrewn6
- Created: 2024-11-19T17:21:41.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-12-27T07:11:04.000Z (12 days ago)
- Last Synced: 2024-12-27T08:22:02.744Z (12 days ago)
- Language: Python
- Size: 48.9 MB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.MD
Awesome Lists containing this project
README
# TinyLLM
Minimal, high-performance inference engine for LLM's -- used in development environments
## Overview
TinyLLM streamlines the inference pipeline with minimal overhead, focusing on memory efficiency and throughput optimization. We include a custom tokenizer for self-developed models, and it's compataibile with existing LLM's through our scheduling systme.## Features
- Memory managment pruning
- Efficient batch processing and response streaming
- Optimized scheduling for multi-model deployments
- Custom tokenizer implmentation for self-developed models
- Inference API
- KV cache implementation
- Training CLI for development models
- Byte-level tokenization*This is very much still an experiment, especially the tokenizer, our scheduler is somewhat well-written, memory management is decent.*
I'll continue to slowly improve these components over my weekends.
## Scope
This is solely a inference engine. It does not:
- Implement large model architectures
- Include pre-trained models
- Support distributed training## How to use?
Clone repository
```
git clone https://github.com/andrewn6/tinyllm
```
```
pip install -e .
```Register your trained model
```
tinyllm model register transformer-19m v1 \
--checkpoint models/tiny-19m.pt \
--model-type native \
--description "19M parameter transformer"
```Serve and expose to localhost
```
tinyllm serve \
--model-name mymodel \
--port 8000 \
--model-type native
```List models
```
tinyllm model list
```