An open API service indexing awesome lists of open source software.

https://github.com/peguesj/ollama-bench

Comprehensive benchmarking and optimization tool for Ollama models with TUI and CLI interfaces
https://github.com/peguesj/ollama-bench

benchmark cli modelfile ollama optimization performance python tui

Last synced: about 1 month ago
JSON representation

Comprehensive benchmarking and optimization tool for Ollama models with TUI and CLI interfaces

Awesome Lists containing this project

README

          

# Ollama Bench

[![Version](https://img.shields.io/badge/version-2.0.0-blue.svg)](https://github.com/peguesj/ollama-bench)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org)

A comprehensive benchmarking and optimization tool for Ollama models with both Terminal User Interface (TUI) and Command-Line Interface (CLI) modes.

## Author

**Jeremiah Pegues**

## Features

### πŸš€ Core Features
- **Dual Interface**: Full-featured TUI with real-time graphs or simple CLI mode
- **Model Benchmarking**: Compare performance across multiple Ollama models
- **System Optimization**: Automatically tune models for your hardware
- **Resource Monitoring**: Real-time CPU, GPU, and RAM usage tracking
- **Batch Processing**: Optimize multiple models in parallel
- **Export Results**: Save benchmark data in CSV format

### 🎯 Version 2.0 Features
- **Modelfile Optimization**: Generate optimized configurations based on system specs
- **Batch Model Optimization**: Optimize all models with a single command
- **Hardware Detection**: Automatic detection of CPU, RAM, and GPU capabilities
- **Platform-Specific Tuning**: Special optimizations for Apple Silicon
- **Performance Profiling**: Detailed metrics including tokens/sec and memory usage

## Installation

### From Source

```bash
git clone https://github.com/peguesj/ollama-bench.git
cd ollama-bench
pip install -e .
```

### Dependencies

```bash
pip install psutil pynvml py-cpuinfo
```

## Quick Start

### TUI Mode (Interactive)

```bash
ollama-bench
# or
python -m ollama_bench
```

### CLI Mode (Non-Interactive)

```bash
ollama-bench --cli
```

### Direct Scripts

```bash
# Optimize a single model
python optimize_model.py llama2

# Optimize all models
python optimize_all.py --parallel

# Clean up optimized models
python optimize_all.py --cleanup
```

## Usage Guide

### TUI Interface

The TUI provides a rich interactive experience with:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Ollama Bench v2.0.0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ === Models === β”‚ Benchmark Results β”‚
β”‚ qwen2.5-coder β”‚ Model: qwen2.5-coder β”‚
β”‚ llama2:7b β”‚ Tokens/sec: 42.3 β”‚
β”‚ codellama:34b β”‚ Time: 1.2s β”‚
β”‚ β”‚ Peak RAM: 7.2 GB β”‚
β”‚ === Actions ===β”‚ β”‚
β”‚> Run Benchmark β”‚ β”Œβ”€Performance Graph──────┐ β”‚
β”‚ Configuration β”‚ β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ β”‚
β”‚ Optimize Model β”‚ β”‚ CPU: 45% GPU: 80% β”‚ β”‚
β”‚ Export Results β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ [Up/Down] Navigate [Enter] Select [O] Optimize [Q] Quit β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Ready CPU: 12% RAM: 8GB β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

#### Keyboard Shortcuts
- **Arrow Keys**: Navigate menu
- **Enter**: Select menu item
- **Space**: Start/stop benchmark
- **O**: Optimize selected model (or all if none selected)
- **E**: Edit configuration
- **M**: Edit Modelfile
- **X**: Export results
- **Q**: Quit

### CLI Interface

The CLI provides a simple menu-driven interface:

```bash
$ ollama-bench --cli

============================================================
Ollama Bench CLI - Benchmarking Tool
============================================================

=== Main Menu ===
1. Run Benchmark
2. List Models
3. Show Configuration
4. Export Results
5. Optimize Single Model
6. Optimize All Models
7. Show System Info
8. Clean Optimized Models
Q. Quit

Enter choice:
```

### Model Optimization

#### System Analysis

```bash
$ ollama-bench --cli
# Select option 7

System Specifications
============================================================
Platform: Darwin (Apple Silicon)
CPU: 12 cores @ 3.2 GHz
RAM: 48.0 GB total, 32.0 GB available
GPU: Apple Silicon GPU (36.0 GB)

Optimal Parameters
============================================================
Context Size: 4096 tokens
Batch Size: 512
Threads: 11
GPU Layers: 999
```

#### Batch Optimization

```bash
# Optimize all models with parallel processing
$ python optimize_all.py --parallel --workers 4

# Optimize specific models
$ python optimize_all.py llama2:7b codellama:13b

# Generate benchmark comparison script
$ python optimize_all.py --benchmark

# Clean up when done
$ python optimize_all.py --cleanup
```

## Optimization Parameters

The optimizer automatically configures:

| Parameter | Description | Impact |
|-----------|-------------|--------|
| `num_ctx` | Context window size | Larger = better comprehension |
| `num_batch` | Batch processing size | Larger = higher throughput |
| `num_gpu` | GPU layers to offload | 999 = full GPU acceleration |
| `num_thread` | CPU threads | Optimized for core count |
| `use_mlock` | Memory locking | Prevents swapping |
| `use_mmap` | Memory mapping | Efficient for large models |

## Model Recommendations by RAM

| Available RAM | Model Size | Example Models |
|--------------|------------|----------------|
| < 8 GB | 3B-7B | qwen2.5:3b, tinyllama |
| 8-16 GB | 7B | llama2:7b, mistral:7b |
| 16-32 GB | 13B | llama2:13b, codellama:13b |
| 32-64 GB | 34B | codellama:34b |
| > 64 GB | 70B+ | llama2:70b, mixtral:8x7b |

## Benchmark Results

Results are saved in CSV format with detailed metrics:

```csv
model,iteration,elapsed_s,tokens_per_sec,peak_rss_bytes,cpu_percent,gpu_percent
qwen2.5-coder,1,1.234,42.3,7516192768,45.2,78.9
llama2:7b,1,2.456,38.1,8589934592,52.1,82.3
```

## Configuration

Configuration is stored in `~/.config/ollama_bench/config.yaml`:

```yaml
benchmark:
iterations: 3
timeout: 120
num_predict: 100
temperature: 0.7
seed: 42
workdir: ~/.ollama_bench

resources:
max_cpu_percent: 80
max_gpu_percent: 90
max_ram_gb: null
throttle_enabled: false

ui:
theme: default
refresh_rate: 0.5
show_graph: true
```

## Performance Improvements

Typical optimization results:
- **Speed**: 20-70% faster token generation
- **Memory**: 10-20% lower RAM usage
- **Stability**: Reduced out-of-memory errors
- **Efficiency**: Better CPU/GPU utilization

## Development

### Project Structure

```
ollama-bench/
β”œβ”€β”€ ollama_bench/ # Main package
β”‚ β”œβ”€β”€ core/ # Core functionality
β”‚ β”‚ β”œβ”€β”€ benchmark.py # Benchmarking engine
β”‚ β”‚ β”œβ”€β”€ models.py # Model management
β”‚ β”‚ β”œβ”€β”€ monitor.py # Resource monitoring
β”‚ β”‚ β”œβ”€β”€ config.py # Configuration
β”‚ β”‚ β”œβ”€β”€ system_optimizer.py # Hardware optimization
β”‚ β”‚ └── batch_optimizer.py # Batch processing
β”‚ β”œβ”€β”€ tui/ # Terminal UI
β”‚ β”‚ β”œβ”€β”€ app.py # Main TUI application
β”‚ β”‚ β”œβ”€β”€ components/ # UI components
β”‚ β”‚ └── widgets/ # Interactive widgets
β”‚ β”œβ”€β”€ cli.py # CLI interface
β”‚ └── utils/ # Utilities
β”œβ”€β”€ optimize_model.py # Single model optimizer
β”œβ”€β”€ optimize_all.py # Batch optimizer
└── setup.py # Package setup
```

### Testing

```bash
# Run tests
python test_optimization.py

# Test TUI import
python -c "from ollama_bench.tui import OllamaBenchTUI"

# Test CLI
python -m ollama_bench.cli
```

## Troubleshooting

### Terminal Issues
- If you see Unicode errors, the tool automatically falls back to ASCII
- For best results, use a terminal that supports UTF-8

### GPU Detection
- NVIDIA: Requires nvidia-ml-py
- Apple Silicon: Automatic Metal acceleration
- No GPU: Falls back to CPU-only optimization

### Memory Issues
- Reduce `num_ctx` for lower memory usage
- Enable `low_vram` mode for limited GPU memory
- Use quantized models (q4_0, q4_K_M)

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Run tests and benchmarks
4. Submit a pull request

## License

MIT License - see [LICENSE](LICENSE) file

## Acknowledgments

- Ollama team for the excellent local LLM platform
- Python curses library for terminal UI capabilities
- psutil for cross-platform system monitoring

## Changelog

### Version 2.0.0 (2024)
- Added Modelfile optimization based on system specs
- Implemented batch model optimization
- Added hardware detection and profiling
- Improved TUI with optimization features
- Added parallel processing support
- Fixed terminal compatibility issues

### Version 1.0.0 (2024)
- Initial release with TUI and CLI interfaces
- Basic benchmarking functionality
- Resource monitoring
- Model management

## Contact

**Author**: Jeremiah Pegues
**Email**: jeremiah@pegues.io
**GitHub**: [github.com/peguesj](https://github.com/peguesj)

---

*Built with ❀️ for the Ollama community*