https://github.com/peguesj/ollama-bench

Comprehensive benchmarking and optimization tool for Ollama models with TUI and CLI interfaces
https://github.com/peguesj/ollama-bench
benchmark cli modelfile ollama optimization performance python tui
Last synced: about 1 month ago
JSON representation
Comprehensive benchmarking and optimization tool for Ollama models with TUI and CLI interfaces
Host: GitHub
URL: https://github.com/peguesj/ollama-bench
Owner: peguesj
License: mit
Created: 2025-09-18T02:03:06.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-09-18T02:03:47.000Z (9 months ago)
Last Synced: 2025-10-07T04:57:40.939Z (8 months ago)
Topics: benchmark, cli, modelfile, ollama, optimization, performance, python, tui
Language: Python
Homepage: https://github.com/peguesj/ollama-bench
Size: 273 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Ollama Bench

[![Version](https://img.shields.io/badge/version-2.0.0-blue.svg)](https://github.com/peguesj/ollama-bench)

[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

[![Python](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org)

A comprehensive benchmarking and optimization tool for Ollama models with both Terminal User Interface (TUI) and Command-Line Interface (CLI) modes.

## Author

**Jeremiah Pegues** 

## Features

### 🚀 Core Features

- **Dual Interface**: Full-featured TUI with real-time graphs or simple CLI mode

- **Model Benchmarking**: Compare performance across multiple Ollama models

- **System Optimization**: Automatically tune models for your hardware

- **Resource Monitoring**: Real-time CPU, GPU, and RAM usage tracking

- **Batch Processing**: Optimize multiple models in parallel

- **Export Results**: Save benchmark data in CSV format

### 🎯 Version 2.0 Features

- **Modelfile Optimization**: Generate optimized configurations based on system specs

- **Batch Model Optimization**: Optimize all models with a single command

- **Hardware Detection**: Automatic detection of CPU, RAM, and GPU capabilities

- **Platform-Specific Tuning**: Special optimizations for Apple Silicon

- **Performance Profiling**: Detailed metrics including tokens/sec and memory usage

## Installation

### From Source

```bash

git clone https://github.com/peguesj/ollama-bench.git

cd ollama-bench

pip install -e .

```

### Dependencies

```bash

pip install psutil pynvml py-cpuinfo

```

## Quick Start

### TUI Mode (Interactive)

```bash

ollama-bench

# or

python -m ollama_bench

```

### CLI Mode (Non-Interactive)

```bash

ollama-bench --cli

```

### Direct Scripts

```bash

# Optimize a single model

python optimize_model.py llama2

# Optimize all models

python optimize_all.py --parallel

# Clean up optimized models

python optimize_all.py --cleanup

```

## Usage Guide

### TUI Interface

The TUI provides a rich interactive experience with:

```

┌─────────────────────────────────────────────────────────┐

│                    Ollama Bench v2.0.0                  │

├─────────────────┬───────────────────────────────────────┤

│  === Models === │      Benchmark Results                │

│  qwen2.5-coder  │  Model: qwen2.5-coder                │

│  llama2:7b      │  Tokens/sec: 42.3                    │

│  codellama:34b  │  Time: 1.2s                          │

│                 │  Peak RAM: 7.2 GB                    │

│  === Actions ===│                                       │

│> Run Benchmark  │  ┌─Performance Graph──────┐           │

│  Configuration  │  │ ████████████████      │           │

│  Optimize Model │  │ CPU: 45% GPU: 80%     │           │

│  Export Results │  └───────────────────────┘           │

├─────────────────┴───────────────────────────────────────┤

│ [Up/Down] Navigate  [Enter] Select  [O] Optimize  [Q] Quit │

├─────────────────────────────────────────────────────────┤

│ Ready                                    CPU: 12% RAM: 8GB │

└─────────────────────────────────────────────────────────┘

```

#### Keyboard Shortcuts

- **Arrow Keys**: Navigate menu

- **Enter**: Select menu item

- **Space**: Start/stop benchmark

- **O**: Optimize selected model (or all if none selected)

- **E**: Edit configuration

- **M**: Edit Modelfile

- **X**: Export results

- **Q**: Quit

### CLI Interface

The CLI provides a simple menu-driven interface:

```bash

$ ollama-bench --cli

============================================================

Ollama Bench CLI - Benchmarking Tool

============================================================

=== Main Menu ===

1. Run Benchmark

2. List Models

3. Show Configuration

4. Export Results

5. Optimize Single Model

6. Optimize All Models

7. Show System Info

8. Clean Optimized Models

Q. Quit

Enter choice: 

```

### Model Optimization

#### System Analysis

```bash

$ ollama-bench --cli

# Select option 7

System Specifications

============================================================

Platform: Darwin (Apple Silicon)

CPU: 12 cores @ 3.2 GHz

RAM: 48.0 GB total, 32.0 GB available

GPU: Apple Silicon GPU (36.0 GB)

Optimal Parameters

============================================================

Context Size: 4096 tokens

Batch Size: 512

Threads: 11

GPU Layers: 999

```

#### Batch Optimization

```bash

# Optimize all models with parallel processing

$ python optimize_all.py --parallel --workers 4

# Optimize specific models

$ python optimize_all.py llama2:7b codellama:13b

# Generate benchmark comparison script

$ python optimize_all.py --benchmark

# Clean up when done

$ python optimize_all.py --cleanup

```

## Optimization Parameters

The optimizer automatically configures:

| Parameter | Description | Impact |

|-----------|-------------|--------|

| `num_ctx` | Context window size | Larger = better comprehension |

| `num_batch` | Batch processing size | Larger = higher throughput |

| `num_gpu` | GPU layers to offload | 999 = full GPU acceleration |

| `num_thread` | CPU threads | Optimized for core count |

| `use_mlock` | Memory locking | Prevents swapping |

| `use_mmap` | Memory mapping | Efficient for large models |

## Model Recommendations by RAM

| Available RAM | Model Size | Example Models |

|--------------|------------|----------------|

| < 8 GB | 3B-7B | qwen2.5:3b, tinyllama |

| 8-16 GB | 7B | llama2:7b, mistral:7b |

| 16-32 GB | 13B | llama2:13b, codellama:13b |

| 32-64 GB | 34B | codellama:34b |

| > 64 GB | 70B+ | llama2:70b, mixtral:8x7b |

## Benchmark Results

Results are saved in CSV format with detailed metrics:

```csv

model,iteration,elapsed_s,tokens_per_sec,peak_rss_bytes,cpu_percent,gpu_percent

qwen2.5-coder,1,1.234,42.3,7516192768,45.2,78.9

llama2:7b,1,2.456,38.1,8589934592,52.1,82.3

```

## Configuration

Configuration is stored in `~/.config/ollama_bench/config.yaml`:

```yaml

benchmark:

  iterations: 3

  timeout: 120

  num_predict: 100

  temperature: 0.7

  seed: 42

  workdir: ~/.ollama_bench

resources:

  max_cpu_percent: 80

  max_gpu_percent: 90

  max_ram_gb: null

  throttle_enabled: false

ui:

  theme: default

  refresh_rate: 0.5

  show_graph: true

```

## Performance Improvements

Typical optimization results:

- **Speed**: 20-70% faster token generation

- **Memory**: 10-20% lower RAM usage  

- **Stability**: Reduced out-of-memory errors

- **Efficiency**: Better CPU/GPU utilization

## Development

### Project Structure

```

ollama-bench/

├── ollama_bench/           # Main package

│   ├── core/              # Core functionality

│   │   ├── benchmark.py   # Benchmarking engine

│   │   ├── models.py      # Model management

│   │   ├── monitor.py     # Resource monitoring

│   │   ├── config.py      # Configuration

│   │   ├── system_optimizer.py  # Hardware optimization

│   │   └── batch_optimizer.py   # Batch processing

│   ├── tui/               # Terminal UI

│   │   ├── app.py        # Main TUI application

│   │   ├── components/   # UI components

│   │   └── widgets/      # Interactive widgets

│   ├── cli.py            # CLI interface

│   └── utils/            # Utilities

├── optimize_model.py      # Single model optimizer

├── optimize_all.py        # Batch optimizer

└── setup.py              # Package setup

```

### Testing

```bash

# Run tests

python test_optimization.py

# Test TUI import

python -c "from ollama_bench.tui import OllamaBenchTUI"

# Test CLI

python -m ollama_bench.cli

```

## Troubleshooting

### Terminal Issues

- If you see Unicode errors, the tool automatically falls back to ASCII

- For best results, use a terminal that supports UTF-8

### GPU Detection

- NVIDIA: Requires nvidia-ml-py

- Apple Silicon: Automatic Metal acceleration

- No GPU: Falls back to CPU-only optimization

### Memory Issues

- Reduce `num_ctx` for lower memory usage

- Enable `low_vram` mode for limited GPU memory

- Use quantized models (q4_0, q4_K_M)

## Contributing

Contributions are welcome! Please:

1. Fork the repository

2. Create a feature branch

3. Run tests and benchmarks

4. Submit a pull request

## License

MIT License - see [LICENSE](LICENSE) file

## Acknowledgments

- Ollama team for the excellent local LLM platform

- Python curses library for terminal UI capabilities

- psutil for cross-platform system monitoring

## Changelog

### Version 2.0.0 (2024)

- Added Modelfile optimization based on system specs

- Implemented batch model optimization

- Added hardware detection and profiling

- Improved TUI with optimization features

- Added parallel processing support

- Fixed terminal compatibility issues

### Version 1.0.0 (2024)

- Initial release with TUI and CLI interfaces

- Basic benchmarking functionality

- Resource monitoring

- Model management

## Contact

**Author**: Jeremiah Pegues  

**Email**: jeremiah@pegues.io  

**GitHub**: [github.com/peguesj](https://github.com/peguesj)

---

*Built with ❤️ for the Ollama community*
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/peguesj/ollama-bench

Awesome Lists containing this project

README