An open API service indexing awesome lists of open source software.

https://github.com/tom-doerr/llm_api_testing


https://github.com/tom-doerr/llm_api_testing

Last synced: 9 months ago
JSON representation

Awesome Lists containing this project

README

          

Deepseek Performance Monitoring


Repository's stars
Issues
License


Latest commit
GitHub repository size
PyPI version
Python Version


A performance monitoring solution for Deepseek API using LiteLLM

## Results
![Performance Plot](performance_results/performance_plot.png)

## 🚀 Features
- Measures response latency in milliseconds
- Calculates tokens processed per second
- Configurable monitoring intervals
- Comprehensive CSV logging
- Adjustable test duration
- Support for multiple Deepseek models
- Random prompt generation with configurable distribution

## 🛠️ Usage

1. Ensure LiteLLM is installed and configured
2. Set your Deepseek API key as an environment variable:
```bash
export DEEPSEEK_API_KEY='your_api_key_here'
```

3. Run the script with desired options:
```bash
# Basic usage with default settings
python3 deepseek_performance_monitor.py

# Run for 1 week with 30-second intervals
python3 deepseek_performance_monitor.py --duration 168 --interval 30

# Custom output file and increased reasoner model usage
python3 deepseek_performance_monitor.py --output custom_results.csv --reasoner-ratio 0.2
```

### Command-line Options
- `--duration`: Test duration in hours (default: 24)
- `--interval`: Time between requests in seconds (default: 60)
- `--output`: Output CSV file path (default: deepseek_performance.csv)
- `--reasoner-ratio`: Probability of using deepseek-reasoner model (default: 0.1)

## 📈 Output

The script creates a CSV file `deepseek_performance.csv` with the following columns:
- timestamp: Measurement time
- first_token_latency_ms: Time to first token in milliseconds
- total_latency_ms: Total response time in milliseconds
- tokens_per_second: Tokens processed per second
- completion_tokens: Total tokens in the response (completion tokens)
- prompt_tokens: Number of tokens in the prompt

## 📋 Example Output

### Performance Statistics
```plaintext
Average TPS: 45.96
Max TPS: 47.43
Min TPS: 44.67

Average First Token Latency: 1030.93 ms
Max First Token Latency: 1184.10 ms
Min First Token Latency: 794.79 ms

Average Total Latency: 17826.14 ms
Max Total Latency: 20675.07 ms
Min Total Latency: 14632.69 ms

Total Completion Tokens Processed: 10657
Total Requests: 13
```
```
2025-01-11 04:02:02 - Latency: 1550.53ms, TPS: 9.67, Tokens: 15
2025-01-11 04:03:04 - Latency: 1317.85ms, TPS: 11.38, Tokens: 15
2025-01-11 04:04:05 - Latency: 1375.23ms, TPS: 10.91, Tokens: 15
```

## 📦 Requirements
- Python 3
- LiteLLM
- Deepseek API key

## 📝 Notes
- Errors are logged to console and CSV but don't stop execution
- Results are saved to CSV for later analysis
- Supports both deepseek-chat and deepseek-reasoner models
- Random prompt generation ensures diverse testing scenarios
- Error rate tracking excludes context size exceeded errors