https://github.com/tom-doerr/llm_api_testing
https://github.com/tom-doerr/llm_api_testing
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/tom-doerr/llm_api_testing
- Owner: tom-doerr
- License: mit
- Created: 2025-01-11T02:57:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-06T19:21:06.000Z (over 1 year ago)
- Last Synced: 2025-06-19T10:15:39.045Z (12 months ago)
- Language: Python
- Size: 742 KB
- Stars: 12
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Deepseek Performance Monitoring
A performance monitoring solution for Deepseek API using LiteLLM
## Results

## 🚀 Features
- Measures response latency in milliseconds
- Calculates tokens processed per second
- Configurable monitoring intervals
- Comprehensive CSV logging
- Adjustable test duration
- Support for multiple Deepseek models
- Random prompt generation with configurable distribution
## 🛠️ Usage
1. Ensure LiteLLM is installed and configured
2. Set your Deepseek API key as an environment variable:
```bash
export DEEPSEEK_API_KEY='your_api_key_here'
```
3. Run the script with desired options:
```bash
# Basic usage with default settings
python3 deepseek_performance_monitor.py
# Run for 1 week with 30-second intervals
python3 deepseek_performance_monitor.py --duration 168 --interval 30
# Custom output file and increased reasoner model usage
python3 deepseek_performance_monitor.py --output custom_results.csv --reasoner-ratio 0.2
```
### Command-line Options
- `--duration`: Test duration in hours (default: 24)
- `--interval`: Time between requests in seconds (default: 60)
- `--output`: Output CSV file path (default: deepseek_performance.csv)
- `--reasoner-ratio`: Probability of using deepseek-reasoner model (default: 0.1)
## 📈 Output
The script creates a CSV file `deepseek_performance.csv` with the following columns:
- timestamp: Measurement time
- first_token_latency_ms: Time to first token in milliseconds
- total_latency_ms: Total response time in milliseconds
- tokens_per_second: Tokens processed per second
- completion_tokens: Total tokens in the response (completion tokens)
- prompt_tokens: Number of tokens in the prompt
## 📋 Example Output
### Performance Statistics
```plaintext
Average TPS: 45.96
Max TPS: 47.43
Min TPS: 44.67
Average First Token Latency: 1030.93 ms
Max First Token Latency: 1184.10 ms
Min First Token Latency: 794.79 ms
Average Total Latency: 17826.14 ms
Max Total Latency: 20675.07 ms
Min Total Latency: 14632.69 ms
Total Completion Tokens Processed: 10657
Total Requests: 13
```
```
2025-01-11 04:02:02 - Latency: 1550.53ms, TPS: 9.67, Tokens: 15
2025-01-11 04:03:04 - Latency: 1317.85ms, TPS: 11.38, Tokens: 15
2025-01-11 04:04:05 - Latency: 1375.23ms, TPS: 10.91, Tokens: 15
```
## 📦 Requirements
- Python 3
- LiteLLM
- Deepseek API key
## 📝 Notes
- Errors are logged to console and CSV but don't stop execution
- Results are saved to CSV for later analysis
- Supports both deepseek-chat and deepseek-reasoner models
- Random prompt generation ensures diverse testing scenarios
- Error rate tracking excludes context size exceeded errors