https://github.com/ajcasagrande/llmshark
LLMShark: Comprehensive analysis tool for LLM streaming traffic from PCAP files
https://github.com/ajcasagrande/llmshark
analysis http llm pcap sse streaming wireshark
Last synced: 10 months ago
JSON representation
LLMShark: Comprehensive analysis tool for LLM streaming traffic from PCAP files
- Host: GitHub
- URL: https://github.com/ajcasagrande/llmshark
- Owner: ajcasagrande
- License: mit
- Created: 2025-06-12T02:38:08.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-06-12T04:43:51.000Z (10 months ago)
- Last Synced: 2025-06-12T05:35:15.971Z (10 months ago)
- Topics: analysis, http, llm, pcap, sse, streaming, wireshark
- Language: Python
- Homepage:
- Size: 145 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ฆ LLMShark
**Comprehensive analysis tool for LLM streaming traffic from PCAP files**
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/psf/black)
[](https://github.com/astral-sh/ruff)
LLMShark is a powerful tool for analyzing Large Language Model (LLM) streaming traffic captured in PCAP files. It provides in-depth analysis of HTTP/SSE (Server-Sent Events) streaming sessions, extracting detailed timing statistics, detecting anomalies, and generating comprehensive reports.
## โจ Features
### ๐ **Deep Analysis**
- **Time to First Token (TTFT)** analysis
- **Inter-Token Latency (ITL)** measurement and statistics
- HTTP session reconstruction from PCAP files
- SSE chunk parsing and timing analysis
- Throughput and performance metrics
### ๐จ **Anomaly Detection**
- Large timing gaps detection
- Silence period identification
- Statistical outlier detection
- Pattern anomaly recognition
- Configurable thresholds
### ๐ **Comparison & Reporting**
- Multi-capture comparison analysis
- Performance ranking and scoring
- Statistical significance testing
- HTML and JSON report generation
- Interactive visualizations (optional)
### ๐จ **Beautiful CLI**
- Rich terminal interface with colors and progress bars
- Multiple output formats (console, JSON, HTML)
- Batch processing capabilities
- Verbose and quiet modes
## ๐ Installation
### From PyPI (Recommended)
```bash
pip install llmshark
```
### From Source
```bash
git clone https://github.com/llmshark/llmshark.git
cd llmshark
pip install -e .
```
### Development Installation
```bash
git clone https://github.com/llmshark/llmshark.git
cd llmshark
pip install -e ".[dev]"
```
### With Visualization Support
```bash
pip install "llmshark[viz]"
```
## ๐ Requirements
- Python 3.10 or higher
- Wireshark PCAP files containing HTTP/SSE traffic
- Root privileges may be required for live packet capture
### Dependencies
- **Core**: `scapy`, `pydantic`, `rich`, `typer`, `numpy`, `pandas`, `scipy`
- **Visualization**: `matplotlib`, `seaborn`, `plotly` (optional)
- **Development**: `pytest`, `black`, `ruff`, `mypy` (optional)
## ๐ฏ Quick Start
### Basic Analysis
```bash
# Analyze a single PCAP file
llmshark analyze capture.pcap
# Analyze multiple files with detailed output
llmshark analyze *.pcap --verbose
# Save results to files
llmshark analyze capture.pcap --output-dir ./results --format all
```
### Comparison Analysis
```bash
# Compare multiple captures
llmshark analyze session1.pcap session2.pcap --compare
# Batch process directory
llmshark batch ./pcap_files/ --output-dir ./analysis_results
```
### Quick File Information
```bash
# Get PCAP file information without full analysis
llmshark info capture.pcap
```
## ๐ Usage Examples
### Single File Analysis
```bash
llmshark analyze llm_session.pcap --output-dir ./results --format html
```
### Multi-File Comparison
```bash
llmshark analyze before_optimization.pcap after_optimization.pcap \
--compare --output-dir ./comparison --verbose
```
### Batch Processing
```bash
llmshark batch ./captures/ --output-dir ./analysis \
--recursive --pattern "*.pcap"
```
### Custom Configuration
```bash
llmshark analyze capture.pcap \
--detect-anomalies \
--format json \
--output-dir ./results \
--verbose
```
## ๐๏ธ Architecture
LLMShark is built with modern Python practices and consists of several key components:
```
llmshark/
โโโ models.py # Pydantic data models
โโโ parser.py # PCAP parsing and session extraction
โโโ analyzer.py # Statistical analysis and anomaly detection
โโโ comparator.py # Multi-capture comparison logic
โโโ visualization.py # Charts and HTML report generation
โโโ cli.py # Command-line interface
```
### Key Models
- **StreamSession**: Complete HTTP streaming session
- **StreamChunk**: Individual SSE data chunk
- **TimingStats**: Comprehensive timing statistics
- **AnalysisResult**: Complete analysis results
- **ComparisonReport**: Multi-capture comparison results
## ๐ Analysis Metrics
### Timing Metrics
- **TTFT (Time to First Token)**: Time from request to first response chunk
- **ITL (Inter-Token Latency)**: Time between consecutive tokens
- **Mean, Median, P95, P99**: Statistical distributions
- **Throughput**: Tokens per second, bytes per second
### Quality Metrics
- **Consistency**: Variance and coefficient of variation
- **Reliability**: Gap detection and silence periods
- **Performance**: Comparative scoring across sessions
### Anomaly Detection
- **Large Gaps**: Configurable threshold for timing gaps
- **Silence Periods**: Detection of inactive periods
- **Statistical Outliers**: Z-score based outlier detection
- **Pattern Analysis**: Unusual behavior identification
## ๐ง Configuration
### Environment Variables
```bash
export LLMSHARK_LOG_LEVEL=INFO
export LLMSHARK_OUTPUT_DIR=./results
export LLMSHARK_ANOMALY_THRESHOLD=3.0
```
### Command Line Options
```bash
llmshark analyze --help
```
## ๐ Output Formats
### Console Output
Rich terminal interface with:
- Summary statistics tables
- Performance insights
- Anomaly warnings
- Recommendations
### JSON Output
```json
{
"session_count": 5,
"total_tokens_analyzed": 1250,
"overall_timing_stats": {
"ttft_ms": 245.6,
"mean_itl_ms": 67.8,
"p95_itl_ms": 124.5
},
"key_insights": [...],
"recommendations": [...]
}
```
### HTML Reports
- Interactive charts and graphs
- Detailed session breakdowns
- Comparison tables
- Exportable results
## ๐งช Testing
Run the test suite:
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=llmshark
# Run only unit tests
pytest -m unit
# Run only integration tests
pytest -m integration
```
## ๐ค Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
```bash
git clone https://github.com/llmshark/llmshark.git
cd llmshark
pip install -e ".[dev]"
pre-commit install
```
### Code Quality
- **Code Formatting**: `black` and `ruff`
- **Type Checking**: `mypy`
- **Testing**: `pytest` with coverage
- **Pre-commit Hooks**: Automated quality checks
## ๐ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## ๐ Acknowledgments
- **Scapy**: For powerful packet analysis capabilities
- **Pydantic**: For robust data validation and modeling
- **Rich**: For beautiful terminal interfaces
- **Typer**: For excellent CLI framework
## ๐ Documentation
- [User Guide](docs/user-guide.md)
- [API Reference](docs/api-reference.md)
- [Examples](docs/examples.md)
- [Troubleshooting](docs/troubleshooting.md)
## ๐ Bug Reports & Feature Requests
Please use the [GitHub Issues](https://github.com/llmshark/llmshark/issues) page to report bugs or request features.
## ๐ Performance
LLMShark is designed for efficiency:
- Streams processing for large PCAP files
- Memory-efficient chunk processing
- Parallel analysis capabilities
- Optimized for Python 3.10+ features
## ๐ฎ Roadmap
- [ ] Real-time capture analysis
- [ ] WebUI dashboard
- [ ] Plugin system for custom analyzers
- [ ] Machine learning anomaly detection
- [ ] Distributed analysis capabilities
- [ ] Integration with monitoring systems
---
**Made with โค๏ธ for the LLM and networking communities**