{"id":28744262,"url":"https://github.com/ajcasagrande/llmshark","last_synced_at":"2025-06-16T11:39:12.003Z","repository":{"id":298640609,"uuid":"1000582892","full_name":"ajcasagrande/llmshark","owner":"ajcasagrande","description":"LLMShark: Comprehensive analysis tool for LLM streaming traffic from PCAP files","archived":false,"fork":false,"pushed_at":"2025-06-12T04:43:51.000Z","size":148,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-12T05:35:15.971Z","etag":null,"topics":["analysis","http","llm","pcap","sse","streaming","wireshark"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ajcasagrande.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-12T02:38:08.000Z","updated_at":"2025-06-12T04:43:55.000Z","dependencies_parsed_at":"2025-06-12T05:47:10.635Z","dependency_job_id":null,"html_url":"https://github.com/ajcasagrande/llmshark","commit_stats":null,"previous_names":["ajcasagrande/llmshark"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ajcasagrande/llmshark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcasagrande%2Fllmshark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcasagrande%2Fllmshark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcasagrande%2Fllmshark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcasagrande%2Fllmshark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ajcasagrande","download_url":"https://codeload.github.com/ajcasagrande/llmshark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ajcasagrande%2Fllmshark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260151879,"owners_count":22966595,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","http","llm","pcap","sse","streaming","wireshark"],"created_at":"2025-06-16T11:39:09.724Z","updated_at":"2025-06-16T11:39:11.985Z","avatar_url":"https://github.com/ajcasagrande.png","language":"Python","readme":"# 🦈 LLMShark\n\n**Comprehensive analysis tool for LLM streaming traffic from PCAP files**\n\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\nLLMShark is a powerful tool for analyzing Large Language Model (LLM) streaming traffic captured in PCAP files. It provides in-depth analysis of HTTP/SSE (Server-Sent Events) streaming sessions, extracting detailed timing statistics, detecting anomalies, and generating comprehensive reports.\n\n## ✨ Features\n\n### 🔍 **Deep Analysis**\n- **Time to First Token (TTFT)** analysis\n- **Inter-Token Latency (ITL)** measurement and statistics\n- HTTP session reconstruction from PCAP files\n- SSE chunk parsing and timing analysis\n- Throughput and performance metrics\n\n### 🚨 **Anomaly Detection**\n- Large timing gaps detection\n- Silence period identification\n- Statistical outlier detection\n- Pattern anomaly recognition\n- Configurable thresholds\n\n### 📊 **Comparison \u0026 Reporting**\n- Multi-capture comparison analysis\n- Performance ranking and scoring\n- Statistical significance testing\n- HTML and JSON report generation\n- Interactive visualizations (optional)\n\n### 🎨 **Beautiful CLI**\n- Rich terminal interface with colors and progress bars\n- Multiple output formats (console, JSON, HTML)\n- Batch processing capabilities\n- Verbose and quiet modes\n\n## 🚀 Installation\n\n### From PyPI (Recommended)\n```bash\npip install llmshark\n```\n\n### From Source\n```bash\ngit clone https://github.com/llmshark/llmshark.git\ncd llmshark\npip install -e .\n```\n\n### Development Installation\n```bash\ngit clone https://github.com/llmshark/llmshark.git\ncd llmshark\npip install -e \".[dev]\"\n```\n\n### With Visualization Support\n```bash\npip install \"llmshark[viz]\"\n```\n\n## 📋 Requirements\n\n- Python 3.10 or higher\n- Wireshark PCAP files containing HTTP/SSE traffic\n- Root privileges may be required for live packet capture\n\n### Dependencies\n- **Core**: `scapy`, `pydantic`, `rich`, `typer`, `numpy`, `pandas`, `scipy`\n- **Visualization**: `matplotlib`, `seaborn`, `plotly` (optional)\n- **Development**: `pytest`, `black`, `ruff`, `mypy` (optional)\n\n## 🎯 Quick Start\n\n### Basic Analysis\n```bash\n# Analyze a single PCAP file\nllmshark analyze capture.pcap\n\n# Analyze multiple files with detailed output\nllmshark analyze *.pcap --verbose\n\n# Save results to files\nllmshark analyze capture.pcap --output-dir ./results --format all\n```\n\n### Comparison Analysis\n```bash\n# Compare multiple captures\nllmshark analyze session1.pcap session2.pcap --compare\n\n# Batch process directory\nllmshark batch ./pcap_files/ --output-dir ./analysis_results\n```\n\n### Quick File Information\n```bash\n# Get PCAP file information without full analysis\nllmshark info capture.pcap\n```\n\n## 📖 Usage Examples\n\n### Single File Analysis\n```bash\nllmshark analyze llm_session.pcap --output-dir ./results --format html\n```\n\n### Multi-File Comparison\n```bash\nllmshark analyze before_optimization.pcap after_optimization.pcap \\\n  --compare --output-dir ./comparison --verbose\n```\n\n### Batch Processing\n```bash\nllmshark batch ./captures/ --output-dir ./analysis \\\n  --recursive --pattern \"*.pcap\"\n```\n\n### Custom Configuration\n```bash\nllmshark analyze capture.pcap \\\n  --detect-anomalies \\\n  --format json \\\n  --output-dir ./results \\\n  --verbose\n```\n\n## 🏗️ Architecture\n\nLLMShark is built with modern Python practices and consists of several key components:\n\n```\nllmshark/\n├── models.py          # Pydantic data models\n├── parser.py          # PCAP parsing and session extraction\n├── analyzer.py        # Statistical analysis and anomaly detection\n├── comparator.py      # Multi-capture comparison logic\n├── visualization.py   # Charts and HTML report generation\n└── cli.py            # Command-line interface\n```\n\n### Key Models\n- **StreamSession**: Complete HTTP streaming session\n- **StreamChunk**: Individual SSE data chunk\n- **TimingStats**: Comprehensive timing statistics\n- **AnalysisResult**: Complete analysis results\n- **ComparisonReport**: Multi-capture comparison results\n\n## 📊 Analysis Metrics\n\n### Timing Metrics\n- **TTFT (Time to First Token)**: Time from request to first response chunk\n- **ITL (Inter-Token Latency)**: Time between consecutive tokens\n- **Mean, Median, P95, P99**: Statistical distributions\n- **Throughput**: Tokens per second, bytes per second\n\n### Quality Metrics\n- **Consistency**: Variance and coefficient of variation\n- **Reliability**: Gap detection and silence periods\n- **Performance**: Comparative scoring across sessions\n\n### Anomaly Detection\n- **Large Gaps**: Configurable threshold for timing gaps\n- **Silence Periods**: Detection of inactive periods\n- **Statistical Outliers**: Z-score based outlier detection\n- **Pattern Analysis**: Unusual behavior identification\n\n## 🔧 Configuration\n\n### Environment Variables\n```bash\nexport LLMSHARK_LOG_LEVEL=INFO\nexport LLMSHARK_OUTPUT_DIR=./results\nexport LLMSHARK_ANOMALY_THRESHOLD=3.0\n```\n\n### Command Line Options\n```bash\nllmshark analyze --help\n```\n\n## 📈 Output Formats\n\n### Console Output\nRich terminal interface with:\n- Summary statistics tables\n- Performance insights\n- Anomaly warnings\n- Recommendations\n\n### JSON Output\n```json\n{\n  \"session_count\": 5,\n  \"total_tokens_analyzed\": 1250,\n  \"overall_timing_stats\": {\n    \"ttft_ms\": 245.6,\n    \"mean_itl_ms\": 67.8,\n    \"p95_itl_ms\": 124.5\n  },\n  \"key_insights\": [...],\n  \"recommendations\": [...]\n}\n```\n\n### HTML Reports\n- Interactive charts and graphs\n- Detailed session breakdowns\n- Comparison tables\n- Exportable results\n\n## 🧪 Testing\n\nRun the test suite:\n```bash\n# Run all tests\npytest\n\n# Run with coverage\npytest --cov=llmshark\n\n# Run only unit tests\npytest -m unit\n\n# Run only integration tests\npytest -m integration\n```\n\n## 🤝 Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n```bash\ngit clone https://github.com/llmshark/llmshark.git\ncd llmshark\npip install -e \".[dev]\"\npre-commit install\n```\n\n### Code Quality\n- **Code Formatting**: `black` and `ruff`\n- **Type Checking**: `mypy`\n- **Testing**: `pytest` with coverage\n- **Pre-commit Hooks**: Automated quality checks\n\n## 📝 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- **Scapy**: For powerful packet analysis capabilities\n- **Pydantic**: For robust data validation and modeling\n- **Rich**: For beautiful terminal interfaces\n- **Typer**: For excellent CLI framework\n\n## 📚 Documentation\n\n- [User Guide](docs/user-guide.md)\n- [API Reference](docs/api-reference.md)\n- [Examples](docs/examples.md)\n- [Troubleshooting](docs/troubleshooting.md)\n\n## 🐛 Bug Reports \u0026 Feature Requests\n\nPlease use the [GitHub Issues](https://github.com/llmshark/llmshark/issues) page to report bugs or request features.\n\n## 📊 Performance\n\nLLMShark is designed for efficiency:\n- Streams processing for large PCAP files\n- Memory-efficient chunk processing\n- Parallel analysis capabilities\n- Optimized for Python 3.10+ features\n\n## 🔮 Roadmap\n\n- [ ] Real-time capture analysis\n- [ ] WebUI dashboard\n- [ ] Plugin system for custom analyzers\n- [ ] Machine learning anomaly detection\n- [ ] Distributed analysis capabilities\n- [ ] Integration with monitoring systems\n\n---\n\n**Made with ❤️ for the LLM and networking communities**\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajcasagrande%2Fllmshark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fajcasagrande%2Fllmshark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fajcasagrande%2Fllmshark/lists"}