{"id":28815373,"url":"https://github.com/wronai/allama","last_synced_at":"2026-04-01T17:18:02.308Z","repository":{"id":297221232,"uuid":"996034889","full_name":"wronai/allama","owner":"wronai","description":"testing and benchmarking suite for Large Language Models (LLMs) focused on Python code generation. The project enables automatic quality assessment of generated code through various metrics and generates detailed HTML reports.","archived":false,"fork":false,"pushed_at":"2025-06-11T14:21:19.000Z","size":145,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-28T00:37:15.823Z","etag":null,"topics":["benchmark","chatbot","cline","ide","llm","ollama","openapi","plugin","windsurf"],"latest_commit_sha":null,"homepage":"https://wronai.github.io/allama/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wronai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-04T11:10:49.000Z","updated_at":"2025-06-11T14:21:22.000Z","dependencies_parsed_at":"2025-06-04T19:12:50.617Z","dependency_job_id":"86f9fea1-9324-4e18-a60c-a302aaddc6f3","html_url":"https://github.com/wronai/allama","commit_stats":null,"previous_names":["wronai/allama"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/wronai/allama","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fallama","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fallama/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fallama/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fallama/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wronai","download_url":"https://codeload.github.com/wronai/allama/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wronai%2Fallama/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31290537,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-01T13:12:26.723Z","status":"ssl_error","status_checked_at":"2026-04-01T13:12:25.102Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","chatbot","cline","ide","llm","ollama","openapi","plugin","windsurf"],"created_at":"2025-06-18T16:08:44.725Z","updated_at":"2026-04-01T17:18:02.284Z","avatar_url":"https://github.com/wronai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![allama-logo.svg](allama-logo.svg)\n\n# Allama - LLM Testing and Benchmarking Suite \n\nA comprehensive testing and benchmarking suite for Large Language Models (LLMs) focused on Python code generation. The project enables automatic quality assessment of generated code through various metrics and generates detailed HTML reports.\n\n## Features\n\n- **Automated Testing** of multiple LLM models with configurable prompts\n- **Code Quality Assessment** - syntax checking, execution, style, and functionality\n- **Detailed HTML Reports** with metrics, charts, and comparisons\n- **Interactive Code Diff** - visual comparison of code generated by different models\n- **Results Export** to CSV and JSON for further analysis\n- **Highly Configurable** - easily add new models and tests\n- **Multiple API Support** - Ollama, local servers, cloud services\n- **Model Ranking** based on performance and quality metrics\n- **Zero Configuration** - automatically generates default config files when needed\n- **Benchmark Publishing** - share your results with the community via the Allama server\n- **Local Results Storage** - automatically saves results in timestamped folders\n- **Prompt Analysis** - detailed information about prompts used in benchmarks\n- **Radar Charts** - visual comparison of model performance across multiple metrics\n\n## Quick Start\n![img.png](img.png)\n### 1. Installation\n\n#### Using Poetry (recommended)\n```bash\n# Clone the repository\ngit clone https://github.com/wronai/allama.git\ncd allama\n\n# Install dependencies\npip install poetry\npoetry install\n\n# Activate the virtual environment\npoetry shell\n```\n\n#### Using pip\n```bash\n# Clone the repository\ngit clone https://github.com/wronai/allama.git\ncd allama\n\npip install .\n```\n\n### 2. Model Configuration\n\nCreate or edit the `models.csv` file to configure your models:\n\n```csv\nmodel_name,url,auth_header,auth_value,think,description\nmistral:latest,http://localhost:11434/api/chat,,,false,Mistral Latest on Ollama\nllama3:8b,http://localhost:11434/api/chat,,,false,Llama 3 8B\ngpt-4,https://api.openai.com/v1/chat/completions,Authorization,Bearer sk-...,false,OpenAI GPT-4\n```\n\n**CSV Columns:**\n- `model_name` - Name of the model (e.g., mistral:latest, gpt-4)\n- `url` - API endpoint URL\n- `auth_header` - Authorization header (if required, e.g., \"Authorization\")\n- `auth_value` - Authorization value (e.g., \"Bearer your-api-key\")\n- `think` - Whether the model supports \"think\" parameter (true/false)\n- `description` - Description of the model\n\n## Configuration\n\nThe application is configured using external files, with `config.json` being the primary configuration file.\n\n### Main Configuration (`config.json`)\n\nThis file, located in the root directory, contains all the main settings for the application:\n\n- **`prompts_file`**: Path to the file containing test prompts (e.g., `prompts.json`).\n- **`evaluation_weights`**: Points awarded for different code quality metrics.\n- **`timeouts`**: Time limits for API requests and code execution.\n- **`report_config`**: Settings for the generated HTML report, such as the title.\n- **`colors`**: Color scheme used in the HTML report.\n\nYou can create your own configuration file (e.g., `my_config.yaml`) and use it with the `--config` flag during runtime.\n\n### Prompts Configuration (`prompts.json`)\n\nThis file contains a list of test cases (prompts) that will be sent to the language models. Each prompt is a JSON object with the following keys:\n\n- **`name`**: A descriptive name for the test (e.g., \"Simple Addition Function\").\n- **`prompt`**: The full text of the prompt to be sent to the model.\n- **`expected_keywords`**: A list of keywords that are expected to be present in the generated code.\n\n### Automatic Configuration Generation\n\nThe system will automatically generate default configuration files (`config.json` and `prompts.json`) if they don't exist when you run the tool. This means you can simply run the `allama` command without any setup, and the necessary configuration files will be created for you with sensible defaults.\n\n### Using Custom Configuration\n\nYou can run tests with a custom configuration file (in either JSON or YAML format) using the `--config` or `-c` flag. The settings from your custom file will be merged with the defaults.\n\n**Example with JSON:**\n```bash\nallama --config my_config.json\n```\n\n**Example with YAML:**\n```bash\nallama --config custom_settings.yaml\n```\n\n## Reports and Output\n\nAllama generates comprehensive reports to help you analyze and compare model performance:\n\n### HTML Report (`allama.html`)\n\nAn interactive HTML report is generated after each test run, containing:\n\n- **Summary Dashboard** - Overview of test results with key metrics\n- **Model Ranking** - Performance comparison of all tested models\n- **Detailed Results** - In-depth analysis of each model's performance\n- **Code Comparison** - Interactive diff viewer to compare code generated by different models\n\n### JSON Data (`allama.json`)\n\nAll test results are also saved in a structured JSON format for:\n- Further analysis with external tools\n- Integration with other systems\n- Custom visualization and reporting\n\nThe JSON file contains complete information about:\n- Test configuration\n- Model responses\n- Evaluation metrics\n- Generated code\n\n### CSV Summary (`*_summary.csv`)\n\nA CSV summary file is also generated with key metrics for quick analysis in spreadsheet applications.\n\n### Example Report Usage\n\nThe HTML report allows you to:\n1. View overall model rankings\n2. Examine detailed results for each model and prompt\n3. Compare code generated by different models using the interactive diff tool\n4. Filter and sort results based on various metrics\n\nTo view the report, simply open `allama.html` in any modern web browser after running tests.\n\n### Publishing Results Online\n\nAllama allows you to publish your benchmark results to a central repository at `allama.sapletta.com`, making it easy to share and compare results with others:\n\n```bash\n# Run benchmark and publish results\nallama --benchmark --publish\n\n# Specify a custom server URL\nallama --benchmark --publish --server-url https://your-server.com/upload.php\n```\n\nThe publishing system includes:\n- **Rate limiting**: Maximum 3 uploads per day per IP address\n- **Request throttling**: Minimum 1 second between requests\n- **Automatic organization**: Results are stored in timestamped directories\n- **Web interface**: Browse and compare published benchmarks\n- **Responsive design**: Optimized for both desktop and mobile devices\n- **Radar charts**: Visual comparison of model performance across multiple metrics\n- **Badge-style metrics**: Quick overview of key benchmark statistics\n\nAfter publishing, you'll receive a URL where you can view your results online.\n\n### Local Results Storage\n\nAll benchmark results are automatically saved locally in a timestamped folder structure:\n```\ndata/\n└── test_YYYYMMDD_HHMMSS/\n    ├── allama.json       # Complete benchmark results\n    ├── allama.html       # HTML report\n    └── prompts.json      # Detailed prompt information\n```\n\nThis allows you to:\n- Keep a history of all benchmark runs\n- Compare results over time\n- Share specific benchmark results with others\n\n### Benchmark Visualization\n\nThe benchmark server provides several visualization features:\n- **Responsive 3-column layout**: Displays benchmarks in an easy-to-scan grid (collapses to single column on mobile)\n- **Radar charts**: Each benchmark includes a radar chart showing model performance across 6 key metrics:\n  - Success rate\n  - Response speed\n  - Syntax correctness\n  - Execution success\n  - Keywords presence\n  - Code quality\n- **Badge-style metrics**: Key statistics displayed as GitHub-style badges for quick reference\n- **Model comparison**: Easy visual comparison of multiple models within each benchmark\n\n## Usage\n\n### Using Makefile (recommended)\n```bash\n# Install dependencies and setup\nmake install\n\n# Run tests\nmake test\n\n# Run all tests including end-to-end\nmake test-all\n\n# Run benchmark suite\nmake benchmark\n\n# Test a single model (set MODEL=name)\nmake single-model\n\n# Generate HTML report\nmake report\n\n# Run code formatters\nmake format\n\n# Run linters\nmake lint\n```\n\n### Basic Command-Line Usage\n```bash\n# Run all tests with default configuration\nallama\n\n# Run benchmark suite\nallama --benchmark\n\n# Test specific models\nallama --models \"mistral:latest,llama3:8b,gemma2:2b\"\n\n# Test a single model\nallama --single-model \"mistral:latest\"\n\n# Compare specific models\nallama --compare \"mistral:latest\" \"llama3:8b\"\n\n# Generate HTML report\nallama --output benchmark_report.html\n\n# Run with verbose output\nallama --verbose\n```\n\n### Advanced Usage\n```bash\n# Run with custom configuration\nallama --config custom_config.json\n\n# Test with a specific prompt\nallama --single-model \"mistral:latest\" --prompt-index 0\n\n# Set request timeout (in seconds)\nallama --timeout 60\n```\n\n## Evaluation Metrics\n\nThe system evaluates generated code based on the following criteria:\n\n### Basic Metrics (automatic)\n- Correct Syntax - whether the code compiles without errors\n- Executability - whether the code runs without runtime errors\n- Keyword Matching - whether the code contains expected elements from the prompt\n\n### Code Quality Metrics\n- Function/Class Definitions - proper code structure\n- Error Handling - try/except blocks, input validation\n- Documentation - docstrings, comments\n- Imports - proper library usage\n- Code Length - reasonable number of lines\n\n### Scoring System\n- Correct Syntax: **3 points**\n- Runs without errors: **2 points**\n- Contains expected elements: **2 points**\n- Has function/class definitions: **1 point**\n- Has error handling: **1 point**\n- Has documentation: **1 point**\n- **Maximum: 10 points**\n\n## Ansible Configuration\n\nCreate `tests/ansible/inventory.ini` with:\n\n```ini\n[all]\nlocalhost ansible_connection=local\n```\n\n## API Integration Examples\n\n### Ollama (local)\n```csv\nllama3:8b,http://localhost:11434/api/chat,,,false,Llama 3 8B\n```\n\n### OpenAI API\n```csv\ngpt-4,https://api.openai.com/v1/chat/completions,Authorization,Bearer sk-your-key,false,OpenAI GPT-4\n```\n\n### Anthropic Claude\n```csv\nclaude-3,https://api.anthropic.com/v1/messages,x-api-key,your-key,false,Claude 3\n```\n\n### Local Server\n```csv\nlocal-model,http://localhost:8080/generate,,,false,Local Model\n```\n\n## Project Structure\n\n```\nallama/\n├── allama/               # Main package\n│   ├── __init__.py      # Package initialization\n│   ├── main.py          # Main module\n│   ├── config_loader.py # Configuration loading and generation\n│   └── runner.py        # Test runner implementation\n├── tests/               # Test files\n│   └── test_allama.py   # Unit tests\n├── models.csv           # Model configurations\n├── config.json          # Main configuration (auto-generated if missing)\n├── prompts.json         # Test prompts (auto-generated if missing)\n├── pyproject.toml       # Project metadata and dependencies\n├── Makefile             # Common tasks\n└── README.md            # This file\n```\n\n## Example Output\n\nAfter running the benchmark, you'll get:\n\n1. **Console Output**: Summary of test results\n2. **HTML Report**: Detailed report with code examples and metrics\n3. **CSV/JSON**: Raw data for further analysis\n\n## Getting Help\n\nIf you encounter any issues or have questions:\n\n1. Check the [issues](https://github.com/wronai/allama/issues) page\n2. Create a new issue with detailed information about your problem\n\n## Contributing\n\nContributions are welcome! Please read our [Contributing Guidelines](CONTRIBUTING.md) for details on how to contribute to this project.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- Thanks to all the open-source projects that made this possible\n- Special thanks to the Ollama team for their amazing work\n\n---\n\n\u003cdiv align=\"center\"\u003e\n  Made with ❤️ by the Allama team\n\u003c/div\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwronai%2Fallama","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwronai%2Fallama","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwronai%2Fallama/lists"}