{"id":50472334,"url":"https://github.com/semcod/qualbench","last_synced_at":"2026-06-01T11:03:10.520Z","repository":{"id":349196653,"uuid":"1201398400","full_name":"semcod/qualbench","owner":"semcod","description":null,"archived":false,"fork":false,"pushed_at":"2026-04-04T20:06:19.000Z","size":296,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-04T20:11:00.219Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/semcod.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-04T16:15:57.000Z","updated_at":"2026-04-04T20:06:24.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/semcod/qualbench","commit_stats":null,"previous_names":["semcod/qualbench"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/semcod/qualbench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fqualbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fqualbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fqualbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fqualbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/semcod","download_url":"https://codeload.github.com/semcod/qualbench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/semcod%2Fqualbench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33771630,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-01T11:03:09.605Z","updated_at":"2026-06-01T11:03:10.507Z","avatar_url":"https://github.com/semcod.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# QualBench — CI for AI-Generated Code\n\n## AI Cost Tracking\n\n![AI Cost](https://img.shields.io/badge/AI%20Cost-$2.85-green) ![AI Model](https://img.shields.io/badge/AI%20Model-openrouter%2Fqwen%2Fqwen3-coder-next-lightgrey)\n\nThis project uses AI-generated code. Total cost: **$2.8500** with **19** AI commits.\n\nGenerated on 2026-04-09 using [openrouter/qwen/qwen3-coder-next](https://openrouter.ai/models/openrouter/qwen/qwen3-coder-next)\n\n---\n\n\n\n\u003e **Correct code is not the same as mergeable code.**\n\u003e eslint + code review, but for AI. Add to your pipeline in 2 minutes.\n\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n[![Dataset: v0](https://img.shields.io/badge/dataset-v0_(10_issues)-green.svg)](dataset/)\n[![CI](https://img.shields.io/badge/CI-qualbench--action-orange.svg)](action/)\n\n---\n\n## 60 seconds to your first score\n\n```bash\npip install qualbench\nqualbench quickstart\n```\n\nNo config, no API keys. QualBench evaluates your current diff and prints a Quality Score.\n\n## Add to CI in 2 minutes\n\n```yaml\n# .github/workflows/qualbench.yml\nname: QualBench\non: [pull_request]\njobs:\n  quality-check:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: semcod/qualbench-action@v1\n        with:\n          tool: prollama\n          fail_on_score: 70\n```\n\nEvery AI-generated PR gets a quality review comment. Set `fail_on_score` and the pipeline fails if quality is below your threshold.\n\n```\n🧠 QualBench Review\n\nQuality Score: 78/100\n\n  ❌ Complexity increased (+12%)\n  ⚠ Security: 1 new medium-severity finding\n  ✔ Tests pass, no regressions\n\nVerdict: needs_review\n```\n\n## CI/CD Examples\n\n### GitHub Action (recommended)\n\n```yaml\n# .github/workflows/qualbench.yml\nname: QualBench\non: [pull_request]\njobs:\n  quality-check:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: semcod/qualbench-action@v1\n        with:\n          tool: prollama\n          fail_on_score: 70\n```\n\n### GitLab CI\n\n```yaml\n# .gitlab-ci.yml\nqualbench:\n  stage: test\n  image: python:3.12-slim\n  before_script:\n    - pip install qualbench\n  script:\n    - qualbench run --tool prollama --json --fail-on-score 70\n  only:\n    - merge_requests\n```\n\n### Azure DevOps\n\n```yaml\n# azure-pipelines.yml\nsteps:\n  - task: UsePythonVersion@0\n    inputs:\n      versionSpec: '3.12'\n  - script: |\n      pip install qualbench\n      qualbench run --tool prollama --json --fail-on-score 70\n    displayName: 'QualBench Quality Check'\n```\n\n### Jenkins\n\n```groovy\n// Jenkinsfile\nstage('Quality Check') {\n    steps {\n        sh '''\n            pip install qualbench\n            qualbench run --tool prollama --fail-on-score 70\n        '''\n    }\n}\n```\n\n### CircleCI\n\n```yaml\n# .circleci/config.yml\nversion: 2.1\njobs:\n  quality:\n    docker:\n      - image: python:3.12-slim\n    steps:\n      - checkout\n      - run: pip install qualbench\n      - run: qualbench run --tool prollama --fail-on-score 70\nworkflows:\n  quality-check:\n    jobs:\n      - quality\n```\n\n## The problem\n\nAI coding tools resolve 70–80% of benchmark tasks. But most AI-generated PRs are not mergeable without human fixes. Every existing benchmark asks \"do tests pass?\" — nobody asks \"would a senior developer approve this PR?\"\n\n## Six dimensions of production readiness\n\n| Dimension | What it measures | Weight |\n|-----------|-----------------|--------|\n| **Correctness** | All tests pass, no regressions | 25% |\n| **Mergeability** | Would a senior dev merge this? (1–5) | 25% |\n| **Security** | New vulnerabilities introduced | 15% |\n| **Code quality** | Complexity delta, dead code | 15% |\n| **Iterations** | Attempts to reach acceptable output | 10% |\n| **Cost efficiency** | USD per successful patch | 10% |\n\n**Verdicts:** `ready_to_merge` (≥85), `needs_review` (65–84), `not_merge_ready` (\u003c65).\n\n## CLI\n\n```bash\nqualbench run --tool prollama          # score current diff\nqualbench run --tool prollama --json   # portable JSON output\nqualbench run --mode cheap             # lowest-cost models\nqualbench quickstart                   # first score in 60 seconds\nqualbench compare my_tool              # vs leaderboard\nqualbench info                         # dataset summary\nqualbench doctor                       # check dependencies\n```\n\n## One portable format everywhere\n\nCLI, API, GitHub Action — same JSON schema. See [docs/schema.md](docs/schema.md).\n\n## Adding your tool\n\n```bash\ncp runners/template.py runners/my_tool.py\n# Implement run() → return portable schema\nqualbench run --tool my_tool\n# Submit PR with results\n```\n\n## License\n\nLicensed under Apache-2.0.\n\n\u003c!-- taskill:status:start --\u003e\n\n## Status\n\n_Last updated by [taskill](https://github.com/oqlos/taskill) at 2026-04-25 13:46 UTC_\n\n| Metric | Value |\n|---|---|\n| HEAD | `c199cf4` |\n| Coverage | — |\n| Failing tests | — |\n| Commits in last cycle | 26 |\n\n\u003e Repository received a mix of fixes, refactors and configuration updates: tests were hardened (mocks added), a JSON test fix was applied, many vallm/style issues and magic numbers were addressed, and release-related features (v0.3.0, Supervisor AI, new runners) were added. PyQual configuration thresholds and gates were adjusted and documentation/TODOs were updated after a PyQual run.\n\n\u003c!-- taskill:status:end --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemcod%2Fqualbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsemcod%2Fqualbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsemcod%2Fqualbench/lists"}