{"id":50325740,"url":"https://github.com/saeg/qpa-v2","last_synced_at":"2026-05-29T06:02:50.048Z","repository":{"id":355493482,"uuid":"1228246557","full_name":"saeg/qpa-v2","owner":"saeg","description":"qpa is a tool that scans open-source quantum computing projects and automatically detects which known quantum algorithm patterns they implement.","archived":false,"fork":false,"pushed_at":"2026-05-03T21:30:42.000Z","size":3350,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-03T23:27:45.458Z","etag":null,"topics":["quantum-computing","quantum-software-engineering","quantum-software-patterns"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saeg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-03T19:35:01.000Z","updated_at":"2026-05-03T21:30:46.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/saeg/qpa-v2","commit_stats":null,"previous_names":["saeg/qpa-v2"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/saeg/qpa-v2","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saeg%2Fqpa-v2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saeg%2Fqpa-v2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saeg%2Fqpa-v2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saeg%2Fqpa-v2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saeg","download_url":"https://codeload.github.com/saeg/qpa-v2/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saeg%2Fqpa-v2/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33639056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["quantum-computing","quantum-software-engineering","quantum-software-patterns"],"created_at":"2026-05-29T06:02:48.359Z","updated_at":"2026-05-29T06:02:50.042Z","avatar_url":"https://github.com/saeg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# qpa - Quantum Patterns Analyzer\n\nqpa is an open-source Python tool that mines quantum computing pattern usage from open-source projects. It builds a knowledge base from major quantum frameworks, discovers and clones relevant GitHub repositories, and uses semantic search to detect pattern implementations in Jupyter Notebooks.\n\nThe full pipeline runs end-to-end with a single command:\n\n```bash\njust all\n```\n\n## How it works\n\nThe tool operates in three stages:\n\n**Stage 1 - Data collection.** The GitHub API is queried to find Python-based quantum repositories (filtered by stars, contributors, and activity). Repositories are cloned locally and the PlanQK Pattern Atlas is downloaded.\n\n**Stage 2 - Knowledge base construction.** Core quantum concepts are extracted from five seed frameworks (Qiskit, PennyLane, Classiq, Qiskit Algorithms, Qiskit Machine Learning) and classified against the pattern catalog. A *dynamic KB* is then built automatically: for each target project, qpa scans its library source code and promotes public functions whose docstrings semantically match a seed KB concept into a project-specific extension of the KB.\n\n**Stage 3 - Pattern detection.** Jupyter Notebooks are converted to Python scripts and scanned across seven semantic channels. Results are aggregated into a structured report.\n\n### Matching channels\n\n| Channel | Threshold | What is matched |\n|---|---|---|\n| `name` | 0.88 | AST-extracted function call names vs. KB concept short names |\n| `summary` | 0.78 | File comment block vs. KB concept docstring summaries |\n| `title` | 0.76 | Notebook heading vs. KB concept summaries |\n| `pattern_desc` | 0.80 | File comment block vs. pattern intent text |\n| `defined_doc` | 0.85 | Docstrings of classes/functions *defined* in the file vs. KB summaries |\n| `internal_keywords` | 0.78 | KB concept internal token signatures vs. call-site names |\n| `internal_comments` | 0.75 | KB concept inline comments vs. file comment block |\n\nThresholds are runtime-tunable in `data/analysis_config.json` without code changes.\n\n### Two-phase pipeline\n\nPhase 1 (`just build-dynamic-kbs`) scans each target project's library source with the seed KB and writes a project-specific dynamic KB under `data/dynamic_kb/\u003cproject\u003e/`. Phase 2 (`just run_main`) runs the full seven-channel analysis on converted notebooks, loading both the seed KB and all dynamic KBs automatically.\n\n## Current dataset\n\n| Metric | Value |\n|---|---|\n| Projects searched | 84 |\n| Python scripts analyzed | 1,363 |\n| Projects with matches | 41 |\n| Total pattern instances | 3,593 |\n| Files with matches | 576 |\n| Patterns detected (of 22) | 22 |\n| Avg. similarity score | 0.894 |\n\n**Qrisp held-out evaluation** (framework excluded from the KB):\n\n| | Precision | Recall | F1 |\n|---|---|---|---|\n| Micro | 0.800 | 0.667 | 0.727 |\n| Macro | 0.705 | 0.564 | 0.611 |\n\n## Requirements\n\n- Python 3.12+\n- [just](https://github.com/casey/just#installation) command runner\n- Git\n- A GitHub Personal Access Token in `.env`:\n\n```\nGITHUB_TOKEN=\"ghp_YourTokenHere\"\n```\n\n## Quickstart\n\n```bash\njust all        # full pipeline from scratch (60–90 min on first run)\n```\n\nOr run stages individually:\n\n```bash\njust search-repos              # GitHub discovery → data/filtered_repo_list.txt\njust clone-filtered            # clone/update repos from the list\njust identify-qiskit           # extract seed KB from Qiskit\njust identify-pennylane        # extract seed KB from PennyLane\njust identify-classiq          # extract seed KB from Classiq\njust identify-qiskit-algorithms\njust enrich-kb                 # add internal keywords/comments to seed KB\njust preprocess-notebooks      # extract .ipynb → .py\njust convert-archived-notebooks\njust build-dynamic-kbs         # phase 1: build per-project dynamic KBs\njust run_main                  # phase 2: detect patterns in notebooks\njust report                    # generate docs/final_pattern_report.md\njust evaluate-qrisp-two-phase  # run held-out Qrisp evaluation\n```\n\n## Tuning thresholds\n\nEdit `data/analysis_config.json` to enable/disable channels or adjust thresholds, then re-run `just run_main`. No code changes required.\n\n## Project layout\n\n```\nqpa/\n├── src/\n│   ├── analysis/           # run.py (main matcher), generate_report.py\n│   ├── core_concepts/      # seed KB extraction pipelines per framework\n│   ├── data_acquisition/   # GitHub search, pattern atlas download\n│   ├── preprocessing/      # notebook conversion, repo cloning\n│   ├── evaluation/         # metrics, per-framework evaluation pipelines\n│   └── conf/               # config.py (paths, model name)\n├── scripts/\n│   ├── build_dynamic_kbs.py       # phase 1 dynamic KB builder\n│   ├── build_qrisp_name_kb.py     # manual Qrisp KB (evaluation target)\n│   ├── enrich_kb_with_internals.py\n│   └── evaluate_qrisp_metrics.py\n├── data/\n│   ├── analysis_config.json       # runtime channel config\n│   ├── dynamic_kb/                # auto-built per-project KBs\n│   ├── knowledge_base/            # classified seed KB CSVs\n│   └── qrisp_ground_truth.csv     # held-out evaluation labels\n├── converted_notebooks/           # .py files extracted from notebooks\n├── target_github_projects/        # cloned repositories\n├── paper/                         # companion papers (LaTeX)\n├── docs/\n│   └── final_pattern_report.md    # latest analysis report\n├── justfile                       # all pipeline commands\n└── pyproject.toml\n```\n\n## Key outputs\n\n| File | Description |\n|---|---|\n| `docs/final_pattern_report.md` | Main analysis report |\n| `data/quantum_concept_matches_with_patterns.csv` | Full match dataset |\n| `data/dynamic_kb/\u003cproject\u003e/` | Per-project dynamic KB entries |\n| `data/report/*.csv` | Individual breakdown tables |\n\n## Utility commands\n\n```bash\njust compare-runs                        # diff two archived runs\njust list-runs                           # list all run archives\njust kappa                               # compute inter-rater agreement\njust build-paper                         # compile LaTeX paper to PDF\njust build-dynamic-kb-project \u003cname\u003e     # rebuild one project's KB\njust evaluate-all-frameworks             # precision/recall on all GT frameworks\njust clean                               # remove all generated artifacts\n```\n\n## Embedding model\n\n`all-mpnet-base-v2` (sentence-transformers). Max 384 tokens per input. Chosen for reproducibility as there are  no API calls,  and reduced non-determinism.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaeg%2Fqpa-v2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaeg%2Fqpa-v2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaeg%2Fqpa-v2/lists"}