{"id":40700965,"url":"https://github.com/epiforecasts/llm-epi-composition","last_synced_at":"2026-01-21T12:04:21.422Z","repository":{"id":332125310,"uuid":"1114354839","full_name":"epiforecasts/llm-epi-composition","owner":"epiforecasts","description":"Evaluating LLM ability to compose epidemic models with and without validated components","archived":false,"fork":false,"pushed_at":"2026-01-12T14:28:18.000Z","size":1880,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-12T21:06:47.615Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epiforecasts.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-11T08:56:41.000Z","updated_at":"2026-01-12T14:28:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/epiforecasts/llm-epi-composition","commit_stats":null,"previous_names":["epiforecasts/llm-epi-composition"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/epiforecasts/llm-epi-composition","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiforecasts%2Fllm-epi-composition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiforecasts%2Fllm-epi-composition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiforecasts%2Fllm-epi-composition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiforecasts%2Fllm-epi-composition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epiforecasts","download_url":"https://codeload.github.com/epiforecasts/llm-epi-composition/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epiforecasts%2Fllm-epi-composition/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28632781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-21T04:47:28.174Z","status":"ssl_error","status_checked_at":"2026-01-21T04:47:22.943Z","response_time":86,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-21T12:04:20.758Z","updated_at":"2026-01-21T12:04:21.415Z","avatar_url":"https://github.com/epiforecasts.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM Epidemiological Code Composition\n\nCan large language models write epidemiologically correct code for estimating the time-varying reproduction number (Rt)?\n\n## Study Design\n\nThis study evaluates LLM-generated code for Rt estimation across:\n- **2 models**: Claude Sonnet 4, Llama 3.1 8B\n- **4 scenarios**: From basic Rt estimation to complex multi-stream models\n- **5 framework conditions**: Stan, PyMC, Turing, EpiAware, plain R\n- **3 runs each**: 120 total submissions\n\n## Important Limitations\n\nThis study tests **zero-context, single-shot prompting** - a deliberately harsh baseline:\n\n- No documentation or examples provided\n- No iterative refinement\n- No tool use or agentic behavior\n- No access to framework codebases\n\nThis is **not** how practitioners would realistically use LLMs. Real-world use involves iteration, documentation in context, and error feedback. Results should be interpreted as a lower bound on capability.\n\nSee [Issue #1](https://github.com/epiforecasts/llm-epi-composition/issues/1) and [Issue #2](https://github.com/epiforecasts/llm-epi-composition/issues/2) for discussion.\n\n## Repository Structure\n\n```\n├── prompts/                 # Scenario prompts sent to LLMs\n│   ├── scenario_1a/         # Open method choice\n│   ├── scenario_1b/         # Renewal equation specified\n│   ├── scenario_2/          # Complex model (day-of-week, ascertainment)\n│   └── scenario_3/          # Multiple data streams\n├── experiments/             # LLM responses (120 submissions)\n├── evaluation/              # Execution evaluation\n│   ├── run_evaluation.R     # Evaluation script\n│   └── results/             # Execution results\n├── expert_review/           # Expert assessment materials\n│   ├── README.md            # Reviewer instructions\n│   ├── all_code.md          # All submissions (blinded)\n│   └── scoresheet.md        # Scoring forms\n├── reference_solutions/     # Ground truth implementations\n├── data/                    # Synthetic COVID-19 case data\n└── analysis_plan.md         # Pre-registered analysis plan\n```\n\n## Scenarios\n\n| Scenario | Description | Method |\n|----------|-------------|--------|\n| 1a | Estimate Rt from daily cases | Open choice |\n| 1b | Estimate Rt using renewal equation | Specified |\n| 2 | Rt with day-of-week effects, time-varying ascertainment, NegBin noise | Specified |\n| 3 | Joint model of cases, hospitalisations, deaths with shared Rt | Specified |\n\n## Expert Review\n\nExpert reviewers assess each submission for epidemiological correctness:\n\n1. **Method identification** (Scenario 1a): Renewal equation, Wallinga-Teunis, etc.\n2. **Departures from reference**: List differences from gold standard\n3. **Departure classification**:\n   - A: Equivalent alternative\n   - B: Minor error\n   - C: Major error\n   - D: Fundamental misunderstanding\n4. **Overall assessment**: Acceptable / Minor issues / Major issues / Incorrect\n\nSee [`expert_review/README.md`](expert_review/README.md) for full instructions.\n\n## Reproducing\n\n### Run experiments\n```bash\nRscript experiments/run_all.R\n```\n\n### Evaluate execution\n```bash\nRscript evaluation/run_evaluation.R\n```\n\n### Generate review materials\n```bash\nRscript expert_review/generate_review_materials.R\n```\n\n## License\n\nMIT\n\n## Citation\n\n*Paper forthcoming*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepiforecasts%2Fllm-epi-composition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepiforecasts%2Fllm-epi-composition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepiforecasts%2Fllm-epi-composition/lists"}