{"id":28961733,"url":"https://github.com/getappmap/navie-benchmark","last_synced_at":"2025-06-24T02:04:48.502Z","repository":{"id":262720316,"uuid":"845225837","full_name":"getappmap/navie-benchmark","owner":"getappmap","description":"Navie benchmarks","archived":false,"fork":false,"pushed_at":"2025-05-21T16:50:16.000Z","size":17650,"stargazers_count":0,"open_issues_count":13,"forks_count":1,"subscribers_count":3,"default_branch":"develop","last_synced_at":"2025-05-21T17:30:52.378Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/getappmap.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-20T20:46:39.000Z","updated_at":"2025-05-21T16:50:19.000Z","dependencies_parsed_at":"2024-11-13T23:31:56.577Z","dependency_job_id":null,"html_url":"https://github.com/getappmap/navie-benchmark","commit_stats":null,"previous_names":["getappmap/navie-benchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/getappmap/navie-benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2Fnavie-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2Fnavie-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2Fnavie-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2Fnavie-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/getappmap","download_url":"https://codeload.github.com/getappmap/navie-benchmark/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/getappmap%2Fnavie-benchmark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261589912,"owners_count":23181437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-24T02:04:47.691Z","updated_at":"2025-06-24T02:04:48.487Z","avatar_url":"https://github.com/getappmap.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## AppMap Navie SWE Bench Solver\n\nThis is a SWE Bench solver based on AppMap Navie.\n\n## Build Instructions\n\n### Clone with submodules\n\n```bash\ngit submodule update --init --recursive\n```\n\n### Create and activate virtualenv\n\nPython 3.12 is required.\n\n```bash\nvirtualenv .venv --python=python3.12\n. ./.venv/bin/activate\n```\n\n### Install Python dependencies\n\n```bash\npip install \".[dev]\"\n```\n\n### Build appmap-js\n\n```bash\ncd submodules/appmap-js\nyarn \u0026\u0026 yarn build\n```\n\n## Solving Locally\n\n### Export LLM key\n\nOptions are:\n\n- `OPENAI_API_KEY`\n- `ANTHROPIC_API_KEY`\n- `GOOGLE_WEB_CREDENTIALS`\n\n### Export LLM model\n\nOptions are:\n\n- `gemini-1.5-pro-002`\n- `gpt-4o-2024-08-06`\n- `gpt-4o-2024-05-13`\n- `gpt-4.1-2025-04-14`\n- `o1-preview-2024-09-12`\n- `o1-mini-2024-09-12`\n- `claude-3-5-sonnet-20240620`\n- `claude-3-5-sonnet-20241022`\n- `claude-3-7-sonnet-20250219`\n\n### Run the \"smoke\" subset\n\n```bash\npython -m solver.solve \\\n    --instance_set smoke \\\n    --limit test_files=2 test_status_retry=2 code_files=2 code_status_retry=2 concurrency=1\n```\n\n## Solving in CI\n\nSolvers are provided as GitHub Workflows in the `.github/workflows` directory.\n\n### `solve.yml`\n\nThis is a main workflow to run the solver when you want to leverage the pre-generated synthetic test cases. That means that the results of this workflow are not independent of previous runs, which is by design.\n\nIt can be triggered manually or via pull request with 'test-solve' label. The `test-solve` label is used for smoke\ntests of pull requests.\n\nThe workflow:\n\n1. Builds appmap-js dependencies\n2. Prepares matrix for parallel execution\n3. Runs solver instances across runners\n4. Collects and aggregates results\n5. Generates final report and artifacts\n\n**Options**\n\n- `use_synthetic_tests`: Whether to use synthetic tests (default true)\n- `observe_synthetic_tests`: Whether to observe synthetic test execution (default false)\n\n### `official.yml`\n\nWorkflow runs of this workflow are independent of previous runs. Existing synthetic test that are present in the repo are not used by this workflow. They are create by the workflow itself in an initial step. Then, once synthetic tests are\navailable and no further tests are being discovered, the workflow moves on to finding solutions.\n\n## Run tests\n\n```bash\npython -m pytest solver/tests\n```\n\n## Logging\n\nMost logging is directed by default to files, otherwise the console output from the project would be very verbose. Also, because the solver is run in parallel, the console output would be interleaved and hard to read.\n\nSo, you'll primarily find logs in the `solve` directory. Within this directory, the logs are organized by the instance id. Each Navie command is logged into a separate directory, with the inputs, options, and outputs in separate files.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetappmap%2Fnavie-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgetappmap%2Fnavie-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgetappmap%2Fnavie-benchmark/lists"}