{"id":20369483,"url":"https://github.com/wschella/helm-data-downloader","last_synced_at":"2026-05-29T21:31:18.455Z","repository":{"id":194784489,"uuid":"687933403","full_name":"wschella/helm-data-downloader","owner":"wschella","description":"Download (all) evaluation data from the Stanford HELM benchmarking effort.","archived":false,"fork":false,"pushed_at":"2024-05-16T14:02:06.000Z","size":861,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-15T06:13:43.585Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wschella.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-06T09:59:56.000Z","updated_at":"2024-05-16T14:02:10.000Z","dependencies_parsed_at":"2023-09-15T04:43:59.468Z","dependency_job_id":"3bbec232-fbce-48dc-b27f-8f67e169180b","html_url":"https://github.com/wschella/helm-data-downloader","commit_stats":null,"previous_names":["wschella/helm-data-downloader"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wschella%2Fhelm-data-downloader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wschella%2Fhelm-data-downloader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wschella%2Fhelm-data-downloader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wschella%2Fhelm-data-downloader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wschella","download_url":"https://codeload.github.com/wschella/helm-data-downloader/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241915592,"owners_count":20041771,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T00:48:01.259Z","updated_at":"2026-05-29T21:31:18.201Z","avatar_url":"https://github.com/wschella.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HELM Data Downloader (helmdd)\n\nDownload evaluation data from the Stanford _Holistic Evaluation of Language Models (HELM)_ project, including _HELM Lite_, _HELM Instruct_, and _HEIM_.\n\nAt the time of writing, the HELM evaluation effort is at release v0.4.0 and contains more than almost 8500 evaluation runs, totalling more than 800GiB of prompts, model outputs, and meta data.\n\nThis script/tool allows you to download it all easily.\n\n## Install\n\n```shell\npip install git+https://github.com/wschella/helm-data-downloader\n```\n\nor with Rye:\n\n```shell\nrye install --git https://github.com/wschella/helm-data-downloader.git helmdd\n```\n\n## Usage\n\nRun the downloader:\n\n```shell\n$ helmdd --release latest\nFound 8526 runs online. No runs already downloaded found. Downloading all.\n  2%|██▋              | 171/8526 [07:05\u003c4:56\n  3%|██▋              | 172/8526 [07:07\u003c4:53\n  3%|██▋              | 173/8526 [07:10\u003c4:45\n  3%|██▊...\n```\n\nTo download _HELM Lite_ data, just use `--project lite`, same goes for `heim` and `instruct`.\n\n### Options\n\n```shell\n$ helmdd --help\nusage: helmdd [-h] [--project PROJECT_ID] [--release RELEASE] [--output-dir OUTPUT_DIR]\n              [--storage-url STORAGE_URL] [--redownload] [--max-runs MAX_RUNS]\n              [--dry-run] [--files FILES [FILES ...]]\n\nHELM Data Downloader\n\noptions:\n  -h, --help            show this help message and exit\n  --project PROJECT_ID  Project to download data from. Options: classic, heim, lite,\n                        instruct, all. Default: lite.\n  --release RELEASE     Release version to download data from. Example: v0.2.4. The\n                        default is 'latest', which will search for the latest release.\n  --output-dir OUTPUT_DIR\n                        Output directory to store downloaded data. Default: ./helm-\n                        data/\u003cPROJECT\u003e/\u003cRELEASE\u003e/\n  --storage-url STORAGE_URL\n                        The URL to download data from. Default behaviour is to search\n                        for it on the HELM website.It can be changed to e.g. use local\n                        mirror with similar folder structure, or adapted when HELM\n                        changes their storage location and this tool has not been\n                        updated yet.\n  --redownload          Redownload all data, even if present already.\n  --max-runs MAX_RUNS   Maximum number of runs to download.\n  --dry-run             Dry run. Do not download any runs.\n  --files FILES [FILES ...]\n                        Files to download for each run. Default: [scenario_state.json,\n                        instances.json, display_predictions.json]. Available:\n                        [run_spec.json, scenario.json, scenario_state.json, stats.json,\n                        instances.json, display_predictions.json,\n                        display_requests.json].You can also put 'all' to download all\n                        files.\n```\n\n### Further notes\n\nCurrently still not possible yet:\n\n- filter runs to download (as on the HELM/HEIM web pages)\n- select which data to download (prompts, model outputs, meta data)\n\nAll of this should be easy to add yourself if needed. Feel free to open a PR.\n\nOther interesting files are e.g. the schema.json, e.g. \u003chttps://storage.googleapis.com/crfm-helm-public/benchmark_output/releases/v0.4.0/schema.json\u003e, which contains all the models, metrics, adapters, etc...\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwschella%2Fhelm-data-downloader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwschella%2Fhelm-data-downloader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwschella%2Fhelm-data-downloader/lists"}