{"id":19329726,"url":"https://github.com/outerbounds/metaflow-card-hf-dataset","last_synced_at":"2026-06-11T22:31:26.897Z","repository":{"id":249697431,"uuid":"829616253","full_name":"outerbounds/metaflow-card-hf-dataset","owner":"outerbounds","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-19T21:20:18.000Z","size":17,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-02-24T06:46:39.649Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/outerbounds.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-16T19:45:52.000Z","updated_at":"2024-08-19T21:20:21.000Z","dependencies_parsed_at":"2024-11-10T02:29:42.756Z","dependency_job_id":"b29c46b4-d920-4fb8-a99a-fa46a0e35ca8","html_url":"https://github.com/outerbounds/metaflow-card-hf-dataset","commit_stats":null,"previous_names":["outerbounds/metaflow-card-hf-dataset"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/outerbounds/metaflow-card-hf-dataset","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-card-hf-dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-card-hf-dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-card-hf-dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-card-hf-dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/outerbounds","download_url":"https://codeload.github.com/outerbounds/metaflow-card-hf-dataset/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/outerbounds%2Fmetaflow-card-hf-dataset/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34221150,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T02:29:39.243Z","updated_at":"2026-06-11T22:31:26.879Z","avatar_url":"https://github.com/outerbounds.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Installation\n\n```bash\npip install metaflow-card-hf-dataset\n```\n\n## Usage\n\nAfter installing the module, you can add any HuggingFace dataset to your Metaflow tasks by using the `@huggingface_dataset` decorator. There are two ways to use the decorator:\n- Via the `id` argument, which is the dataset ID from HuggingFace.\n- Via the `artifact_id` argument, which is the name of a FlowSpec artifact that contains the dataset ID.\n\nUse the first if your workflow always reads from the same HuggingFace dataset ID. \nUse the second if your workflow pass in dataset IDs as parameters or changes them dynamically.\n\n```python\nfrom metaflow import FlowSpec, step, huggingface_dataset, Parameter\n\nclass Flow(FlowSpec):\n\n    eval_ds = Parameter('eval_ds', default='argilla/databricks-dolly-15k-curated-en', help='HuggingFace dataset id.')\n    # Dynamically input: python flow.py run --eval_ds lighteval/mmlu\n\n    @huggingface_dataset(id=\"princeton-nlp/SWE-bench\")\n    @step\n    def start(self):\n        self.another_one = 'wikimedia/wikipedia'\n        self.next(self.end)\n\n    @huggingface_dataset(artifact_id=\"another_one\") # Use the dataset ID set to an artifact var.\n    @huggingface_dataset(artifact_id=\"eval_ds\") # Use the dataset ID passed as a parameter.\n    @step\n    def end(self):\n        pass\n\nif __name__ == '__main__':\n    Flow()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fmetaflow-card-hf-dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fouterbounds%2Fmetaflow-card-hf-dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fouterbounds%2Fmetaflow-card-hf-dataset/lists"}