Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/outerbounds/metaflow-card-hf-dataset
https://github.com/outerbounds/metaflow-card-hf-dataset
Last synced: 4 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/outerbounds/metaflow-card-hf-dataset
- Owner: outerbounds
- License: apache-2.0
- Created: 2024-07-16T19:45:52.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-19T21:20:18.000Z (3 months ago)
- Last Synced: 2024-08-20T01:12:54.195Z (3 months ago)
- Language: Python
- Size: 16.6 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Installation
```bash
pip install metaflow-card-hf-dataset
```## Usage
After installing the module, you can add any HuggingFace dataset to your Metaflow tasks by using the `@huggingface_dataset` decorator. There are two ways to use the decorator:
- Via the `id` argument, which is the dataset ID from HuggingFace.
- Via the `artifact_id` argument, which is the name of a FlowSpec artifact that contains the dataset ID.Use the first if your workflow always reads from the same HuggingFace dataset ID.
Use the second if your workflow pass in dataset IDs as parameters or changes them dynamically.```python
from metaflow import FlowSpec, step, huggingface_dataset, Parameterclass Flow(FlowSpec):
eval_ds = Parameter('eval_ds', default='argilla/databricks-dolly-15k-curated-en', help='HuggingFace dataset id.')
# Dynamically input: python flow.py run --eval_ds lighteval/mmlu@huggingface_dataset(id="princeton-nlp/SWE-bench")
@step
def start(self):
self.another_one = 'wikimedia/wikipedia'
self.next(self.end)@huggingface_dataset(artifact_id="another_one") # Use the dataset ID set to an artifact var.
@huggingface_dataset(artifact_id="eval_ds") # Use the dataset ID passed as a parameter.
@step
def end(self):
passif __name__ == '__main__':
Flow()
```