https://github.com/dagshub/open-source-data-pipeline
A repository that holds machine learning projects that uses DVC for data pipeline orchestration
https://github.com/dagshub/open-source-data-pipeline
ai dagshub dvc dvc-pipeline hacktoberfest hacktoberfest-2023 machine-learning machinelearning mlops open-source
Last synced: 7 months ago
JSON representation
A repository that holds machine learning projects that uses DVC for data pipeline orchestration
- Host: GitHub
- URL: https://github.com/dagshub/open-source-data-pipeline
- Owner: DagsHub
- Created: 2023-09-07T09:27:43.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-09-13T08:58:34.000Z (about 2 years ago)
- Last Synced: 2025-01-22T03:46:03.605Z (9 months ago)
- Topics: ai, dagshub, dvc, dvc-pipeline, hacktoberfest, hacktoberfest-2023, machine-learning, machinelearning, mlops, open-source
- Homepage:
- Size: 8.79 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Open Source Data Pipeline đ¶
**Welcome to DagsHubâs Data Pipeline contribution project for Hacktoberfest 2023!**

In this exciting Hacktoberfest challenge, DagsHub invites you to build data pipelines using DVC for automation and versioning of Open Source Machine Learning projects.
## What is DagsHub?
[DagsHub](https://dagshub.com/) is a centralized platform to host and manage machine learning projects including code, data, models, experiments, annotations, model registry, and more! DagsHub does the MLOps heavy lifting for its users. Every repository comes with configured S3 storage, an experiment tracking server, and an annotation workspace - all using popular open-source tools like MLflow, DVC, Git, and Label Studio.
## **What's the Challenge?**
DagsHub is excited to introduce the DVC Data Pipeline Contribution Challenge. In this challenge, we invite you to contribute DVC (Data Version Control) data pipelines to open-source projects on DagsHub. DVC pipelines are essential for efficiently managing, versioning, and sharing data workflows in machine learning and data science projects.
## **How Can You Participate?**
Here's a step-by-step guide to get involved in this challenge:
1. **Choose a Project**: Explore [open-source projects on DagsHub](https://dagshub.com/explore/repos) and select one that interests you. It can be any project that utilizes data pipelines or would benefit from one.
2. **Create the DVC Pipeline**: Fork the project under your name and using DVC, design and **execute** a data pipeline that suits the project's needs. Ensure it follows best practices for data versioning, reproducibility, and scalability.
3. **Document Your Pipeline**: As you build the pipeline, maintain clear and concise documentation describing its purpose, data sources, processing steps, and any dependencies. This documentation is crucial for future users and contributors and should be added to the projectâs README file.
4. **Tag your project:** Add relevant tags to the repository and files including `dvc`,`data-pipeline`, `hacktoberfest`, and `hacktoberfest-2023` labels to the DagsHub repository.
5. **Submit Your Contribution**: Open a Pull Request to the project on DagsHub.
6. **Proof of Contribution**: Open a Pull Request [here](https://github.com/DagsHub/open-source-data-pipelin) with the `README.md`, `dvc.yaml` and `dvc.lock` files and a link to the DagsHub repo.## **Why Join the Challenge?**
Participating in the DagsHub DVC Data Pipeline Contribution Challenge offers numerous benefits:
- **Skill Enhancement**: Sharpen your DVC skills and gain hands-on experience in creating robust data pipelines.
- **Collaborative Learning**: Collaborate with open-source project maintainers and fellow contributors, expanding your network and knowledge.
- **Contribution to Open Source**: Contribute to the open-source community by enhancing the data workflows of valuable projects.
- **Visibility**: Showcase your expertise to a wider audience within the data science and machine learning community.