Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/danhenriquex/machine_learning_project

Machine Learning Project Setup with DVC, Hydra, GCP and Docker
https://github.com/danhenriquex/machine_learning_project

docker dvc gcp hydra machine-learning mlops

Last synced: 4 days ago
JSON representation

Machine Learning Project Setup with DVC, Hydra, GCP and Docker

Awesome Lists containing this project

README

        

🤖 Machine Learning Project


Learning MLOps.


Overview •
Technologies and Tools Used •
Getting Started •
Author


🚧 MLOps Project 🚀 Finished 🚧

### Overview


This project demonstrates the setup of a Data Version Control (DVC) system using Google Cloud Storage (GCS) as the remote storage for data versioning. It leverages Docker for containerization, Hydra for configuration management, and Poetry for dependency management.

### Technologies and Tools Used

- **Docker**: Used to containerize the application, making it portable and easier to deploy in different environments.
- **GCP (Google Cloud Platform)**: Google Cloud Storage is used to store raw data and manage versioning through DVC.
- **Hydra**: Manages the configuration schema for the project, helping with flexible and hierarchical configuration setups.
- **DVC**: Used for versioning datasets and model files. It helps in tracking changes and managing large files efficiently.
- **Poetry**: Handles dependency management, ensuring all required packages are installed in a virtual environment.

### Getting Started

To get started with this project, follow these steps:

1. **Clone the Repository:**

```bash
git clone
cd
```

2. **Create environtment:**

```bash
# To install and update dependencies
make lock-dependencies

# Build the docker container
make build
```
3. **Update Dataset**

```bash
# Updates dataset in GCP and push changes to github repository
make version-data
```

### Author

---