https://github.com/shcheklein/dvc-sql-example
A simple DVC pipeline to fetch from an SQL DB, cache as parquet for reproducibility and faster processing
https://github.com/shcheklein/dvc-sql-example
azure dvc dvc-pipeline example pipeline
Last synced: 10 months ago
JSON representation
A simple DVC pipeline to fetch from an SQL DB, cache as parquet for reproducibility and faster processing
- Host: GitHub
- URL: https://github.com/shcheklein/dvc-sql-example
- Owner: shcheklein
- License: mit
- Created: 2023-06-21T03:25:49.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-15T17:59:53.000Z (about 2 years ago)
- Last Synced: 2025-01-25T22:35:42.637Z (about 1 year ago)
- Topics: azure, dvc, dvc-pipeline, example, pipeline
- Language: Python
- Homepage: https://dvc.org
- Size: 3.91 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DVC Azure SQL example
A simple DVC pipeline to fetch from an SQL DB, cache as parquet for
reproducibility and faster processing.
## Install
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
Depending on the setup and machine, you might need to install ODBC driver. It
depends on the OS, please refer to MS ODBC setup docs.
## Setup
Create and `.env` file with:
```env
AZURE_CONNECTION_STRING="DRIVER={ODBC Driver 18 for SQL Server};SERVER=.database.windows.net,1433;DATABASE=;UID=;PWD="
```
This file is in `.gitignore`.
> Note! There should be a better way to manage Azure credentials (e.g. using AD
> or managed identities. This is example is made simple, but we recommend to
> explore other options.
## Running
Run `dvc repro` or `dvc exp run` to reproduce the pipeline. Use regular
`dvc push`, `dvc pull`, etc, to save and load artifacts.