https://github.com/longnguyen010203/databricks-etl-pipeline
https://github.com/longnguyen010203/databricks-etl-pipeline
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/longnguyen010203/databricks-etl-pipeline
- Owner: longNguyen010203
- Created: 2024-06-09T07:37:03.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-09T07:43:11.000Z (about 1 year ago)
- Last Synced: 2025-01-23T09:11:18.739Z (5 months ago)
- Language: Python
- Size: 1.95 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# etl_pipeline
This is a [Dagster](https://dagster.io/) project scaffolded with [`dagster project scaffold`](https://docs.dagster.io/getting-started/create-new-project).
## Getting started
First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in ["editable mode"](https://pip.pypa.io/en/latest/topics/local-project-installs/#editable-installs) so that as you develop, local code changes will automatically apply.
```bash
pip install -e ".[dev]"
```Then, start the Dagster UI web server:
```bash
dagster dev
```Open http://localhost:3000 with your browser to see the project.
You can start writing assets in `etl_pipeline/assets.py`. The assets are automatically loaded into the Dagster code location as you define them.
## Development
### Adding new Python dependencies
You can specify new Python dependencies in `setup.py`.
### Unit testing
Tests are in the `etl_pipeline_tests` directory and you can run tests using `pytest`:
```bash
pytest etl_pipeline_tests
```### Schedules and sensors
If you want to enable Dagster [Schedules](https://docs.dagster.io/concepts/partitions-schedules-sensors/schedules) or [Sensors](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors) for your jobs, the [Dagster Daemon](https://docs.dagster.io/deployment/dagster-daemon) process must be running. This is done automatically when you run `dagster dev`.
Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.
## Deploy on Dagster Cloud
The easiest way to deploy your Dagster project is to use Dagster Cloud.
Check out the [Dagster Cloud Documentation](https://docs.dagster.cloud) to learn more.