An open API service indexing awesome lists of open source software.

https://github.com/cmpadden/dagster-essentials-capstone

Dagster Essentials Capstone Project - Letterboxd Movie Summary
https://github.com/cmpadden/dagster-essentials-capstone

Last synced: 11 months ago
JSON representation

Dagster Essentials Capstone Project - Letterboxd Movie Summary

Awesome Lists containing this project

README

          

# Dagster Essentials Capstone - Movie Summaries

Collect movie metadata from Letterbox and OpenSubtitles, and generate a full movie
summarization with the power of LLMs and LangChain.

## Getting started

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in ["editable mode"](https://pip.pypa.io/en/latest/topics/local-project-installs/#editable-installs) so that as you develop, local code changes will automatically apply.

```bash
pip install -e ".[dev]"
```

Then, start the Dagster UI web server:

```bash
dagster dev
```

Open http://localhost:3000 with your browser to see the project.

You can start writing assets in `dagster_essentials_capstone/assets.py`. The assets are automatically loaded into the Dagster code location as you define them.

## Development

### Adding new Python dependencies

You can specify new Python dependencies in `setup.py`.

### Unit testing

Tests are in the `dagster_essentials_capstone_tests` directory and you can run tests using `pytest`:

```bash
pytest dagster_essentials_capstone_tests
```

### Schedules and sensors

If you want to enable Dagster [Schedules](https://docs.dagster.io/concepts/partitions-schedules-sensors/schedules) or [Sensors](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors) for your jobs, the [Dagster Daemon](https://docs.dagster.io/deployment/dagster-daemon) process must be running. This is done automatically when you run `dagster dev`.

Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.

### Exploring DuckDB

Using the DuckDB CLI, it is possible to easily explore the contents of the local DuckDB
by running the command:

```sh
duckdb data/data.duckdb
```

## Deploy on Dagster Cloud

The easiest way to deploy your Dagster project is to use Dagster Cloud.

Check out the [Dagster Cloud Documentation](https://docs.dagster.cloud) to learn more.