Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sayantikabanik/datajourney
Open-source Data Management Framework
https://github.com/sayantikabanik/datajourney
allthingsopen dagster data-engineering flask gha holoviews intake llm mito open-source panel pytest vale
Last synced: about 18 hours ago
JSON representation
Open-source Data Management Framework
- Host: GitHub
- URL: https://github.com/sayantikabanik/datajourney
- Owner: sayantikabanik
- License: cc0-1.0
- Created: 2022-03-02T14:02:56.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-25T03:21:04.000Z (3 months ago)
- Last Synced: 2024-10-25T04:21:19.181Z (3 months ago)
- Topics: allthingsopen, dagster, data-engineering, flask, gha, holoviews, intake, llm, mito, open-source, panel, pytest, vale
- Language: HTML
- Homepage: https://sayantikabanik.github.io/DataJourney/
- Size: 80.1 MB
- Stars: 13
- Watchers: 2
- Forks: 2
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
[![License](https://img.shields.io/badge/license-CC0%201.0%20Universal-blue)](https://creativecommons.org/publicdomain/zero/1.0/)
[![Code of Conduct](https://img.shields.io/badge/Code_of_Conduct-Contributor%20Covenant-blue)](https://www.contributor-covenant.org/version/2/0/code_of_conduct/)\
[![CI](https://github.com/sayantikabanik/DataJourney/actions/workflows/CI.yml/badge.svg)](https://github.com/sayantikabanik/DataJourney/actions/workflows/CI.yml)
[![github-repo-stats](https://github.com/sayantikabanik/DataJourney/actions/workflows/github-repo-stats.yml/badge.svg)](https://github.com/sayantikabanik/DataJourney/actions/workflows/github-repo-stats.yml)
[![Deploy DataJourney Stats](https://github.com/sayantikabanik/DataJourney/actions/workflows/static.yml/badge.svg)](https://github.com/sayantikabanik/DataJourney/actions/workflows/static.yml)
[![Lint prose](https://github.com/sayantikabanik/DataJourney/actions/workflows/review.yml/badge.svg)](https://github.com/sayantikabanik/DataJourney/actions/workflows/review.yml)
[![Monitor GitHub API Rate Limit](https://github.com/sayantikabanik/DataJourney/actions/workflows/rate-limit-monitor.yml/badge.svg)](https://github.com/sayantikabanik/DataJourney/actions/workflows/rate-limit-monitor.yml)
### π DataJourney
#### πͺΆShort version
Design- first Open Source Data Management Toolkit. Simplifies data workflows with modular, reproducible solutions
#### π²Long version
DataJourney demonstrates how organizations can effectively manage and utilize data by harnessing the power of open-source technologies. It's designed to help navigate the complex landscape of data tools, offering a structured approach to building **scalable**, and **reproducible** data workflows.
Built on open-source principles, the framework guides users through essential stepsβfrom **identifying** goals and **selecting tools** to **testing** and **customising** workflows. With its flexible, modular design, DataJourney can be tailored to individual needs, making it an invaluable toolkit for data professionals.
### 𧱠Design Philosophy (LEGO)
Built with additive, subtractive capabilities glued with open source.
Each layer has a certain strength of communication inbuilt- PO (Base): Static home(s) to keep it together `(GitHub)`
- P1 (Tooling): Tooling, strings `(Powered by open source)`
- P2 (Maintenance + Monitoring): Env, automations `(Pixi + GHA)`
- P3 (Abstraction): Layer(s), CLI/task manager for users to interact with `(Pixi)`![DJ Design](assets/design/dj_vision.png)
### π Current workflows covered
{β¨= Experimental,
β = Implemented}| Status | Workflow Description |
|--------|------------------------------------------------------------------------------------------------------------------|
| β | `Python Packaging framework` design principles |
| β | `GitHub actions` configured |
| β | `Vale.sh` configured at PR level |
| β | `Pre-commit hooks` configured for code linting/formatting |
| β¨ | `Hello world` LLM design example based on [LangChain](https://python.langchain.com/) |
| β | Environment management via [pixi](https://prefix.dev/) |
| β | Reading data from online sources using [intake](https://github.com/intake/intake) |
| β | Sample pipeline built using [Dagster](https://github.com/dagster-io/dagster) |
| β | Building Dashboard using [holoviews](https://holoviews.org/gallery/index.html) + [panel](https://panel.holoviz.org/reference/index.html) |
| β | Exploratory data analysis (EDA) using [mito](https://www.trymito.io/) |
| β | Web UI build on [Flask](https://flask.palletsprojects.com/en/3.0.x/) |
| β | Web UI re-done and expanded with [FastHTML](https://docs.fastht.ml/) |
| β | Leverage AI models to analyse data [GitHub AI models Beta](https://docs.github.com/en/github-models/prototyping-with-ai-models) |### βοΈ Quickly getting started with DataJourney
- Clone DJ `[email protected]:sayantikabanik/DataJourney.git`
- Generate & add `GITHUB_TOKEN`, instructions [here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic)
- Added requirement to run the LLM workflows
- Switch directory `cd DataJourney`
- Download pixi : [prefix.dev](https://prefix.dev/)
- Activate env: `pixi shell`
- Install DJ framework locally `pixi run DJ_package`
- List all the tasks: `pixi task list`
- Execute a task from the list: `pixi run `
- Execute a task with verbosity enabled: `pixi run -v `### ππ½ββοΈ Active `tasks` under DJ
- GIT_TOKEN_CHECK
- DJ_package
- DJ_pre_commit
- DJ_dagster
- DJ_fasthtml_app
- DJ_flask_app
- DJ_mito_app
- DJ_panel_app
- DJ_llm_analysis
- DJ_hello_world_langchain### π About pre-commit-hooks and activating
Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. [More details](https://pre-commit.com/)```shell
pixi run DJ_pre_commit
```### π¦ Executing LLM script: Generate stock price recommendations
```shell
pixi run DJ_llm_analysis
```### πͺΌ Execute pre-configured Dagster pipeline
```shell
pixi run DJ_dagster
```
![Dagit UI output](assets/pipeline/dagster_ui.png)### π Panel app
```shell
pixi run DJ_panel_app
```*NOTE:*
The dashboard generated is exported into HTML format and saved as [stock_price_twilio_dashboard](analytics_framework%2Fdashboard%2Fstock_price_twilio_dashboard.html)![Panel app output](assets/dashboard/panel_app_stock.png)
### π΅ Mito
To explore further visit [trymito.io](https://docs.trymito.io/)
```shell
pixi run DJ_mito_app
```[//]: # (![mito output](assets/pipeline/mito_graph.png "Graph generated via mitosheet") ![mito output operation](assets/pipeline/mito_operations.png "Operations performed via mitosheet"))
### π¦ Display all data sources present via web UI
```shell
# Run FastHTML app
pixi run DJ_fasthtml_app
```
![data_sources_fasthtml.png](assets/pipeline/data_sources_fasthtml.png)