{"id":14958366,"url":"https://github.com/octo-technology/ddui","last_synced_at":"2025-10-24T14:31:59.205Z","repository":{"id":74925032,"uuid":"182095627","full_name":"octo-technology/ddui","owner":"octo-technology","description":"Airflow's plugin for Data Science pipeline visualisation","archived":false,"fork":false,"pushed_at":"2019-12-27T12:58:02.000Z","size":735,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":11,"default_branch":"master_github","last_synced_at":"2025-01-31T02:22:01.224Z","etag":null,"topics":["airflow","airflow-plugin","datadriver","datascience","ml","pandas-python","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/octo-technology.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-18T13:45:39.000Z","updated_at":"2024-05-17T08:42:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"2b5376cc-a9e3-4058-a106-23870a4f5114","html_url":"https://github.com/octo-technology/ddui","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/octo-technology%2Fddui","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/octo-technology%2Fddui/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/octo-technology%2Fddui/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/octo-technology%2Fddui/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/octo-technology","download_url":"https://codeload.github.com/octo-technology/ddui/tar.gz/refs/heads/master_github","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237990573,"owners_count":19398453,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["airflow","airflow-plugin","datadriver","datascience","ml","pandas-python","scikit-learn"],"created_at":"2024-09-24T13:16:52.991Z","updated_at":"2025-10-24T14:31:49.185Z","avatar_url":"https://github.com/octo-technology.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://badge.fury.io/py/ddui.svg)](https://badge.fury.io/py/ddui)\n[![Anaconda-Server Badge](https://anaconda.org/octo/ddui/badges/latest_release_date.svg)](https://anaconda.org/octo/ddui)\n[![Anaconda-Server Badge](https://anaconda.org/octo/ddui/badges/version.svg)](https://anaconda.org/octo/ddui)\n[![Build Status](http://ec2-52-212-162-0.eu-west-1.compute.amazonaws.com:8080/buildStatus/icon?job=dd-ui-airflow%2Fmaster)](http://ec2-52-212-162-0.eu-west-1.compute.amazonaws.com:8080/job/dd-ui-airflow/job/master/)\n\n# Airflow's DataDriver plugin\n\n## from Pandas' dataframes to Airflow pipelines\n\n#### WHY : \n\nIn a machine learning project, there is a recurring problem \nwith the difference between local interactive modeling source code \nand production pipelines source code. \nIt is very error prone and, as a consequence, time consuming because we \nswitch constantly between experimentation and production.\n \nThe Datadriver project aims to solve this issue by making the glue code **based on Pandas and sklearn**\nfor modelization, **and on Airflow** for automation, scheduling, and monitoring of training \nand predicting pipelines.\n\n#### Plugin description\n\n**Datadriver UI (ddui)** is the Airflow's plugin we developed to track our models. \nCombined with the Datadriver's API (pyddapi), it offers a DAG view to track machine learning workflow (or dataflow).\n\nMore specifically, it shows the **Output** of any Airflow's Task with a lot of metrics and\ncharts : \n\n - choose a DAG to track\n![img/ddui_titan1.png](img/ddui_titan1.png)\n - select a task to see charts and describe metrics on the output_table\n![img/ddui_titan3.png](img/ddui_titan3.png) \n - look at histograms to verify if columns are correct (distributions, number of NAs,\n  unique values, etc...)\n![img/ddui_titan2.png](img/ddui_titan2.png) \n\n## Getting started\n\nfrom [PyPI.org](https://pypi.org/project/ddui/) :\n\n    pip install ddui\n    ddui install # link the plugin to airflow plugin's folder    \n\nfrom source install :\n\n    git clone git_url_of_this_project \u0026\u0026 cd this_project\n    pip install -e .\n    ddui install\n    \ndocker install :\n\n    ./run_docker.sh\n\n\n## Package modules\n\n    ddui/\n        dash_app -\u003e the application defined like a Dash application, with callbacks and event handeling. It is imported in plugin.py later\n        dash_components -\u003e html custom components like a Panel or an Alert Div\n        orm -\u003e function to access the Airflow metastore and retrieve DAGs list and infos\n        plot -\u003e functions using plotly, they return a Graph object\n        plugin -\u003e defines the DataDriverUI plugin that implements Airflow's Plugin interface https://airflow.apache.org/plugins.html#interface\n        views -\u003e a FlaskAdminView that implements Dash too, to have the ability to include plotly charts in Airflow\n        \n\n###### dependencies graph\n\n![pydeps ddui](img/dependencies_analysis.png)       \n\n## Developer setup\n\nThere is an existing DAG in tests/dags that mocks the behavior of Datadriver's API, but\nwithout any dependency to pyddapi.\n\nYou can use it to develop the User Interface, using the script located in tests/dev_tools.\n    \n    cd tests/dev_tools\n    python run_webserver.py\n    \nIt runs the Airflow's webserver, and it overrides the AIRFLOW__CORE__DAGS_FOLDER to look into tests/dags.\n\n### Setup your virtual env\n\n    virtualenv venv\n    source venv/bin/activate\n    pip install -e .\n    pip install -r ci/tests_requirements.txt\n    ddui install\n    \n\n\n# Contributors\n\nThis repository is a part of the DataDriver project.\n \nSince 2016, there were many people who contributed to this project : \n\n* Ali El Moussawi\n* Arthur Baudry\n* Augustin Grimprel\n* Aurélien Massiot\n* Benjamin Joyen-Conseil\n* Constant Bridon\n* Cyril Vinot\n* Eric Biernat\n* Jeffrey Lucas\n* Nicolas Cavallo\n* Nicolas Frot\n* Matthieu Lagacherie  \n* Mehdi Houacine\n* Pierre Baonla Bassom\n* Rémy Frenoy\n* Romain Ayres\n* Samuel Rochette\n* Thomas Vial\n* Veltin Dupont \n* Vincent Levorato\n* Yannick Drant\n* Yannick Schini\n* Yasir Khan\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focto-technology%2Fddui","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Focto-technology%2Fddui","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Focto-technology%2Fddui/lists"}