{"id":20580267,"url":"https://github.com/maif/eurybia","last_synced_at":"2025-10-08T16:11:00.873Z","repository":{"id":37450274,"uuid":"487858444","full_name":"MAIF/eurybia","owner":"MAIF","description":"⚓ Eurybia monitors model drift over time and securizes model deployment with data validation ","archived":false,"fork":false,"pushed_at":"2024-10-24T09:42:04.000Z","size":34499,"stargazers_count":213,"open_issues_count":11,"forks_count":25,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-09-25T15:00:11.686Z","etag":null,"topics":["data-drift","data-validation","datadrift-classifier","domain-classifier","drift","drift-detection","html-report","machine-learning","model-drift","model-monitoring","production-machine-learning","python"],"latest_commit_sha":null,"homepage":"https://maif.github.io/eurybia/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MAIF.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-02T13:36:59.000Z","updated_at":"2025-09-09T12:30:50.000Z","dependencies_parsed_at":"2024-10-19T16:41:08.387Z","dependency_job_id":"89b7441f-cc63-414f-a153-1d581ecaad01","html_url":"https://github.com/MAIF/eurybia","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/MAIF/eurybia","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAIF%2Feurybia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAIF%2Feurybia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAIF%2Feurybia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAIF%2Feurybia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MAIF","download_url":"https://codeload.github.com/MAIF/eurybia/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MAIF%2Feurybia/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278973248,"owners_count":26078193,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-drift","data-validation","datadrift-classifier","domain-classifier","drift","drift-detection","html-report","machine-learning","model-drift","model-monitoring","production-machine-learning","python"],"created_at":"2024-11-16T06:22:13.996Z","updated_at":"2025-10-08T16:11:00.851Z","avatar_url":"https://github.com/MAIF.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003c!-- Tests --\u003e\n  \u003ca href=\"https://github.com/MAIF/eurybia/workflows/Build%20%26%20Test/badge.svg\"\u003e\n    \u003cimg src=\"https://github.com/MAIF/eurybia/workflows/Build%20%26%20Test/badge.svg\" alt=\"tests\"\u003e\n  \u003c/a\u003e\n  \u003c!-- PyPi --\u003e\n  \u003ca href=\"https://img.shields.io/pypi/v/eurybia\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/eurybia\" alt=\"pypi\"\u003e\n  \u003c/a\u003e\n  \u003c!-- Python Version --\u003e\n  \u003ca href=\"https://img.shields.io/pypi/pyversions/eurybia\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/pyversions/eurybia\" alt=\"pyversion\"\u003e\n  \u003c/a\u003e\n  \u003c!-- License --\u003e\n  \u003ca href=\"https://img.shields.io/pypi/l/eurybia\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/l/eurybia\" alt=\"license\"\u003e\n  \u003c/a\u003e\n  \u003c!-- Doc --\u003e\n  \u003ca href=\"https://eurybia.readthedocs.io/en/latest/\"\u003e\n    \u003cimg src=\"https://readthedocs.org/projects/eurybia/badge/?version=latest\" alt=\"doc\"\u003e\n  \u003c/a\u003e\n  \u003c!-- Pre-commit --\u003e\n  \u003ca href=\"https://github.com/pre-commit/pre-commit\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\" alt=\"pre-commit\"\u003e\n  \u003c/a\u003e\n\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia-fond-clair.png?raw=true\" width=\"300\" title=\"eurybia-logo\"\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://eurybia.readthedocs.io/en/latest/report.html\"\u003eView Demo\u003c/a\u003e\n    ·\n    \u003ca href=\"https://eurybia.readthedocs.io/en/latest/\"\u003eDocumentation\u003c/a\u003e\n    ·\n    \u003ca href=\"https://medium.com/oss-by-maif/eurybia-maif-releases-a-new-open-source-solution-for-quality-ia-models-in-production-57bd0266a77e\"\u003eEurybia Quick Tour\u003c/a\u003e\n    ·\n    \u003ca href=\"https://www.kdnuggets.com/2022/07/detecting-data-drift-ensuring-production-ml-model-quality-eurybia.html\"\u003eTutorial Article\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\n## 🔍 Overview\n\n\n**Eurybia** is a Python library which aims to help in :\n  - **Detecting data drift and model drift**\n  - **Validate data** before putting a model in production.\n\nEurybia addresses challenges of **industrialisation** and **maintainability** of machine learning models over time.\nThus, it contributes for better model monitoring, model auditing and more generally AI governance.\n\nTo do so, Eurybia generates an HTML report:\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/report_scrolling.gif?raw=true\" width=\"800\" title=\"eurybia_gif\"\u003e\n\u003c/p\u003e\n\n## 🕐 Quickstart\n\nThe 3 steps to display results:\n\n- Step 1: Declare SmartDrift Object\n  \u003e you need to pass at least 2 pandas DataFrames in order to instantiate the SmartDrift class (Current or production dataset, baseline or training dataset)\n\n```python\nfrom eurybia import SmartDrift\n\nsd = SmartDrift(\n    df_current=df_current,\n    df_baseline=df_baseline,\n    deployed_model=my_model,  # Optional: put in perspective result with importance on deployed model\n    encoding=my_encoder,  # Optional: if deployed_model and encoder to use this model\n    dataset_names={\n        \"df_current\": \"Current dataset Name\",\n        \"df_baseline\": \"Baseline dataset Name\",\n    },  # Optional: Names for outputs\n)\n```\n\n- Step 2: Compile Model\n  \u003e There are different ways to compile the SmartDrift object\n\n```python\nsd.compile(\n    full_validation=True,  # Optional: to save time, leave the default False value. If True, analyze consistency on modalities between columns.\n    date_compile_auc=\"01/01/2022\",  # Optional: useful when computing the drift for a time that is not now\n    datadrift_file=\"datadrift_auc.csv\",  # Optional: name of the csv file that contains the performance history of data drift\n)\n```\n\n- Step 3: Generate report\n  \u003e The report's content will be enriched if you provided the datascience model (deployed) and its encoder.\n  Note that providing the deployed_model and encoding will only produce useful results if the datasets are both usable by the model (i.e. all features are present, dtypes are correct, etc).\n\n```python\nsd.generate_report(\n    output_file=\"output/my_report_name.html\",\n    title_story=\"my_report_title\",\n    title_description=\"my_report_subtitle\",  # Optional: add a subtitle to describe report\n    project_info_file=\"project_info.yml\",  # Optional: add information on report\n)\n```\n\n[Report Example](https://eurybia.readthedocs.io/en/latest/report.html)\n\n## 🛠 Installation\n\nEurybia is intended to work with Python versions 3.9 to 3.12. Installation can be done with pip:\n\n```\npip install eurybia\n```\n\nIf you encounter **compatibility issues** you may check the corresponding section in the Eurybia documentation [here](https://eurybia.readthedocs.io/en/latest/installation-instructions/index.html).\n\n## 🔥 Features\n\n- Display clear and understandable insightful report :\n\n\u003cp align=\"center\"\u003e\n  \u003cimg align=\"left\" src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia_features_importance.PNG?raw=true\" width=\"28%\"/\u003e\n  \u003cimg src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia_scatter_plot.PNG?raw=true\" width=\"28%\" /\u003e\n  \u003cimg align=\"right\" src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia_auc_plot.PNG?raw=true\" width=\"20%\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg align=\"left\" src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia_contribution_plot.PNG?raw=true\" width=\"28%\" /\u003e\n  \u003cimg src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia-fond-clair.png?raw=true\" width=\"15%\" /\u003e\n  \u003cimg align=\"right\" src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia_univariate_continuous.PNG?raw=true\" width=\"28%\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg align=\"left\" src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia_contribution_violin.PNG?raw=true\" width=\"28%\" /\u003e\n  \u003cimg src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia_univariate_categorial.PNG?raw=true\" width=\"28%\" /\u003e\n  \u003cimg align=\"right\" src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/eurybia_auc_evolution.PNG?raw=true\" width=\"28%\" /\u003e\n\u003c/p\u003e\n\n\n- Allow Data Scientists to quickly explore drift thanks to **dynamic reports** to easily navigate between drift detection and datasets features.\n\n**In a nutshell** :\n\n- Monitoring drift using a scheduler (like Airflow)\n\n- Evaluate level of data drift\n\n- Facilitate collaboration between data analysts and data scientists, and easily share and discuss results with non-Data users\n\n**More precisely** :\n- **Render** data drift and model drift over time through :\n    - Feature importance: features that discriminate the most the two datasets\n    - Scatter plot: Feature importance relatively to the drift importance\n    - Dataset analysis: distribution comparison between variable from the baseline dataset and the newest one\n    - Predicted values analysis: distribution comparison between targets from the baseline dataset and the newest one\n    - Performance of the data drift classifier\n    - Features contribution for the data drift classifier\n    - AUC evolution: comparison of data drift classifier at different period.\n    - Model performance evolution: your model performances over time\n\n## 📢 Why we made Eurybia\n\nThe visualization of the life cycle of a machine learning model can ease the understanding of Eurybia importance. During their life, ML models go through the following phases: Model learning, Model deployment, Model monitoring.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/lifecycle_ml_models.png?raw=true\" width=\"90%\" /\u003e\n\u003c/p\u003e\n\nLet's respectively name features, target and prediction of a model X, Y and P(X, Y). P(X, Y) can be decompose as : P(X, Y) = P(Y|X)P(X), with P(Y|X), the conditional probability of ouput given the model features, and P(X) the probability density of the model features.\n\nData Validation : Validate that data used for production prediction are similar to training data or test data before deployment. With formulas, P(Xtraining) similar to P(XtoDeploy) \u003cbr /\u003e\nData drift : Evolution of the production data over time compared to training or test data before deployment. With formulas, compare P(Xtraining) to P(XProduction) \u003cbr /\u003e\nModel drift : Model performances' evolution over time due to change in the target feature statistical properties (Concept drift), or due to change in data (Data drift). With formulas, when change in P(Y|XProduction) compared to P(Y|Xtraining) is concept drift. And change in P(Y,XProduction) compared to P(Y,Xtraining) is model drift\n\nEurybia helps data analysts and data scientists to collaborate through a report that allows them to exchange on drift monitoring and data validation before deploying model into production. \u003cbr /\u003e\nEurybia also contributes to data science auditing by displaying usefull information about any model and data in a unique report.\n\n## ⚙️ How Eurybia detect data drift\n\n**Eurybia** works mainly with a binary classification model (named datadrift classifier) that tries to predict whether a sample belongs to the training dataset (or baseline dataset) or to the production dataset (or current dataset).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/MAIF/eurybia/blob/master/docs/_static/data_drift_detection.png?raw=true\" width=\"90%\" /\u003e\n\u003c/p\u003e\n\nAs shown below on the diagram, there are 2 datasets, the baseline and the current one. Those datasets are those we wish to compare in order to assess if data drift occurred. On the first one we create a column named “target”, it will be filled only with 0, on the other hand on the second dataset we also add this column, but this time it will be filled only with 1 values. \u003cbr /\u003e\nOur goal is to build a binary classification model on top of those 2 datasets (concatenated). Once trained, this model will be helpful to tell if there is any data drift. To do so we are looking at the model performance through AUC metric. The greater the AUC the greater the drift is. (AUC = 0.5 means no data drift and AUC close to 1 means data drift is occuring)\n\nThe explainability of this datadrift classifier allows to prioritise features that are important for drift and to focus on those that have the most impact on the model in production.\n\nTo use Eurybia to monitor drift over time, you can use a scheduler to make computations automatically and periodically. \u003cbr /\u003e\nOne of the schedulers you can use is Apache Airflow. To use it, you can read the [official documentation](https://airflow.apache.org/) and read blogs like this one: [Getting started with Apache Airflow](https://towardsdatascience.com/getting-started-with-apache-airflow-df1aa77d7b1b)\n\n## 🔬 Built With\nThis section list libraries used in Eurybia.\n- [Shapash](https://github.com/MAIF/shapash/tree/master/shapash)\n- [Panel](https://github.com/holoviz/panel)\n- [Plotly](https://github.com/plotly/plotly.py)\n- [Catboost](https://github.com/catboost/catboost)\n\n## 📖  Tutorials\n\nThis github repository offers a lot of tutorials to let you to quickly start using Eurybia.\n\n### Overview\n- [Overview to compile Eurybia and generate Report](tutorial/tutorial01-Eurybia-overview.ipynb)\n\n### Validate Data before model deployment\n- [**Eurybia** Data Validation](tutorial/data_validation/tutorial01-data-validation.ipynb)\n- [Validate data in production for model deployment](tutorial/data_validation/tutorial02-data-validation-iteration.ipynb)\n\n### Measure and analyze Data drift\n- [**Eurybia** to monitor Data Drift](tutorial/data_drift/tutorial01-datadrift-over-years.ipynb)\n- [Detect high data drift over years](tutorial/data_drift/tutorial02-datadrift-high-datadrift.ipynb)\n\n### Measure and analyze Model drift\n- [**Eurybia** to detect Model Drift](tutorial/model_drift/tutorial01-modeldrift.ipynb)\n- [Detect high model drift over years](tutorial/model_drift/tutorial02-modeldrift-high-datadrift.ipynb)\n\n### More details about report and plots\n- [Customize colors in report and plots](tutorial/common/tuto-common01-colors.ipynb)\n- [Use **Shapash** Webapp to understand datadrift classifier](tutorial/common/tuto-common02-shapash-webapp.ipynb)\n\n## 🔭 Roadmap\n- [ ] Concept Drift\n\nDetecting drift concept and get analyses and explainability of this drift. An issue is open: [Add Concept Drift](https://github.com/MAIF/eurybia/issues/8)\n- [ ] API mode\n\nAdapting Eurybia for models consumed in API mode. An issue is open: [Adapt Eurybia to API mode](https://github.com/MAIF/eurybia/issues/9)\n\nIf you want to contribute, you can contact us in the [discussion tab](https://github.com/MAIF/eurybia/discussions)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaif%2Feurybia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaif%2Feurybia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaif%2Feurybia/lists"}