{"id":24310312,"url":"https://github.com/giordanodaloisio/demv","last_synced_at":"2026-04-29T16:37:47.460Z","repository":{"id":45146402,"uuid":"444145512","full_name":"giordanoDaloisio/demv","owner":"giordanoDaloisio","description":"Debiaser for Multiple Variables, a model- and data- agnostic method to improve fairness in binary and multi-class classification tasks","archived":false,"fork":false,"pushed_at":"2024-04-22T09:44:27.000Z","size":8667,"stargazers_count":0,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-18T23:16:23.216Z","etag":null,"topics":["bias-mitigation","numpy","pandas","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/giordanoDaloisio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-03T17:29:28.000Z","updated_at":"2024-04-22T09:44:31.000Z","dependencies_parsed_at":"2023-02-16T20:01:08.097Z","dependency_job_id":"ac832e40-632e-463d-9295-739144bfcf00","html_url":"https://github.com/giordanoDaloisio/demv","commit_stats":{"total_commits":63,"total_committers":3,"mean_commits":21.0,"dds":0.2222222222222222,"last_synced_commit":"8cc9bf88a15830a6c96e009338986a867999d7bf"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/giordanoDaloisio/demv","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giordanoDaloisio%2Fdemv","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giordanoDaloisio%2Fdemv/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giordanoDaloisio%2Fdemv/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giordanoDaloisio%2Fdemv/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/giordanoDaloisio","download_url":"https://codeload.github.com/giordanoDaloisio/demv/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/giordanoDaloisio%2Fdemv/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272014563,"owners_count":24858725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-25T02:00:12.092Z","response_time":1107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bias-mitigation","numpy","pandas","python"],"created_at":"2025-01-17T06:11:13.877Z","updated_at":"2026-04-29T16:37:47.405Z","avatar_url":"https://github.com/giordanoDaloisio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DEMV : Debiaser for Multiple Variables\n\n![GitHub last commit](https://img.shields.io/github/last-commit/giordanoDaloisio/demv2022?style=for-the-badge) [![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg?style=for-the-badge)](https://www.gnu.org/licenses/agpl-3.0)\n\n## Table of contents\n\n- [Installation](#installation)\n- [Citation request](#citation-request)\n- [General info](#general-info)\n- [DEMV class description](#demv-class-description)\n  - [Attributes](#attributes)\n  - [Methods](#methods)\n  - [Example usage](#example-usage)\n\n## Installation\n\n### Pip\n\nThe easiest way to install DEMV is from the PyPI repository:\n\n```shell\npip install demv\n```\n\n### Manual installation\n\nClone this repository and then install the following libraries:\n\n- `pandas`\n- `numpy`\n- `scikit-learn`\n\nThe source code of DEMV is inside the `DEMV` folder.\n\n## Citation request\n\nPlease cite our papers if you use DEMV in your experiments:\n\n_Giordano d’Aloisio, Andrea D’Angelo, Antinisca Di Marco, Giovanni Stilo, Debiaser for Multiple Variables to enhance fairness in classification tasks, Information Processing \u0026 Management,\nVolume 60, Issue 2, 2023, 103226, ISSN 0306-4573, \u003chttps://doi.org/10.1016/j.ipm.2022.103226\u003e_\n\n```bibtex\n@article{daloisio_debiaser_2023,\ntitle = {Debiaser for Multiple Variables to enhance fairness in classification tasks},\njournal = {Information Processing \u0026 Management},\nvolume = {60},\nnumber = {2},\npages = {103226},\nyear = {2023},\nissn = {0306-4573},\ndoi = {https://doi.org/10.1016/j.ipm.2022.103226},\nurl = {https://www.sciencedirect.com/science/article/pii/S0306457322003272},\nauthor = {Giordano d’Aloisio and Andrea D’Angelo and Antinisca {Di Marco} and Giovanni Stilo},\nkeywords = {Machine learning, Bias and Fairness, Multi-class classification, Preprocessing algorithm, Equality},\n}\n```\n\n_d’Aloisio, G., Stilo, G., Di Marco, A., D’Angelo, A. (2022). Enhancing Fairness in Classification Tasks with Multiple Variables: A Data- and Model-Agnostic Approach. In: Boratto, L., Faralli, S., Marras, M., Stilo, G. (eds) Advances in Bias and Fairness in Information Retrieval. BIAS 2022. Communications in Computer and Information Science, vol 1610. Springer, Cham. \u003chttps://doi.org/10.1007/978-3-031-09316-6_11\u003e_\n\n```bibtex\n@inproceedings{d2022enhancing,\n  title={Enhancing Fairness in Classification Tasks with Multiple Variables: A Data-and Model-Agnostic Approach},\n  author={d’Aloisio, Giordano and Stilo, Giovanni and Di Marco, Antinisca and D’Angelo, Andrea},\n  booktitle={International Workshop on Algorithmic Bias in Search and Recommendation},\n  pages={117--129},\n  year={2022},\n  organization={Springer}\n}\n```\n\n## General info\n\nDEMV is a Debiaser for Multiple Variables that aims to increase Fairness in any given dataset, both binary and categorical, with one or more sensitive variables, while keeping the accuracy of the classifier as high as possible.\nThe main idea behind the proposed method is that to enhance the classifier’s fairness during pre-processing effectively is necessary to consider all possible combinations of the values of the sensitive variables and the label’s values for the definition of the so-called _sensitive groups_.\n\nWe approach the problem by recursively identifying all the possible groups given by combining all the values of the sensible variables with the belonging label (class). Next, for each group, we compute its expected (𝑊𝑒𝑥𝑝) and observed (𝑊𝑜𝑏𝑠) sizes and look at the ratio among these two values. If 𝑊𝑒𝑥𝑝/𝑊𝑜𝑏𝑠 = 1, it implies that the group is fully balanced. Otherwise, if the ratio is less than one, the group size is larger than expected, so we must remove an\nelement from the considered group accordingly to a chosen deletion strategy. Finally, if the ratio is greater than one, the group is smaller than expected, so we have to add another item accordingly to a generation strategy. For each group, we recursively repeat this balancing operation until 𝑊𝑒𝑥𝑝/𝑊𝑜𝑏𝑠 converge to one. It is worth noting that, in order to keep a high level of accuracy, the new items added to a group should be coherent in their values with the already existing ones.\n\nThe papers describing our work are available at:\n\n- \u003chttps://doi.org/10.1016/j.ipm.2022.103226\u003e\n- \u003chttp://dx.doi.org/10.1007/978-3-031-09316-6_11\u003e ([pdf](https://www.researchgate.net/profile/Giordano-Daloisio/publication/361406303_Enhancing_Fairness_in_Classification_Tasks_with_Multiple_Variables_A_Data-_and_Model-Agnostic_Approach/links/6357a1ca8d4484154a32cf02/Enhancing-Fairness-in-Classification-Tasks-with-Multiple-Variables-A-Data-and-Model-Agnostic-Approach.pdf)).\n\n## DEMV class description\n\n### Attributes\n\n- `round_level : float`\n\n  Tolerance value to balance the sensitive groups\n\n- `debug : bool`\n\n  Prints w_exp/w_obs, useful for debugging\n\n- `stop : int`\n\n  Maximum number of balance iterations\n\n- `iter : int`\n\n  Maximum number of iterations\n\n### Methods\n\n- `__init__(self, sensitive_vars, round_level=1, stop=10000, verbose=False)`\n\n      Args\n      ----------\n        sensitive_vars : list\n            List of sensitive variable names\n        round_level : float, optional\n            Tolerance value to balance the sensitive groups (default is 1)\n        stop : int, optional\n            Maximum number of iterations to balance the sensitive groups (default is 10000)\n        verbose : bool, optional\n            Prints w_exp/w_obs, useful for debugging (default is False)\n\n- `fit(self, x: pd.DataFrame, y: np.ndarray)`\n\n  Balances the dataset's sensitive groups\n\n        Args\n        ----------\n        x : pd.DataFrame\n            Dataset to be balanced\n        y : array-like\n            Labels of the dataset\n\n        Returns\n        -------\n         x: Balanced dataset\n         y: Balanced labels of the dataset\n\n- `transform(self, x: pd.DataFrame, y: np.ndarray)`\n\n  Balances the dataset's sensitive groups\n\n        Args\n        ----------\n        x : pd.DataFrame\n            Dataset to be balanced\n        y : array-like\n            Labels of the dataset\n\n        Returns\n        -------\n         x: Balanced dataset\n         y: Balanced labels of the dataset\n\n- `fit_transform(self, x: pd.DataFrame, y: np.ndarray)`\n\n  Balances the dataset's sensitive groups\n\n        Args\n        ----------\n        x : pd.DataFrame\n            Dataset to be balanced\n        y : array-like\n            Labels of the dataset\n\n        Returns\n        -------\n         x: Balanced dataset\n         y: Balanced labels of the dataset\n\n- `get_iters(self)`\n\n      Gets the maximum number of iterations\n\n        Returns\n        -------\n        int:\n            maximum number of iterations\n\n- `get_disparities(self)`\n  Returns the list of w_exp/w_obs\n\n        Returns:\n        list: list of disparities values\n\n### Example usage\n\nIn the following we show an example usage of our algorithm:\n\n```python\nfrom demv import DEMV\nimport pandas as pd\n\ndf = pd.read_csv('some_data.csv')\nprotected_attrs = ['s1','s2']\nlabel = 'l'\n\ndemv = DEMV(sensitive_vars = protected_attrs, round_level = 1)\nx = df.drop(label, axis=1)\ny = df[label]\nx_new, y_new = demv.fit_transform(x, y)\nprint('Maximum number of iterations: ',demv.get_iters())\n```\n\n## Credits\n\nThe original paper was written by Giordano d'Aloisio, Giovanni Stilo, Antinisca di Marco and Andrea D'Angelo.\nThis work is partially supported by Territori Aperti a project funded by Fondo Territori Lavoro e Conoscenza CGIL CISL UIL, by SoBigData-PlusPlus H2020-INFRAIA-2019-1 EU project, contract number 871042 and by “FAIR-EDU: Promote FAIRness in EDUcation institutions” a project founded by the University of L’Aquila. All the numerical simulations have been realized mostly on the Linux HPC cluster Caliban of the High-Performance Computing Laboratory of the Department of Information Engineering, Computer Science and Mathematics (DISIM) at the University of L’Aquila.\n\n## License\n\nThis work is licensed under AGPL 3.0 license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgiordanodaloisio%2Fdemv","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgiordanodaloisio%2Fdemv","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgiordanodaloisio%2Fdemv/lists"}