{"id":30957260,"url":"https://github.com/laureberti/learn2clean","last_synced_at":"2025-09-11T13:45:09.282Z","repository":{"id":37601898,"uuid":"178432205","full_name":"LaureBerti/Learn2Clean","owner":"LaureBerti","description":"Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning","archived":false,"fork":false,"pushed_at":"2022-12-26T20:53:17.000Z","size":36282,"stargazers_count":51,"open_issues_count":9,"forks_count":20,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-12T16:05:22.231Z","etag":null,"topics":["automated","data-cleaning","data-cleaning-pipeline","data-curation","data-preprocessing","reinforcement-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LaureBerti.png","metadata":{"files":{"readme":"readme.rst","changelog":null,"contributing":"docs/contributing.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-29T15:35:49.000Z","updated_at":"2025-02-25T20:23:32.000Z","dependencies_parsed_at":"2023-01-31T01:31:16.572Z","dependency_job_id":null,"html_url":"https://github.com/LaureBerti/Learn2Clean","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LaureBerti/Learn2Clean","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaureBerti%2FLearn2Clean","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaureBerti%2FLearn2Clean/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaureBerti%2FLearn2Clean/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaureBerti%2FLearn2Clean/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LaureBerti","download_url":"https://codeload.github.com/LaureBerti/Learn2Clean/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LaureBerti%2FLearn2Clean/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274648319,"owners_count":25324299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-11T02:00:13.660Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automated","data-cleaning","data-cleaning-pipeline","data-curation","data-preprocessing","reinforcement-learning"],"created_at":"2025-09-11T13:45:04.945Z","updated_at":"2025-09-11T13:45:09.271Z","avatar_url":"https://github.com/LaureBerti.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":".. image:: ./docs/images/learn2clean-text.png\n\n-----------------------\n\n**Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Data Cleaning**\n\n\n|Documentation Status| |PyPI version| |Build Status| |GitHub Issues| |codecov| |License|\n\n-----------------------\n\nLearn2Clean is a Python library for data preprocessing and cleaning based on Q-Learning, a model-free reinforcement learning technique. It selects, for a given dataset, a ML model, and a quality performance metric, the optimal sequence of tasks for preparing the data such that the quality of the ML model result is maximized. \n \nYou can try it for composing your own data preprocessing pipelines or for automizing data preparation before clustering, regression, and classification.\n\n\n.. image:: ./docs/images/figure_Learn2Clean.jpeg\n\n\n**For more details**, please refer to the paper presented at the Web Conf 2019 and the related tutorial.\n\n- Laure Berti-Equille. Learn2Clean: Optimizing the Sequence of Tasks for Web Data Preparation. Proceedings of the Web Conf 2019, San Francisco, May 2019. `Preprint \u003chttps://github.com/LaureBerti/Learn2Clean/tree/master/docs/publications/theWebConf2019-preprint.pdf\u003e`__ \n\n- Laure Berti-Equille. ML to Data Management: A Round Trip. Tutorial Part I, ICDE 2018. `Tutorial \u003chttps://github.com/LaureBerti/Learn2Clean/tree/master/docs/publications/tutorial_ICDE2018.pdf\u003e`__ \n\n\n--------------------------\n\nHow to Contribute\n=================\n\nLearn2Clean is a research prototype. Your help is very valuable to make it better for everyone.\n\n- Check out `call for contributions \u003chttps://github.com/LaureBerti/Learn2Clean/labels/call-for-contributions\u003e`__ to see what can be improved, or open an issue if you want something.\n- Contribute to the `tests \u003chttps://github.com/LaureBerti/Learn2Clean/tree/master/tests\u003e`__ to make it more reliable. \n- Contribute to the `documents \u003chttps://github.com/LaureBerti/Learn2Clean/tree/master/docs\u003e`__ to make it clearer for everyone.\n- Contribute to the `examples \u003chttps://github.com/LaureBerti/Learn2Clean/tree/master/examples\u003e`__ to share your experience with other users.\n- Open `issue \u003chttps://github.com/LaureBerti/Learn2Clean/issues\u003e`__ if you met problems during development.\n\nFor more details, please refer to `CONTRIBUTING \u003chttps://github.com/LaureBerti/Learn2Clean/blob/master/docs/contributing.rst\u003e`__.\n\n.. |Documentation Status| image:: https://readthedocs.org/projects/learn2clean/badge/?version=latest\n   :target: https://learn2clean.readthedocs.io/en/latest/\n.. |PyPI version| image:: https://badge.fury.io/py/learn2clean.svg\n   :target: https://pypi.python.org/pypi/learn2clean\n.. |Build Status| image:: https://travis-ci.org/LaureBerti/Learn2Clean.svg?branch=master\n   :target: https://travis-ci.org/LaureBerti/Learn2Clean\n.. |GitHub Issues| image:: https://img.shields.io/github/issues/LaureBerti/Learn2Clean.svg\n   :target: https://github.com/LaureBerti/Learn2Clean/issues\n.. |codecov| image:: https://codecov.io/gh/LaureBerti/Learn2Clean/branch/master/graph/badge.svg\n   :target: https://codecov.io/gh/LaureBerti/Learn2Clean\n.. |License| image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg\n   :target: https://github.com/LaureBerti/Learn2Clean/blob/master/LICENSE\n   \n\n--------------------------\n\nLicence \n=================\n\nLearn2Clean is licensed under the BSD 3-Clause \"New\" or \"Revised\" License.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaureberti%2Flearn2clean","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flaureberti%2Flearn2clean","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaureberti%2Flearn2clean/lists"}