{"id":18734696,"url":"https://github.com/elysian01/data-purifier-dataset","last_synced_at":"2026-01-24T20:54:59.924Z","repository":{"id":106594610,"uuid":"366722645","full_name":"Elysian01/Data-Purifier-Dataset","owner":"Elysian01","description":"Data repository for Data Purifier examples","archived":false,"fork":false,"pushed_at":"2021-08-22T06:49:26.000Z","size":6205,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-20T00:39:24.449Z","etag":null,"topics":["data-purifer","data-science","datase","ml-datasets","nlp-datasets"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Elysian01.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-12T13:20:25.000Z","updated_at":"2021-08-22T06:49:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"4d31a3ea-573b-4e7e-a5f0-4c1ec88b0607","html_url":"https://github.com/Elysian01/Data-Purifier-Dataset","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Elysian01/Data-Purifier-Dataset","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Elysian01%2FData-Purifier-Dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Elysian01%2FData-Purifier-Dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Elysian01%2FData-Purifier-Dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Elysian01%2FData-Purifier-Dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Elysian01","download_url":"https://codeload.github.com/Elysian01/Data-Purifier-Dataset/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Elysian01%2FData-Purifier-Dataset/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28736791,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-24T19:23:36.361Z","status":"ssl_error","status_checked_at":"2026-01-24T19:23:28.966Z","response_time":89,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-purifer","data-science","datase","ml-datasets","nlp-datasets"],"created_at":"2024-11-07T15:14:32.515Z","updated_at":"2026-01-24T20:54:59.907Z","avatar_url":"https://github.com/Elysian01.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data-Purifier-Dataset\nData repository for [Data-Purifier](https://pypi.org/project/data-purifier/) examples\n\nThis repository exists only to provide a convenient target for the datapurifier.load_dataset function to download sample datasets from. Its existence makes it easy to document datapurifier without confusing things by spending time loading and munging data. The datasets may change or be removed at any time if they are no longer useful for the datapurifier documentation. Some of the datasets have also been modifed from their canonical sources.\n\nData is sourced from kaggle \n\n## Get Started\n\nInstall the packages\n\n```bash\npip install data-purifier\n```\n\n```bash\npython -m spacy download en_core_web_sm\n```\n\nLoad the module\n```python\nimport datapurifier as dp\nfrom datapurifier import Mleda, Nleda, Nlpurifier\n\nprint(dp.__version__)\n```\n\nGet the list of the example dataset  \n```python\nprint(dp.get_dataset_names()) # to get all dataset names\nprint(dp.get_text_dataset_names()) # to get all text dataset names\n```\n\nLoad an example dataset, pass one of the dataset names from the example list as an argument.\n```python\ndf = dp.load_dataset(\"womens_clothing_e-commerce_reviews\")\n```\n\n\n## Example: \n[Colab Notebook](https://colab.research.google.com/drive/1J932G1uzqxUHCMwk2gtbuMQohYZsze8U?usp=sharing)\n\nOfficial Documentation: https://cutt.ly/CbFT5Dw\n\nPython Package: https://pypi.org/project/data-purifier/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felysian01%2Fdata-purifier-dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felysian01%2Fdata-purifier-dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felysian01%2Fdata-purifier-dataset/lists"}