{"id":15898692,"url":"https://github.com/facultyai/ipydataclean","last_synced_at":"2025-08-26T06:39:34.224Z","repository":{"id":62571574,"uuid":"127296800","full_name":"facultyai/ipydataclean","owner":"facultyai","description":"Interactive cleaning for Pandas DataFrames","archived":false,"fork":false,"pushed_at":"2019-11-29T15:20:53.000Z","size":64,"stargazers_count":15,"open_issues_count":1,"forks_count":3,"subscribers_count":17,"default_branch":"master","last_synced_at":"2025-07-25T09:40:04.015Z","etag":null,"topics":["data-cleaning","data-science","dataframe","jupyter-notebook","pandas"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/ipydataclean/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facultyai.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-29T13:38:38.000Z","updated_at":"2023-01-18T14:14:17.000Z","dependencies_parsed_at":"2022-11-03T17:59:27.122Z","dependency_job_id":null,"html_url":"https://github.com/facultyai/ipydataclean","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/facultyai/ipydataclean","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facultyai%2Fipydataclean","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facultyai%2Fipydataclean/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facultyai%2Fipydataclean/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facultyai%2Fipydataclean/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facultyai","download_url":"https://codeload.github.com/facultyai/ipydataclean/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facultyai%2Fipydataclean/sbom","scorecard":{"id":391175,"data":{"date":"2025-08-11","repo":{"name":"github.com/facultyai/ipydataclean","commit":"43fedcd2277a6f469d8e8c4c379b7b6e8a306056"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":4,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Code-Review","score":3,"reason":"Found 2/6 approved changesets -- score normalized to 3","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.txt:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE.txt:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 14 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-18T17:51:10.431Z","repository_id":62571574,"created_at":"2025-08-18T17:51:10.431Z","updated_at":"2025-08-18T17:51:10.431Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272186209,"owners_count":24888332,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-26T02:00:07.904Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-cleaning","data-science","dataframe","jupyter-notebook","pandas"],"created_at":"2024-10-06T10:07:33.789Z","updated_at":"2025-08-26T06:39:34.207Z","avatar_url":"https://github.com/facultyai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"ipydataclean\n============\n\nJupyter notebook extension and python library for interactive cleaning of\npandas DataFrames with a selection of techniques, from simple replacements of\nmissing values to imputation with a Restricted Boltzmann Machine.\n\nInstallation\n------------\n\n.. code-block:: bash\n\n    pip install ipydataclean\n    jupyter nbextension enable dataclean --py --sys-prefix\n\nUsage\n-----\n\nUse your Jupyter notebook as normal. When a pandas DataFrame is present in your\npython kernel you should see a new notification on the Data Cleaner icon in\nyour toolbar. DataFrames with names beginning with an underscore will be\nignored.\n\n.. figure:: https://user-images.githubusercontent.com/29061040/37827637-30cf156a-2e90-11e8-9b84-81a41cf94898.png\n   :width: 25 %\n   :alt: Data Cleaner toolbar icon.\n\n   Data Cleaner toolbar icon.\n\nClicking on the icon will open a floating window containing a summary of the\nDataFrames in your kernel. Clicking on the name of one of these DataFrames will\nshow some of the Data Cleaner controls and some summary statistics on the\nDataFrame columns.\n\n.. figure:: https://user-images.githubusercontent.com/29061040/37827939-520b095e-2e91-11e8-8a85-a4d8cb0dfed1.png\n   :width: 25 %\n   :alt: Data Cleaner window.\n\n   Data Cleaner window.\n\nClicking on the name of one of these columns will show data cleaning tools\nspecific to that column, with a histogram or bar chart showing the distribution\nof these values. As you create a step the effect that this will have on the\ndata distribution is shown as a preview.\n\n.. figure:: https://user-images.githubusercontent.com/29061040/37828167-169edb9c-2e92-11e8-88cd-f918d2c498df.png\n   :width: 50 %\n   :alt: Creating a data cleaning step on a column.\n\n   Creating a data cleaning step on a column.\n\nYou can also choose to fill in missing and mistyped values in your DataFrame\nwith a Restricted Boltzmann Machine. This uses the boltzmannclean package.\n\n.. figure:: https://user-images.githubusercontent.com/29061040/37828870-d096628e-2e94-11e8-9291-511fab3bdf7a.png\n   :width: 40 %\n   :alt: Creating a Restricted Boltzmann Machine cleaning step.\n\n   Creating a Restricted Boltzmann Machine cleaning step.\n\nOnce you create your steps they are added to a processing pipeline which can be\nviewed in the \"Pipeline\" widget.\n\n.. figure:: https://user-images.githubusercontent.com/29061040/37829003-4488afda-2e95-11e8-9995-9ebc1348d2bf.png\n   :width: 40 %\n   :alt: A data cleaning pipeline.\n\n   A data cleaning pipeline.\n\nThese steps can be modified or deleted using these controls, and when ready the\npipeline can be executed on the dataframe or output to code. Executing your\npipeline will create a new DataFrame with the suffix \"_cleaned\" in your kernel,\nwhile exporting will create a new code cell in your notebook defining a python\nfunction which will carry out the pipeline cleaning steps.\n\n.. figure:: https://user-images.githubusercontent.com/29061040/37829131-bf920dd4-2e95-11e8-9e77-aaa3533c2095.png\n   :width: 40 %\n   :alt: An exported pipeline.\n\n   An exported pipeline.\n\nCaveats\n-------\n\nDuplicated or non string column names are not supported.\n\nFor DataFrames over 1000 rows, a sample of 1000 rows will be used for\npreviewing and creating your processing pipeline, with the whole DataFrame only\noperated on when the pipeline is executed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacultyai%2Fipydataclean","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacultyai%2Fipydataclean","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacultyai%2Fipydataclean/lists"}