{"id":13738484,"url":"https://github.com/akanz1/klib","last_synced_at":"2026-02-01T17:00:48.256Z","repository":{"id":42014693,"uuid":"249966008","full_name":"akanz1/klib","owner":"akanz1","description":"Easy to use Python library of customized functions for cleaning and analyzing data.","archived":false,"fork":false,"pushed_at":"2025-11-06T10:58:20.000Z","size":50067,"stargazers_count":520,"open_issues_count":4,"forks_count":56,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-11-06T12:25:53.747Z","etag":null,"topics":["data-analysis","data-cleaning","data-preprocessing","data-science","data-visualization","feature-selection","klib","python"],"latest_commit_sha":null,"homepage":"https://medium.com/p/97191d320f80","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/akanz1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":["akanz1"]}},"created_at":"2020-03-25T11:56:48.000Z","updated_at":"2025-11-06T10:58:23.000Z","dependencies_parsed_at":"2024-07-22T13:46:28.053Z","dependency_job_id":"de14bf5f-cf99-4b9f-802e-9b4517d20ce5","html_url":"https://github.com/akanz1/klib","commit_stats":{"total_commits":705,"total_committers":13,"mean_commits":54.23076923076923,"dds":"0.21843971631205672","last_synced_commit":"38f5af0a343f37c68b0fbac486fefd3293328b31"},"previous_names":[],"tags_count":39,"template":false,"template_full_name":null,"purl":"pkg:github/akanz1/klib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akanz1%2Fklib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akanz1%2Fklib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akanz1%2Fklib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akanz1%2Fklib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/akanz1","download_url":"https://codeload.github.com/akanz1/klib/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akanz1%2Fklib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28983424,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-01T16:29:42.054Z","status":"ssl_error","status_checked_at":"2026-02-01T16:29:41.428Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-cleaning","data-preprocessing","data-science","data-visualization","feature-selection","klib","python"],"created_at":"2024-08-03T03:02:23.749Z","updated_at":"2026-02-01T17:00:48.246Z","avatar_url":"https://github.com/akanz1.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/akanz1/klib/main/examples/images/header.png\" alt=\"klib Header\" width=\"859\" height=\"304\"\u003e\u003c/p\u003e\n\n[![Flake8 \u0026 PyTest](https://github.com/akanz1/klib/workflows/Flake8%20%F0%9F%90%8D%20PyTest%20%20%20%C2%B4/badge.svg)](https://github.com/akanz1/klib)\n[![Language](https://img.shields.io/github/languages/top/akanz1/klib)](https://pypi.org/project/klib/)\n[![Last Commit](https://badgen.net/github/last-commit/akanz1/klib/main)](https://github.com/akanz1/klib/commits/main)\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=akanz1_klib\u0026metric=alert_status)](https://sonarcloud.io/dashboard?id=akanz1_klib)\n[![Scrutinizer](https://scrutinizer-ci.com/g/akanz1/klib/badges/quality-score.png?b=main)](https://scrutinizer-ci.com/g/akanz1/klib/)\n[![codecov](https://codecov.io/gh/akanz1/klib/branch/main/graph/badge.svg)](https://codecov.io/gh/akanz1/klib)\n\n**klib** is a Python library for importing, cleaning, analyzing and preprocessing data. Explanations on key functionalities can be found on [Medium / TowardsDataScience](https://medium.com/@akanz) and in the [examples](examples) section. Additionally, there are great introductions and overviews of the functionality on [PythonBytes](https://pythonbytes.fm/episodes/show/240/this-is-github-your-pilot-speaking) or on [YouTube (Data Professor)](https://www.youtube.com/watch?v=URjJVEeZxxU).\n\n## Installation\n\nUse the package manager [pip](https://pip.pypa.io/en/stable/) to install klib.\n\n[![PyPI Version](https://img.shields.io/pypi/v/klib)](https://pypi.org/project/klib/)\n[![Downloads](https://pepy.tech/badge/klib/month)](https://pypi.org/project/klib/)\n\n```bash\npip install -U klib\n```\n\nAlternatively, to install this package with conda run:\n\n[![Conda Version](https://img.shields.io/conda/vn/conda-forge/klib)](https://anaconda.org/conda-forge/klib)\n[![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/klib.svg)](https://anaconda.org/conda-forge/klib)\n\n```bash\nconda install -c conda-forge klib\n```\n\n## Usage\n\n```python\nimport klib\nimport pandas as pd\n\ndf = pd.DataFrame(data)\n\n# klib.describe - functions for visualizing datasets\n- klib.cat_plot(df) # returns a visualization of the number and frequency of categorical features\n- klib.corr_mat(df) # returns a color-encoded correlation matrix\n- klib.corr_plot(df) # returns a color-encoded heatmap, ideal for correlations\n- klib.corr_interactive_plot(df, split=\"neg\").show() # returns an interactive correlation plot using plotly\n- klib.dist_plot(df) # returns a distribution plot for every numeric feature\n- klib.missingval_plot(df) # returns a figure containing information about missing values\n\n# klib.clean - functions for cleaning datasets\n- klib.data_cleaning(df) # performs datacleaning (drop duplicates \u0026 empty rows/cols, adjust dtypes,...)\n- klib.clean_column_names(df) # cleans and standardizes column names, also called inside data_cleaning()\n- klib.convert_datatypes(df) # converts existing to more efficient dtypes, also called inside data_cleaning()\n- klib.drop_missing(df) # drops missing values, also called in data_cleaning()\n- klib.mv_col_handling(df) # drops features with high ratio of missing vals based on informational content\n- klib.pool_duplicate_subsets(df) # pools subset of cols based on duplicates with min. loss of information\n```\n\n## Examples\n\nFind all available examples as well as applications of the functions in **klib.clean()** with detailed descriptions \u003ca href=\"https://github.com/akanz1/klib/tree/main/examples\"\u003ehere\u003c/a\u003e.\n\n```python\nklib.missingval_plot(df) # default representation of missing values in a DataFrame, plenty of settings are available\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/akanz1/klib/main/examples/images/example_mv_plot.png\" alt=\"Missingvalue Plot Example\" width=\"1000\" height=\"1091\"\u003e\u003c/p\u003e\n\n```python\nklib.corr_plot(df, split='pos') # displaying only positive correlations, other settings include threshold, cmap...\nklib.corr_plot(df, split='neg') # displaying only negative correlations\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/akanz1/klib/main/examples/images/example_corr_plot.png\" alt=\"Corr Plot Example\" width=\"720\" height=\"338\"\u003e\u003c/p\u003e\n\n```python\nklib.corr_plot(df, target='wine') # default representation of correlations with the feature column\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/akanz1/klib/main/examples/images/example_target_corr_plot.png\" alt=\"Target Corr Plot Example\" width=\"720\" height=\"600\"\u003e\u003c/p\u003e\n\n```python\nklib.corr_interactive_plot(df, split=\"neg\").show()\n\n# The interactive plot has the same parameters as the corr_plot, but with additional Plotly heatmap graph object kwargs.\nklib.corr_interactive_plot(df, split=\"neg\", zmax=0)\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://github.com/akanz1/klib/assets/124513922/27b79ece-dc22-410c-9816-fcc4cfc97edc\"  alt=\"Interactive Corr Plot Simple Example\" width=\"720\" height=\"600\"\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://github.com/akanz1/klib/assets/124513922/d986dc6c-84dc-4693-b276-6b4df7023e0f\" alt=\"Interactive Corr Plot with zmax kwarg Example\" width=\"720\" height=\"600\"\u003e\u003c/p\u003e\n\n```python\n#Since corr_interactive_plot returns a Graph Object Figure, it supports the update_layout chain method.\nklib.corr_interactive_plot(wine, split=\"neg\").update_layout(template=\"simple_white\")\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://github.com/akanz1/klib/assets/124513922/fb7ac102-6daf-4e30-a24c-db168c7cf42c\" alt=\"Interactive Corr Plot Chained Example\" width=\"720\" height=\"600\"\u003e\u003c/p\u003e\n\n```python\nklib.dist_plot(df) # default representation of a distribution plot, other settings include fill_range, histogram, ...\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/akanz1/klib/main/examples/images/example_dist_plot.png\" alt=\"Dist Plot Example\" width=\"910\" height=\"130\"\u003e\u003c/p\u003e\n\n```python\nklib.cat_plot(data, top=4, bottom=4) # representation of the 4 most \u0026 least common values in each categorical column\n```\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/akanz1/klib/main/examples/images/example_cat_plot.png\" alt=\"Cat Plot Example\" width=\"1000\" height=\"1000\"\u003e\u003c/p\u003e\n\nFurther examples, as well as applications of the functions in **klib.clean()** can be found \u003ca href=\"https://github.com/akanz1/klib/tree/main/examples#data-cleaning-and-aggretation\"\u003ehere\u003c/a\u003e.\n\n## Contributing\n\n[![Open in Visual Studio Code](https://open.vscode.dev/badges/open-in-vscode.svg)](https://open.vscode.dev/akanz1/klib)\n\nPull requests and ideas, especially for further functions are welcome. For major changes or feedback, please open an issue first to discuss what you would like to change.\n\n## License\n\n[MIT](https://choosealicense.com/licenses/mit/)\n","funding_links":["https://github.com/sponsors/akanz1"],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakanz1%2Fklib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakanz1%2Fklib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakanz1%2Fklib/lists"}