{"id":14989231,"url":"https://github.com/gagolews/datawranglingpy","last_synced_at":"2025-04-09T13:08:56.832Z","repository":{"id":37692147,"uuid":"474517869","full_name":"gagolews/datawranglingpy","owner":"gagolews","description":"Minimalist Data Wrangling with Python (Open-Access Textbook)","archived":false,"fork":false,"pushed_at":"2025-02-28T10:15:21.000Z","size":302634,"stargazers_count":79,"open_issues_count":0,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-02T11:08:27.034Z","etag":null,"topics":["data-analysis","data-science","data-visualisation","data-wrangling","jupyter","machine-learning","matplotlib","modelling","numpy","pandas","python","python3","scikit-learn","scipy","scipy-stats","seaborn","statistics"],"latest_commit_sha":null,"homepage":"https://datawranglingpy.gagolewski.com/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gagolews.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-27T02:44:38.000Z","updated_at":"2025-03-14T09:16:02.000Z","dependencies_parsed_at":"2023-12-23T18:27:05.055Z","dependency_job_id":"b2911550-ed76-45ab-b736-7d43b3e0f818","html_url":"https://github.com/gagolews/datawranglingpy","commit_stats":{"total_commits":181,"total_committers":2,"mean_commits":90.5,"dds":"0.0055248618784530246","last_synced_commit":"0790da37b5fb85d4965945b673b43d74d109727d"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fdatawranglingpy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fdatawranglingpy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fdatawranglingpy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fdatawranglingpy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gagolews","download_url":"https://codeload.github.com/gagolews/datawranglingpy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248045233,"owners_count":21038553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-science","data-visualisation","data-wrangling","jupyter","machine-learning","matplotlib","modelling","numpy","pandas","python","python3","scikit-learn","scipy","scipy-stats","seaborn","statistics"],"created_at":"2024-09-24T14:17:54.414Z","updated_at":"2025-04-09T13:08:56.811Z","avatar_url":"https://github.com/gagolews.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- NOTE EDIT in *-src; *-public is a clone! --\u003e\n\n# [Minimalist Data Wrangling with Python](https://datawranglingpy.gagolewski.com/)\n\n\u003ca href=\"https://datawranglingpy.gagolewski.com/\"\u003e\u003cimg src=\"docs/_static/img/cover.png\" align=\"right\" height=\"225\" /\u003e\u003c/a\u003e\n\n*Minimalist Data Wrangling with Python* is envisaged as a student's first\nintroduction to data science, providing a high-level overview as well as\ndiscussing key concepts in detail. We explore methods for\ncleaning data gathered from different sources, transforming, selecting, and\nextracting features, performing exploratory data analysis and dimensionality\nreduction, identifying naturally occurring data clusters, modelling patterns in\ndata, comparing data between groups, and reporting the results.\n\nFor many students around the world, educational resources are hardly\naffordable. Therefore, I have decided that this book should remain\nan independent, non-profit, open-access project. You can read it at:\n\n* \u003chttps://datawranglingpy.gagolewski.com/\u003e (a browser-friendly version)\n* \u003chttps://datawranglingpy.gagolewski.com/datawranglingpy.pdf\u003e (PDF)\n\nYou can also order a\n[paper copy](https://datawranglingpy.gagolewski.com/order-paper-copy.html).\n\nWhilst, for some people, the presence of a \"designer tag\" from a\nmajor publisher might still be a proxy for quality, it is my hope\nthat this publication will prove useful to those who seek knowledge for\nknowledge's sake.\n\n\n**Please spread the news about this project.**\n\nConsider citing this book as:\n[Gagolewski M.][1] (2025), *Minimalist Data Wrangling with Python*,\nMelbourne,\nDOI: [10.5281/zenodo.6451068](https://dx.doi.org/10.5281/zenodo.6451068),\nISBN: 978-0-6455719-1-2,\nURL: \u003chttps://datawranglingpy.gagolewski.com/\u003e.\n\nAny remarks and bug fixes are appreciated. Please submit them via\nthis repository's *Issues* tracker. Thank you.\n\n\n\n## About the Author\n\n[Marek Gagolewski][1] is currently an Associate Professor\nin Data Science at the Faculty of Mathematics and Information Science,\nWarsaw University of Technology.\n\nHis research interests are related to data science, in particular: modelling\ncomplex phenomena, developing usable, general-purpose algorithms, studying\ntheir analytical properties, and finding out how people use, misuse,\nunderstand, and misunderstand methods of data analysis in scientific, business,\nand decision-making settings.\n\nHe is an author of ~100 publications, including journal papers\nin outlets such as *Proceedings of the National Academy of Sciences (PNAS)*,\n*Journal of Statistical Software*, *The R Journal*, *Journal of Classification*,\n*Information Fusion*, *International Journal of Forecasting*,\n*Statistical Modelling*, *Physica A: Statistical Mechanics and its Applications*,\n*Information Sciences*, *Knowledge-Based Systems*,\n*IEEE Transactions on Fuzzy Systems*, and *Journal of Informetrics*.\n\nIn his \"spare\" time, he writes books for his students\n(check out [*Deep R Programming*](https://deepr.gagolewski.com/))\nand [develops](https://github.com/gagolews) open-source software for data analysis, such as\n[`stringi`](https://stringi.gagolewski.com/) (one of the most often downloaded\nR packages) and\n[`genieclust`](https://genieclust.gagolewski.com/) (a fast and robust\nhierarchical clustering algorithm in both Python and R).\n\n\n--------------------------------------------------------------------------------\n\nCopyright (C) 2022–2025, [Marek Gagolewski][1]. Some rights reserved.\n\nThis material is licensed under the Creative Commons\n[Attribution-NonCommercial-NoDerivatives 4.0 International][2] License\n(CC BY-NC-ND 4.0).\n\n[1]: https://www.gagolewski.com/\n[2]: https://creativecommons.org/licenses/by-nc-nd/4.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgagolews%2Fdatawranglingpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgagolews%2Fdatawranglingpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgagolews%2Fdatawranglingpy/lists"}