Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gagolews/datawranglingpy
Minimalist Data Wrangling with Python (Open-Access Textbook)
https://github.com/gagolews/datawranglingpy
data-analysis data-science data-visualisation data-wrangling jupyter machine-learning matplotlib modelling numpy pandas python python3 scikit-learn scipy scipy-stats seaborn statistics
Last synced: 2 days ago
JSON representation
Minimalist Data Wrangling with Python (Open-Access Textbook)
- Host: GitHub
- URL: https://github.com/gagolews/datawranglingpy
- Owner: gagolews
- License: other
- Created: 2022-03-27T02:44:38.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2025-01-09T14:25:19.000Z (23 days ago)
- Last Synced: 2025-01-22T16:06:17.567Z (10 days ago)
- Topics: data-analysis, data-science, data-visualisation, data-wrangling, jupyter, machine-learning, matplotlib, modelling, numpy, pandas, python, python3, scikit-learn, scipy, scipy-stats, seaborn, statistics
- Homepage: https://datawranglingpy.gagolewski.com/
- Size: 292 MB
- Stars: 80
- Watchers: 5
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
Awesome Lists containing this project
README
# [Minimalist Data Wrangling with Python](https://datawranglingpy.gagolewski.com/)
*Minimalist Data Wrangling with Python* is envisaged as a student's first
introduction to data science, providing a high-level overview as well as
discussing key concepts in detail. We explore methods for
cleaning data gathered from different sources, transforming, selecting, and
extracting features, performing exploratory data analysis and dimensionality
reduction, identifying naturally occurring data clusters, modelling patterns in
data, comparing data between groups, and reporting the results.For many students around the world, educational resources are hardly
affordable. Therefore, I have decided that this book should remain
an independent, non-profit, open-access project. You can read it at:* (a browser-friendly version)
* (PDF)You can also order a
[paper copy](https://datawranglingpy.gagolewski.com/order-paper-copy.html).Whilst, for some people, the presence of a "designer tag" from a
major publisher might still be a proxy for quality, it is my hope
that this publication will prove useful to those who seek knowledge for
knowledge's sake.**Please spread the news about this project.**
Consider citing this book as:
[Gagolewski M.][1] (2025), *Minimalist Data Wrangling with Python*,
Melbourne,
DOI: [10.5281/zenodo.6451068](https://dx.doi.org/10.5281/zenodo.6451068),
ISBN: 978-0-6455719-1-2,
URL: .Any remarks and bug fixes are appreciated. Please submit them via
this repository's *Issues* tracker. Thank you.## About the Author
[Marek Gagolewski][1] is currently an Associate Professor
in Data Science at the Faculty of Mathematics and Information Science,
Warsaw University of Technology.His research interests are related to data science, in particular: modelling
complex phenomena, developing usable, general-purpose algorithms, studying
their analytical properties, and finding out how people use, misuse,
understand, and misunderstand methods of data analysis in scientific, business,
and decision-making settings.He is an author of ~100 publications, including journal papers
in outlets such as *Proceedings of the National Academy of Sciences (PNAS)*,
*Journal of Statistical Software*, *The R Journal*, *Journal of Classification*,
*Information Fusion*, *International Journal of Forecasting*,
*Statistical Modelling*, *Physica A: Statistical Mechanics and its Applications*,
*Information Sciences*, *Knowledge-Based Systems*,
*IEEE Transactions on Fuzzy Systems*, and *Journal of Informetrics*.In his "spare" time, he writes books for his students
(check out [*Deep R Programming*](https://deepr.gagolewski.com/))
and [develops](https://github.com/gagolews) open-source software for data analysis, such as
[`stringi`](https://stringi.gagolewski.com/) (one of the most often downloaded
R packages) and
[`genieclust`](https://genieclust.gagolewski.com/) (a fast and robust
hierarchical clustering algorithm in both Python and R).--------------------------------------------------------------------------------
Copyright (C) 2022–2025, [Marek Gagolewski][1]. Some rights reserved.
This material is licensed under the Creative Commons
[Attribution-NonCommercial-NoDerivatives 4.0 International][2] License
(CC BY-NC-ND 4.0).[1]: https://www.gagolewski.com/
[2]: https://creativecommons.org/licenses/by-nc-nd/4.0