Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ax-va/numpy-pandas-matplotlib-scikit-learn-vanderplas-2023
These examples provide an introduction to Data Science and classic Machine Learning using NumPy, Pandas, Matplotlib, and scikit-learn. They are taken, with some changes, from the book "Python Data Science Handbook: Essential Tools for Working with Data", Second Edition, written by Jake VanderPlas and published by O'Reilly Media in 2023.
https://github.com/ax-va/numpy-pandas-matplotlib-scikit-learn-vanderplas-2023
ax-va classic-machine-learning data-science machine-learning matplotlib numpy pandas python scikit-learn
Last synced: about 1 month ago
JSON representation
These examples provide an introduction to Data Science and classic Machine Learning using NumPy, Pandas, Matplotlib, and scikit-learn. They are taken, with some changes, from the book "Python Data Science Handbook: Essential Tools for Working with Data", Second Edition, written by Jake VanderPlas and published by O'Reilly Media in 2023.
- Host: GitHub
- URL: https://github.com/ax-va/numpy-pandas-matplotlib-scikit-learn-vanderplas-2023
- Owner: ax-va
- Created: 2023-04-23T14:59:06.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-07T13:03:11.000Z (6 months ago)
- Last Synced: 2024-07-07T14:26:06.933Z (6 months ago)
- Topics: ax-va, classic-machine-learning, data-science, machine-learning, matplotlib, numpy, pandas, python, scikit-learn
- Language: Python
- Homepage:
- Size: 36.8 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NumPy-Pandas-Matplotlib-Scikit-Learn-VanderPlas-2023
These examples provide an introduction to Data Science and classic Machine Learning using **NumPy**, **Pandas**, **Matplotlib**, and **scikit-learn**. They are taken, with some changes, from the book *"Python Data Science Handbook: Essential Tools for Working with Data"*, Second Edition, written by Jake VanderPlas and published by *O'Reilly Media* in 2023. Some datasets are also taken from the Jake VanderPlas' GitHub repositories https://github.com/jakevdp.
The content is divided in four separate parts consisting of
1. numpy
2. pandas
3. matplotlib
4. scikit-learnexamples, datasets, and figures.
My environment was Python 3.11 with the following packages and their dependencies (not listed here):
```
numpy==1.25.2
pandas==2.1.0
matplotlib==3.8.0
seaborn==0.12.2
scikit-learn==1.3.0
scikit-image==0.21.0
ipython==8.15.0 # optionally
```## Original code in Jupyter notebooks by Jake VanderPlas
https://github.com/jakevdp/PythonDataScienceHandbook## How to run Jupyter QtConsole:
1) Install PySide6 and qtconsole
2) Run in the terminal:
```unix
$ jupyter qtconsole
```## How to run IPython
Run in the terminal:
```unix
$ ipython
```## Use `%timeit` command in IPython
```ipython
from my_module import my_func
%timeit my_func(1, 2)
# 45.7 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
```## Get attributes, methods, and functions in `IPython`
```ipython
import my_module
my_module.
```## Get the source code in `IPython`
```ipython
from my_module import my_func
my_func??
# Signature: my_func(x, y)
# Source:
# def my_func(x, y):
# """
# It is my function
# """
# for i in range(1000):
# x += 1
# y += 1
# return x + y
# File: ~/PycharmProjects/NumPy-Pandas-Matplotlib-Scikit-learn-VanderPlas-2023/my_module.py
# Type: function
```## Get the description in `IPython`
```ipython
my_func?
# Signature: my_func(x, y)
# Docstring: It is my function
# File: ~/PycharmProjects/NumPy-Pandas-Matplotlib-Scikit-learn-VanderPlas-2023/my_module.py
# Type: function
```## Install `line_profiler`
```
pip install line_profiler
```## Use `line_profiler` in `IPython`
```ipython
from my_module import my_func
%load_ext line_profiler
%lprun -f my_func my_func(1, 2)
# Line # Hits Time Per Hit % Time Line Contents
# ==============================================================
# 6 def my_func(x, y):
# 7 """
# 8 It is my function
# 9 """
# 10 1000 1799776.0 1799.8 31.6 for i in range(1000):
# 11 1000 1879619.0 1879.6 33.0 x += 1
# 12 1000 2012540.0 2012.5 35.3 y += 1
# 13 1 2375.0 2375.0 0.0 return x + y
```## Install `memory_profiler`
```
pip install memory_profiler
```## Use `memory_profiler` in `IPython`
```ipython
from my_module import my_func
%load_ext memory_profiler
%memit my_func(1, 2)
# peak memory: 74.48 MiB, increment: 0.00 MiB
peak memory: 74.48 MiB, increment: 0.00 MiB
%mprun -f my_func my_func(1, 2)
# Line # Mem usage Increment Occurrences Line Contents
# =============================================================
# 6 82.1 MiB 82.1 MiB 1 def my_func(x, y):
# 7 """
# 8 It is my function
# 9 """
# 10 82.1 MiB 0.0 MiB 1001 for i in range(1000):
# 11 82.1 MiB 0.0 MiB 1000 x += 1
# 12 82.1 MiB 0.0 MiB 1000 y += 1
# 13 82.1 MiB 0.0 MiB 1 return x + y
```