Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jvns/pandas-cookbook

Recipes for using Python's pandas library
https://github.com/jvns/pandas-cookbook

Last synced: about 18 hours ago
JSON representation

Recipes for using Python's pandas library

Awesome Lists containing this project

README

        

Pandas cookbook
===============

Try it in your browser with Jupyter Lite: [![lite-badge](https://jupyterlite.rtfd.io/en/latest/_static/badge.svg)](https://jvns.github.io/pandas-cookbook/lab/index.html)

[pandas](http://pandas.pydata.org/) is a Python library for doing
data analysis. It's really fast and lets you do exploratory work
incredibly quickly.

The goal of this cookbook is to give you some concrete examples for
getting started with pandas. The [docs](http://pandas.pydata.org/pandas-docs/stable/)
are really comprehensive. However, I've often had people
tell me that they have some trouble getting started, so these are
examples with real-world data, and all the bugs and weirdness
that entails.

It uses 3 datasets:

* 311 calls in New York
* How many people were on Montréal's bike paths in 2012
* Montreal's weather for 2012, hourly

It comes with batteries (data) included, so you can try out all the
examples right away.

Table of Contents
=================

* [A quick tour of the Jupyter Notebook](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/A%20quick%20tour%20of%20%20Notebook.ipynb)

Shows off Jupyter's awesome tab completion and magic functions.
* [Chapter 1: Reading from a CSV](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%201%20-%20Reading%20from%20a%20CSV.ipynb)

Reading your data into pandas is pretty much the easiest thing. Even when the encoding is wrong!
* [Chapter 2: Selecting data & finding the most common complaint type](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%202%20-%20Selecting%20data%20&%20finding%20the%20most%20common%20complaint%20type.ipynb)

It's not totally obvious how to select data from a pandas dataframe. Here I explain the basics (how to take slices and get columns)
* [Chapter 3: Which borough has the most noise complaints? (or, more selecting data)](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%203%20-%20Which%20borough%20has%20the%20most%20noise%20complaints%20%28or%2C%20more%20selecting%20data%29.ipynb)

Here we get into serious slicing and dicing and learn how to filter dataframes in complicated ways, really fast.
* [Chapter 4: Find out on which weekday people bike the most with groupby and aggregate](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%204%20-%20Find%20out%20on%20which%20weekday%20people%20bike%20the%20most%20with%20groupby%20and%20aggregate.ipynb)

The groupby/aggregate is seriously my favorite thing about pandas and I use it all the time. You should probably read this.
* [Chapter 5: Combining dataframes and scraping Canadian weather data](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%205%20-%20Combining%20dataframes%20and%20scraping%20Canadian%20weather%20data.ipynb)

Here you get to find out if it's cold in Montreal in the winter (spoiler: yes). Web scraping with pandas is fun!
* [Chapter 6: String operations! Which month was the snowiest?](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%206%20-%20String%20Operations-%20Which%20month%20was%20the%20snowiest.ipynb)

Strings with pandas are great. It has all these vectorized string operations and they're the best. We will turn a bunch of strings containing "Snow" into vectors of numbers in a trice.
* [Chapter 7: Cleaning up messy data](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%207%20-%20Cleaning%20up%20messy%20data.ipynb)

Cleaning up messy data is never a joy, but with pandas it's easier <3
* [Chapter 8: Parsing Unix timestamps](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%208%20-%20How%20to%20deal%20with%20timestamps.ipynb)

This is basically a quick trick that took me 2 days to figure out.
* [Chapter 9 - Loading data from SQL databases](https://nbviewer.org/github/jvns/pandas-cookbook/blob/master/cookbook/Chapter%209%20-%20Loading%20data%20from%20SQL%20databases.ipynb)

How to load data from an SQL database into Pandas, with examples using SQLite3, PostgreSQL, and MySQL.

How to use this cookbook
========================

You can try it out instantly online using [Jupyter Lite](https://jvns.github.io/pandas-cookbook/lab/index.html), which will run Python with WebAssembly in your browser.

To install it locally, you'll need Jupyter notebook and pandas on your computer.

You can get these using `pip` (you may want to do this inside a virtual environment to avoid conflicting with your other libraries).

```bash
# Get the repository
git clone https://github.com/jvns/pandas-cookbook.git
cd pandas-cookbook

# Set up a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Start jupyter
jupyter notebook
```

A tab should open up in your browser at `http://localhost:8888`

Happy pandas!

Running the cookbook inside a Docker container.
===============================================================
This repository contains a Dockerfile and can be built into a docker container.
To build the container run following command from inside of the repository directory:
```
docker build -t jvns/pandas-cookbook -f Dockerfile-Local .
```
run the container:
```
docker run -d -p 8888:8888 -e "PASSWORD=MakeAPassword"
```
you can find out about the id of the image, by checking
```
docker images
```

After starting the container, you can access the Jupyter notebook with the cookbook
on port 8888.

License
=======

Creative Commons License

This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)

## Translations

There's [a translation into Chinese of this repo](https://github.com/ia-cas/pandas-cookbook).