Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tonyfast/idiomatic-pandas

A subset of idioms for pandas
https://github.com/tonyfast/idiomatic-pandas

Last synced: about 1 month ago
JSON representation

A subset of idioms for pandas

Awesome Lists containing this project

README

        

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Idiomatic 🐞 pandas\n",
"\n",
"[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/tonyfast/idiomatic-pandas/master)\n",
"[NBViewer](http://nbviewer.jupyter.org/github/tonyfast/idiomatic-pandas/blob/master/readme.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This repository will help demonstrate interactive data manipulation in [__JupyterLab__](https://jupyterlab.readthedocs.io) using a subset of the extensive [🐞 API](https://pandas.pydata.org/). During this demo, we will treat `🐞.DataFrame` and `🐞.Series` as first \n",
"class citizens in our notebooks. 🐞 is a [__NumFocus sponsored__](https://numfocus.org/sponsored-projects) open source Python package for working with [tidy data](https://en.wikipedia.org/wiki/Tidy_data).\n",
"\n",
"This repository was created for [Munmun DeChoudhury's](http://www.munmund.net/) [Social Computing course](http://www.munmund.net/CS6474_Fall2018.html) on October 3, 2018."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Please checkout the [`deathbeds` blog](http://deathbeds.github.io) for weekly posts about literate and scientific computing.\n",
"\n",
"> To interact with the local scientific computing community join [PyData Atlanta](https://www.meetup.com/PyData-Atlanta/) or [The Atlanta Jupyter User Group](https://www.meetup.com/Atlanta-Jupyter-User-Group/).\n",
"\n",
"> [A little news](https://gist.github.com/tonyfast/c7505b54166130a5836b0ece181bbb23) before we get going."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Objective\n",
"\n",
"> 10 Get into a rut early: Do the same process the same way. Accumulate idioms. Standardize. The only difference(!) between Shakespeare and you was the size of his idiom list - not the size of his vocabulary.\n",
">> [Alan Perlis - _Perlisisms_](http://www.cs.yale.edu/homes/perlis-alan/quotes.html)\n",
"\n",
"* We will treat public Github data as social data. \n",
"* We will manipulate, combine, and explore multiple data sources from the Github API.\n",
"* We will discuss how 🐞 objects may replace common Python operations.\n",
"* We will explore how JupyterLab can augment the data analysis experience.\n",
"* We will discuss third party libraries that extend 🐞s.\n",
"\n",
"> Some syntaxes in this demonstartion may look unfamiliar. Please ask questions when you have them. In this code, the audience will see a mixture if functional programming, asynchronous programming, and caching."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Why does this demonstration exist?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> I feel a major difference between the [R culture](https://en.wikipedia.org/wiki/R_(programming_language)) and Python culture is that Python users seem to create code more often, whereas R users often use code. There seems to be a strong atmosphere of software engineering in the Python world: in the beginning was the custom class (with methods). For R users, in the beginning was the data.\n",
"\n",
">> [Yihui Xie - _The First Notebook War_](https://yihui.name/en/2018/09/notebook-war)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are a lot of 🐞 examples on the web, but they _feel_ more like software rather than data analysis. Having too many idioms may muddle data analysis & software engineering. Hopefully, this demonstration promotes the value of 🐞 as a productivity tool for data analysis, and accelerates insight during the development process."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tidy Data\n",
"\n",
"[![image](https://user-images.githubusercontent.com/4236275/46415900-524e8e80-c6f4-11e8-8183-6732beeecafa.png)](https://vita.had.co.nz/papers/tidy-data.pdf)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The Zen of Python, by Tim Peters\n",
"\n",
"Beautiful is better than ugly.\n",
"Explicit is better than implicit.\n",
"Simple is better than complex.\n",
"Complex is better than complicated.\n",
"Flat is better than nested.\n",
"Sparse is better than dense.\n",
"Readability counts.\n",
"Special cases aren't special enough to break the rules.\n",
"Although practicality beats purity.\n",
"Errors should never pass silently.\n",
"Unless explicitly silenced.\n",
"In the face of ambiguity, refuse the temptation to guess.\n",
"There should be one-- and preferably only one --obvious way to do it.\n",
"Although that way may not be obvious at first unless you're Dutch.\n",
"Now is better than never.\n",
"Although never is often better than *right* now.\n",
"If the implementation is hard to explain, it's a bad idea.\n",
"If the implementation is easy to explain, it may be a good idea.\n",
"Namespaces are one honking great idea -- let's do more of those!\n"
]
}
],
"source": [
" import this"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Modern Idioms\n",
"\n",
"The Python data science ecosystem is evolving rapidly.\n",
"\n",
"* [The Python 3 movement](https://python3statement.org/)\n",
"* [fstrings](https://www.python.org/dev/peps/pep-0498/)\n",
"* [Path](https://docs.python.org/3/library/pathlib.html)\n",
"* [Type Annotations](https://docs.python.org/3/library/typing.html)\n",
"* [walrus operator](https://speakerdeck.com/di_codes/pep-572-the-walrus-operator)\n",
"* [async Python](https://docs.python.org/3/library/asyncio-task.html)\n",
"* [IPython 7.0](https://blog.jupyter.org/ipython-7-0-async-repl-a35ce050f7f7)\n",
"* [conda forge](https://conda-forge.org/)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# References\n",
"\n",
"I have been using 🐞 for ~4 years now. Below are some references that have influenced my approaches to data analysis in 🐞.\n",
"\n",
"* http://pandas.pydata.org/pandas-docs/stable/merging.html\n",
"* https://tomaugspurger.github.io/modern-1-intro\n",
"* https://blaze.readthedocs.io/en/latest/rosetta-pandas.html\n",
"* https://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html\n",
"* http://matthewrocklin.com/blog/work/2015/06/18/Categoricals\n",
"* http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization \n",
"* https://pandas.pydata.org/pandas-docs/stable/indexing.html\n",
"* https://toolz.readthedocs.io/en/latest/composition.html"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[NbConvertApp] Converting notebook readme.ipynb to markdown\n",
"[NbConvertApp] Writing 5619 bytes to readme.md\n"
]
}
],
"source": [
" if __name__ == '__main__':\n",
" !jupyter nbconvert --to markdown readme.ipynb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}