Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/miraculixx/pandas-dfquery
Keyword-based & lazy queries for Pandas DataFrames (drop-in replacement for DataFrame.query)
https://github.com/miraculixx/pandas-dfquery
Last synced: 24 days ago
JSON representation
Keyword-based & lazy queries for Pandas DataFrames (drop-in replacement for DataFrame.query)
- Host: GitHub
- URL: https://github.com/miraculixx/pandas-dfquery
- Owner: miraculixx
- License: mit
- Created: 2015-03-24T13:46:29.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2015-08-18T14:35:22.000Z (over 9 years ago)
- Last Synced: 2024-10-29T00:27:45.253Z (2 months ago)
- Language: Python
- Homepage:
- Size: 336 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
pandas-dfquery
--------------.. _Django ORM queries: https://docs.djangoproject.com/en/1.7/topics/db/queries/#retrieving-specific-objects-with-filters
Provides keyword-style queries on Pandas DataFrames -- see examples below. Inspired by `Django ORM queries`_
Why?
----Ever got tired of writing code like this:
.. code:: python
# standard subsetting syntax
df[df.YEAR == 2015 & df.MONTH == 1]
df[df.YEAR == 2015 & df.PRODUCT.str.contains('Fab')]
# .query() style
df.query('YEAR==2015 & MONTH==1')
# -- uups, string functions raise an exception (Node call not implemented)
df.query('df.YEAR == 2015 & df.PRODUCT.str.contains("Fab")'and wish you could instead write:
.. code:: python
df.query(YEAR=2015, MONTH=1)
df.query(PRODUCT__contains='Fab')
# query for null values straight forward
df.query(YEAR__isnone=True)
df.query(YEAR__isnone=False)
# use string functions
df.query(PRODUCT__islower=True)
df.query(PRODUCT__isupper=False)Then pandas-dfquery is for you. See the tutorial below.
Tutorial
--------.. code:: python
from dfquery import QDataFrame, Q, Filter
import pandas as pd
import numpy as np
# basic filtering
iris = QDataFrame(pd.read_csv('https://raw.github.com/pydata/pandas/master/pandas/tests/data/iris.csv'))
df = iris.query(SepalLength__gte=6.0, Name__contains='versicolor')
df.. code:: python
# create Q objects as query terms, which are combinable by logical &, |
q_versi = Q(SepalLength__lt=6.0, Name__contains='versi')
q_setosa = Q(SepalLength__lt=6.0, Name__contains='setosa')
iris.query(q_versi & ~q_setosa).. code:: python
# create Q objects as query terms, which are combinable by logical &, |
q_versi = Q(SepalLength__gt=6.0, Name__contains='versi')
q_setosa = Q(SepalLength__lt=6.0, Name__contains='setosa')
iris.query(q_versi | q_setosa).. code:: python
# lazy evaluation -- query() returns self instead of a new dataframe
# calls to .query() build up a filter object which is only evaluated
# on repr() or when accesing the .value property
df = QDataFrame(iris).lazy()
df.query(~Q(Name__contains='versicolor') & ~Q(Name__contains='setosa'))
df.query(SepalLength=5.8)
df.value
.. code:: python# use an existing data frame
df = pd.DataFrame(...)
dfsubset = Filter(df, Name__contains='versicolor').value
# or use q objects as before
dfsubset = Filter(df, ~Q(Name__contains='versicolor') & ~Q(Name__contains='setosa'))Development
-----------Installation
++++++++++++.. code:: python
$ pip install -r requirements.txt
Running unit tests
++++++++++++++++++.. code::
$ python -m unittest discover