https://github.com/miraculixx/pandas-dfquery

Keyword-based & lazy queries for Pandas DataFrames (drop-in replacement for DataFrame.query)
https://github.com/miraculixx/pandas-dfquery

Last synced: 3 months ago
JSON representation

Keyword-based & lazy queries for Pandas DataFrames (drop-in replacement for DataFrame.query)

Host: GitHub
URL: https://github.com/miraculixx/pandas-dfquery
Owner: miraculixx
License: mit
Created: 2015-03-24T13:46:29.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2015-08-18T14:35:22.000Z (almost 10 years ago)
Last Synced: 2025-02-09T01:43:19.973Z (5 months ago)
Language: Python
Homepage:
Size: 336 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

README

        pandas-dfquery

--------------

.. _Django ORM queries: https://docs.djangoproject.com/en/1.7/topics/db/queries/#retrieving-specific-objects-with-filters

Provides keyword-style queries on Pandas DataFrames -- see examples below. Inspired by `Django ORM queries`_

Why?

----

Ever got tired of writing code like this:

.. code:: python

    # standard subsetting syntax

    df[df.YEAR == 2015 & df.MONTH == 1]

    df[df.YEAR == 2015 & df.PRODUCT.str.contains('Fab')]

    # .query() style

    df.query('YEAR==2015 & MONTH==1')

    # -- uups, string functions raise an exception (Node call not implemented)

    df.query('df.YEAR == 2015 & df.PRODUCT.str.contains("Fab")'

and wish you could instead write:

.. code:: python

    df.query(YEAR=2015, MONTH=1)

    df.query(PRODUCT__contains='Fab')

    # query for null values straight forward 

    df.query(YEAR__isnone=True)

    df.query(YEAR__isnone=False)

    # use string functions

    df.query(PRODUCT__islower=True)

    df.query(PRODUCT__isupper=False)

Then pandas-dfquery is for you. See the tutorial below.

Tutorial

--------

.. code:: python

    from dfquery import QDataFrame, Q, Filter

    import pandas as pd

    import numpy as np

    

    # basic filtering

    iris = QDataFrame(pd.read_csv('https://raw.github.com/pydata/pandas/master/pandas/tests/data/iris.csv'))

    df = iris.query(SepalLength__gte=6.0, Name__contains='versicolor')

    df

.. code:: python

    # create Q objects as query terms, which are combinable by logical &, | 

    q_versi = Q(SepalLength__lt=6.0, Name__contains='versi')

    q_setosa = Q(SepalLength__lt=6.0, Name__contains='setosa')

    iris.query(q_versi & ~q_setosa)

.. code:: python

    # create Q objects as query terms, which are combinable by logical &, | 

    q_versi = Q(SepalLength__gt=6.0, Name__contains='versi')

    q_setosa = Q(SepalLength__lt=6.0, Name__contains='setosa')

    iris.query(q_versi | q_setosa)

.. code:: python

    # lazy evaluation -- query() returns self instead of a new dataframe

    # calls to .query() build up a filter object which is only evaluated

    # on repr() or when accesing the .value property

    df = QDataFrame(iris).lazy()

    df.query(~Q(Name__contains='versicolor') & ~Q(Name__contains='setosa'))

    df.query(SepalLength=5.8)

    df.value

    

.. code:: python

   # use an existing data frame

   df = pd.DataFrame(...)

   dfsubset = Filter(df, Name__contains='versicolor').value

   # or use q objects as before

   dfsubset = Filter(df, ~Q(Name__contains='versicolor') & ~Q(Name__contains='setosa'))

Development

-----------

Installation

++++++++++++

.. code:: python

   $ pip install -r requirements.txt

Running unit tests

++++++++++++++++++

.. code:: 

   $ python -m unittest discover

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/miraculixx/pandas-dfquery

Awesome Lists containing this project

README