https://github.com/kayak/pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
https://github.com/kayak/pypika
builder data functional python python3 pythonic query sql
Last synced: 7 months ago
JSON representation
Host: GitHub
URL: https://github.com/kayak/pypika
Owner: kayak
License: apache-2.0
Created: 2016-07-06T14:08:50.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2024-10-21T15:43:38.000Z (about 1 year ago)
Last Synced: 2024-10-29T09:52:29.507Z (about 1 year ago)
Topics: builder, data, functional, python, python3, pythonic, query, sql
Language: Python
Homepage: http://pypika.readthedocs.io/en/latest/
Size: 1.2 MB
Stars: 2,511
Watchers: 36
Forks: 296
Open Issues: 221
Metadata Files:
- Readme: README.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE.txt
- Codeowners: CODEOWNERS
Awesome Lists containing this project

awesome-rainmana - kayak/pypika - PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially us (Python)
starred-awesome - pypika - PyPika is a SQL query builder with a pythonic syntax that doesn't limit the expressiveness of SQL (Python)
jimsghstars - kayak/pypika - PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially us (Python)
awesome-fastapi - PyPika - A SQL query builder that exposes the full richness of the SQL language. (Third-Party Extensions / Databases)
best-of-python - GitHub - 45% open · ⏱️ 02.11.2025): (Database Clients)
README

          PyPika - Python Query Builder

=============================

.. _intro_start:

|BuildStatus|  |CoverageStatus|  |Codacy|  |Docs|  |PyPi|  |License|

Abstract

--------

What is |Brand|?

|Brand| is a Python API for building SQL queries. The motivation behind |Brand| is to provide a simple interface for

building SQL queries without limiting the flexibility of handwritten SQL. Designed with data analysis in mind, |Brand|

leverages the builder design pattern to construct queries to avoid messy string formatting and concatenation. It is also

easily extended to take full advantage of specific features of SQL database vendors.

What are the design goals for |Brand|?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

|Brand| is a fast, expressive and flexible way to replace handwritten SQL (or even ORM for the courageous souls amongst you).

Validation of SQL correctness is not an explicit goal of |Brand|. With such a large number of

SQL database vendors providing a robust validation of input data is difficult. Instead you are encouraged to check inputs you provide to |Brand| or appropriately handle errors raised from

your SQL database - just as you would have if you were writing SQL yourself.

.. _intro_end:

Read the docs: http://pypika.readthedocs.io/en/latest/

Installation

------------

.. _installation_start:

|Brand| supports python ``3.6+``.  It may also work on pypy, cython, and jython, but is not being tested for these versions.

To install |Brand| run the following command:

.. code-block:: bash

    pip install pypika

.. _installation_end:

Tutorial

--------

.. _tutorial_start:

The main classes in pypika are ``pypika.Query``, ``pypika.Table``, and ``pypika.Field``.

.. code-block:: python

    from pypika import Query, Table, Field

Selecting Data

^^^^^^^^^^^^^^

The entry point for building queries is ``pypika.Query``.  In order to select columns from a table, the table must

first be added to the query.  For simple queries with only one table, tables and columns can be references using

strings.  For more sophisticated queries a ``pypika.Table`` must be used.

.. code-block:: python

    q = Query.from_('customers').select('id', 'fname', 'lname', 'phone')

To convert the query into raw SQL, it can be cast to a string.

.. code-block:: python

    str(q)

Alternatively, you can use the `Query.get_sql()` function:

.. code-block:: python

    q.get_sql()

Tables, Columns, Schemas, and Databases

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In simple queries like the above example, columns in the "from" table can be referenced by passing string names into

the ``select`` query builder function. In more complex examples, the ``pypika.Table`` class should be used. Columns can be

referenced as attributes on instances of ``pypika.Table``.

.. code-block:: python

    from pypika import Table, Query

    customers = Table('customers')

    q = Query.from_(customers).select(customers.id, customers.fname, customers.lname, customers.phone)

Both of the above examples result in the following SQL:

.. code-block:: sql

    SELECT id,fname,lname,phone FROM customers

An alias for the table can be given using the ``.as_`` function on ``pypika.Table``

.. code-block:: sql

    customers = Table('x_view_customers').as_('customers')

    q = Query.from_(customers).select(customers.id, customers.phone)

.. code-block:: sql

    SELECT id,phone FROM x_view_customers customers

A schema can also be specified. Tables can be referenced as attributes on the schema.

.. code-block:: sql

    from pypika import Table, Query, Schema

    views = Schema('views')

    q = Query.from_(views.customers).select(customers.id, customers.phone)

.. code-block:: sql

    SELECT id,phone FROM views.customers

Also references to databases can be used. Schemas can be referenced as attributes on the database.

.. code-block:: sql

    from pypika import Table, Query, Database

    my_db = Database('my_db')

    q = Query.from_(my_db.analytics.customers).select(customers.id, customers.phone)

.. code-block:: sql

    SELECT id,phone FROM my_db.analytics.customers

Results can be ordered by using the following syntax:

.. code-block:: python

    from pypika import Order

    Query.from_('customers').select('id', 'fname', 'lname', 'phone').orderby('id', order=Order.desc)

This results in the following SQL:

.. code-block:: sql

    SELECT "id","fname","lname","phone" FROM "customers" ORDER BY "id" DESC

Arithmetic

""""""""""

Arithmetic expressions can also be constructed using pypika.  Operators such as `+`, `-`, `*`, and `/` are implemented

by ``pypika.Field`` which can be used simply with a ``pypika.Table`` or directly.

.. code-block:: python

    from pypika import Field

    q = Query.from_('account').select(

        Field('revenue') - Field('cost')

    )

.. code-block:: sql

    SELECT revenue-cost FROM accounts

Using ``pypika.Table``

.. code-block:: python

    accounts = Table('accounts')

    q = Query.from_(accounts).select(

        accounts.revenue - accounts.cost

    )

.. code-block:: sql

    SELECT revenue-cost FROM accounts

An alias can also be used for fields and expressions.

.. code-block:: sql

    q = Query.from_(accounts).select(

        (accounts.revenue - accounts.cost).as_('profit')

    )

.. code-block:: sql

    SELECT revenue-cost profit FROM accounts

More arithmetic examples

.. code-block:: python

    table = Table('table')

    q = Query.from_(table).select(

        table.foo + table.bar,

        table.foo - table.bar,

        table.foo * table.bar,

        table.foo / table.bar,

        (table.foo+table.bar) / table.fiz,

    )

.. code-block:: sql

    SELECT foo+bar,foo-bar,foo*bar,foo/bar,(foo+bar)/fiz FROM table

Filtering

"""""""""

Queries can be filtered with ``pypika.Criterion`` by using equality or inequality operators

.. code-block:: python

    customers = Table('customers')

    q = Query.from_(customers).select(

        customers.id, customers.fname, customers.lname, customers.phone

    ).where(

        customers.lname == 'Mustermann'

    )

.. code-block:: sql

    SELECT id,fname,lname,phone FROM customers WHERE lname='Mustermann'

Query methods such as select, where, groupby, and orderby can be called multiple times.  Multiple calls to the where

method will add additional conditions as

.. code-block:: python

    customers = Table('customers')

    q = Query.from_(customers).select(

        customers.id, customers.fname, customers.lname, customers.phone

    ).where(

        customers.fname == 'Max'

    ).where(

        customers.lname == 'Mustermann'

    )

.. code-block:: sql

    SELECT id,fname,lname,phone FROM customers WHERE fname='Max' AND lname='Mustermann'

Filters such as IN and BETWEEN are also supported

.. code-block:: python

    customers = Table('customers')

    q = Query.from_(customers).select(

        customers.id,customers.fname

    ).where(

        customers.age[18:65] & customers.status.isin(['new', 'active'])

    )

.. code-block:: sql

    SELECT id,fname FROM customers WHERE age BETWEEN 18 AND 65 AND status IN ('new','active')

Filtering with complex criteria can be created using boolean symbols ``&``, ``|``, and ``^``.

AND

.. code-block:: python

    customers = Table('customers')

    q = Query.from_(customers).select(

        customers.id, customers.fname, customers.lname, customers.phone

    ).where(

        (customers.age >= 18) & (customers.lname == 'Mustermann')

    )

.. code-block:: sql

    SELECT id,fname,lname,phone FROM customers WHERE age>=18 AND lname='Mustermann'

OR

.. code-block:: python

    customers = Table('customers')

    q = Query.from_(customers).select(

        customers.id, customers.fname, customers.lname, customers.phone

    ).where(

        (customers.age >= 18) | (customers.lname == 'Mustermann')

    )

.. code-block:: sql

    SELECT id,fname,lname,phone FROM customers WHERE age>=18 OR lname='Mustermann'

XOR

.. code-block:: python

 customers = Table('customers')

 q = Query.from_(customers).select(

     customers.id, customers.fname, customers.lname, customers.phone

 ).where(

     (customers.age >= 18) ^ customers.is_registered

 )

.. code-block:: sql

    SELECT id,fname,lname,phone FROM customers WHERE age>=18 XOR is_registered

Convenience Methods

"""""""""""""""""""

In the `Criterion` class, there are the static methods `any` and `all` that allow building chains AND and OR expressions with a list of terms.

.. code-block:: python

    from pypika import Criterion

    customers = Table('customers')

    q = Query.from_(customers).select(

        customers.id,

        customers.fname

    ).where(

        Criterion.all([

            customers.is_registered,

            customers.age >= 18,

            customers.lname == "Jones",

        ])

    )

.. code-block:: sql

    SELECT id,fname FROM customers WHERE is_registered AND age>=18 AND lname = "Jones"

Grouping and Aggregating

""""""""""""""""""""""""

Grouping allows for aggregated results and works similar to ``SELECT`` clauses.

.. code-block:: python

    from pypika import functions as fn

    customers = Table('customers')

    q = Query \

        .from_(customers) \

        .where(customers.age >= 18) \

        .groupby(customers.id) \

        .select(customers.id, fn.Sum(customers.revenue))

.. code-block:: sql

    SELECT id,SUM("revenue") FROM "customers" WHERE "age">=18 GROUP BY "id"

After adding a ``GROUP BY`` clause to a query, the ``HAVING`` clause becomes available.  The method

``Query.having()`` takes a ``Criterion`` parameter similar to the method ``Query.where()``.

.. code-block:: python

    from pypika import functions as fn

    payments = Table('payments')

    q = Query \

        .from_(payments) \

        .where(payments.transacted[date(2015, 1, 1):date(2016, 1, 1)]) \

        .groupby(payments.customer_id) \

        .having(fn.Sum(payments.total) >= 1000) \

        .select(payments.customer_id, fn.Sum(payments.total))

.. code-block:: sql

    SELECT customer_id,SUM(total) FROM payments

    WHERE transacted BETWEEN '2015-01-01' AND '2016-01-01'

    GROUP BY customer_id HAVING SUM(total)>=1000

Joining Tables and Subqueries

"""""""""""""""""""""""""""""

Tables and subqueries can be joined to any query using the ``Query.join()`` method.  Joins can be performed with either

a ``USING`` or ``ON`` clauses.  The ``USING`` clause can be used when both tables/subqueries contain the same field and

the ``ON`` clause can be used with a criterion. To perform a join, ``...join()`` can be chained but then must be

followed immediately by ``...on()`` or ``...using(*field)``.

Join Types

~~~~~~~~~~

All join types are supported by |Brand|.

.. code-block:: python

    Query \

        .from_(base_table)

        ...

        .join(join_table, JoinType.left)

        ...

.. code-block:: python

    Query \

        .from_(base_table)

        ...

        .left_join(join_table) \

        .left_outer_join(join_table) \

        .right_join(join_table) \

        .right_outer_join(join_table) \

        .inner_join(join_table) \

        .outer_join(join_table) \

        .full_outer_join(join_table) \

        .cross_join(join_table) \

        .hash_join(join_table) \

        ...

See the list of join types here ``pypika.enums.JoinTypes``

Example of a join using `ON`

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    history, customers = Tables('history', 'customers')

    q = Query \

        .from_(history) \

        .join(customers) \

        .on(history.customer_id == customers.id) \

        .select(history.star) \

        .where(customers.id == 5)

.. code-block:: sql

    SELECT "history".* FROM "history" JOIN "customers" ON "history"."customer_id"="customers"."id" WHERE "customers"."id"=5

As a shortcut, the ``Query.join().on_field()`` function is provided for joining the (first) table in the ``FROM`` clause

with the joined table when the field name(s) are the same in both tables.

Example of a join using `ON`

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    history, customers = Tables('history', 'customers')

    q = Query \

        .from_(history) \

        .join(customers) \

        .on_field('customer_id', 'group') \

        .select(history.star) \

        .where(customers.group == 'A')

.. code-block:: sql

    SELECT "history".* FROM "history" JOIN "customers" ON "history"."customer_id"="customers"."customer_id" AND "history"."group"="customers"."group" WHERE "customers"."group"='A'

Example of a join using `USING`

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    history, customers = Tables('history', 'customers')

    q = Query \

        .from_(history) \

        .join(customers) \

        .using('customer_id') \

        .select(history.star) \

        .where(customers.id == 5)

.. code-block:: sql

    SELECT "history".* FROM "history" JOIN "customers" USING "customer_id" WHERE "customers"."id"=5

Example of a correlated subquery in the `SELECT`

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    history, customers = Tables('history', 'customers')

    last_purchase_at = Query.from_(history).select(

        history.purchase_at

    ).where(history.customer_id==customers.customer_id).orderby(

        history.purchase_at, order=Order.desc

    ).limit(1)

    q = Query.from_(customers).select(

        customers.id, last_purchase_at.as_('last_purchase_at')

    )

.. code-block:: sql

    SELECT

      "id",

      (SELECT "history"."purchase_at"

       FROM "history"

       WHERE "history"."customer_id" = "customers"."customer_id"

       ORDER BY "history"."purchase_at" DESC

       LIMIT 1) "last_purchase_at"

    FROM "customers"

Unions

""""""

Both ``UNION`` and ``UNION ALL`` are supported. ``UNION DISTINCT`` is synonymous with ``UNION`` so |Brand| does not

provide a separate function for it.  Unions require that queries have the same number of ``SELECT`` clauses so

trying to cast a unioned query to string will throw a ``SetOperationException`` if the column sizes are mismatched.

To create a union query, use either the ``Query.union()`` method or `+` operator with two query instances. For a

union all, use ``Query.union_all()`` or the `*` operator.

.. code-block:: python

    provider_a, provider_b = Tables('provider_a', 'provider_b')

    q = Query.from_(provider_a).select(

        provider_a.created_time, provider_a.foo, provider_a.bar

    ) + Query.from_(provider_b).select(

        provider_b.created_time, provider_b.fiz, provider_b.buz

    )

.. code-block:: sql

    SELECT "created_time","foo","bar" FROM "provider_a" UNION SELECT "created_time","fiz","buz" FROM "provider_b"

Intersect

"""""""""

``INTERSECT`` is supported. Intersects require that queries have the same number of ``SELECT`` clauses so

trying to cast a intersected query to string will throw a ``SetOperationException`` if the column sizes are mismatched.

To create a intersect query, use the ``Query.intersect()`` method.

.. code-block:: python

    provider_a, provider_b = Tables('provider_a', 'provider_b')

    q = Query.from_(provider_a).select(

        provider_a.created_time, provider_a.foo, provider_a.bar

    )

    r = Query.from_(provider_b).select(

        provider_b.created_time, provider_b.fiz, provider_b.buz

    )

    intersected_query = q.intersect(r)

.. code-block:: sql

    SELECT "created_time","foo","bar" FROM "provider_a" INTERSECT SELECT "created_time","fiz","buz" FROM "provider_b"

Minus

"""""

``MINUS`` is supported. Minus require that queries have the same number of ``SELECT`` clauses so

trying to cast a minus query to string will throw a ``SetOperationException`` if the column sizes are mismatched.

To create a minus query, use either the ``Query.minus()`` method or `-` operator with two query instances.

.. code-block:: python

    provider_a, provider_b = Tables('provider_a', 'provider_b')

    q = Query.from_(provider_a).select(

        provider_a.created_time, provider_a.foo, provider_a.bar

    )

    r = Query.from_(provider_b).select(

        provider_b.created_time, provider_b.fiz, provider_b.buz

    )

    minus_query = q.minus(r)

    (or)

    minus_query = Query.from_(provider_a).select(

        provider_a.created_time, provider_a.foo, provider_a.bar

    ) - Query.from_(provider_b).select(

        provider_b.created_time, provider_b.fiz, provider_b.buz

    )

.. code-block:: sql

    SELECT "created_time","foo","bar" FROM "provider_a" MINUS SELECT "created_time","fiz","buz" FROM "provider_b"

EXCEPT

""""""

``EXCEPT`` is supported. Minus require that queries have the same number of ``SELECT`` clauses so

trying to cast a except query to string will throw a ``SetOperationException`` if the column sizes are mismatched.

To create a except query, use the ``Query.except_of()`` method.

.. code-block:: python

    provider_a, provider_b = Tables('provider_a', 'provider_b')

    q = Query.from_(provider_a).select(

        provider_a.created_time, provider_a.foo, provider_a.bar

    )

    r = Query.from_(provider_b).select(

        provider_b.created_time, provider_b.fiz, provider_b.buz

    )

    minus_query = q.except_of(r)

.. code-block:: sql

    SELECT "created_time","foo","bar" FROM "provider_a" EXCEPT SELECT "created_time","fiz","buz" FROM "provider_b"

Date, Time, and Intervals

"""""""""""""""""""""""""

Using ``pypika.Interval``, queries can be constructed with date arithmetic.  Any combination of intervals can be

used except for weeks and quarters, which must be used separately and will ignore any other values if selected.

.. code-block:: python

    from pypika import functions as fn

    fruits = Tables('fruits')

    q = Query.from_(fruits) \

        .select(fruits.id, fruits.name) \

        .where(fruits.harvest_date + Interval(months=1) < fn.Now())

.. code-block:: sql

    SELECT id,name FROM fruits WHERE harvest_date+INTERVAL 1 MONTH QueryBuilder: 

        if isinstance(col, str): 

            col = Field(col)

        return query.where(col > fn.Now() - num_days)

    def count_groups(query: QueryBuilder, *groups) -> QueryBuilder: 

        return query.groupby(*groups).select(*groups, fn.Count("*").as_("n_rows"))

    base_query = Query.from_("table")

    query = (

        base_query

        .pipe(filter_days, "date", num_days=7)

        .pipe(count_groups, "col1", "col2")

    )

This produces: 

.. code-block:: sql

    SELECT "col1","col2",COUNT(*) n_rows 

    FROM "table" 

    WHERE "date">NOW()-7 

    GROUP BY "col1","col2"

.. _tutorial_end:

.. _contributing_start: 

Contributing

------------

We welcome community contributions to |Brand|. Please see the `contributing guide <6_contributing.html>`_ to more info.

.. _contributing_end:

.. _license_start:

License

-------

Copyright 2020 KAYAK Germany, GmbH

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

Crafted with ♥ in Berlin.

.. _license_end:

.. _appendix_start:

.. |Brand| replace:: *PyPika*

.. _appendix_end:

.. _available_badges_start:

.. |BuildStatus| image:: https://github.com/kayak/pypika/workflows/Unit%20Tests/badge.svg

   :target: https://github.com/kayak/pypika/actions

.. |CoverageStatus| image:: https://coveralls.io/repos/kayak/pypika/badge.svg?branch=master

   :target: https://coveralls.io/github/kayak/pypika?branch=master

.. |Codacy| image:: https://api.codacy.com/project/badge/Grade/6d7e44e5628b4839a23da0bd82eaafcf

   :target: https://www.codacy.com/app/twheys/pypika

.. |Docs| image:: https://readthedocs.org/projects/pypika/badge/?version=latest

   :target: http://pypika.readthedocs.io/en/latest/

.. |PyPi| image:: https://img.shields.io/pypi/v/pypika.svg?style=flat

   :target: https://pypi.python.org/pypi/pypika

.. |License| image:: https://img.shields.io/hexpm/l/plug.svg?maxAge=2592000

   :target: http://www.apache.org/licenses/LICENSE-2.0

.. _available_badges_end:
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kayak/pypika

Awesome Lists containing this project

README