An open API service indexing awesome lists of open source software.

https://github.com/kayak/pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
https://github.com/kayak/pypika

builder data functional python python3 pythonic query sql

Last synced: 10 days ago
JSON representation

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

Awesome Lists containing this project

README

          

PyPika - Python Query Builder
=============================

.. _intro_start:

|BuildStatus| |CoverageStatus| |Codacy| |Docs| |PyPi| |License|

Abstract
--------

What is |Brand|?

|Brand| is a Python API for building SQL queries. The motivation behind |Brand| is to provide a simple interface for
building SQL queries without limiting the flexibility of handwritten SQL. Designed with data analysis in mind, |Brand|
leverages the builder design pattern to construct queries to avoid messy string formatting and concatenation. It is also
easily extended to take full advantage of specific features of SQL database vendors.

What are the design goals for |Brand|?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

|Brand| is a fast, expressive and flexible way to replace handwritten SQL (or even ORM for the courageous souls amongst you).
Validation of SQL correctness is not an explicit goal of |Brand|. With such a large number of
SQL database vendors providing a robust validation of input data is difficult. Instead you are encouraged to check inputs you provide to |Brand| or appropriately handle errors raised from
your SQL database - just as you would have if you were writing SQL yourself.

.. _intro_end:

Read the docs: http://pypika.readthedocs.io/en/latest/

Installation
------------

.. _installation_start:

|Brand| supports is tested for supported Python, i.e. 3.9+. It is tested for PyPy3.9 and PyPy3.10. It may also work Cython, and Jython but is not being tested for in the CI script.

To install |Brand| run the following command:

.. code-block:: bash

pip install pypika

.. _installation_end:

Tutorial
--------

.. _tutorial_start:

The main classes in pypika are ``pypika.Query``, ``pypika.Table``, and ``pypika.Field``.

.. code-block:: python

from pypika import Query, Table, Field

Selecting Data
^^^^^^^^^^^^^^

The entry point for building queries is ``pypika.Query``. In order to select columns from a table, the table must
first be added to the query. For simple queries with only one table, tables and columns can be references using
strings. For more sophisticated queries a ``pypika.Table`` must be used.

.. code-block:: python

q = Query.from_('customers').select('id', 'fname', 'lname', 'phone')

To convert the query into raw SQL, it can be cast to a string.

.. code-block:: python

str(q)

Alternatively, you can use the `Query.get_sql()` function:

.. code-block:: python

q.get_sql()

Tables, Columns, Schemas, and Databases
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In simple queries like the above example, columns in the "from" table can be referenced by passing string names into
the ``select`` query builder function. In more complex examples, the ``pypika.Table`` class should be used. Columns can be
referenced as attributes on instances of ``pypika.Table``.

.. code-block:: python

from pypika import Table, Query

customers = Table('customers')
q = Query.from_(customers).select(customers.id, customers.fname, customers.lname, customers.phone)

Both of the above examples result in the following SQL:

.. code-block:: sql

SELECT id,fname,lname,phone FROM customers

An alias for the table can be given using the ``.as_`` function on ``pypika.Table``

.. code-block:: sql

customers = Table('x_view_customers').as_('customers')
q = Query.from_(customers).select(customers.id, customers.phone)

.. code-block:: sql

SELECT id,phone FROM x_view_customers customers

A schema can also be specified. Tables can be referenced as attributes on the schema.

.. code-block:: sql

from pypika import Table, Query, Schema

views = Schema('views')
q = Query.from_(views.customers).select(customers.id, customers.phone)

.. code-block:: sql

SELECT id,phone FROM views.customers

Also references to databases can be used. Schemas can be referenced as attributes on the database.

.. code-block:: sql

from pypika import Table, Query, Database

my_db = Database('my_db')
q = Query.from_(my_db.analytics.customers).select(customers.id, customers.phone)

.. code-block:: sql

SELECT id,phone FROM my_db.analytics.customers

Results can be ordered by using the following syntax:

.. code-block:: python

from pypika import Order
Query.from_('customers').select('id', 'fname', 'lname', 'phone').orderby('id', order=Order.desc)

This results in the following SQL:

.. code-block:: sql

SELECT "id","fname","lname","phone" FROM "customers" ORDER BY "id" DESC

Arithmetic
""""""""""

Arithmetic expressions can also be constructed using pypika. Operators such as `+`, `-`, `*`, and `/` are implemented
by ``pypika.Field`` which can be used simply with a ``pypika.Table`` or directly.

.. code-block:: python

from pypika import Field

q = Query.from_('account').select(
Field('revenue') - Field('cost')
)

.. code-block:: sql

SELECT revenue-cost FROM accounts

Using ``pypika.Table``

.. code-block:: python

accounts = Table('accounts')
q = Query.from_(accounts).select(
accounts.revenue - accounts.cost
)

.. code-block:: sql

SELECT revenue-cost FROM accounts

An alias can also be used for fields and expressions.

.. code-block:: sql

q = Query.from_(accounts).select(
(accounts.revenue - accounts.cost).as_('profit')
)

.. code-block:: sql

SELECT revenue-cost profit FROM accounts

More arithmetic examples

.. code-block:: python

table = Table('table')
q = Query.from_(table).select(
table.foo + table.bar,
table.foo - table.bar,
table.foo * table.bar,
table.foo / table.bar,
(table.foo+table.bar) / table.fiz,
)

.. code-block:: sql

SELECT foo+bar,foo-bar,foo*bar,foo/bar,(foo+bar)/fiz FROM table

Bitwise operations are also supported using the ``bitwiseand`` and ``bitwiseor`` methods.

.. code-block:: python

from pypika import Query, Field

q = Query.from_('flags').select('name').where(Field('permissions').bitwiseand(4) == 4)

.. code-block:: sql

SELECT "name" FROM "flags" WHERE ("permissions" & 4)=4

.. code-block:: python

q = Query.from_('flags').select('name').where(Field('permissions').bitwiseor(2) == 3)

.. code-block:: sql

SELECT "name" FROM "flags" WHERE ("permissions" | 2)=3

Filtering
"""""""""

Queries can be filtered with ``pypika.Criterion`` by using equality or inequality operators

.. code-block:: python

customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
customers.lname == 'Mustermann'
)

.. code-block:: sql

SELECT id,fname,lname,phone FROM customers WHERE lname='Mustermann'

Query methods such as select, where, groupby, and orderby can be called multiple times. Multiple calls to the where
method will add additional conditions as

.. code-block:: python

customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
customers.fname == 'Max'
).where(
customers.lname == 'Mustermann'
)

.. code-block:: sql

SELECT id,fname,lname,phone FROM customers WHERE fname='Max' AND lname='Mustermann'

Filters such as IN and BETWEEN are also supported

.. code-block:: python

customers = Table('customers')
q = Query.from_(customers).select(
customers.id,customers.fname
).where(
customers.age[18:65] & customers.status.isin(['new', 'active'])
)

.. code-block:: sql

SELECT id,fname FROM customers WHERE age BETWEEN 18 AND 65 AND status IN ('new','active')

Filtering with complex criteria can be created using boolean symbols ``&``, ``|``, and ``^``.

AND

.. code-block:: python

customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) & (customers.lname == 'Mustermann')
)

.. code-block:: sql

SELECT id,fname,lname,phone FROM customers WHERE age>=18 AND lname='Mustermann'

OR

.. code-block:: python

customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) | (customers.lname == 'Mustermann')
)

.. code-block:: sql

SELECT id,fname,lname,phone FROM customers WHERE age>=18 OR lname='Mustermann'

XOR

.. code-block:: python

customers = Table('customers')
q = Query.from_(customers).select(
customers.id, customers.fname, customers.lname, customers.phone
).where(
(customers.age >= 18) ^ customers.is_registered
)

.. code-block:: sql

SELECT id,fname,lname,phone FROM customers WHERE age>=18 XOR is_registered

Convenience Methods
"""""""""""""""""""

In the `Criterion` class, there are the static methods `any` and `all` that allow building chains AND and OR expressions with a list of terms.

.. code-block:: python

from pypika import Criterion

customers = Table('customers')
q = Query.from_(customers).select(
customers.id,
customers.fname
).where(
Criterion.all([
customers.is_registered,
customers.age >= 18,
customers.lname == "Jones",
])
)

.. code-block:: sql

SELECT id,fname FROM customers WHERE is_registered AND age>=18 AND lname = "Jones"

Grouping and Aggregating
""""""""""""""""""""""""

Grouping allows for aggregated results and works similar to ``SELECT`` clauses.

.. code-block:: python

from pypika import functions as fn

customers = Table('customers')
q = Query \
.from_(customers) \
.where(customers.age >= 18) \
.groupby(customers.id) \
.select(customers.id, fn.Sum(customers.revenue))

.. code-block:: sql

SELECT id,SUM("revenue") FROM "customers" WHERE "age">=18 GROUP BY "id"

After adding a ``GROUP BY`` clause to a query, the ``HAVING`` clause becomes available. The method
``Query.having()`` takes a ``Criterion`` parameter similar to the method ``Query.where()``.

.. code-block:: python

from pypika import functions as fn

payments = Table('payments')
q = Query \
.from_(payments) \
.where(payments.transacted[date(2015, 1, 1):date(2016, 1, 1)]) \
.groupby(payments.customer_id) \
.having(fn.Sum(payments.total) >= 1000) \
.select(payments.customer_id, fn.Sum(payments.total))

.. code-block:: sql

SELECT customer_id,SUM(total) FROM payments
WHERE transacted BETWEEN '2015-01-01' AND '2016-01-01'
GROUP BY customer_id HAVING SUM(total)>=1000

The ``QUALIFY`` clause can be used to filter rows based on window function results. This is particularly useful
when you want to filter after window functions have been evaluated.

.. code-block:: python

from pypika import Query, Table, analytics as an

table = Table('events')
rank_expr = an.Rank().over(table.user_id).orderby(table.created_at)

q = Query.from_(table).select('*').qualify(rank_expr == 1)

.. code-block:: sql

SELECT * FROM "events" QUALIFY RANK() OVER(PARTITION BY "user_id" ORDER BY "created_at")=1

GROUP BY Modifiers
""""""""""""""""""

The ``ROLLUP`` modifier allows for aggregating to higher levels than the given groups, called super-aggregates.

.. code-block:: python

from pypika import Query, Table, Rollup, functions as fn

products = Table('products')
q = Query.from_(products).select(
products.id, products.category, fn.Sum(products.price)
).rollup(products.id, products.category)

.. code-block:: sql

SELECT "id","category",SUM("price") FROM "products" GROUP BY ROLLUP("id","category")

Joining Tables and Subqueries
"""""""""""""""""""""""""""""

Tables and subqueries can be joined to any query using the ``Query.join()`` method. Joins can be performed with either
a ``USING`` or ``ON`` clauses. The ``USING`` clause can be used when both tables/subqueries contain the same field and
the ``ON`` clause can be used with a criterion. To perform a join, ``...join()`` can be chained but then must be
followed immediately by ``...on()`` or ``...using(*field)``.

Join Types
~~~~~~~~~~

All join types are supported by |Brand|.

.. code-block:: python

Query \
.from_(base_table)
...
.join(join_table, JoinType.left)
...

.. code-block:: python

Query \
.from_(base_table)
...
.left_join(join_table) \
.left_outer_join(join_table) \
.right_join(join_table) \
.right_outer_join(join_table) \
.inner_join(join_table) \
.outer_join(join_table) \
.full_outer_join(join_table) \
.cross_join(join_table) \
.hash_join(join_table) \
...

See the list of join types here ``pypika.enums.JoinTypes``

Example of a join using `ON`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

history, customers = Tables('history', 'customers')
q = Query \
.from_(history) \
.join(customers) \
.on(history.customer_id == customers.id) \
.select(history.star) \
.where(customers.id == 5)

.. code-block:: sql

SELECT "history".* FROM "history" JOIN "customers" ON "history"."customer_id"="customers"."id" WHERE "customers"."id"=5

As a shortcut, the ``Query.join().on_field()`` function is provided for joining the (first) table in the ``FROM`` clause
with the joined table when the field name(s) are the same in both tables.

Example of a join using `ON`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

history, customers = Tables('history', 'customers')
q = Query \
.from_(history) \
.join(customers) \
.on_field('customer_id', 'group') \
.select(history.star) \
.where(customers.group == 'A')

.. code-block:: sql

SELECT "history".* FROM "history" JOIN "customers" ON "history"."customer_id"="customers"."customer_id" AND "history"."group"="customers"."group" WHERE "customers"."group"='A'

Example of a join using `USING`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

history, customers = Tables('history', 'customers')
q = Query \
.from_(history) \
.join(customers) \
.using('customer_id') \
.select(history.star) \
.where(customers.id == 5)

.. code-block:: sql

SELECT "history".* FROM "history" JOIN "customers" USING "customer_id" WHERE "customers"."id"=5

Example of a correlated subquery in the `SELECT`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

history, customers = Tables('history', 'customers')
last_purchase_at = Query.from_(history).select(
history.purchase_at
).where(history.customer_id==customers.customer_id).orderby(
history.purchase_at, order=Order.desc
).limit(1)
q = Query.from_(customers).select(
customers.id, last_purchase_at.as_('last_purchase_at')
)

.. code-block:: sql

SELECT
"id",
(SELECT "history"."purchase_at"
FROM "history"
WHERE "history"."customer_id" = "customers"."customer_id"
ORDER BY "history"."purchase_at" DESC
LIMIT 1) "last_purchase_at"
FROM "customers"

Unions
""""""

Both ``UNION`` and ``UNION ALL`` are supported. ``UNION DISTINCT`` is synonymous with ``UNION`` so |Brand| does not
provide a separate function for it. Unions require that queries have the same number of ``SELECT`` clauses so
trying to cast a unioned query to string will throw a ``SetOperationException`` if the column sizes are mismatched.

To create a union query, use either the ``Query.union()`` method or `+` operator with two query instances. For a
union all, use ``Query.union_all()`` or the `*` operator.

.. code-block:: python

provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
provider_a.created_time, provider_a.foo, provider_a.bar
) + Query.from_(provider_b).select(
provider_b.created_time, provider_b.fiz, provider_b.buz
)

.. code-block:: sql

SELECT "created_time","foo","bar" FROM "provider_a" UNION SELECT "created_time","fiz","buz" FROM "provider_b"

Intersect
"""""""""

``INTERSECT`` is supported. Intersects require that queries have the same number of ``SELECT`` clauses so
trying to cast a intersected query to string will throw a ``SetOperationException`` if the column sizes are mismatched.

To create a intersect query, use the ``Query.intersect()`` method.

.. code-block:: python

provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
provider_a.created_time, provider_a.foo, provider_a.bar
)
r = Query.from_(provider_b).select(
provider_b.created_time, provider_b.fiz, provider_b.buz
)
intersected_query = q.intersect(r)

.. code-block:: sql

SELECT "created_time","foo","bar" FROM "provider_a" INTERSECT SELECT "created_time","fiz","buz" FROM "provider_b"

Minus
"""""

``MINUS`` is supported. Minus require that queries have the same number of ``SELECT`` clauses so
trying to cast a minus query to string will throw a ``SetOperationException`` if the column sizes are mismatched.

To create a minus query, use either the ``Query.minus()`` method or `-` operator with two query instances.

.. code-block:: python

provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
provider_a.created_time, provider_a.foo, provider_a.bar
)
r = Query.from_(provider_b).select(
provider_b.created_time, provider_b.fiz, provider_b.buz
)
minus_query = q.minus(r)

(or)

minus_query = Query.from_(provider_a).select(
provider_a.created_time, provider_a.foo, provider_a.bar
) - Query.from_(provider_b).select(
provider_b.created_time, provider_b.fiz, provider_b.buz
)

.. code-block:: sql

SELECT "created_time","foo","bar" FROM "provider_a" MINUS SELECT "created_time","fiz","buz" FROM "provider_b"

EXCEPT
""""""

``EXCEPT`` is supported. Minus require that queries have the same number of ``SELECT`` clauses so
trying to cast a except query to string will throw a ``SetOperationException`` if the column sizes are mismatched.

To create a except query, use the ``Query.except_of()`` method.

.. code-block:: python

provider_a, provider_b = Tables('provider_a', 'provider_b')
q = Query.from_(provider_a).select(
provider_a.created_time, provider_a.foo, provider_a.bar
)
r = Query.from_(provider_b).select(
provider_b.created_time, provider_b.fiz, provider_b.buz
)
minus_query = q.except_of(r)

.. code-block:: sql

SELECT "created_time","foo","bar" FROM "provider_a" EXCEPT SELECT "created_time","fiz","buz" FROM "provider_b"

Date, Time, and Intervals
"""""""""""""""""""""""""

Using ``pypika.Interval``, queries can be constructed with date arithmetic. Any combination of intervals can be
used except for weeks and quarters, which must be used separately and will ignore any other values if selected.

.. code-block:: python

from pypika import functions as fn

fruits = Tables('fruits')
q = Query.from_(fruits) \
.select(fruits.id, fruits.name) \
.where(fruits.harvest_date + Interval(months=1) < fn.Now())

.. code-block:: sql

SELECT id,name FROM fruits WHERE harvest_date+INTERVAL 1 MONTH= 18)
)

parameter = QmarkParameter()
sql = q.get_sql(parameter=parameter)
params = parameter.get_parameters()

# sql: SELECT * FROM "customers" WHERE "status"=? AND "age">=?
# params: ['active', 18]

This works with all parameter types. For dict-based parameters like ``NamedParameter``:

.. code-block:: python

from pypika import Query, Table, NamedParameter

customers = Table('customers')
q = Query.from_(customers).select('*').where(customers.status == 'active')

parameter = NamedParameter()
sql = q.get_sql(parameter=parameter)
params = parameter.get_parameters()

# sql: SELECT * FROM "customers" WHERE "status"=:param1
# params: {'param1': 'active'}

Temporal support
^^^^^^^^^^^^^^^^

Temporal criteria can be added to the tables.

Select
""""""

Here is a select using system time.

.. code-block:: python

t = Table("abc")
q = Query.from_(t.for_(SYSTEM_TIME.as_of('2020-01-01'))).select("*")

This produces:

.. code-block:: sql

SELECT * FROM "abc" FOR SYSTEM_TIME AS OF '2020-01-01'

You can also use between.

.. code-block:: python

t = Table("abc")
q = Query.from_(
t.for_(SYSTEM_TIME.between('2020-01-01', '2020-02-01'))
).select("*")

This produces:

.. code-block:: sql

SELECT * FROM "abc" FOR SYSTEM_TIME BETWEEN '2020-01-01' AND '2020-02-01'

You can also use a period range.

.. code-block:: python

t = Table("abc")
q = Query.from_(
t.for_(SYSTEM_TIME.from_to('2020-01-01', '2020-02-01'))
).select("*")

This produces:

.. code-block:: sql

SELECT * FROM "abc" FOR SYSTEM_TIME FROM '2020-01-01' TO '2020-02-01'

Finally you can select for all times:

.. code-block:: python

t = Table("abc")
q = Query.from_(t.for_(SYSTEM_TIME.all_())).select("*")

This produces:

.. code-block:: sql

SELECT * FROM "abc" FOR SYSTEM_TIME ALL

A user defined period can also be used in the following manner.

.. code-block:: python

t = Table("abc")
q = Query.from_(
t.for_(t.valid_period.between('2020-01-01', '2020-02-01'))
).select("*")

This produces:

.. code-block:: sql

SELECT * FROM "abc" FOR "valid_period" BETWEEN '2020-01-01' AND '2020-02-01'

Joins
"""""

With joins, when the table object is used when specifying columns, it is
important to use the table from which the temporal constraint was generated.
This is because `Table("abc")` is not the same table as `Table("abc").for_(...)`.
The following example demonstrates this.

.. code-block:: python

t0 = Table("abc").for_(SYSTEM_TIME.as_of('2020-01-01'))
t1 = Table("efg").for_(SYSTEM_TIME.as_of('2020-01-01'))
query = (
Query.from_(t0)
.join(t1)
.on(t0.foo == t1.bar)
.select("*")
)

This produces:

.. code-block:: sql

SELECT * FROM "abc" FOR SYSTEM_TIME AS OF '2020-01-01'
JOIN "efg" FOR SYSTEM_TIME AS OF '2020-01-01'
ON "abc"."foo"="efg"."bar"

Update & Deletes
""""""""""""""""

An update can be written as follows:

.. code-block:: python

t = Table("abc")
q = Query.update(
t.for_portion(
SYSTEM_TIME.from_to('2020-01-01', '2020-02-01')
)
).set("foo", "bar")

This produces:

.. code-block:: sql

UPDATE "abc"
FOR PORTION OF SYSTEM_TIME FROM '2020-01-01' TO '2020-02-01'
SET "foo"='bar'

Here is a delete:

.. code-block:: python

t = Table("abc")
q = Query.from_(
t.for_portion(t.valid_period.from_to('2020-01-01', '2020-02-01'))
).delete()

This produces:

.. code-block:: sql

DELETE FROM "abc"
FOR PORTION OF "valid_period" FROM '2020-01-01' TO '2020-02-01'

Creating Tables
^^^^^^^^^^^^^^^

The entry point for creating tables is ``pypika.Query.create_table``, which is used with the class ``pypika.Column``.
As with selecting data, first the table should be specified. This can be either a
string or a `pypika.Table`. Then the columns, and constraints. Here's an example
that demonstrates much of the functionality.

.. code-block:: python

stmt = Query \
.create_table("person") \
.columns(
Column("id", "INT", nullable=False),
Column("first_name", "VARCHAR(100)", nullable=False),
Column("last_name", "VARCHAR(100)", nullable=False),
Column("phone_number", "VARCHAR(20)", nullable=True),
Column("status", "VARCHAR(20)", nullable=False, default=ValueWrapper("NEW")),
Column("date_of_birth", "DATETIME")) \
.unique("last_name", "first_name") \
.primary_key("id")

This produces:

.. code-block:: sql

CREATE TABLE "person" (
"id" INT NOT NULL,
"first_name" VARCHAR(100) NOT NULL,
"last_name" VARCHAR(100) NOT NULL,
"phone_number" VARCHAR(20) NULL,
"status" VARCHAR(20) NOT NULL DEFAULT 'NEW',
"date_of_birth" DATETIME,
UNIQUE ("last_name","first_name"),
PRIMARY KEY ("id")
)

There is also support for creating a table from a query.

.. code-block:: python

stmt = Query.create_table("names").as_select(
Query.from_("person").select("last_name", "first_name")
)

This produces:

.. code-block:: sql

CREATE TABLE "names" AS (SELECT "last_name","first_name" FROM "person")

TEMPORARY and UNLOGGED tables can also be created:

.. code-block:: python

from pypika import Query, Table, Columns

columns = Columns(('id', 'INT'), ('name', 'VARCHAR(100)'))

Query.create_table('temp_items').columns(*columns).temporary()
Query.create_table('fast_items').columns(*columns).unlogged()

.. code-block:: sql

CREATE TEMPORARY TABLE "temp_items" ("id" INT,"name" VARCHAR(100))

CREATE UNLOGGED TABLE "fast_items" ("id" INT,"name" VARCHAR(100))

Managing Table Indices
^^^^^^^^^^^^^^^^^^^^^^

Create Indices
""""""""""""""""

The entry point for creating indices is ``pypika.Query.create_index``.
An index name (as ``str``) or a ``pypika.terms.Index`` a table (as ``str`` or ``pypika.Table``) and
columns (as ``pypika.Column``) must be specified.

.. code-block:: python

my_index = Index("my_index")
person = Table("person")
stmt = Query \
.create_index(my_index) \
.on(person) \
.columns(person.first_name, person.last_name)

This produces:

.. code-block:: sql

CREATE INDEX my_index
ON person (first_name, last_name)

It is also possible to create a unique index

.. code-block:: python

my_index = Index("my_index")
person = Table("person")
stmt = Query \
.create_index(my_index) \
.on(person) \
.columns(person.first_name, person.last_name) \
.unique()

This produces:

.. code-block:: sql

CREATE UNIQUE INDEX my_index
ON person (first_name, last_name)

It is also possible to create an index if it does not exist

.. code-block:: python

my_index = Index("my_index")
person = Table("person")
stmt = Query \
.create_index(my_index) \
.on(person) \
.columns(person.first_name, person.last_name) \
.if_not_exists()

This produces:

.. code-block:: sql

CREATE INDEX IF NOT EXISTS my_index
ON person (first_name, last_name)

Drop Indices
""""""""""""""""

Then entry point for dropping indices is ``pypika.Query.drop_index``.
It takes either ``str`` or ``pypika.terms.Index`` as an argument.

.. code-block:: python

my_index = Index("my_index")
stmt = Query.drop_index(my_index)

This produces:

.. code-block:: sql

DROP INDEX my_index

It is also possible to drop an index if it exists

.. code-block:: python

my_index = Index("my_index")
stmt = Query.drop_index(my_index).if_exists()

This produces:

.. code-block:: sql

DROP INDEX IF EXISTS my_index

Handling Different Database Platforms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There can sometimes be differences between how database vendors implement SQL in their platform, for example
which quote characters are used. To ensure that the correct SQL standard is used for your platform,
the platform-specific Query classes can be used.

.. code-block:: python

from pypika import MySQLQuery, MSSQLQuery, PostgreSQLQuery, OracleQuery, VerticaQuery, ClickHouseQuery

You can use these query classes as a drop in replacement for the default ``Query`` class shown in the other examples.

ClickHouse-Specific Features
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

|Brand| provides several ClickHouse-specific query features through the ``ClickHouseQuery`` class.

FINAL
"""""

The ``FINAL`` modifier forces ClickHouse to fully merge data before returning results, useful with
ReplacingMergeTree and CollapsingMergeTree tables.

.. code-block:: python

from pypika import ClickHouseQuery, Table

t = Table('events')
q = ClickHouseQuery.from_(t).select(t.user_id, t.event).final()

.. code-block:: sql

SELECT "user_id","event" FROM "events" FINAL

SAMPLE
""""""

The ``SAMPLE`` clause enables approximate query processing on a fraction of data.

.. code-block:: python

from pypika import ClickHouseQuery, Table

t = Table('events')
q = ClickHouseQuery.from_(t).select(t.user_id).sample(10)

.. code-block:: sql

SELECT "user_id" FROM "events" SAMPLE 10

You can also specify an offset:

.. code-block:: python

q = ClickHouseQuery.from_(t).select(t.user_id).sample(10, 5)

.. code-block:: sql

SELECT "user_id" FROM "events" SAMPLE 10 OFFSET 5

DISTINCT ON
"""""""""""

ClickHouse supports ``DISTINCT ON`` to return distinct rows based on specific columns.

.. code-block:: python

from pypika import ClickHouseQuery, Table

t = Table('users')
q = ClickHouseQuery.from_(t).distinct_on('department', t.role).select('name', 'department', 'role')

.. code-block:: sql

SELECT DISTINCT ON("department","role") "name","department","role" FROM "users"

LIMIT BY
""""""""

The ``LIMIT BY`` clause limits the number of rows per group of column values.

.. code-block:: python

from pypika import ClickHouseQuery, Table

t = Table('events')
q = ClickHouseQuery.from_(t).select('user_id', 'event', 'timestamp').limit_by(3, 'user_id')

.. code-block:: sql

SELECT "user_id","event","timestamp" FROM "events" LIMIT 3 BY ("user_id")

You can also specify an offset with ``limit_offset_by``:

.. code-block:: python

q = ClickHouseQuery.from_(t).select('user_id', 'event').limit_offset_by(3, 1, 'user_id')

.. code-block:: sql

SELECT "user_id","event" FROM "events" LIMIT 3 OFFSET 1 BY ("user_id")

Oracle-Specific Features
^^^^^^^^^^^^^^^^^^^^^^^^

LIMIT and OFFSET
""""""""""""""""

Oracle queries support ``LIMIT`` and ``OFFSET`` using the ``FETCH NEXT ... ROWS ONLY`` and ``OFFSET ... ROWS`` syntax.

.. code-block:: python

from pypika import OracleQuery, Table

t = Table('employees')
q = OracleQuery.from_(t).select(t.name).limit(10)

.. code-block:: sql

SELECT name FROM employees FETCH NEXT 10 ROWS ONLY

With offset:

.. code-block:: python

q = OracleQuery.from_(t).select(t.name).limit(10).offset(20)

.. code-block:: sql

SELECT name FROM employees OFFSET 20 ROWS FETCH NEXT 10 ROWS ONLY

Jira Query Language (JQL)
^^^^^^^^^^^^^^^^^^^^^^^^^

|Brand| supports generating Jira Query Language expressions through the ``JiraQuery`` class.

.. code-block:: python

from pypika import JiraQuery

J = JiraQuery.Table()
query = (
JiraQuery.where(J.project.isin(["PROJ1", "PROJ2"]))
.where(J.issuetype == "Bug")
.where(J.labels.isempty() | J.labels.notin(["stale", "wontfix"]))
)

.. code-block:: sql

project IN ("PROJ1","PROJ2") AND issuetype="Bug" AND (labels is EMPTY OR labels NOT IN ("stale","wontfix"))

JQL fields support ``isempty()`` and ``notempty()`` methods for checking empty/non-empty values.

.. _advanced_end:

Chaining Functions
^^^^^^^^^^^^^^^^^^

The ``QueryBuilder.pipe`` method gives a more readable alternative while chaining functions.

.. code-block:: python

# This
(
query
.pipe(func1, *args)
.pipe(func2, **kwargs)
.pipe(func3)
)

# Is equivalent to this
func3(func2(func1(query, *args), **kwargs))

Or for a more concrete example:

.. code-block:: python

from pypika import Field, Query, functions as fn
from pypika.queries import QueryBuilder

def filter_days(query: QueryBuilder, col, num_days: int) -> QueryBuilder:
if isinstance(col, str):
col = Field(col)

return query.where(col > fn.Now() - num_days)

def count_groups(query: QueryBuilder, *groups) -> QueryBuilder:
return query.groupby(*groups).select(*groups, fn.Count("*").as_("n_rows"))

base_query = Query.from_("table")

query = (
base_query
.pipe(filter_days, "date", num_days=7)
.pipe(count_groups, "col1", "col2")
)

This produces:

.. code-block:: sql

SELECT "col1","col2",COUNT(*) n_rows
FROM "table"
WHERE "date">NOW()-7
GROUP BY "col1","col2"

.. _tutorial_end:

.. _contributing_start:

Contributing
------------

We welcome community contributions to |Brand|. Please see the `contributing guide <6_contributing.html>`_ to more info.

.. _contributing_end:

.. _license_start:

License
-------

Copyright 2020 KAYAK Germany, GmbH

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Crafted with ♥ in Berlin.

.. _license_end:

.. _appendix_start:

.. |Brand| replace:: *PyPika*

.. _appendix_end:

.. _available_badges_start:

.. |BuildStatus| image:: https://github.com/kayak/pypika/workflows/Unit%20Tests/badge.svg
:target: https://github.com/kayak/pypika/actions
.. |CoverageStatus| image:: https://coveralls.io/repos/kayak/pypika/badge.svg?branch=master
:target: https://coveralls.io/github/kayak/pypika?branch=master
.. |Codacy| image:: https://api.codacy.com/project/badge/Grade/6d7e44e5628b4839a23da0bd82eaafcf
:target: https://www.codacy.com/app/twheys/pypika
.. |Docs| image:: https://readthedocs.org/projects/pypika/badge/?version=latest
:target: http://pypika.readthedocs.io/en/latest/
.. |PyPi| image:: https://img.shields.io/pypi/v/pypika.svg?style=flat
:target: https://pypi.python.org/pypi/pypika
.. |License| image:: https://img.shields.io/hexpm/l/plug.svg?maxAge=2592000
:target: http://www.apache.org/licenses/LICENSE-2.0

.. _available_badges_end: