Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ibis-project/ibis

the portable Python dataframe library
https://github.com/ibis-project/ibis

bigquery clickhouse database datafusion duckdb impala mssql mysql pandas polars postgresql pyarrow pyspark python snowflake sql sqlite trino

Last synced: 3 days ago
JSON representation

the portable Python dataframe library

Awesome Lists containing this project

README

        

# Ibis

[![Documentation status](https://img.shields.io/badge/docs-docs.ibis--project.org-blue.svg)](http://ibis-project.org)
[![Project chat](https://img.shields.io/badge/zulip-join_chat-purple.svg?logo=zulip)](https://ibis-project.zulipchat.com)
[![Anaconda badge](https://anaconda.org/conda-forge/ibis-framework/badges/version.svg)](https://anaconda.org/conda-forge/ibis-framework)
[![PyPI](https://img.shields.io/pypi/v/ibis-framework.svg)](https://pypi.org/project/ibis-framework)
[![Build status](https://github.com/ibis-project/ibis/actions/workflows/ibis-main.yml/badge.svg)](https://github.com/ibis-project/ibis/actions/workflows/ibis-main.yml?query=branch%3Amain)
[![Build status](https://github.com/ibis-project/ibis/actions/workflows/ibis-backends.yml/badge.svg)](https://github.com/ibis-project/ibis/actions/workflows/ibis-backends.yml?query=branch%3Amain)
[![Codecov branch](https://img.shields.io/codecov/c/github/ibis-project/ibis/main.svg)](https://codecov.io/gh/ibis-project/ibis)

## What is Ibis?

Ibis is the portable Python dataframe library:

- Fast local dataframes (via DuckDB by default)
- Lazy dataframe expressions
- Interactive mode for iterative data exploration
- [Compose Python dataframe and SQL code](#python--sql-better-together)
- Use the same dataframe API for [nearly 20 backends](#backends)
- Iterate locally and deploy remotely by [changing a single line of code](#portability)

See the documentation on ["Why Ibis?"](https://ibis-project.org/why) to learn more.

## Getting started

You can `pip install` Ibis with a backend and example data:

```bash
pip install 'ibis-framework[duckdb,examples]'
```

> πŸ’‘ **Tip**
>
> See the [installation guide](https://ibis-project.org/install) for more installation options.

Then use Ibis:

```python
>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex ┃ year ┃
┑━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
β”‚ string β”‚ string β”‚ float64 β”‚ float64 β”‚ int64 β”‚ int64 β”‚ string β”‚ int64 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Adelie β”‚ Torgersen β”‚ 39.1 β”‚ 18.7 β”‚ 181 β”‚ 3750 β”‚ male β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 39.5 β”‚ 17.4 β”‚ 186 β”‚ 3800 β”‚ female β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 40.3 β”‚ 18.0 β”‚ 195 β”‚ 3250 β”‚ female β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ NULL β”‚ NULL β”‚ NULL β”‚ NULL β”‚ NULL β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 36.7 β”‚ 19.3 β”‚ 193 β”‚ 3450 β”‚ female β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 39.3 β”‚ 20.6 β”‚ 190 β”‚ 3650 β”‚ male β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 38.9 β”‚ 17.8 β”‚ 181 β”‚ 3625 β”‚ female β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 39.2 β”‚ 19.6 β”‚ 195 β”‚ 4675 β”‚ male β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 34.1 β”‚ 18.1 β”‚ 193 β”‚ 3475 β”‚ NULL β”‚ 2007 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 42.0 β”‚ 20.2 β”‚ 190 β”‚ 4250 β”‚ NULL β”‚ 2007 β”‚
β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚ … β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
>>> g = t.group_by("species", "island").agg(count=t.count()).order_by("count")
>>> g
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ count ┃
┑━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
β”‚ string β”‚ string β”‚ int64 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Adelie β”‚ Biscoe β”‚ 44 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 52 β”‚
β”‚ Adelie β”‚ Dream β”‚ 56 β”‚
β”‚ Chinstrap β”‚ Dream β”‚ 68 β”‚
β”‚ Gentoo β”‚ Biscoe β”‚ 124 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
```

> πŸ’‘ **Tip**
>
> See the [getting started tutorial](https://ibis-project.org/tutorials/getting_started) for a full introduction to Ibis.

## Python + SQL: better together

For most backends, Ibis works by compiling its dataframe expressions into SQL:

```python
>>> ibis.to_sql(g)
SELECT
"t1"."species",
"t1"."island",
"t1"."count"
FROM (
SELECT
"t0"."species",
"t0"."island",
COUNT(*) AS "count"
FROM "penguins" AS "t0"
GROUP BY
1,
2
) AS "t1"
ORDER BY
"t1"."count" ASC
```

You can mix SQL and Python code:

```python
>>> a = t.sql("SELECT species, island, count(*) AS count FROM penguins GROUP BY 1, 2")
>>> a
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ count ┃
┑━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
β”‚ string β”‚ string β”‚ int64 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Adelie β”‚ Torgersen β”‚ 52 β”‚
β”‚ Adelie β”‚ Biscoe β”‚ 44 β”‚
β”‚ Adelie β”‚ Dream β”‚ 56 β”‚
β”‚ Gentoo β”‚ Biscoe β”‚ 124 β”‚
β”‚ Chinstrap β”‚ Dream β”‚ 68 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
>>> b = a.order_by("count")
>>> b
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island ┃ count ┃
┑━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
β”‚ string β”‚ string β”‚ int64 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Adelie β”‚ Biscoe β”‚ 44 β”‚
β”‚ Adelie β”‚ Torgersen β”‚ 52 β”‚
β”‚ Adelie β”‚ Dream β”‚ 56 β”‚
β”‚ Chinstrap β”‚ Dream β”‚ 68 β”‚
β”‚ Gentoo β”‚ Biscoe β”‚ 124 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
```

This allows you to combine the flexibility of Python with the scale and performance of modern SQL.

## Backends

Ibis supports nearly 20 backends:

- [Apache DataFusion](https://ibis-project.org/backends/datafusion/)
- [Apache Druid](https://ibis-project.org/backends/druid/)
- [Apache Flink](https://ibis-project.org/backends/flink)
- [Apache Impala](https://ibis-project.org/backends/impala/)
- [Apache PySpark](https://ibis-project.org/backends/pyspark/)
- [BigQuery](https://ibis-project.org/backends/bigquery/)
- [ClickHouse](https://ibis-project.org/backends/clickhouse/)
- [DuckDB](https://ibis-project.org/backends/duckdb/)
- [Exasol](https://ibis-project.org/backends/exasol)
- [MySQL](https://ibis-project.org/backends/mysql/)
- [Oracle](https://ibis-project.org/backends/oracle/)
- [Polars](https://ibis-project.org/backends/polars/)
- [PostgreSQL](https://ibis-project.org/backends/postgresql/)
- [RisingWave](https://ibis-project.org/backends/risingwave/)
- [SQL Server](https://ibis-project.org/backends/mssql/)
- [SQLite](https://ibis-project.org/backends/sqlite/)
- [Snowflake](https://ibis-project.org/backends/snowflake)
- [Trino](https://ibis-project.org/backends/trino/)

## How it works

Most Python dataframes are tightly coupled to their execution engine. And many databases only support SQL, with no Python API. Ibis solves this problem by providing a common API for data manipulation in Python, and compiling that API into the backend’s native language. This means you can learn a single API and use it across any supported backend (execution engine).

Ibis broadly supports two types of backend:

1. SQL-generating backends
2. DataFrame-generating backends

![Ibis backend types](./docs/images/backends.png)

## Portability

To use different backends, you can set the backend Ibis uses:

```python
>>> ibis.set_backend("duckdb")
>>> ibis.set_backend("polars")
>>> ibis.set_backend("datafusion")
```

Typically, you'll create a connection object:

```python
>>> con = ibis.duckdb.connect()
>>> con = ibis.polars.connect()
>>> con = ibis.datafusion.connect()
```

And work with tables in that backend:

```python
>>> con.list_tables()
['penguins']
>>> t = con.table("penguins")
```

You can also read from common file formats like CSV or Apache Parquet:

```python
>>> t = con.read_csv("penguins.csv")
>>> t = con.read_parquet("penguins.parquet")
```

This allows you to iterate locally and deploy remotely by changing a single line of code.

> πŸ’‘ **Tip**
>
> Check out [the blog on backend agnostic arrays](https://ibis-project.org/posts/backend-agnostic-arrays/) for one example using the same code across DuckDB and BigQuery.

## Community and contributing

Ibis is an open source project and welcomes contributions from anyone in the community.

- Read [the contributing guide](https://github.com/ibis-project/ibis/blob/main/docs/CONTRIBUTING.md).
- We care about keeping the community welcoming for all. Check out [the code of conduct](https://github.com/ibis-project/ibis/blob/main/CODE_OF_CONDUCT.md).
- The Ibis project is open sourced under the [Apache License](https://github.com/ibis-project/ibis/blob/main/LICENSE.txt).

Join our community by interacting on GitHub or chatting with us on [Zulip](https://ibis-project.zulipchat.com/).

For more information visit https://ibis-project.org/.

## Governance

The Ibis project is an [independently governed](https://github.com/ibis-project/governance/blob/main/governance.md) open source community project to build and maintain the portable Python dataframe library. Ibis has contributors across a range of data companies and institutions.