https://github.com/mause/duckdb_engine

SQLAlchemy driver for DuckDB
https://github.com/mause/duckdb_engine

duckdb duckdb-engine python sql sqlalchemy

Last synced: 8 months ago
JSON representation

SQLAlchemy driver for DuckDB

Host: GitHub
URL: https://github.com/mause/duckdb_engine
Owner: Mause
License: mit
Created: 2020-09-28T07:09:14.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2025-04-03T17:18:51.000Z (9 months ago)
Last Synced: 2025-04-04T14:25:00.368Z (9 months ago)
Topics: duckdb, duckdb-engine, python, sql, sqlalchemy
Language: Python
Homepage:
Size: 2.34 MB
Stars: 407
Watchers: 3
Forks: 49
Open Issues: 46
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          # duckdb_engine

[![Supported Python Versions](https://img.shields.io/pypi/pyversions/duckdb-engine)](https://pypi.org/project/duckdb-engine/) [![PyPI version](https://badge.fury.io/py/duckdb-engine.svg)](https://badge.fury.io/py/duckdb-engine) [![PyPI Downloads](https://img.shields.io/pypi/dm/duckdb-engine.svg)](https://pypi.org/project/duckdb-engine/) [![codecov](https://codecov.io/gh/Mause/duckdb_engine/graph/badge.svg)](https://codecov.io/gh/Mause/duckdb_engine)

Basic SQLAlchemy driver for [DuckDB](https://duckdb.org/)

- [duckdb\_engine](#duckdb_engine)

  - [Installation](#installation)

  - [Usage](#usage)

  - [Usage in IPython/Jupyter](#usage-in-ipythonjupyter)

  - [Configuration](#configuration)

  - [How to register a pandas DataFrame](#how-to-register-a-pandas-dataframe)

  - [Things to keep in mind](#things-to-keep-in-mind)

    - [Auto-incrementing ID columns](#auto-incrementing-id-columns)

    - [Pandas `read_sql()` chunksize](#pandas-read_sql-chunksize)

    - [Unsigned integer support](#unsigned-integer-support)

  - [Alembic Integration](#alembic-integration)

  - [Preloading extensions (experimental)](#preloading-extensions-experimental)

  - [Registering Filesystems](#registering-filesystems)

  - [The name](#the-name)

## Installation

```sh

$ pip install duckdb-engine

```

DuckDB Engine also has a conda feedstock available, the instructions for the use of which are available in it's [repository](https://github.com/conda-forge/duckdb-engine-feedstock).

## Usage

Once you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search

```python

from sqlalchemy import Column, Integer, Sequence, String, create_engine

from sqlalchemy.ext.declarative import declarative_base

from sqlalchemy.orm.session import Session

Base = declarative_base()

class FakeModel(Base):  # type: ignore

    __tablename__ = "fake"

    id = Column(Integer, Sequence("fakemodel_id_sequence"), primary_key=True)

    name = Column(String)

eng = create_engine("duckdb:///:memory:")

Base.metadata.create_all(eng)

session = Session(bind=eng)

session.add(FakeModel(name="Frank"))

session.commit()

frank = session.query(FakeModel).one()

assert frank.name == "Frank"

```

## Usage in IPython/Jupyter

With IPython-SQL and DuckDB-Engine you can query DuckDB natively in your notebook! Check out [DuckDB's documentation](https://duckdb.org/docs/guides/python/jupyter) or

Alex Monahan's great demo of this on [his blog](https://alex-monahan.github.io/2021/08/22/Python_and_SQL_Better_Together.html#an-example-workflow-with-duckdb).

## Configuration

You can configure DuckDB by passing `connect_args` to the create_engine function

```python

create_engine(

    'duckdb:///:memory:',

    connect_args={

        'read_only': False,

        'config': {

            'memory_limit': '500mb'

        }

    }

)

```

The supported configuration parameters are listed in the [DuckDB docs](https://duckdb.org/docs/sql/configuration)

## How to register a pandas DataFrame

```python

conn = create_engine("duckdb:///:memory:").connect()

# with SQLAlchemy 1.3

conn.execute("register", ("dataframe_name", pd.DataFrame(...)))

# with SQLAlchemy 1.4+

conn.execute(text("register(:name, :df)"), {"name": "test_df", "df": df})

conn.execute("select * from dataframe_name")

```

## Things to keep in mind

Duckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the `duckdb_engine` dialect is derived from the `postgresql` dialect, `SQLAlchemy` may try to use PostgreSQL-only features. Below are some caveats to look out for.

### Auto-incrementing ID columns

When defining an Integer column as a primary key, `SQLAlchemy` uses the `SERIAL` datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the `SQLAlchemy.Sequence()` object to auto-increment the key. For more information on sequences, you can find the [`SQLAlchemy Sequence` documentation here](https://docs.sqlalchemy.org/en/14/core/defaults.html#associating-a-sequence-as-the-server-side-default).

The following example demonstrates how to create an auto-incrementing ID column for a simple table:

```python

>>> import sqlalchemy

>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')

>>> metadata = sqlalchemy.MetaData(engine)

>>> user_id_seq = sqlalchemy.Sequence('user_id_seq')

>>> users_table = sqlalchemy.Table(

...     'users',

...     metadata,

...     sqlalchemy.Column(

...         'id',

...         sqlalchemy.Integer,

...         user_id_seq,

...         server_default=user_id_seq.next_value(),

...         primary_key=True,

...     ),

... )

>>> metadata.create_all(bind=engine)

```

### Pandas `read_sql()` chunksize

**NOTE**: this is no longer an issue in versions `>=0.5.0` of `duckdb`

The `pandas.read_sql()` method can read tables from `duckdb_engine` into DataFrames, but the `sqlalchemy.engine.result.ResultProxy` trips up when `fetchmany()` is called. Therefore, for now `chunksize=None` (default) is necessary when reading duckdb tables into DataFrames. For example:

```python

>>> import pandas as pd

>>> import sqlalchemy

>>> engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')

>>> df = pd.read_sql('users', engine)                ### Works as expected

>>> df = pd.read_sql('users', engine, chunksize=25)  ### Throws an exception

```

### Unsigned integer support

Unsigned integers are supported by DuckDB, and are available in [`duckdb_engine.datatypes`](duckdb_engine/datatypes.py).

## Alembic Integration

SQLAlchemy's companion library `alembic` can optionally be used to manage database migrations.

This support can be enabling by adding an Alembic implementation class for the `duckdb` dialect.

```python

from alembic.ddl.impl import DefaultImpl

class AlembicDuckDBImpl(DefaultImpl):

    """Alembic implementation for DuckDB."""

    __dialect__ = "duckdb"

```

After loading this class with your program, Alembic will no longer raise an error when generating or applying migrations.

## Preloading extensions (experimental)

> DuckDB 0.9.0+ includes builtin support for autoinstalling and autoloading of extensions, see [the extension documentation](http://duckdb.org/docs/archive/0.9.0/extensions/overview#autoloadable-extensions) for more information.

Until the DuckDB python client allows you to natively preload extensions, I've added experimental support via a `connect_args` parameter

```python

from sqlalchemy import create_engine

create_engine(

    'duckdb:///:memory:',

    connect_args={

        'preload_extensions': ['https'],

        'config': {

            's3_region': 'ap-southeast-1'

        }

    }

)

```

## Registering Filesystems

> DuckDB allows registering filesystems from [fsspec](https://filesystem-spec.readthedocs.io/), see [documentation](https://duckdb.org/docs/guides/python/filesystems.html) for more information.

Support is provided under `connect_args` parameter

```python

from sqlalchemy import create_engine

from fsspec import filesystem

create_engine(

    'duckdb:///:memory:',

    connect_args={

        'register_filesystems': [filesystem('gcs')],

    }

)

```

## The name

Yes, I'm aware this package should be named `duckdb-driver` or something, I wasn't thinking when I named it and it's too hard to change the name now

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mause/duckdb_engine

Awesome Lists containing this project

README