{"id":20083517,"url":"https://github.com/mause/duckdb_engine","last_synced_at":"2025-05-14T10:12:04.096Z","repository":{"id":36956393,"uuid":"299220196","full_name":"Mause/duckdb_engine","owner":"Mause","description":"SQLAlchemy driver for DuckDB","archived":false,"fork":false,"pushed_at":"2025-04-03T17:18:51.000Z","size":2450,"stargazers_count":407,"open_issues_count":46,"forks_count":49,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-04T14:25:00.368Z","etag":null,"topics":["duckdb","duckdb-engine","python","sql","sqlalchemy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mause.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-28T07:09:14.000Z","updated_at":"2025-04-03T10:13:57.000Z","dependencies_parsed_at":"2023-11-24T12:30:26.833Z","dependency_job_id":"ece81930-ad6a-48e7-919f-6f17469515c6","html_url":"https://github.com/Mause/duckdb_engine","commit_stats":{"total_commits":1231,"total_committers":28,"mean_commits":"43.964285714285715","dds":"0.49553208773354995","last_synced_commit":"5bf8a59649895b10f8bc8bef886a5534b10e2bd3"},"previous_names":[],"tags_count":73,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mause%2Fduckdb_engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mause%2Fduckdb_engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mause%2Fduckdb_engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mause%2Fduckdb_engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mause","download_url":"https://codeload.github.com/Mause/duckdb_engine/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248469136,"owners_count":21108963,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["duckdb","duckdb-engine","python","sql","sqlalchemy"],"created_at":"2024-11-13T15:47:30.827Z","updated_at":"2025-04-11T19:45:47.023Z","avatar_url":"https://github.com/Mause.png","language":"Python","readme":"# duckdb_engine\n\n[![Supported Python Versions](https://img.shields.io/pypi/pyversions/duckdb-engine)](https://pypi.org/project/duckdb-engine/) [![PyPI version](https://badge.fury.io/py/duckdb-engine.svg)](https://badge.fury.io/py/duckdb-engine) [![PyPI Downloads](https://img.shields.io/pypi/dm/duckdb-engine.svg)](https://pypi.org/project/duckdb-engine/) [![codecov](https://codecov.io/gh/Mause/duckdb_engine/graph/badge.svg)](https://codecov.io/gh/Mause/duckdb_engine)\n\nBasic SQLAlchemy driver for [DuckDB](https://duckdb.org/)\n\n\u003c!--ts--\u003e\n- [duckdb\\_engine](#duckdb_engine)\n  - [Installation](#installation)\n  - [Usage](#usage)\n  - [Usage in IPython/Jupyter](#usage-in-ipythonjupyter)\n  - [Configuration](#configuration)\n  - [How to register a pandas DataFrame](#how-to-register-a-pandas-dataframe)\n  - [Things to keep in mind](#things-to-keep-in-mind)\n    - [Auto-incrementing ID columns](#auto-incrementing-id-columns)\n    - [Pandas `read_sql()` chunksize](#pandas-read_sql-chunksize)\n    - [Unsigned integer support](#unsigned-integer-support)\n  - [Alembic Integration](#alembic-integration)\n  - [Preloading extensions (experimental)](#preloading-extensions-experimental)\n  - [Registering Filesystems](#registering-filesystems)\n  - [The name](#the-name)\n\n\u003c!-- Created by https://github.com/ekalinin/github-markdown-toc --\u003e\n\u003c!-- Added by: me, at: Wed 20 Sep 2023 12:44:27 AWST --\u003e\n\n\u003c!--te--\u003e\n\n## Installation\n```sh\n$ pip install duckdb-engine\n```\n\nDuckDB Engine also has a conda feedstock available, the instructions for the use of which are available in it's [repository](https://github.com/conda-forge/duckdb-engine-feedstock).\n\n## Usage\n\nOnce you've installed this package, you should be able to just use it, as SQLAlchemy does a python path search\n\n```python\nfrom sqlalchemy import Column, Integer, Sequence, String, create_engine\nfrom sqlalchemy.ext.declarative import declarative_base\nfrom sqlalchemy.orm.session import Session\n\nBase = declarative_base()\n\n\nclass FakeModel(Base):  # type: ignore\n    __tablename__ = \"fake\"\n\n    id = Column(Integer, Sequence(\"fakemodel_id_sequence\"), primary_key=True)\n    name = Column(String)\n\n\neng = create_engine(\"duckdb:///:memory:\")\nBase.metadata.create_all(eng)\nsession = Session(bind=eng)\n\nsession.add(FakeModel(name=\"Frank\"))\nsession.commit()\n\nfrank = session.query(FakeModel).one()\n\nassert frank.name == \"Frank\"\n```\n\n## Usage in IPython/Jupyter\n\nWith IPython-SQL and DuckDB-Engine you can query DuckDB natively in your notebook! Check out [DuckDB's documentation](https://duckdb.org/docs/guides/python/jupyter) or\nAlex Monahan's great demo of this on [his blog](https://alex-monahan.github.io/2021/08/22/Python_and_SQL_Better_Together.html#an-example-workflow-with-duckdb).\n\n## Configuration\n\nYou can configure DuckDB by passing `connect_args` to the create_engine function\n```python\ncreate_engine(\n    'duckdb:///:memory:',\n    connect_args={\n        'read_only': False,\n        'config': {\n            'memory_limit': '500mb'\n        }\n    }\n)\n```\n\nThe supported configuration parameters are listed in the [DuckDB docs](https://duckdb.org/docs/sql/configuration)\n\n## How to register a pandas DataFrame\n\n```python\nconn = create_engine(\"duckdb:///:memory:\").connect()\n\n# with SQLAlchemy 1.3\nconn.execute(\"register\", (\"dataframe_name\", pd.DataFrame(...)))\n\n# with SQLAlchemy 1.4+\nconn.execute(text(\"register(:name, :df)\"), {\"name\": \"test_df\", \"df\": df})\n\nconn.execute(\"select * from dataframe_name\")\n```\n\n## Things to keep in mind\nDuckdb's SQL parser is based on the PostgreSQL parser, but not all features in PostgreSQL are supported in duckdb. Because the `duckdb_engine` dialect is derived from the `postgresql` dialect, `SQLAlchemy` may try to use PostgreSQL-only features. Below are some caveats to look out for.\n\n### Auto-incrementing ID columns\nWhen defining an Integer column as a primary key, `SQLAlchemy` uses the `SERIAL` datatype for PostgreSQL. Duckdb does not yet support this datatype because it's a non-standard PostgreSQL legacy type, so a workaround is to use the `SQLAlchemy.Sequence()` object to auto-increment the key. For more information on sequences, you can find the [`SQLAlchemy Sequence` documentation here](https://docs.sqlalchemy.org/en/14/core/defaults.html#associating-a-sequence-as-the-server-side-default).\n\nThe following example demonstrates how to create an auto-incrementing ID column for a simple table:\n\n```python\n\u003e\u003e\u003e import sqlalchemy\n\u003e\u003e\u003e engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')\n\u003e\u003e\u003e metadata = sqlalchemy.MetaData(engine)\n\u003e\u003e\u003e user_id_seq = sqlalchemy.Sequence('user_id_seq')\n\u003e\u003e\u003e users_table = sqlalchemy.Table(\n...     'users',\n...     metadata,\n...     sqlalchemy.Column(\n...         'id',\n...         sqlalchemy.Integer,\n...         user_id_seq,\n...         server_default=user_id_seq.next_value(),\n...         primary_key=True,\n...     ),\n... )\n\u003e\u003e\u003e metadata.create_all(bind=engine)\n```\n\n### Pandas `read_sql()` chunksize\n\n**NOTE**: this is no longer an issue in versions `\u003e=0.5.0` of `duckdb`\n\nThe `pandas.read_sql()` method can read tables from `duckdb_engine` into DataFrames, but the `sqlalchemy.engine.result.ResultProxy` trips up when `fetchmany()` is called. Therefore, for now `chunksize=None` (default) is necessary when reading duckdb tables into DataFrames. For example:\n\n```python\n\u003e\u003e\u003e import pandas as pd\n\u003e\u003e\u003e import sqlalchemy\n\u003e\u003e\u003e engine = sqlalchemy.create_engine('duckdb:////path/to/duck.db')\n\u003e\u003e\u003e df = pd.read_sql('users', engine)                ### Works as expected\n\u003e\u003e\u003e df = pd.read_sql('users', engine, chunksize=25)  ### Throws an exception\n```\n\n### Unsigned integer support\n\nUnsigned integers are supported by DuckDB, and are available in [`duckdb_engine.datatypes`](duckdb_engine/datatypes.py).\n\n## Alembic Integration\n\nSQLAlchemy's companion library `alembic` can optionally be used to manage database migrations.\n\nThis support can be enabling by adding an Alembic implementation class for the `duckdb` dialect.\n\n```python\nfrom alembic.ddl.impl import DefaultImpl\n\nclass AlembicDuckDBImpl(DefaultImpl):\n    \"\"\"Alembic implementation for DuckDB.\"\"\"\n\n    __dialect__ = \"duckdb\"\n```\n\nAfter loading this class with your program, Alembic will no longer raise an error when generating or applying migrations.\n\n## Preloading extensions (experimental)\n\n\u003e DuckDB 0.9.0+ includes builtin support for autoinstalling and autoloading of extensions, see [the extension documentation](http://duckdb.org/docs/archive/0.9.0/extensions/overview#autoloadable-extensions) for more information.\n\nUntil the DuckDB python client allows you to natively preload extensions, I've added experimental support via a `connect_args` parameter\n\n```python\nfrom sqlalchemy import create_engine\n\ncreate_engine(\n    'duckdb:///:memory:',\n    connect_args={\n        'preload_extensions': ['https'],\n        'config': {\n            's3_region': 'ap-southeast-1'\n        }\n    }\n)\n```\n\n## Registering Filesystems\n\n\u003e DuckDB allows registering filesystems from [fsspec](https://filesystem-spec.readthedocs.io/), see [documentation](https://duckdb.org/docs/guides/python/filesystems.html) for more information.\n\nSupport is provided under `connect_args` parameter\n\n```python\nfrom sqlalchemy import create_engine\nfrom fsspec import filesystem\n\ncreate_engine(\n    'duckdb:///:memory:',\n    connect_args={\n        'register_filesystems': [filesystem('gcs')],\n    }\n)\n```\n\n## The name\n\nYes, I'm aware this package should be named `duckdb-driver` or something, I wasn't thinking when I named it and it's too hard to change the name now\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmause%2Fduckdb_engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmause%2Fduckdb_engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmause%2Fduckdb_engine/lists"}