https://github.com/intake/intake-duckdb

Intake plugin for DuckDB
https://github.com/intake/intake-duckdb

catalogs datasets duckdb intake sql

Last synced: about 1 month ago
JSON representation

Intake plugin for DuckDB

Host: GitHub
URL: https://github.com/intake/intake-duckdb
Owner: intake
License: bsd-2-clause
Created: 2023-02-27T22:51:28.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-05-01T21:07:03.000Z (over 2 years ago)
Last Synced: 2025-08-19T15:15:54.507Z (about 2 months ago)
Topics: catalogs, datasets, duckdb, intake, sql
Language: Python
Homepage: https://intake-duckdb.readthedocs.io/en/latest/
Size: 63.5 KB
Stars: 1
Watchers: 2
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Intake-DuckDB

[![Build Status](https://github.com/intake/intake-duckdb//actions/workflows/main.yaml/badge.svg)](https://github.com/intake/intake-duckdb/actions)

[![Documentation Status](https://readthedocs.org/projects/intake-duckdb/badge/?version=latest)](http://intake-duckdb.readthedocs.io/en/latest/?badge=latest)

DuckDB Plugin for Intake

## Installation

From PyPI

```shell

pip install intake-duckdb

```

Or conda-forge

```shell

conda install -c conda-forge intake-duckdb

```

## Usage

Load an entire table into a dataframe

```python

source = intake.open_duckdb("path/to/dbfile", "tablename")

df = source.read()

```

Or a custom SQL in [valid DuckDB query syntax](https://duckdb.org/docs/sql/query_syntax/select)

```python

source = intake.open_duckdb("path/to/dbfile", "SELECT col1, col2 FROM tablename")

df = source.read()

```

Can also iterate over table chunks

```python

source_chunked = intake.open_duckdb("path/to/dbfile", "tablename", chunks=10)

source_chunked.discover()

for chunk in source_chunked.read_chunked():

    # do something

    ...

```

DuckDB catalog: create an Intake catalog from a DuckDB backend

```python

cat = intake.open_duckdb_cat("path/to/dbfile")

# list the sources in 'cat'

list(cat)

df = cat["tablename"].read()

df_chunks = [chunk for chunk in cat["tablename"](chunks=10).read_chunked()]

```

Run DuckDB queries on other Intake sources (that produce pandas DataFrames) within the same catalog

```yaml

# cat.yaml

sources:

  csv_source:

    args:

      urlpath: https://data.csv

    description: Remote CSV source

    driver: csv

  duck_source:

    args:

      targets:

        - csv_source

      sql_expr: SELECT col FROM csv_source LIMIT 10

    description: Source referencing other sources in catalog

    driver: duckdb_transform

```

```python

cat  = intake.open_catalog("cat.yaml")

duck_source = cat.duck_source.read()

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/intake/intake-duckdb

Awesome Lists containing this project

README