https://github.com/intake/intake-duckdb
Intake plugin for DuckDB
https://github.com/intake/intake-duckdb
catalogs datasets duckdb intake sql
Last synced: about 1 month ago
JSON representation
Intake plugin for DuckDB
- Host: GitHub
- URL: https://github.com/intake/intake-duckdb
- Owner: intake
- License: bsd-2-clause
- Created: 2023-02-27T22:51:28.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-01T21:07:03.000Z (over 2 years ago)
- Last Synced: 2025-08-19T15:15:54.507Z (about 2 months ago)
- Topics: catalogs, datasets, duckdb, intake, sql
- Language: Python
- Homepage: https://intake-duckdb.readthedocs.io/en/latest/
- Size: 63.5 KB
- Stars: 1
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Intake-DuckDB
[](https://github.com/intake/intake-duckdb/actions)
[](http://intake-duckdb.readthedocs.io/en/latest/?badge=latest)DuckDB Plugin for Intake
## Installation
From PyPI
```shell
pip install intake-duckdb
```Or conda-forge
```shell
conda install -c conda-forge intake-duckdb
```
## UsageLoad an entire table into a dataframe
```python
source = intake.open_duckdb("path/to/dbfile", "tablename")
df = source.read()```
Or a custom SQL in [valid DuckDB query syntax](https://duckdb.org/docs/sql/query_syntax/select)
```python
source = intake.open_duckdb("path/to/dbfile", "SELECT col1, col2 FROM tablename")
df = source.read()
```Can also iterate over table chunks
```python
source_chunked = intake.open_duckdb("path/to/dbfile", "tablename", chunks=10)
source_chunked.discover()
for chunk in source_chunked.read_chunked():
# do something
...
```DuckDB catalog: create an Intake catalog from a DuckDB backend
```python
cat = intake.open_duckdb_cat("path/to/dbfile")# list the sources in 'cat'
list(cat)df = cat["tablename"].read()
df_chunks = [chunk for chunk in cat["tablename"](chunks=10).read_chunked()]
```Run DuckDB queries on other Intake sources (that produce pandas DataFrames) within the same catalog
```yaml
# cat.yaml
sources:
csv_source:
args:
urlpath: https://data.csv
description: Remote CSV source
driver: csvduck_source:
args:
targets:
- csv_source
sql_expr: SELECT col FROM csv_source LIMIT 10
description: Source referencing other sources in catalog
driver: duckdb_transform
```
```python
cat = intake.open_catalog("cat.yaml")
duck_source = cat.duck_source.read()
```