{"id":30523837,"url":"https://github.com/intake/intake-duckdb","last_synced_at":"2025-08-26T20:51:29.379Z","repository":{"id":142559294,"uuid":"607404983","full_name":"intake/intake-duckdb","owner":"intake","description":"Intake plugin for DuckDB","archived":false,"fork":false,"pushed_at":"2023-05-01T21:07:03.000Z","size":65,"stargazers_count":1,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-19T15:15:54.507Z","etag":null,"topics":["catalogs","datasets","duckdb","intake","sql"],"latest_commit_sha":null,"homepage":"https://intake-duckdb.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/intake.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-02-27T22:51:28.000Z","updated_at":"2025-06-22T03:13:20.000Z","dependencies_parsed_at":"2023-05-02T15:00:58.565Z","dependency_job_id":null,"html_url":"https://github.com/intake/intake-duckdb","commit_stats":{"total_commits":32,"total_committers":1,"mean_commits":32.0,"dds":0.0,"last_synced_commit":"af8c2997c5945a77db440bf2a0eb644773c64ac0"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/intake/intake-duckdb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-duckdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-duckdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-duckdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-duckdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/intake","download_url":"https://codeload.github.com/intake/intake-duckdb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/intake%2Fintake-duckdb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272254471,"owners_count":24901048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-26T02:00:07.904Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["catalogs","datasets","duckdb","intake","sql"],"created_at":"2025-08-26T20:51:23.855Z","updated_at":"2025-08-26T20:51:29.358Z","avatar_url":"https://github.com/intake.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Intake-DuckDB\n\n[![Build Status](https://github.com/intake/intake-duckdb//actions/workflows/main.yaml/badge.svg)](https://github.com/intake/intake-duckdb/actions)\n[![Documentation Status](https://readthedocs.org/projects/intake-duckdb/badge/?version=latest)](http://intake-duckdb.readthedocs.io/en/latest/?badge=latest)\n\nDuckDB Plugin for Intake\n\n## Installation\n\nFrom PyPI\n```shell\npip install intake-duckdb\n```\n\nOr conda-forge\n```shell\nconda install -c conda-forge intake-duckdb\n```\n## Usage\n\nLoad an entire table into a dataframe\n```python\nsource = intake.open_duckdb(\"path/to/dbfile\", \"tablename\")\ndf = source.read()\n\n```\nOr a custom SQL in [valid DuckDB query syntax](https://duckdb.org/docs/sql/query_syntax/select)\n```python\nsource = intake.open_duckdb(\"path/to/dbfile\", \"SELECT col1, col2 FROM tablename\")\ndf = source.read()\n```\n\nCan also iterate over table chunks\n```python\nsource_chunked = intake.open_duckdb(\"path/to/dbfile\", \"tablename\", chunks=10)\nsource_chunked.discover()\nfor chunk in source_chunked.read_chunked():\n    # do something\n    ...\n```\n\nDuckDB catalog: create an Intake catalog from a DuckDB backend\n```python\ncat = intake.open_duckdb_cat(\"path/to/dbfile\")\n\n# list the sources in 'cat'\nlist(cat)\n\ndf = cat[\"tablename\"].read()\ndf_chunks = [chunk for chunk in cat[\"tablename\"](chunks=10).read_chunked()]\n```\n\nRun DuckDB queries on other Intake sources (that produce pandas DataFrames) within the same catalog\n```yaml\n# cat.yaml\nsources:\n  csv_source:\n    args:\n      urlpath: https://data.csv\n    description: Remote CSV source\n    driver: csv\n\n  duck_source:\n    args:\n      targets:\n        - csv_source\n      sql_expr: SELECT col FROM csv_source LIMIT 10\n    description: Source referencing other sources in catalog\n    driver: duckdb_transform\n```\n```python\ncat  = intake.open_catalog(\"cat.yaml\")\nduck_source = cat.duck_source.read()\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintake%2Fintake-duckdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintake%2Fintake-duckdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintake%2Fintake-duckdb/lists"}