https://github.com/zachcp/bioduck
DuckDB for Biological Tables
https://github.com/zachcp/bioduck
Last synced: about 2 months ago
JSON representation
DuckDB for Biological Tables
- Host: GitHub
- URL: https://github.com/zachcp/bioduck
- Owner: zachcp
- Created: 2025-03-15T21:01:56.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-03-15T21:07:42.000Z (3 months ago)
- Last Synced: 2025-04-01T18:08:54.378Z (2 months ago)
- Language: Python
- Size: 14.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BioDuck
A CLI tool for running SQL files against DuckDB and exploring data interactively.
## Installation
```bash
pip install .
```This will install the BioDuck CLI tool along with all included SQL files and resources.
## Usage
### Create a SQL File
```bash
bioduck create my_query
```This creates a file `./sql/my_query.sql` that you can edit with your SQL query.
### Run a SQL File
```bash
bioduck run ./sql/my_query.sql
```You can specify a database file:
```bash
bioduck run ./sql/my_query.sql --db my_database.duckdb
```Save results to a CSV file:
```bash
bioduck run ./sql/my_query.sql --output results.csv
```### Launch the DuckDB UI
Run the DuckDB UI after initializing with SQL files:
```bash
bioduck ui
```By default, it will run all SQL files in the `./sql` directory in alphabetical order, then launch the DuckDB UI.
You can specify a different SQL directory:
```bash
bioduck ui --sql-dir ./my_queries
```And a specific database file:
```bash
bioduck ui --db my_database.duckdb
```## Project Structure
- Place your SQL files in the `./sql` directory
- SQL files are run in alphabetical order when using the `ui` command
- You can use the `create` command to create new SQL files## NCBI Database
BioDuck includes built-in support for creating and managing NCBI biological data databases.
### Creating an NCBI Database
```bash
bioduck ncbi
```This will:
1. Create a database at `~/.bioduck/ncbi.db` (if it doesn't already exist)
2. Initialize it with NCBI database schema and loaders for:
- Taxonomy data
- GenBank assembly data
- RefSeq assembly data
The command will automatically:
1. Download required files from NCBI FTP servers with progress indicators
2. Extract necessary data from compressed archives
3. Process and load the data into a DuckDB database
4. Files are stored in `~/.bioduck/data` by default for reuse### Accessing an Existing NCBI Database
If the database already exists, the command will simply report its location:
```bash
bioduck ncbi
# Database already exists at /home/user/.bioduck/ncbi.db
```### Launch UI for an NCBI Database
To open the DuckDB UI with your NCBI database:
```bash
bioduck ncbi --launch-ui
# or shorter form:
bioduck ncbi -u
```### Recreate Database
To force recreation of the database even if it exists:
```bash
bioduck ncbi --force
```### Specify Custom Locations
```bash
# Custom database location
bioduck ncbi --db-path /path/to/custom/ncbi.db# Custom data directory for downloaded files
bioduck ncbi --data-dir /path/to/store/downloaded/data
```