https://github.com/zachcp/bioduck

DuckDB for Biological Tables
https://github.com/zachcp/bioduck

Last synced: about 2 months ago
JSON representation

DuckDB for Biological Tables

Host: GitHub
URL: https://github.com/zachcp/bioduck
Owner: zachcp
Created: 2025-03-15T21:01:56.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-03-15T21:07:42.000Z (3 months ago)
Last Synced: 2025-04-01T18:08:54.378Z (2 months ago)
Language: Python
Size: 14.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# BioDuck

A CLI tool for running SQL files against DuckDB and exploring data interactively.

## Installation

```bash
pip install .
```

This will install the BioDuck CLI tool along with all included SQL files and resources.

## Usage

### Create a SQL File

```bash
bioduck create my_query
```

This creates a file `./sql/my_query.sql` that you can edit with your SQL query.

### Run a SQL File

```bash
bioduck run ./sql/my_query.sql
```

You can specify a database file:

```bash
bioduck run ./sql/my_query.sql --db my_database.duckdb
```

Save results to a CSV file:

```bash
bioduck run ./sql/my_query.sql --output results.csv
```

### Launch the DuckDB UI

Run the DuckDB UI after initializing with SQL files:

```bash
bioduck ui
```

By default, it will run all SQL files in the `./sql` directory in alphabetical order, then launch the DuckDB UI.

You can specify a different SQL directory:

```bash
bioduck ui --sql-dir ./my_queries
```

And a specific database file:

```bash
bioduck ui --db my_database.duckdb
```

## Project Structure

- Place your SQL files in the `./sql` directory
- SQL files are run in alphabetical order when using the `ui` command
- You can use the `create` command to create new SQL files

## NCBI Database

BioDuck includes built-in support for creating and managing NCBI biological data databases.

### Creating an NCBI Database

```bash
bioduck ncbi
```

This will:
1. Create a database at `~/.bioduck/ncbi.db` (if it doesn't already exist)
2. Initialize it with NCBI database schema and loaders for:
- Taxonomy data
- GenBank assembly data
- RefSeq assembly data

The command will automatically:
1. Download required files from NCBI FTP servers with progress indicators
2. Extract necessary data from compressed archives
3. Process and load the data into a DuckDB database
4. Files are stored in `~/.bioduck/data` by default for reuse

### Accessing an Existing NCBI Database

If the database already exists, the command will simply report its location:

```bash
bioduck ncbi
# Database already exists at /home/user/.bioduck/ncbi.db
```

### Launch UI for an NCBI Database

To open the DuckDB UI with your NCBI database:

```bash
bioduck ncbi --launch-ui
# or shorter form:
bioduck ncbi -u
```

### Recreate Database

To force recreation of the database even if it exists:

```bash
bioduck ncbi --force
```

### Specify Custom Locations

```bash
# Custom database location
bioduck ncbi --db-path /path/to/custom/ncbi.db

# Custom data directory for downloaded files
bioduck ncbi --data-dir /path/to/store/downloaded/data
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zachcp/bioduck

Awesome Lists containing this project

README