https://github.com/zachcp/bioduck
DuckDB for Biological Tables
https://github.com/zachcp/bioduck
Last synced: about 1 year ago
JSON representation
DuckDB for Biological Tables
- Host: GitHub
- URL: https://github.com/zachcp/bioduck
- Owner: zachcp
- Created: 2025-03-15T21:01:56.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-15T21:07:42.000Z (about 1 year ago)
- Last Synced: 2025-04-01T18:08:54.378Z (about 1 year ago)
- Language: Python
- Size: 14.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BioDuck
A CLI tool for running SQL files against DuckDB and exploring data interactively.
## Installation
```bash
pip install .
```
This will install the BioDuck CLI tool along with all included SQL files and resources.
## Usage
### Create a SQL File
```bash
bioduck create my_query
```
This creates a file `./sql/my_query.sql` that you can edit with your SQL query.
### Run a SQL File
```bash
bioduck run ./sql/my_query.sql
```
You can specify a database file:
```bash
bioduck run ./sql/my_query.sql --db my_database.duckdb
```
Save results to a CSV file:
```bash
bioduck run ./sql/my_query.sql --output results.csv
```
### Launch the DuckDB UI
Run the DuckDB UI after initializing with SQL files:
```bash
bioduck ui
```
By default, it will run all SQL files in the `./sql` directory in alphabetical order, then launch the DuckDB UI.
You can specify a different SQL directory:
```bash
bioduck ui --sql-dir ./my_queries
```
And a specific database file:
```bash
bioduck ui --db my_database.duckdb
```
## Project Structure
- Place your SQL files in the `./sql` directory
- SQL files are run in alphabetical order when using the `ui` command
- You can use the `create` command to create new SQL files
## NCBI Database
BioDuck includes built-in support for creating and managing NCBI biological data databases.
### Creating an NCBI Database
```bash
bioduck ncbi
```
This will:
1. Create a database at `~/.bioduck/ncbi.db` (if it doesn't already exist)
2. Initialize it with NCBI database schema and loaders for:
- Taxonomy data
- GenBank assembly data
- RefSeq assembly data
The command will automatically:
1. Download required files from NCBI FTP servers with progress indicators
2. Extract necessary data from compressed archives
3. Process and load the data into a DuckDB database
4. Files are stored in `~/.bioduck/data` by default for reuse
### Accessing an Existing NCBI Database
If the database already exists, the command will simply report its location:
```bash
bioduck ncbi
# Database already exists at /home/user/.bioduck/ncbi.db
```
### Launch UI for an NCBI Database
To open the DuckDB UI with your NCBI database:
```bash
bioduck ncbi --launch-ui
# or shorter form:
bioduck ncbi -u
```
### Recreate Database
To force recreation of the database even if it exists:
```bash
bioduck ncbi --force
```
### Specify Custom Locations
```bash
# Custom database location
bioduck ncbi --db-path /path/to/custom/ncbi.db
# Custom data directory for downloaded files
bioduck ncbi --data-dir /path/to/store/downloaded/data
```