https://github.com/refgenie/refget
Python tools for identification and distribution of reference sequences and sequence collections
https://github.com/refgenie/refget
Last synced: 2 months ago
JSON representation
Python tools for identification and distribution of reference sequences and sequence collections
- Host: GitHub
- URL: https://github.com/refgenie/refget
- Owner: refgenie
- License: bsd-2-clause
- Created: 2020-06-22T20:17:32.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2026-03-13T12:10:27.000Z (3 months ago)
- Last Synced: 2026-03-14T00:57:07.452Z (3 months ago)
- Language: Python
- Homepage: https://refgenie.org/refget/
- Size: 1.37 MB
- Stars: 10
- Watchers: 1
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Refget

User-facing documentation is hosted at [refgenie.org/refget](https://refgenie.org/refget/).
This repository includes:
1. `/refget`: The `refget` Python package, which provides a Python interface to both remote and local use of refget standards. It has clients and functions for both refget sequences and refget sequence collections (seqcol).
2. `/seqcolapi`: Sequence collections API software, a FastAPI wrapper built on top of the `refget` package. It provides a bare-bones Sequence Collections API service.
3. `/deployment`: Server configurations for demo instances and public deployed instances. There are also github workflows (in `.github/workflows`) that deploy the demo server instance from this repository.
4. `/test_fasta` and `/test_api`: Dummy data and a compliance test, to test external implementations of the Refget Sequence Collections API.
5. `/frontend`: a React seqcolapi front-end.
## Deploy to AWS ECS
To deploy the public demo instance, you can either:
1. **Create a GitHub release** - This triggers the `deploy_release_software.yml` workflow, which builds and pushes the Docker image to DockerHub. After that completes, it automatically triggers `deploy_primary.yml` to deploy to AWS ECS.
2. **Manual dispatch** - You can manually trigger either workflow from the GitHub Actions tab.
This builds seqcolapi, pushes to DockerHub, and deploys to ECS.
## Testing
### Unit tests
```bash
pytest
```
### Integration tests (requires Docker)
Integration tests run against an ephemeral PostgreSQL database in Docker:
```bash
./scripts/test-integration.sh
```
This starts the test database, runs tests, and cleans up automatically.
## Development and deployment: Backend
### Store-backed (no database)
The store-backed seqcolapi uses a RefgetStore (local files) instead of PostgreSQL. This is the simplest way to run the API:
#### Quick start
```console
bash deployment/store_demo_up.sh
```
This will:
- Build a local RefgetStore from test FASTA files
- Run the store-backed seqcolapi with uvicorn
- Block the terminal until you press Ctrl+C, which cleans up
No Docker or database required.
#### Step-by-step
1. Build a store from FASTA files:
```console
python data_loaders/demo_build_store.py test_fasta /tmp/refget_demo_store
```
2. Start the store-backed API:
```console
REFGET_STORE_PATH=/tmp/refget_demo_store uvicorn seqcolapi.main:store_app --reload --port 8100
```
#### Remote store
To run against a remote (S3) store:
```console
REFGET_STORE_URL=https://example.com/store uvicorn seqcolapi.main:store_app --port 8100
```
### DB-backed (PostgreSQL)
If you need a database-backed instance (e.g., for mutable data, advanced queries), use the DB-backed workflow. In a moment I'll show you how to do these steps individually, but if you're in a hurry, the easy way to get a development API running for testing is to just use my very simple shell script like this (no data persistence, just loads demo data):
```console
bash deployment/demo_up.sh
```
This will:
- populate env vars
- launch postgres container with docker
- run the refget service with uvicorn
- load up the demo data
- block the terminal until you press Ctrl+C, which will shut down all services.
### Step-by-step process (DB-backed)
Alternatively, if you want to run each step separately to see what's really going on, start here.
#### Setting up a database connection
First configure a database connection through environment variables. Choose one of these:
```
source deployment/local_demo/local_demo.env # local demo (see below to create the database using docker)
source deployment/seqcolapi.databio.org/production.env # connect to production database
```
If you're using the `local_demo`, then use docker to launch a local postgres database service like this:
```
docker run --rm --name refget-postgres -p 127.0.0.1:5432:5432 \
-e POSTGRES_PASSWORD \
-e POSTGRES_USER \
-e POSTGRES_DB \
-e POSTGRES_HOST \
postgres:17.0
```
If you need to load test data into your server, then you have to install [gtars](https://docs.bedbase.org/gtars/) (with `pip install gtars`), a Python package for computing GA4GH digests. You can then load test data like this:
```
PYTHONPATH=. python data_loaders/load_demo_seqcols.py
```
or:
```
refget add-fasta -p test_fasta/test_fasta_metadata.csv -r test_fasta
```
#### Running the seqcolapi API backend
Run the demo `seqcolapi` service like this:
```
uvicorn seqcolapi.main:app --reload --port 8100
```
#### Running with docker
To build the docker file, first build the image from the root of this repository:
```
docker build -f deployment/dockerhub/Dockerfile -t databio/seqcolapi seqcolapi
```
To run in container:
```
source deployment/seqcolapi.databio.org/production.env
docker run --rm -p 8000:80 --name seqcolapi \
--env "POSTGRES_USER" \
--env "POSTGRES_DB" \
--env "POSTGRES_PASSWORD" \
--env "POSTGRES_HOST" \
databio/seqcolapi
```
#### Deploying container to dockerhub
Use the github action in this repo which deploys on release, or through manual dispatch.
## Running the frontend
Once you have a backend running, you can run a frontend to interact with it
### Local client with local server
```
cd frontend
npm i
VITE_API_BASE="http://localhost:8100" npm run dev
```
### Local client with production server
```
cd frontend
npm i
VITE_API_BASE="https://seqcolapi.databio.org" npm run dev
```
### Development with local WASM
The `/digest` feature uses [@databio/gtars](https://www.npmjs.com/package/@databio/gtars) for WASM-based FASTA processing. To use a local gtars-wasm build instead of the npm package:
```
LOCAL_GTARS=../../gtars/gtars-wasm/pkg npm run dev
```
The `LOCAL_GTARS` env var should point to the `pkg/` directory of a built gtars-wasm package (run `wasm-pack build --target web` in gtars-wasm to build it).
### gtars WASM API Reference
The streaming API handles files of any size:
```javascript
import * as gtars from '@databio/gtars';
await gtars.default(); // Initialize WASM
// Streaming API (for large files)
const handle = gtars.fastaHasherNew();
gtars.fastaHasherUpdate(handle, chunk); // Feed Uint8Array chunks
const result = gtars.fastaHasherFinish(handle); // Get SeqColResult
// Batch API (for small files)
const result = gtars.digestSeqcol(fastaBytes);
```
Result object:
```typescript
interface SeqColResult {
digest: string; // Collection digest (SHA512t24u)
names_digest: string;
sequences_digest: string;
lengths_digest: string;
n_sequences: number;
sequences: Array<{
name: string;
length: number;
alphabet: string; // dna2bit, dna3bit, etc.
sha512t24u: string;
md5: string;
description?: string;
}>;
}
```
### Deploying
1. Ensure the [refget](https://github.com/refgenie/refget/) package master branch is as you want it.
2. Deploy the updated [secqolapi](https://github.com/refgenie/seqcolapi/) app to dockerhub (using manual dispatch, or deploy on github release).
3. Finally, deploy the instance with manual dispatch using the included GitHub action.
## Developer notes
### Models
The objects and attributes are represented as SQLModel objects in `refget/models.py`. To add a new attribute:
1. create a new model. This will create a table for that model, etc.
2. change the function that creates the objects, to populate the new attribute.
## Example of loading reference fasta datasets:
```
refget add-fasta -p ref_fasta.csv -r $BRICKYARD/datasets_downloaded/pangenome_fasta/reference_fasta
```