https://github.com/databio/refgetstore-node-demo
Lightweight Node.js refget + seqcol API server backed by RefgetStore
https://github.com/databio/refgetstore-node-demo
Last synced: 6 days ago
JSON representation
Lightweight Node.js refget + seqcol API server backed by RefgetStore
- Host: GitHub
- URL: https://github.com/databio/refgetstore-node-demo
- Owner: databio
- Created: 2026-04-13T18:47:46.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-06-19T13:01:26.000Z (7 days ago)
- Last Synced: 2026-06-19T14:29:38.662Z (7 days ago)
- Language: TypeScript
- Size: 27.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# RefgetStore Node Server
A lightweight Node.js **proxy** for GA4GH refget sequences and sequence collections APIs, backed by a [RefgetStore](https://refgenie.org/refget/refgetstore/). The server never materializes sequence bytes in memory — it either redirects raw-store bytes to the backing store or stream-decodes encoded-store bytes directly to the HTTP response.
## Quick Start
```bash
npm install
npm run build
# Run the demo (builds a store from test FASTAs and starts the server)
bash demo_up.sh
```
## Live Demo
A demo server backed by the pangenome jungle RefgetStore runs at:
**http://ecs.databio.org:8150/**
Example links:
- [Service info](http://ecs.databio.org:8150/service-info)
- [List collections](http://ecs.databio.org:8150/collection)
- [Get a collection](http://ecs.databio.org:8150/collection/-Ffl-8v7R0Wh53_pRA4WtKoDQL9GmC-v)
## How it works
The server proxies sequence bytes in one of two ways, depending on how the backing RefgetStore is stored:
- **Redirect (Raw-mode stores).** The server returns `302` with a `Location` header pointing at `/sequences//.seq`. Clients follow the redirect and hit the backing store (typically S3) directly. Range headers on the original request flow through to the backing store, which responds with `206 Partial Content`. The server never loads bytes. Query-param partials (`?start=&end=`) are rejected by default — use the `Range` header.
- **Stream-decode (Encoded-mode stores).** Stored bytes are 2-bit/3-bit packed; they cannot be redirected verbatim. The server calls `RefgetStore.streamSequence(digest, start, end)` which returns a `Readable` of decoded ASCII bases, piped directly to the HTTP response. Memory use is bounded by the stream's internal buffer regardless of sequence size.
### Proxy mode matrix
| Store mode | `REFGET_PROXY_MODE=auto` | `redirect-only` | `stream-only` |
|---|---|---|---|
| Raw | redirect (302) | redirect (302) | stream (decode is a no-op) |
| Encoded | stream | startup error | stream |
## Configuration
| Env var | Default | Description |
|---|---|---|
| `REFGET_STORE_URL` | — | URL to a remote RefgetStore (S3 / HTTP). Required for redirect mode. |
| `REFGET_STORE_PATH` | — | Path to a local RefgetStore dir. Forces `stream-only` mode. |
| `REFGET_CACHE_PATH` | `/tmp/refgetstore_cache` | Metadata cache for remote stores. |
| `REFGET_PROXY_MODE` | `auto` | `auto` (redirect Raw, stream Encoded), `redirect-only`, `stream-only`. |
| `REFGET_ALLOW_QUERY_PARAM_PARTIALS` | `false` | When true, `?start=&end=` in redirect mode fall through to streaming instead of 400. |
| `PORT` | `3000` | HTTP port. |
Exactly one of `REFGET_STORE_URL` or `REFGET_STORE_PATH` must be set.
## API Endpoints
### Service Info
| Endpoint | Description |
|---|---|
| `GET /service-info` | GA4GH service-info with store statistics |
### Refget Sequences (GA4GH refget v2)
| Endpoint | Description |
|---|---|
| `GET /sequence` | List all sequences (disabled for stores with > 10,000 sequences) |
| `GET /sequence/:digest` | Retrieve sequence bases (302 redirect or streaming, depending on proxy mode). Supports `Range` header; `?start=&end=` accepted in stream mode. |
| `GET /sequence/:digest/metadata` | Sequence metadata (length, md5, ga4gh digest) |
| `GET /sequence/service-info` | Refget service capabilities |
### Sequence Collections (GA4GH seqcol)
| Endpoint | Description |
|---|---|
| `GET /collection` | List all collections |
| `GET /collection/:digest` | Collection metadata |
| `GET /collection/:digest/metadata` | Collection metadata (explicit) |
## Building a Store from FASTA Files
```bash
node scripts/build_store.mjs --fasta path/to/genome.fa --output my_store
REFGET_STORE_PATH=my_store REFGET_PROXY_MODE=stream-only npm start
```
## Development (local-linked `@databio/gtars-node`)
Until `@databio/gtars-node` is published with `streamSequence`, link to a local build:
```bash
# In the gtars repo
cd repos/gtars/gtars-node
npm run build
npm link
# In this repo
cd repos/refgetstore-node-demo
npm link @databio/gtars-node
npm run dev
```
## Docker
```bash
# Build
docker build -f deployment/dockerhub/Dockerfile -t refgetstore-server .
# Run (redirect-mode example)
docker run -p 80:80 \
-e REFGET_STORE_URL=https://my-bucket.s3.amazonaws.com/refget/store \
refgetstore-server
```
## Comparison to seqcolapi
| | seqcolapi | refgetstore-server |
|---|---|---|
| Runtime | Python + FastAPI | Node.js + Hono |
| Storage | PostgreSQL | RefgetStore (flat files, local or S3) |
| Infrastructure | Database server required | Single binary store on disk / object store |
| Sequence delivery | Reads DB, builds response in Python | Redirect or stream-decode; no bytes buffered |
## Known Limitations
- No comparison endpoint (`/comparison/:digest1/:digest2`) — pending napi binding support
- Read-only: store must be pre-built from FASTA files