https://github.com/databiosphere/terra-drs-hub
https://github.com/databiosphere/terra-drs-hub
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/databiosphere/terra-drs-hub
- Owner: DataBiosphere
- License: bsd-3-clause
- Created: 2022-02-11T19:09:12.000Z (about 4 years ago)
- Default Branch: dev
- Last Pushed: 2025-06-17T13:08:51.000Z (9 months ago)
- Last Synced: 2025-06-17T14:23:00.547Z (9 months ago)
- Language: Java
- Homepage:
- Size: 607 KB
- Stars: 3
- Watchers: 21
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# DrsHub (Also known as Dr. Martha Shub, MD)
## Overview
DrsHub is the [DRS](https://ga4gh.github.io/data-repository-service-schemas/preview/develop/docs/) resolution service for Terra. It is the hub through which DRS requests are routed, therefore: DRSHub.
It is a Java Spring Boot rewrite of the deprecated Cloud Function [Martha](https://github.com/broadinstitute/martha), specifically, its v3 API.
### Background Info
- [The adoption of DRS across data repositories](https://docs.google.com/document/d/1Wf4enSGOEXD5_AE-uzLoYqjIp5MnePbZ6kYTVFp1WoM/edit#heading=h.qiwlmit3m9)
- [Global Alliance for Genomics and Health (GA4GH) DRS specifications](https://ga4gh.github.io/data-repository-service-schemas/preview/develop/docs/)
## DRS Providers
This is the short name, full name, and auth type(s) for each provider
- **AnVIL** (NHGRI Analysis Visualization and Informatics Lab-space)
- Ecm Provider Access Token
- **BDC** (BioData Catalyst)
- Ecm Provider Access Token
- **CRDC** (NCI Cancer Research/Proteomics Data Commons)
- Ecm Provider Access Token
- **KidsFirst** (Gabriella Miller Kids First DRC)
- Ecm Provider Access Token
- **Passport Test** (Passport Test Provider)
- Passport
- Ecm Provider Access Token
- **TDR** (Terra Data Repo)
- Bearer Token
- **Sage Bionetworks** (Synapse)
- Ecm Provider Access Token
## Usage
To resolve a DRS URL, perform an HTTP `POST` to `/api/v4/drs/resolve`.
The content-type of your request should be `application/json` with the content/body of your request encoded accordingly.
Request bodies should look like
```json
{
"url": "string",
"cloudPlatform": ["gs", "azure","s3", null],
"fields": ["string"]
}
```
where `url` is the DRS URL to resolve and `fields` is any of
```text
accessUrl
bondProvider
bucket
contentType
fileName
googleServiceAccount
gsUri
hashes
localizationPath
name
size
timeCreated
timeUpdated
```
If no `fields` are specified in the request, the response will include the following `fields` by default:
```text
bucket
contentType
fileName
gsUri
hashes
localizationPath
name
size
timeCreated
timeUpdated
googleServiceAccount
```
The `cloudPlatform` field is optional and can be used to specify the preferred cloud platform to use for returning a signed URL.
If no option is found for the specified cloud platform, an attempt will be made to return a signed URL from a fall-back cloud platform.
## Architecture
DrsHub is a Java 17 Spring Boot application running in Kubernetes. As it simply resolves urls and doesn't have any state, it has no database. For developer convenience, a Swagger UI is provided.
Some architecture diagrams can be found in [LucidChart](https://lucid.app/documents#/documents?folder_id=297026717)
## Development
### Setup
Install Java 17 SDK from your preferred provider. A common way to install and manage different JDK versions is to use [sdkman](https://sdkman.io/).
If developing in IntelliJ, you can configure the Project SDK to use Java 17.
You'll also need to set the Gradle JVM, located at `Preferences | Build, Execution, Deployment | Build Tools | Gradle`.
You must use [git-secrets](https://github.com/awslabs/git-secrets) to protect against committing passwords
or other sensitive information ot this repository. The linked repository gives instructions for
installing and setting it up.
DrsHub uses [Minnie Kenny](https://minnie-kenny.readthedocs.io/en/latest/), and is configured to run `minnie_kenny.sh` on `./gradlew test` tasks, ensuring that git-secrets is set up.
You can also run it manually to make sure `git-secrets` is set up without testing.
Before running anything, make sure to run `./render_config ` to render secrets locally. The default `` is `dev`, which is what you should use for local running and testing.
DrsHub uses Gradle as a build tool. Some common Gradle commands you may want to run are
```shell
./gradlew generateSwaggerCode # Generate Swagger code for models and Swagger UI
./gradlew bootRun # Run DrsHub locally (Swagger UI at localhost:8080)
./gradlew test # Run the unit tests
./gradlew jib # Build the DrsHub Docker image
```
### Run Integration or Performance Tests
DrsHub uses [TestRunner](https://github.com/DataBiosphere/terra-test-runner) to run its integration
and performance tests.
To run the integration test suite, run
```shell
./gradlew runTest --args="suites/FullIntegration.json /tmp/test-results"
```
To run the performance test suite, run
```shell
./gradlew runTest --args="suites/FullPerf.json /tmp/test-results"
```
Adding `--stacktrace` can give you more debugging information, if needed.
### Run Pact Tests
To run the Pact tests, run the following:
```shell
export PACT_BROKER_URL="pact-broker.dsp-eng-tools.broadinstitute.org"
export PACT_PROVIDER_COMMIT="$(git rev-parse HEAD)"
export PACT_PROVIDER_BRANCH="$(git rev-parse --abbrev-ref HEAD)"
export PACT_BROKER_USERNAME="$(gcloud --project broad-dsp-eng-tools secrets versions access latest --secret pact-broker-users-read-write | jq -r .basic_auth_username)"
export PACT_BROKER_PASSWORD="$(gcloud --project broad-dsp-eng-tools secrets versions access latest --secret pact-broker-users-read-write | jq -r .basic_auth_password)"
./gradlew verifyPacts
```
### Logging
By default, DrsHub will emit logs in the Stackdriver JSON format.
To disable this behavior for local development, add `DRSHUB_LOG_APPENDER=Console-Standard` to your environment when running DrsHub.
## Deployment
DrsHub runs in Kubernetes in GCP. Current deployments for each env can be found at:
- Dev
- [Kubernetes Deployment](https://console.cloud.google.com/kubernetes/deployment/us-central1-a/terra-dev/terra-dev/drshub-deployment/overview?project=broad-dsde-dev)
- [Swagger UI](https://drshub.dsde-dev.broadinstitute.org/)
- Alpha
- [Kubernetes Deployment](https://console.cloud.google.com/kubernetes/deployment/us-central1-a/terra-alpha/terra-alpha/drshub-deployment/overview?project=broad-dsde-alpha)
- [Swagger UI](https://drshub.dsde-alpha.broadinstitute.org/)
- Staging
- [Kubernetes Deployment](https://console.cloud.google.com/kubernetes/deployment/us-central1-a/terra-staging/terra-staging/drshub-deployment/overview?project=broad-dsde-staging)
- [Swagger UI](https://drshub.dsde-staging.broadinstitute.org/)
- Production
- [Kubernetes Deployment](https://console.cloud.google.com/kubernetes/deployment/us-central1-a/terra-prod/terra-prod/drshub-deployment/overview?project=broad-dsde-prod)
- [Swagger UI](https://drshub.dsde-prod.broadinstitute.org/)
### DRS Provider Compact ID/URIs per Environment
Note: there are a few tricky cases with the **compact IDs (CID)**:
- BioDataCatalyst uses the CIB `dg.4503` in production, and `dg.712c` in non-prod environments
- The AnVIL currently has two CIBs in use
- `dg.anv0` for old gen3 and TDR hosted data
- `drs.anv0` (note the **dg** vs **drs** prefix) for TDR hosted data, this will be used going forward
- Most of these providers have only `prod` and `not prod` URIs, TDR is the only one that has specific URIs for each lower environment
### Dev
| Provider | Compact Id (CIB) | Host URI |
|---------------------|-------------------|--------------------------------------------|
| AnVIL (TDR hosted) | dg.anv0 | jade.datarepo-dev.broadinstitute.org |
| AnVIL (TDR hosted) | drs.anv0 | jade.datarepo-dev.broadinstitute.org |
| BDC | dg.712c | staging.gen3.biodatacatalyst.nhlbi.nih.gov |
| CRDC | dg.4dfc | nci-crdc-staging.datacommons.io |
| KidsFirst | dg.f82a1a | gen3staging.kidsfirstdrc.org |
| Passport Test | dg.test0 | ctds-test-env.planx-pla.net |
### Alpha
| Provider | Compact Id (CIB) | Host URI |
|---------------------|-------------------|--------------------------------------------|
| AnVIL (TDR hosted) | dg.anv0 | data.alpha.envs-terra.bio |
| AnVIL (TDR hosted) | drs.anv0 | data.alpha.envs-terra.bio |
| BDC | dg.712c | staging.gen3.biodatacatalyst.nhlbi.nih.gov |
| CRDC | dg.4dfc | nci-crdc-staging.datacommons.io |
| KidsFirst | dg.f82a1a | gen3staging.kidsfirstdrc.org |
| Passport Test | dg.test0 | ctds-test-env.planx-pla.net |
### Staging
| Provider | Compact Id (CIB) | Host URI |
|---------------------|-------------------|--------------------------------------------|
| AnVIL (TDR hosted) | dg.anv0 | data.staging.envs-terra.bio |
| AnVIL (TDR hosted) | drs.anv0 | data.staging.envs-terra.bio |
| BDC | dg.712c | staging.gen3.biodatacatalyst.nhlbi.nih.gov |
| CRDC | dg.4dfc | nci-crdc-staging.datacommons.io |
| KidsFirst | dg.f82a1a | gen3staging.kidsfirstdrc.org |
| Passport Test | dg.test0 | ctds-test-env.planx-pla.net |
### Prod
| Provider | Compact Id (CIB) | Host URI |
|----------------------|-------------------|------------------------------------|
| AnVIL (TDR hosted) | dg.anv0 | data.terra.bio |
| AnVIL (TDR hosted) | drs.anv0 | data.terra.bio |
| BDC | dg.4503 | gen3.biodatacatalyst.nhlbi.nih.gov |
| CRDC | dg.4dfc | nci-crdc.datacommons.io |
| KidsFirst | dg.f82a1a | data.kidsfirstdrc.org |
| Passport Test | | |
## SonarCloud Status
[](https://sonarcloud.io/summary/new_code?id=DataBiosphere_terra-drs-hub)