https://github.com/databiosphere/data-browser

boardwalk

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/databiosphere/data-browser
Owner: DataBiosphere
License: apache-2.0
Created: 2018-03-12T20:19:09.000Z (over 8 years ago)
Default Branch: main
Last Pushed: 2026-02-27T22:44:54.000Z (5 months ago)
Last Synced: 2026-02-28T00:06:05.944Z (5 months ago)
Topics: boardwalk
Language: Jupyter Notebook
Homepage:
Size: 50 MB
Stars: 11
Watchers: 24
Forks: 5
Open Issues: 123
Metadata Files:
- Readme: .gitlab/README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: security.txt

Awesome Lists containing this project

README

The files in this directory define how GitLab builds Data Browser
distributions (a tarball of the site content) and uploads them to the GitLab
package registry. The Azul pipeline will download these distributions and
deploy them to the S3 bucket backing the CloudFront distribution for
browser & portal.

There is typically one site of the Data Browser per Azul atlas, an atlas being
a set of catalogs. Often there are multiple atlases hosted in an Azul
deployment and therefore multiple Data Browser sites accessing that Azul
instance, one per atlas.

In order to avoid duplicating substantial amounts of GitLab pipeline YAML, we
dynamically generate the GitLab jobs that produce the distributions. That's
made possible in GitLab via the parent-child pipeline mechanism.

`ci.yaml` defines the parent pipeline. The first job in the parent pipeline
builds the Docker image to be used for all subsequent jobs in the pipeline.
This image is defined in the `Dockerfile`. The first job uses the Azul runner
image, a minimal image containing a `docker` client. The Azul runner image is
pushed manually when the Azul project is first set up in a GitLab instance.
The GitLab job token that is issued to pipelines running for this project
must be granted access to the pull from the registry for the Azul project, so
that the runner can pull the Azul runner image for the first job.

The second job in the parent pipeline generates the GitLab pipeline YAML for
the child pipeline. The third job triggers a child pipeline using the YAML
generated by the second job, which that job exposes as an artifact. The
generation is done by a a Typescript program running on NodeJS, not because
Typescript is particularly good at generating YAML but simply because
building the Data Browser already requires NodeJS.

The generated child pipeline contains two stages: one stage builds the
distributions, the other stage uploads them to the package registry. There is
one job per atlas in each of these two stages: a `build` job and a `publish`
job.

The `sites` directory contains GitLab YAML job fragments that are merged into
the generated YAML for the child pipeline. There is one subdirectory per Azul
deployment and site. The Azul deployment is fixed as a job variable in
the _Settings_, _CI/CD_, and _Variables_ section of the GitLab project. The
name of the variable is `AZUL_DEPLOYMENT_STAGE`. The GitLab administrator
picks the Azul deployment by configuring the job variable, and the pipeline
builds a distribution per site in that deployment. Each site contains a
number of YAML job fragments:

- `sites///base.yaml` is merged into the `.base` job
definition that the `build` and `publish` jobs extend. It typically only
contains a `variables` section that sets the environment variables for both
jobs, but specific to a site and a deployment

- `sites///build.yaml` is merged into the `build` job
definition. It contains at least a `script` section that actually builds
the distribution

- `sites//pipeline.yaml` is merged into the child pipeline, at the
document root. It typically contains a `variables` section with environment
variables that are not specific to a site.

The merging uses https://github.com/kleber-swf/deepmerge-json which allows for
merging of JSON arrays, something that GitLab's native inheritance/inclusion
mechanisms doesn't support. This allows for customizing the `rules`
section of a job, say, by appending a rule. See
https://github.com/kleber-swf/deepmerge-json#array-merge

To reuse a YAML fragment file in another site or deployment, use symlinks.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/databiosphere/data-browser

Awesome Lists containing this project

README