{"id":20425031,"url":"https://github.com/databio/databio_genomes","last_synced_at":"2026-03-09T15:02:46.487Z","repository":{"id":84558891,"uuid":"200691385","full_name":"databio/databio_genomes","owner":"databio","description":"A list of lab genome assets to be built with refgenie","archived":false,"fork":false,"pushed_at":"2020-06-18T17:02:30.000Z","size":113,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-01-15T15:08:42.934Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-05T16:28:42.000Z","updated_at":"2020-06-16T13:52:55.000Z","dependencies_parsed_at":"2023-03-12T23:37:54.463Z","dependency_job_id":null,"html_url":"https://github.com/databio/databio_genomes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fdatabio_genomes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fdatabio_genomes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fdatabio_genomes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fdatabio_genomes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databio","download_url":"https://codeload.github.com/databio/databio_genomes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241966977,"owners_count":20050324,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T07:12:03.842Z","updated_at":"2026-03-09T15:02:46.425Z","avatar_url":"https://github.com/databio.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Databio genomes overview\n\nThis repository contains the files to build and archive our labs's reference genome assets to serve with [`refgenieserver`](https://github.com/databio/refgenieserver) at http://refgenomes.databio.org. \n\nThe whole process is scripted, starting from this repository. From here, we download the input data (FASTA files, GTF files etc.), use `refgenie build` to create all of these assets in a local refgenie instance, and then use `refgenieserver archive` to build the server archives, and finally serve them with a refgenieserver instance by calling `refgenieserver serve`.\n\n# Asset PEP\n\nThe [asset_pep](asset_pep) folder contains a [PEP](https://pep.databio.org) with metadata for each asset. The contents are:\n\n- `assets.csv` - The primary sample_table. Each each row is an asset. \n- `recipe_inputs.csv` - The subsample_table. This provides a way to define each individual value passed to any of the 3 arguments of the `refgenie build` command: `--assets`, `--params`, and `--files`. \n- `refgenie_build_cfg.yaml` -- config file that defines a subproject (which is used to download the input data) and additional project settings.\n\nBelow are instructions for: 1) adding a new asset to this PEP, which will deploy that asset at http://refgenomes.databio.org; 2) processing this PEP to build, archive, and deploy on the server.\n\n## Adding an asset to this PEP\n\n### Step 1: Add the asset to the asset table.\n\nTo add an asset, you will need to add a row in `assets.csv`. Follow these directions:\n\n- `genome` - the human-readable genome (namespace) you want to serve this asset under\n- `asset` - the human-readble asset name you want to serve this asset under. It is identical to the asset recipe. Use `refgenie list` to see [available recipes](http://refgenie.databio.org/en/latest/build/)\n\nYour asset will be retrievable from the server with `refgenie pull {genome}/{asset_name}`.\n\n### Step 2: Add any required inputs to the recipe_inputs table\n\nNext, we need to add the source for each item required by your recipe. You can see what the recipe requires by using `-q` or `--requirements`, like this: `refgenie build {genome}/{recipe} -q`. If your recipe doesn't require any inputs, then you're done. If it requires any inputs (which can be one or more of the following: *assets*, *files*, *parameters*), then you need to specify these in the `recipe_inputs.csv` table.\n\nFor each required input, you add a row to `recipe_inputs.csv`. Follow these directions:\n- `sample_name` - must match the `genome` and `asset` value in the `assets.csv` file. Format it this way: `\u003cgenome\u003e-\u003casset\u003e`. This is how we match inputs to assets.\n\nNext you will need to fill in 3 columns:\n- `input_type` which is one of the following: *files*, *params* or *assets*\n- `intput_id` must match the recipe requirement. Again, use `refgenie build \u003cgenome\u003e/\u003casset\u003e -q` to learn the ids\n- `input_value` value for the input, e.g. URL in case of *files*\n\n### Step 3: See if you did it well!\n\n**Validate the PEP with [`eido`](http://eido.databio.org/en/latest/)**\n\nThe command below validates the PEP aginst a remote schema. Any PEP issues will result in a `ValidationError`:\n\n```\neido validate refgenie_build_cfg.yaml -s http://schema.databio.org/refgenie/refgenie_build.yaml\n```\n\n\n\n## Building assets using this PEP\n\n### Step 1: Download input files\n\nMany of the assets require some input files, and we have to make sure we have those files locally. In the `recipe_inputs.csv` file, we have entered these files as remote URLs, so the first step is to download them. We have created a subproject called `getfiles` for this: To programmatically download all the files required by `refgenie build`, run from this directory using [looper](http://looper.databio.org):\n\n```\nlooper run refgenie_build_cfg.yaml -p local --amend getfiles\n```\n\n### Step 2: Build assets\n\nOnce files are present locally, we can run `refgenie build` on each asset specified in the sample_table (`assets.csv`):\n\n```\nlooper run refgenie_build_cfg.yaml\n```\n\nThis will create one job for each *asset*.\n\n### Step 3. Archive assets\n\nAssets are built locally now, but to serve them, we must archive them using `refgenieserver`. The command is simple:\n\n```\nrefgenieserver archive -c \u003cpath/to/genomes.yaml\u003e\n```\n\nSince the archivization process is generally lengthy, it makes sense to submit this job to the cluster. Since you have [divvy](http://divvy.databio.org/en/latest/) installed (with looper), you can easily create a SLURM submission script with `divvy write`:\n\n```\ndivvy write -o archive_job.sbatch --code 'refgenieserver archive -c \u003cpath/to/genomes.yaml\u003e' ...\n```\nfor example:\n```\ndivvy write -o archive_job.sbatch \\\n  --code 'refgenieserver archive -c $PROJECT/genomes_staging/genomes.yaml' \\\n  --mem 12000 \\ \n  --cores 8 \\ \n  --logfile $HOME/refgenieserver_archive.log \\\n  --jobname refgenieserver_archive \\\n  --time 01-00:00:00\n```\nand submit it with:\n```\nsbatch archive_job.sbatch\n```\n\n### Step 4. Serve assets\n\n```\nrefgenieserver serve genomes.yaml\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabio%2Fdatabio_genomes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabio%2Fdatabio_genomes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabio%2Fdatabio_genomes/lists"}