{"id":20425059,"url":"https://github.com/databio/rnapipe","last_synced_at":"2026-06-06T11:31:51.677Z","repository":{"id":86287792,"uuid":"93782814","full_name":"databio/rnapipe","owner":"databio","description":null,"archived":false,"fork":false,"pushed_at":"2019-04-06T13:43:18.000Z","size":14183,"stargazers_count":0,"open_issues_count":1,"forks_count":2,"subscribers_count":6,"default_branch":"master","last_synced_at":"2026-01-27T00:09:22.311Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databio.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-08T19:04:57.000Z","updated_at":"2019-04-06T13:43:20.000Z","dependencies_parsed_at":"2023-03-10T20:15:36.881Z","dependency_job_id":null,"html_url":"https://github.com/databio/rnapipe","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/databio/rnapipe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Frnapipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Frnapipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Frnapipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Frnapipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databio","download_url":"https://codeload.github.com/databio/rnapipe/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Frnapipe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33981122,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-06T02:00:07.033Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T07:12:08.884Z","updated_at":"2026-06-06T11:31:51.639Z","avatar_url":"https://github.com/databio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RNA-seq pipelines\n\nThis repository contains pipelines to process RNA-seq data. You can download the latest version from the [releases page](https://github.com/databio/dnameth_pipelines/releases) and a history of version changes is in the [CHANGELOG](CHANGELOG.md).\n\n## Pipeline features at-a-glance\n\nThese features are explained in more detail later in this README.\n\nDescription pending.\n\n## Quick start\n\nIf your system has everything installed, run the examples like this:\n\n```\ncd examples\nlooper run test_config.yaml -d\n```\n\n## Installing\n\n**Prerequisite python packages**. This pipeline uses [pypiper](https://github.com/epigen/pypiper) to run a single sample, [looper](https://github.com/epigen/looper) to handle multi-sample projects (for either local or cluster computation), and [pararead](https://github.com/databio/pararead) for parallel processing sequence reads. You can do a user-specific install of these like this:\n\n```\npip install --user https://github.com/databio/pypiper/zipball/master\npip install --user https://github.com/pepkit/looper/zipball/master\npip install --user https://github.com/databio/pararead/zipball/master\n```\n\n**Required executables**. You will need some common bioinformatics tools installed. The list is specified in the pipeline configuration files (`.yaml` files in [src/](src/)).\n\n**Genome resources**. This pipeline requires genome assemblies produced by [refgenie](https://github.com/databio/refgenie). You may [download pre-indexed references](http://cloud.databio.org/refgenomes) or you may index your own (see [refgenie instructions](https://github.com/databio/refgenie#indexing-your-own-reference-genome)).\n\n**Clone the pipeline**. Clone this repository using one of these methods:\n- using SSH: `git clone git@github.com:databio/rnapipe.git`\n- using HTTPS: `git clone https://github.com/databio/rnapipe.git`\n\n## Configuring\n\nThere are two configuration options: You can either set up environment variables to fit the default configuration, or change the configuration file to fit your environment. Choose one:\n\n**Option 1: Default configuration** (recommended; `.yaml` files in [src/](src/)). \n  - Make sure the executable tools (java, samtools, bowtie2, etc.) are in your PATH.\n  - Set up environment variables to point to `jar` files for the java tools (`picard` and `trimmomatic`).\n  ```\n  export PICARD=\"/path/to/picard.jar\"\n  export TRIMMOMATIC=\"/path/to/trimmomatic.jar\"\n  ```\n  \n  - Define environment variable `GENOMES` for refgenie genomes. \n  ```\n  export GENOMES=\"/path/to/genomes/folder/\"\n  ```\n  \n\n**Option 2: Custom configuration**. Instead, you can also put absolute paths to each tool or resource in the configuration file to fit your local setup. Just change the pipeline configuration file (`.yaml` files in [src/](src/)) appropriately. \n\n\n## Running the pipeline\n\nYou never need to interface with the pipeline directly, but you can if you want. Just run `python src/SCRIPTNAME.py -h` to see usage. But the best way to use this pipeline is to run it using looper. You will need to tell looper about your project. Example project data are in the [examples/](examples/) folder. Run the pipeline across all samples in the test project with this command:\n```\nlooper run examples/test_config.yaml\n```\n\nIf the looper executable in not your `$PATH`, add the following line to your `.bashrc` or `.profile`:\n\n```\nexport PATH=$PATH:~/.local/bin\n```\n\nNow, adapt the example project to your project. Here's a quick start: You need to build two files for your project (follow examples in the [examples/](examples/) folder):\n\n- [project config file](examples/test_project/test_config.yaml) -- describes output locations, pointers to data, etc.\n- [sample annotation file](examples/test_project/test_annotation.csv) -- comma-separated value (CSV) list of your samples.\n\nYour annotation file must specify these columns:\n- sample_name\n- library\n- organism\n- read1\n- read2\n- whatever else you want\n\nRun your project as above, by passing your project config file to `looper run`. More detailed instructions and advanced options for how to define your project are in the [Looper documentation on defining a project](http://looper.readthedocs.io/en/latest/define-your-project.html). Of particular interest may be the section on [using looper derived columns](http://looper.readthedocs.io/en/latest/advanced.html#pointing-to-flexible-data-with-derived-columns).\n\n## Using a cluster\n\nOnce you've specified your project to work with this pipeline, you will also inherit all the power of looper for your project.  You can submit these jobs to a cluster with a simple change to your configuration file. Follow instructions in [configuring looper to use a cluster](http://looper.readthedocs.io/en/latest/cluster-computing.html).\n\nLooper can also summarize your results, monitor your runs, clean intermediate files to save disk space, and more. You can find additional details on what you can do with this in the [looper docs](http://looper.readthedocs.io/). \n\n## Contributing\n\nPull requests welcome. Active development should occur in a development or feature branch.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabio%2Frnapipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabio%2Frnapipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabio%2Frnapipe/lists"}