https://github.com/databio/ppqc
QC samples for PEPPRO pipeline
https://github.com/databio/ppqc
Last synced: 3 months ago
JSON representation
QC samples for PEPPRO pipeline
- Host: GitHub
- URL: https://github.com/databio/ppqc
- Owner: databio
- Created: 2019-07-19T20:46:06.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2022-08-29T14:08:23.000Z (almost 4 years ago)
- Last Synced: 2025-01-15T15:08:44.047Z (over 1 year ago)
- Homepage:
- Size: 83 KB
- Stars: 1
- Watchers: 15
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Peppro QC PEP
## Get source samples using geofetch
```
geofetch -i sra_accessions.txt -n ppqc
```
## Grab the source sample metadata here:
```
wget -r http://big.databio.org/peppro/sra_meta/ppqc/ .
```
## Get RNA-seq spike-in samples here:
```
wget http://big.databio.org/peppro/fastq/K562_[1-9]0pct_RNArc_r2.fastq.gz .
```
## Validate the configuration file with [`eido`](https://github.com/pepkit/eido) like so:
```
eido -p peppro_paper.yaml -s http://schema.databio.org/pipelines/ProseqPEP.yaml
```
## Create user-specific environment variables
- `$CODE`: this is a path to the parent directory where scripts and code are stored
- For example:`CODE=/scratch/userid/src/`
- `$DATA`: this is a path to the parent directory where input files are stored
- For example:`DATA=/project/shefflab/data/`
- `$PROCESSED`: this is a path to the parent directory to where output should be written
- For example:`PROCESSED=/project/shefflab/processed/`
## Run in looper:
```
looper run ppqc/peppro_paper.yaml
```
The `peppro_paper.yaml` file is the working PEP for these samples.
The `peppro_paper.csv` file is the working annotation file for these samples.