https://github.com/databio/paqc

PEPATAC qc project
https://github.com/databio/paqc

Last synced: 4 months ago
JSON representation

PEPATAC qc project

Host: GitHub
URL: https://github.com/databio/paqc
Owner: databio
Created: 2020-03-04T21:20:41.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2021-07-09T02:23:26.000Z (almost 5 years ago)
Last Synced: 2025-09-11T10:15:51.438Z (10 months ago)
Size: 18.6 KB
Stars: 0
Watchers: 6
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# PEPATAC QC PEP

The `paqc.yaml` file is the working PEP for these samples.
The `paqc_annotation.csv` file is the working annotation file for these samples.

## Download data

Get source data using geofetch
```
geofetch -i accessions.txt -n paqc -m paqc_geo_metadata
```

You can set up a default sratoolkit config like this:

```
export DATA="..."
echo "/repository/user/main/public/root = \"$DATA\"" > ${HOME}/.ncbi/user-settings.mkfg
```

## Format your PEP

You'll need to manually tweak the output of geofetch to adopt the new PEP 2.0.0 specification. Validate the configuration file with [`eido`](https://github.com/pepkit/eido) like so:
```
eido validate paqc.yaml -s http://schema.databio.org/pipelines/pepatac.yaml
```

## Convert the SRA files to FASTQ

Use the `sra_convert` amendment to point at the conversion pipeline. Run in looper:
```
looper run paqc.yaml -a sra_convert --lump 25
```

## Run PEPATAC

```
PROCESSED=/project/shefflab/processed DATA=/project/shefflab/data/ looper run paqc.yaml -d
```

The `peppro_paper.yaml` file is the working PEP for these samples.
The `peppro_paper.csv` file is the working annotation file for these samples.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/databio/paqc

Awesome Lists containing this project

README