https://github.com/clinical-genomics/downsampling

Downsample fastq files in an automated way
https://github.com/clinical-genomics/downsampling

Last synced: 7 months ago
JSON representation

Downsample fastq files in an automated way

Host: GitHub
URL: https://github.com/clinical-genomics/downsampling
Owner: Clinical-Genomics
Created: 2014-10-22T19:02:40.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2020-05-07T19:13:09.000Z (about 5 years ago)
Last Synced: 2023-03-12T06:28:35.229Z (over 2 years ago)
Language: Shell
Size: 15.6 KB
Stars: 9
Watchers: 9
Forks: 4
Open Issues: 3
Metadata Files:
- Readme: README.md
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

# DOWNSAMPLING

Script to downsample a bunch of fastq files.
Reads will be selected randomly.

## Usage

```bash

./downsample.sh [-2] indir outdir reads [total reads]

with:
-2: To reduce memory footprint, do a doube pass. Takes twice as long.
indir: The input directory. The script will expect forward and reverse
strand files found with a matching pattern.
- forward match pattern: *_1.fastq.gz
- reverse match pattern: *_2.fastq.gz
outdir: The output directory. Will be created if it does not exist.
One output file per strand will be created in this directory.
The output file name will be the first file name in the input
directory matched with above mentioned patterns.
reads: The amount of read pairs to keep.
total reads: To reduce memory footprint, will produce an estimate amount
of read pairs to keep. Does NOT work with the -2 option.
Only requires two cores.
```

## Dependencies

This script uses [seqtk](https://github.com/lh3/seqtk) to quickly downsample
fastq files.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/clinical-genomics/downsampling

Awesome Lists containing this project

README