https://github.com/clinical-genomics/downsampling
Downsample fastq files in an automated way
https://github.com/clinical-genomics/downsampling
Last synced: 7 months ago
JSON representation
Downsample fastq files in an automated way
- Host: GitHub
- URL: https://github.com/clinical-genomics/downsampling
- Owner: Clinical-Genomics
- Created: 2014-10-22T19:02:40.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2020-05-07T19:13:09.000Z (about 5 years ago)
- Last Synced: 2023-03-12T06:28:35.229Z (over 2 years ago)
- Language: Shell
- Size: 15.6 KB
- Stars: 9
- Watchers: 9
- Forks: 4
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# DOWNSAMPLING
Script to downsample a bunch of fastq files.
Reads will be selected randomly.## Usage
```bash
./downsample.sh [-2] indir outdir reads [total reads]
with:
-2: To reduce memory footprint, do a doube pass. Takes twice as long.
indir: The input directory. The script will expect forward and reverse
strand files found with a matching pattern.
- forward match pattern: *_1.fastq.gz
- reverse match pattern: *_2.fastq.gz
outdir: The output directory. Will be created if it does not exist.
One output file per strand will be created in this directory.
The output file name will be the first file name in the input
directory matched with above mentioned patterns.
reads: The amount of read pairs to keep.
total reads: To reduce memory footprint, will produce an estimate amount
of read pairs to keep. Does NOT work with the -2 option.
Only requires two cores.
```## Dependencies
This script uses [seqtk](https://github.com/lh3/seqtk) to quickly downsample
fastq files.