Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/blahah/graphsample
Subsample FASTQ by sampling connected components of a de-Bruijn graph
https://github.com/blahah/graphsample
Last synced: 6 days ago
JSON representation
Subsample FASTQ by sampling connected components of a de-Bruijn graph
- Host: GitHub
- URL: https://github.com/blahah/graphsample
- Owner: blahah
- Created: 2015-05-26T15:49:00.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-06-20T00:27:50.000Z (over 9 years ago)
- Last Synced: 2024-04-16T01:01:02.887Z (7 months ago)
- Language: C++
- Size: 383 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# graphsample
Subsample FASTQ by sampling connected components of a de-Bruijn graph
Taking a subsample from a FASTQ can lead to poor coverage of some regions in the subsample, causing the subsample to have informational properties that are not representative of the full read set.
`graphsample` addresses this problem by building a [de-Bruijn graph](http://en.wikipedia.org/wiki/De_Bruijn_graph) from the reads, identifying all the [connected components](http://en.wikipedia.org/wiki/Connected_component_%28graph_theory%29), and randomly sampling those components. It outputs the reads that belong to the sampled components.
What's the point? Glad you asked! `graphsample` allows you to take a small subsample from a large set of reads, and use the subsample to optimise the parameters of any tools and algorithms you want to run on the full set.
## Compiling
```bash
$ git clone --recursive https://github.com/Blahah/graphsample.git
$ cd graphsample
$ make
```## Running
```bash
$ bin/graphsample --help
```