https://github.com/unkindpartition/bamcc
bamcc — split SAM/BAM files into connected components
https://github.com/unkindpartition/bamcc
alignment bioinformatics biology dna rna rna-seq
Last synced: 2 months ago
JSON representation
bamcc — split SAM/BAM files into connected components
- Host: GitHub
- URL: https://github.com/unkindpartition/bamcc
- Owner: UnkindPartition
- License: mit
- Created: 2017-04-01T16:32:56.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-04-01T19:01:54.000Z (about 8 years ago)
- Last Synced: 2025-02-03T12:52:48.655Z (4 months ago)
- Topics: alignment, bioinformatics, biology, dna, rna, rna-seq
- Language: C++
- Size: 17.6 KB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bamcc
## Description
A set of sequence alignments (represented as a BAM/SAM/CRAM file) defines
a bipartite graph in which the nodes are reference and template sequences
and the edges are alignments between a template and a reference.bamcc splits this graph into connected components and reports the mapping from
reference sequences to their components.This is useful e.g. when analyzing multireads in an RNA-Seq experiment,
where connected components can be analyzed independently.## Usage
```
bamcc input.bam output.tsv
```The `output.tsv` file will look like this:
```
seqid seqname component
0 FBtr0005088 0
1 FBtr0006151 1
2 FBtr0070000 2
3 FBtr0070002 0
4 FBtr0070003 2
5 FBtr0070006 3
```where:
1. `seqid` is the 0-based number of the reference sequence.
Sequence numbers are defined by the input file and are stable.
2. `seqname` is the reference sequence name.
3. `component` is the 0-based component number to which the reference has been
assigned.To extract, say, the 17th component into a separate bam file, run
```
samtools view -bh -o example.17.bam example.sorted.bam \
$(awk 'BEGIN{ORS=" "} NR>1 && $3==17 {print $2}' rsem_orig.tsv)
```## Building
### Dependencies
* A C++-14 compiler
* [htslib](http://www.htslib.org/)
* [Boost](http://www.boost.org/)### Compilation
```
CXXFLAGS=-O2 make
```This will create an executable `bamcc` in the current directory.
## Testing
First, run `make`.
Then, run `./test`.
This will update all files `test_files/example*.tsv`.
Failures may appear either as messages from `bamcc` or differences in output
files reported by `git diff`.## Security
Do not run this program on untrusted or potentially malformed input files.