An open API service indexing awesome lists of open source software.

https://github.com/dombennett/project-cluster

:hibiscus: Short pipeline for counting number of clusters across .sam files
https://github.com/dombennett/project-cluster

cdhit clustering pipeline polyploid

Last synced: 11 months ago
JSON representation

:hibiscus: Short pipeline for counting number of clusters across .sam files

Awesome Lists containing this project

README

          

# Project-cluster

Identify and count clusters across a series of .sam files.

## Usage

`python run.py --help`

## Install

`git clone https://github.com/DomBennett/Project-cluster.git`

Or download the zipped folder:

`wget https://github.com/DomBennett/Project-cluster/archive/master.zip`

## Requirements

* One .sam file stored per folder
* cdhit
* Python (v2 or v3)

## Steps

* Convert .sam to .fasta by extracting the orthologous sequence identified within
the .sam file.
* Run cdhit
* Count clusters with greater than `min_nsqs`
* Report number of clusters per .sam in a .csv

## Authors

D.J. Bennett & J.S. Eriksson