https://github.com/dombennett/project-cluster
:hibiscus: Short pipeline for counting number of clusters across .sam files
https://github.com/dombennett/project-cluster
cdhit clustering pipeline polyploid
Last synced: 11 months ago
JSON representation
:hibiscus: Short pipeline for counting number of clusters across .sam files
- Host: GitHub
- URL: https://github.com/dombennett/project-cluster
- Owner: DomBennett
- Created: 2018-02-28T12:08:37.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-05-07T12:38:31.000Z (about 8 years ago)
- Last Synced: 2025-02-08T06:12:58.622Z (over 1 year ago)
- Topics: cdhit, clustering, pipeline, polyploid
- Language: Python
- Homepage:
- Size: 19.5 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project-cluster
Identify and count clusters across a series of .sam files.
## Usage
`python run.py --help`
## Install
`git clone https://github.com/DomBennett/Project-cluster.git`
Or download the zipped folder:
`wget https://github.com/DomBennett/Project-cluster/archive/master.zip`
## Requirements
* One .sam file stored per folder
* cdhit
* Python (v2 or v3)
## Steps
* Convert .sam to .fasta by extracting the orthologous sequence identified within
the .sam file.
* Run cdhit
* Count clusters with greater than `min_nsqs`
* Report number of clusters per .sam in a .csv
## Authors
D.J. Bennett & J.S. Eriksson