Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yuanx749/py-cdhit
A Python package for CD-HIT, clustering protein or nucleotide sequences.
https://github.com/yuanx749/py-cdhit
bioinformatics package sequence-analysis tool
Last synced: about 14 hours ago
JSON representation
A Python package for CD-HIT, clustering protein or nucleotide sequences.
- Host: GitHub
- URL: https://github.com/yuanx749/py-cdhit
- Owner: yuanx749
- License: gpl-2.0
- Created: 2023-05-25T10:49:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-22T03:31:05.000Z (3 months ago)
- Last Synced: 2025-02-03T03:52:44.535Z (8 days ago)
- Topics: bioinformatics, package, sequence-analysis, tool
- Language: Python
- Homepage: https://yuanx749.github.io/py-cdhit/
- Size: 56.6 KB
- Stars: 122
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# py-cdhit
[![PyPI version](https://badge.fury.io/py/py-cdhit.svg)](https://badge.fury.io/py/py-cdhit)
[![Downloads](https://static.pepy.tech/badge/py-cdhit/month)](https://pepy.tech/project/py-cdhit)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/197a0be6dcd14961b919e666a0de39eb)](https://app.codacy.com/gh/yuanx749/py-cdhit/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)A Python package for CD-HIT, clustering protein or nucleotide sequences.
This package provides a Python interface for CD-HIT (Cluster Database at High Identity with Tolerance), which has programs for clustering biological sequences with a very fast speed. Specifically, this package contains functions that run commands and read the output files, thus reducing the overhead of switching between languages and writing parsing code when using Python in the data analysis workflows.
Read the documentation [here](https://yuanx749.github.io/py-cdhit/).
## Usage
A simple example on Linux is provided below. See the [notebook](docs/examples/examples.ipynb) for more details.
```Python
from pycdhit import cd_hit, read_clstrres = cd_hit(
i="./docs/examples/apd.fasta",
o="./docs/examples/out",
c=0.7,
d=0,
sc=1,
)df_clstr = read_clstr("./docs/examples/out.clstr")
```Please visit CD-HIT's [documentations](https://github.com/weizhongli/cdhit/wiki) for its installation and the options of commands.
## Installation
First Install CD-HIT. [Mamba](https://mamba.readthedocs.io/) is recommended. For example, to create an environment and install:
```bash
mamba create -n myenv python=3.10
mamba activate myenv
``````bash
mamba install -c bioconda cd-hit cd-hit-auxtools
```Then install this package from PyPI:
```bash
pip install py-cdhit
```## Development
Install from source after git clone:
```bash
cd py-cdhit
pip install -e '.[dev]'
pip install -r docs/requirements.txt
python -m pytest --cov-report term-missing --cov=pycdhit tests/
```