https://github.com/yuanx749/py-cdhit
A Python package for CD-HIT, clustering protein or nucleotide sequences.
https://github.com/yuanx749/py-cdhit
bioinformatics package sequence-analysis tool
Last synced: 10 months ago
JSON representation
A Python package for CD-HIT, clustering protein or nucleotide sequences.
- Host: GitHub
- URL: https://github.com/yuanx749/py-cdhit
- Owner: yuanx749
- License: gpl-2.0
- Created: 2023-05-25T10:49:37.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-11-22T03:31:05.000Z (over 1 year ago)
- Last Synced: 2025-03-28T14:09:34.876Z (about 1 year ago)
- Topics: bioinformatics, package, sequence-analysis, tool
- Language: Python
- Homepage: https://yuanx749.github.io/py-cdhit/
- Size: 56.6 KB
- Stars: 121
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# py-cdhit
[](https://badge.fury.io/py/py-cdhit)
[](https://pepy.tech/project/py-cdhit)
[](https://app.codacy.com/gh/yuanx749/py-cdhit/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
A Python package for CD-HIT, clustering protein or nucleotide sequences.
This package provides a Python interface for CD-HIT (Cluster Database at High Identity with Tolerance), which has programs for clustering biological sequences with a very fast speed. Specifically, this package contains functions that run commands and read the output files, thus reducing the overhead of switching between languages and writing parsing code when using Python in the data analysis workflows.
Read the documentation [here](https://yuanx749.github.io/py-cdhit/).
## Usage
A simple example on Linux is provided below. See the [notebook](docs/examples/examples.ipynb) for more details.
```Python
from pycdhit import cd_hit, read_clstr
res = cd_hit(
i="./docs/examples/apd.fasta",
o="./docs/examples/out",
c=0.7,
d=0,
sc=1,
)
df_clstr = read_clstr("./docs/examples/out.clstr")
```
Please visit CD-HIT's [documentations](https://github.com/weizhongli/cdhit/wiki) for its installation and the options of commands.
## Installation
First Install CD-HIT. [Mamba](https://mamba.readthedocs.io/) is recommended. For example, to create an environment and install:
```bash
mamba create -n myenv python=3.10
mamba activate myenv
```
```bash
mamba install -c bioconda cd-hit cd-hit-auxtools
```
Then install this package from PyPI:
```bash
pip install py-cdhit
```
## Development
Install from source after git clone:
```bash
cd py-cdhit
pip install -e '.[dev]'
pip install -r docs/requirements.txt
python -m pytest --cov-report term-missing --cov=pycdhit tests/
```