https://github.com/althonos/pyncbitk
Cython bindings and Python interface to the NCBI C++ ToolKit (including BLAST+).
https://github.com/althonos/pyncbitk
bioinformatics blast cython-library python-bindings python-library sequence-analysis
Last synced: about 2 months ago
JSON representation
Cython bindings and Python interface to the NCBI C++ ToolKit (including BLAST+).
- Host: GitHub
- URL: https://github.com/althonos/pyncbitk
- Owner: althonos
- License: mit
- Created: 2024-07-30T18:17:59.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-17T19:42:21.000Z (12 months ago)
- Last Synced: 2025-08-21T06:38:23.742Z (6 months ago)
- Topics: bioinformatics, blast, cython-library, python-bindings, python-library, sequence-analysis
- Language: Cython
- Homepage:
- Size: 1.52 MB
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: COPYING
Awesome Lists containing this project
README
# π§¬π§° PyNCBItk [](https://github.com/althonos/pyncbitk/stargazers)
*(Unofficial) [Cython](https://cython.org/) bindings and Python interface to the [NCBI C++ Toolkit](https://www.ncbi.nlm.nih.gov/toolkit).*
[](https://github.com/althonos/pyncbitk/actions)
[](https://codecov.io/gh/althonos/pyncbitk/)
[](https://choosealicense.com/licenses/mit/)
[](https://pypi.org/project/pyncbitk)
[](https://anaconda.org/bioconda/pyncbitk)
[](https://aur.archlinux.org/packages/python-pyncbitk)
[](https://pypi.org/project/pyncbitk/#files)
[](https://pypi.org/project/pyncbitk/#files)
[](https://pypi.org/project/pyncbitk/#files)
[](https://github.com/althonos/pyncbitk/)
[](https://git.lumc.nl/mflarralde/pyncbitk/)
[](https://github.com/althonos/pyncbitk/issues)
[](https://pyncbitk.readthedocs.io)
[](https://github.com/althonos/pyncbitk/blob/main/CHANGELOG.md)
[](https://pepy.tech/project/pyncbitk)
***β οΈ This package is a work-in-progress and in a very experimental state. Expect segmentation faults, compilation issues, missing features, incomplete documentation.***
## πΊοΈ Overview
The [NCBI C++ Toolkit](https://ncbi.github.io/cxx-toolkit/) is a framework of
C++ libraries to work with biological sequence data developed at the
[National Center for Biotechnology Information](https://www.ncbi.nlm.nih.gov/).
It features a flexible object model for representing sequences of various
origin, including composite or virtual sequences; a resource manager
to easily manipulate heterogeneous data sources; and a comprehensive API to the
various BLAST algorithms[\[1\]](#ref1) developed at the NBCI.
PyNCBItk is a Python library that provides bindings to the NCBI C++ Toolkit
data model and BLAST+ interface using [Cython](https://cython.org). It exposes
the internals of the C++ Toolkit, allowing BLAST queries to be run directly
from the Python interpreter without external I/O.
## π Roadmap
The package is in a very experimental state, and only a few core features are
supported at the moment:
- [x] Loading sequences from a FASTA file.
- [x] Creating basic sequences through the Python API.
- [x] Running BLAST searches with default parameters.
- [ ] Thorough BLAST configuration.
- [ ] Error and warning management.
- [ ] Support for all kinds of sequence storage.
- [ ] Multi-threading for database searches using Python threads.
- [ ] Advanced interface for the object manager.
- [ ] Interface for all sequence and alignment types.
## π§ Installing
PyNCBItk is available for all modern Python (3.7+). Compilation is done
through [CMake](https://cmake.org) using [Scikit-build-core](https://scikit-build-core.readthedocs.io).
To install an alpha release, use `pip` with the `--pre` flag:
```console
$ pip install pyncbitk --pre
```
The `pyncbitk` package requires additional runtime libraries that are distributed
in the `pyncbitk-runtime` package. These libraries should be available in a pre-compiled
wheel for Linux and MacOS platforms. Otherwise, they can be compiled on setup
with the [Conan C/C++ package manager](https://docs.conan.io/2/)
to handle compilation of the NCBI C++ Toolkit. *The project will take ages to
compile the first time, but afterwards only the Cython code will have to be
recompiled.*
## π‘ Example
```python
from pyncbitk.objects.seqset import BioSeqSet
from pyncbitk.objtools import DatabaseReader, FastaReader
from pyncbitk.algo.blast import BlastN
# read the sequences from FASTA-formatted files
queries = BioSeqSet(FastaReader("queries.fna", split=False))
subjects = BioSeqSet(FastaReader("subjects.fna", split=False))
# run `blastn` with default parameters
blastn = BlastN()
results = blastn.run(queries, subjects)
```
The result is a `SearchResultsSet` which contains one `SearchResults` object
per query/subject pair. The `SearchResults` object summarizes the result
and contains the hit alignments in a `SeqAlignSet`.
See the [Examples section](https://pyncbitk.readthedocs.io/en/latest/examples/index.html)
in the [online documentation](https://pyncbitk.readthedocs.io/en/latest/examples/index.html)
for more information.
## π Feedback
### β οΈ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the
[GitHub issue tracker](https://github.com/althonos/pyncbitk/issues)
if you need to report or ask something. If you are filing in on a bug,
please include as much information as you can about the issue, and try to
recreate the same bug in a simple, easily reproducible situation.
### ποΈ Contributing
Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/pyncbitk/blob/main/CONTRIBUTING.md)
for more details.
## π Changelog
This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/althonos/pyncbitk/blob/main/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.
## βοΈ License
This library is provided under the [MIT License](https://choosealicense.com/licenses/mit/).
The NCBI C++ Toolkit is a "United States Government Work" and therefore lies in
the public domain, but may be subject to copyright by the U.S. in foreign
countries. Some restrictions apply, see the
[NCBI C++ Toolkit license](https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/LICENSE).
*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the NCBI or any associated entity. It was developed
by [Martin Larralde](https://github.com/althonos/) during his PhD
at the [Leiden University Medical Center](https://www.lumc.nl/en/) in
the [Zeller team](https://github.com/zellerlab).*
## π References
- \[1\] Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. *Journal of molecular biology*, 215(3), 403β410. [doi:10.1016/S0022-2836(05)80360-2](https://doi.org/10.1016/S0022-2836(05)80360-2)