{"id":18547585,"url":"https://github.com/kensho-technologies/sequence_align","last_synced_at":"2025-04-05T08:03:47.323Z","repository":{"id":151367881,"uuid":"623962592","full_name":"kensho-technologies/sequence_align","owner":"kensho-technologies","description":"Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.","archived":false,"fork":false,"pushed_at":"2025-03-05T16:18:21.000Z","size":206,"stargazers_count":71,"open_issues_count":3,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-29T07:04:24.275Z","etag":null,"topics":["bioinformatics","hirschberg","natural-language-processing","needleman-wunsch","nlp","pyo3","python","rust","sequence-alignment"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kensho-technologies.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-05T13:06:02.000Z","updated_at":"2025-03-28T15:41:39.000Z","dependencies_parsed_at":"2024-01-13T04:12:12.156Z","dependency_job_id":"3aa8551a-28de-4879-8ca2-bf009fbbcaed","html_url":"https://github.com/kensho-technologies/sequence_align","commit_stats":{"total_commits":18,"total_committers":2,"mean_commits":9.0,"dds":"0.38888888888888884","last_synced_commit":"782f0f8f2669faca7e0a8a5cce6ba75e530d8c43"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kensho-technologies%2Fsequence_align","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kensho-technologies%2Fsequence_align/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kensho-technologies%2Fsequence_align/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kensho-technologies%2Fsequence_align/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kensho-technologies","download_url":"https://codeload.github.com/kensho-technologies/sequence_align/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247305932,"owners_count":20917208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","hirschberg","natural-language-processing","needleman-wunsch","nlp","pyo3","python","rust","sequence-alignment"],"created_at":"2024-11-06T20:30:01.843Z","updated_at":"2025-04-05T08:03:47.306Z","avatar_url":"https://github.com/kensho-technologies.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"  \u003ca href=\"https://github.com/kensho-technologies/sequence_align/actions?query=workflow%3A%22Tests+and+lint%22\"\u003e\u003cimg src=\"https://github.com/kensho-technologies/sequence_align/workflows/Tests%20and%20lint/badge.svg\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/kensho-technologies/sequence_align\"\u003e\u003cimg src=\"https://codecov.io/gh/kensho-technologies/sequence_align/branch/main/graph/badge.svg\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://opensource.org/licenses/Apache-2.0\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-Apache%202.0-blue.svg\" /\u003e\u003c/a\u003e\n  \u003ca href=\"http://www.repostatus.org/#active\"\u003e\u003cimg src=\"http://www.repostatus.org/badges/latest/active.svg\" /\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/psf/black\"\u003e\u003cimg src=\"https://img.shields.io/badge/code%20style-black-000000.svg\" /\u003e\u003c/a\u003e\n\n# sequence_align\nEfficient implementations of [Needleman-Wunsch](https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm)\nand other sequence alignment algorithms written in Rust with Python bindings via [PyO3](https://github.com/PyO3/pyo3).\n\n\u003cp\u003e\u003cimg width=\"800px\" src=\"https://raw.githubusercontent.com/kensho-technologies/sequence_align/main/docs/images/sequence_align.png\"\u003e\u003c/p\u003e\n\n## Installation\n`sequence_align` is distributed via [PyPi](https://pypi.org/project/sequence_align) for Python 3.9 - 3.13, making installation as simple as the following --\nno special setup required for cross-platform compatibility, Rust installation, etc.!\n\n``` bash\npip install sequence_align\n```\n\nAlternatively, if one wishes to develop for `sequence_align`, first ensure that both\n[Python](https://wiki.python.org/moin/BeginnersGuide/Download) and [Rust](https://www.rust-lang.org/tools/install)\nare installed on your system. Then, install [Maturin](https://www.maturin.rs/#usage) and run\n`maturin develop` (optionally with the `-r` flag to compile a release build, instead of an unoptimized debug build)\nfrom the root of your cloned repo to build and install `sequence_align` in your active Python environment.\n\n## Quick Start\nPairwise sequence algorithms are available in [sequence_align.pairwise](src/sequence_align/pairwise.py).\nCurrently, two algorithms are implemented: the [Needleman-Wunsch algorithm](https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm)\nand [Hirschberg’s algorithm](https://en.wikipedia.org/wiki/Hirschberg%27s_algorithm). Needleman-Wunsch is\ncommonly used for global sequence alignment, but suffers from the fact that it uses `O(M*N)` space,\nwhere `M` and `N` are the lengths of the two sequences being aligned. Hirschberg’s algorithm modifies Needleman-Wunsch\nto have the same time complexity (`O(M*N)`), but only use `O(min{M, N})` space, making it an appealing option\nfor memory-limited applications or extremely large sequences.\n\nOne may also compute the Needleman-Wunsch alignment score for alignments produced by either algorithm\nusing [sequence_align.pairwise.alignment_score](src/sequence_align/pairwise.py).\n\nUsing these algorithms is straightforward:\n\n``` python\nfrom sequence_align.pairwise import alignment_score, hirschberg, needleman_wunsch\n\n\n# See https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm#/media/File:Needleman-Wunsch_pairwise_sequence_alignment.png\n# Use Needleman-Wunsch default scores (match=1, mismatch=-1, indel=-1)\nseq_a = [\"G\", \"A\", \"T\", \"T\", \"A\", \"C\", \"A\"]\nseq_b = [\"G\", \"C\", \"A\", \"T\", \"G\", \"C\", \"G\"]\n\naligned_seq_a, aligned_seq_b = needleman_wunsch(\n    seq_a,\n    seq_b,\n    match_score=1.0,\n    mismatch_score=-1.0,\n    indel_score=-1.0,\n    gap=\"_\",\n)\n\n# Expects [\"G\", \"_\", \"A\", \"T\", \"T\", \"A\", \"C\", \"A\"]\nprint(aligned_seq_a)\n\n# Expects [\"G\", \"C\", \"A\", \"_\", \"T\", \"G\", \"C\", \"G\"]\nprint(aligned_seq_b)\n\n# Expects 0\nscore = alignment_score(\n    aligned_seq_a,\n    aligned_seq_b,\n    match_score=1.0,\n    mismatch_score=-1.0,\n    indel_score=-1.0,\n    gap=\"_\",\n)\nprint(score)\n\n\n# See https://en.wikipedia.org/wiki/Hirschberg%27s_algorithm#Example\nseq_a = [\"A\", \"G\", \"T\", \"A\", \"C\", \"G\", \"C\", \"A\"]\nseq_b = [\"T\", \"A\", \"T\", \"G\", \"C\"]\n\naligned_seq_a, aligned_seq_b = hirschberg(\n    seq_a,\n    seq_b,\n    match_score=2.0,\n    mismatch_score=-1.0,\n    indel_score=-2.0,\n    gap=\"_\",\n)\n\n# Expects [\"A\", \"G\", \"T\", \"A\", \"C\", \"G\", \"C\", \"A\"]\nprint(aligned_seq_a)\n\n# Expects [\"_\", \"_\", \"T\", \"A\", \"T\", \"G\", \"C\", \"_\"]\nprint(aligned_seq_b)\n\n# Expects 1\nscore = alignment_score(\n    aligned_seq_a,\n    aligned_seq_b,\n    match_score=2.0,\n    mismatch_score=-1.0,\n    indel_score=-2.0,\n    gap=\"_\",\n)\nprint(score)\n```\n\n## Performance Benchmarks\nAll tests below were conducted sequentially on a [AWS R5.4 instance](https://aws.amazon.com/ec2/instance-types/r5/)\nwith 16 cores and 128 GB of memory. The pair of sequences for alignment consist of a character sequence of randomly\nselected A/C/G/T nucleotide bases along with another that is identical, except with 10% of the characters randomly\nperturbed by deletion, insertion of another randomly-selected character after the entry, or replacement with a\ndifferent randomly-selected character.\n\nAs one can see, while `sequence_align` is comparable to some other toolkits in terms of speed, its memory performance\nis **best-in-class**, even when compared to toolkits using the same algorithm, such as Needleman-Wunsch being used in\n`pyseq-align`. \n\n_(Please note that some lines terminate early, as some toolkits took prohibitively long and/or ran out of memory at higher scales.)_\n\n\u003cp\u003e\u003cimg width=\"800px\" src=\"https://raw.githubusercontent.com/kensho-technologies/sequence_align/main/docs/images/runtime_benchmark.png\"\u003e\u003c/p\u003e\n\n\u003cp\u003e\u003cimg width=\"800px\" src=\"https://raw.githubusercontent.com/kensho-technologies/sequence_align/main/docs/images/memory_benchmark.png\"\u003e\u003c/p\u003e\n\n## License\nLicensed under the Apache 2.0 License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n\nCopyright 2023-present Kensho Technologies, LLC. The present date is determined by the timestamp of the most recent commit in the repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkensho-technologies%2Fsequence_align","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkensho-technologies%2Fsequence_align","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkensho-technologies%2Fsequence_align/lists"}