{"id":16916905,"url":"https://github.com/althonos/diced","last_synced_at":"2025-08-02T15:39:22.692Z","repository":{"id":243683044,"uuid":"807046908","full_name":"althonos/diced","owner":"althonos","description":"A Rust reimplementation of the MinCED method for identifying CRISPRs in full or assembled genomes.","archived":false,"fork":false,"pushed_at":"2024-11-05T15:29:44.000Z","size":1765,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-11T16:24:39.639Z","etag":null,"topics":["bioinformatics","crispr","genomics","python-bindings","python-library","rust-library"],"latest_commit_sha":null,"homepage":"https://diced.readthedocs.io","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/althonos.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-28T11:38:56.000Z","updated_at":"2025-01-29T20:02:12.000Z","dependencies_parsed_at":"2025-01-02T17:41:02.415Z","dependency_job_id":"b1d12ff5-c6a5-4083-8523-a1d6840018e3","html_url":"https://github.com/althonos/diced","commit_stats":{"total_commits":52,"total_committers":1,"mean_commits":52.0,"dds":0.0,"last_synced_commit":"a3c93925e616a099ef58144e586f635d23b02bd4"},"previous_names":["althonos/diced","althonos/mincer"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fdiced","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fdiced/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fdiced/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fdiced/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/althonos","download_url":"https://codeload.github.com/althonos/diced/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248439039,"owners_count":21103526,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","crispr","genomics","python-bindings","python-library","rust-library"],"created_at":"2024-10-13T19:31:15.602Z","updated_at":"2025-04-11T16:24:45.285Z","avatar_url":"https://github.com/althonos.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🔪🧅 Diced [![Star me](https://img.shields.io/github/stars/althonos/mincer?style=social\u0026label=Star\u0026maxAge=3600)](https://github.com/althonos/diced/stargazers)\n\n*A Rust re-implementation of the [MinCED](https://github.com/ctSkennerton/minced) algorithm to Detect Instances of [CRISPRs](https://en.wikipedia.org/wiki/CRISPR) in Environmental Data.*\n\n[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/diced/rust.yml?branch=main\u0026logo=github\u0026style=flat-square\u0026maxAge=300)](https://github.com/althonos/diced/actions)\n[![Coverage](https://img.shields.io/codecov/c/gh/althonos/diced?logo=codecov\u0026style=flat-square\u0026maxAge=3600)](https://codecov.io/gh/althonos/diced/)\n[![License](https://img.shields.io/badge/license-GPLv3-blue.svg?style=flat-square\u0026maxAge=2678400)](https://choosealicense.com/licenses/gpl-3.0/)\n[![Crate](https://img.shields.io/crates/v/diced.svg?maxAge=600\u0026style=flat-square)](https://crates.io/crates/diced)\n[![Docs](https://img.shields.io/docsrs/diced?maxAge=600\u0026style=flat-square)](https://docs.rs/diced)\n[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/diced/)\n[![Mirror](https://img.shields.io/badge/mirror-LUMC-001158?style=flat-square\u0026maxAge=2678400)](https://git.lumc.nl/mflarralde/diced/)\n[![GitHub issues](https://img.shields.io/github/issues/althonos/diced.svg?style=flat-square\u0026maxAge=600)](https://github.com/althonos/diced/issues)\n[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/diced/blob/master/CHANGELOG.md)\n\n\n## 🗺️ Overview\n\nMinCED is a method developed by [Connor T. Skennerton](https://github.com/ctSkennerton) \nto identify [Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs)](https://en.wikipedia.org/wiki/CRISPR) \nin isolate and metagenomic-assembled genomes. It was derived from the CRISPR \nRecognition Tool [\\[1\\]](#ref1). It uses a fast scanning algorithm to identify\ncandidate repeats, combined with an extension step to find maximally spanning\nregions of the genome that feature a CRISPR repeat.\n\nDiced is a Rust reimplementation of the MinCED method, using the original\nJava code as a reference. It produces exactly the same results as MinCED,\ncorrects some bugs ([minced#35](https://github.com/ctSkennerton/minced/issues/35)), and is\nmuch faster. The Diced implementation is available as a Rust library for convenience.\n\n*This is the Rust version, there is a [Python package](https://pypi.org/project/diced) available as well.*\n\n### 📋 Features\n\n- **library interface**: The Rust implementation is written as library to facilitate reusability in other projects. It is used to implement a Python library using\nPyO3 to generate a native extension.\n- **zero-copy**: The `Scanner` which iterates over candidate CRISPRs is zero-copy if provided with a simple `\u0026str` reference, but it also supports data behind smart pointers such as `Rc\u003cstr\u003e` or `Arc\u003cstr\u003e`.\n- **fast string matching**: The Java implementation uses a handwritten implementation of the [Boyer-Moore algorithm](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm)[\\[2\\]](#ref2), while the Rust implementation uses the `str::find` method of the standard library, which implements the [Two-way algorithm](https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm)[\\[3\\]](#ref3). In addition, the [`memchr`](https://crates.io/crates/memchr) crate can be used as a fast SIMD-capable implementation of the `memmem` function.\n\n## 💡 Example\n\nDiced supports any sequence in string format.\n\n```rust\nlet mut reader = std::fs::File::open(\"tests/data/Aquifex_aeolicus_VF5.fna\")\n    .map(std::io::BufReader::new)\n    .map(noodles_fasta::Reader::new)\n    .unwrap();\nlet record = reader.records().next().unwrap().unwrap();\nlet seq = std::str::from_utf8(record.sequence().as_ref()).unwrap();\n\nfor crispr in diced::Scanner::new(\u0026seq) {\n    println!(\"{} to {}: {} repeats\", crispr.start(), crispr.end(), crispr.len());\n    for repeat in crispr.repeats() {\n        println!(\" - at {}: {}\", repeat.start(), repeat.as_str());\n    }\n}\n```\n\n## 💭 Feedback\n\n### ⚠️ Issue Tracker\n\nFound a bug ? Have an enhancement request ? Head over to the [GitHub issue\ntracker](https://github.com/althonos/diced/issues) if you need to report\nor ask something. If you are filing in on a bug, please include as much\ninformation as you can about the issue, and try to recreate the same bug\nin a simple, easily reproducible situation.\n\n\u003c!-- ### 🏗️ Contributing\n\nContributions are more than welcome! See [`CONTRIBUTING.md`](https://github.com/althonos/diced/blob/master/CONTRIBUTING.md) for more details. --\u003e\n\n## 📋 Changelog\n\nThis project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)\nand provides a [changelog](https://github.com/althonos/diced/blob/master/CHANGELOG.md)\nin the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.\n\n## ⚖️ License\n\nThis library is provided under the open-source\n[GPLv3 license](https://choosealicense.com/licenses/gpl-3.0/), or later. \nThe code for this implementation was derived from the \n[MinCED source code](https://github.com/ctSkennerton/minced), which is \navailable under the GPLv3 as well.\n\n*This project is in no way not affiliated, sponsored, or otherwise endorsed\nby the [original MinCED authors](https://github.com/ctSkennerton). It was developed \nby [Martin Larralde](https://github.com/althonos/) during his PhD project at \nthe [Leiden University Medical Center](https://www.lumc.nl/en/) in the \n[Zeller team](https://github.com/zellerlab).*\n\n## 📚 References\n\n- \u003ca id=\"ref1\"\u003e\\[1\\]\u003c/a\u003e Bland, C., Ramsey, T. L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N. C., \u0026 Hugenholtz, P. (2007). 'CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats'. BMC bioinformatics, 8, 209. [PMID:17577412](https://pubmed.ncbi.nlm.nih.gov/17577412/) [doi:10.1186/1471-2105-8-209](https://doi.org/10.1186/1471-2105-8-209).\n- \u003ca id=\"ref2\"\u003e\\[2\\]\u003c/a\u003e Boyer, R. S. and \u0026 Moore, J. S. (1977). 'A fast string searching algorithm'. Commun. ACM 20, 10 762–772. [doi:10.1145/359842.359859](https://doi.org/10.1145/359842.359859)\n- \u003ca id=\"ref3\"\u003e\\[3\\]\u003c/a\u003e Crochemore, M. \u0026 Perrin, D. (1991). 'Two-way string-matching'. J. ACM 38, 3, 650–674. [doi:10.1145/116825.116845](https://doi.org/10.1145/116825.116845)\n\n  \n\n\n  \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fdiced","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falthonos%2Fdiced","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fdiced/lists"}