{"id":13623334,"url":"https://github.com/Daniel-Liu-c0deb0t/triple_accel","last_synced_at":"2025-04-15T14:32:42.590Z","repository":{"id":49608244,"uuid":"249540724","full_name":"Daniel-Liu-c0deb0t/triple_accel","owner":"Daniel-Liu-c0deb0t","description":"Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.","archived":false,"fork":false,"pushed_at":"2023-03-13T08:16:03.000Z","size":186,"stargazers_count":93,"open_issues_count":7,"forks_count":10,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-04-30T09:34:08.352Z","etag":null,"topics":["algorithms","avx2","dynamic-programming","hamming","levenshtein","rust","simd","sse","string-distance","string-matching","string-search","string-similarity"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Daniel-Liu-c0deb0t.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-03-23T20:48:06.000Z","updated_at":"2024-02-18T07:09:50.000Z","dependencies_parsed_at":"2023-01-24T00:46:35.218Z","dependency_job_id":"8b06fcd3-1922-4238-8ee4-5aec5f5ef8c0","html_url":"https://github.com/Daniel-Liu-c0deb0t/triple_accel","commit_stats":{"total_commits":143,"total_committers":4,"mean_commits":35.75,"dds":0.034965034965035,"last_synced_commit":"0f2119a6bbc3e6f007bcca0051660408c47ee883"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Daniel-Liu-c0deb0t%2Ftriple_accel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Daniel-Liu-c0deb0t%2Ftriple_accel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Daniel-Liu-c0deb0t%2Ftriple_accel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Daniel-Liu-c0deb0t%2Ftriple_accel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Daniel-Liu-c0deb0t","download_url":"https://codeload.github.com/Daniel-Liu-c0deb0t/triple_accel/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249089023,"owners_count":21210903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","avx2","dynamic-programming","hamming","levenshtein","rust","simd","sse","string-distance","string-matching","string-search","string-similarity"],"created_at":"2024-08-01T21:01:30.504Z","updated_at":"2025-04-15T14:32:42.285Z","avatar_url":"https://github.com/Daniel-Liu-c0deb0t.png","language":"Rust","funding_links":[],"categories":["Libraries","库 Libraries"],"sub_categories":["Text processing","文本处理 Text processing"],"readme":"# triple_accel\n![Test](https://github.com/Daniel-Liu-c0deb0t/triple_accel/workflows/Test/badge.svg)\n![GitHub](https://img.shields.io/github/license/Daniel-Liu-c0deb0t/triple_accel)\n![Crates.io](https://img.shields.io/crates/v/triple_accel)\n![Docs.rs](https://docs.rs/triple_accel/badge.svg)\n\nRust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein,\nrestricted Damerau-Levenshtein, etc. distance calculations and string search.\n\nAlthough vectorized SIMD code allows for up to 20-30x speedups over their scalar counterparts,\nthe difficulty of handling platform-dependent SIMD code makes SIMD routines less attractive.\nThe goal of this library is to provide an easy-to-use abstraction over SIMD edit distance routines\nthat fall back to scalar routines if the target CPU architecture is not supported.\nAdditionally, all limitations and tradeoffs of the edit distance routines should be provided upfront\nso the user knows exactly what to expect.\nFinally, this library should lead to performance boosts on both short and longer strings, so it\ncan be used for a variety of tasks, from bioinformatics to natural language processing.\n`triple_accel` is very lightweight: it only has dependencies on other crates for benchmarking.\nIt can be built on machines without CPUs that have AVX2 or SSE4.1 support. It can also run on\nmachines without SIMD support by automatically using scalar alternatives.\n\n## Install\nAdd\n```\ntriple_accel = \"*\"\n```\nto the `[dependencies]` section of your `Cargo.toml`. This library is available\n[here](https://crates.io/crates/triple_accel) on crates.io.\n\nAlternatively, you can clone this repository and run\n```\ncargo build --release\n```\nIn general, for maximum efficiency, use `RUSTFLAGS=\"-C target-cpu=native\"` if portability is not an issue.\n\n## Tests\nYou can run tests with\n```\ncargo test\n```\nafter cloning the repository.\n\nContinuous integration is used to ensure that the code passes all tests on the latest Linux, Windows,\nand Mac platforms. Additionally, crate feature flags like `jewel-sse`, `jewel-avx`, `jewel-8bit`,\n`jewel-16bit`, and `jewel-32bit` are used to override the default automatic detection of CPU features,\nso all features can be thoroughly tested in continuous integration. The `debug` feature flag is specified,\nso the exact underlying vector type that is used is printed.\n\n## Benchmarks\nBenchmarks can be ran with\n```\ncargo bench\n```\n\n## Docs\nThe docs are available [here](https://docs.rs/triple_accel). To build them on\nyour machine, run\n```\ncargo doc\n```\n\n## Features\nThis library provides routines for both searching for some needle string in a haystack string\nand calculating the edit distance between two strings. Hamming distance (mismatches only),\nLevenshtein distance (mismatches + gaps), and restricted Damerau-Levenshtein distance\n(transpositions + mismatches + gaps) are supported, along with arbitrary edit costs for mismatch\nand gap open/extend. This library provides a simple interface, in addition to powerful lower-level\ncontrol over the edit distance calculations.\n\nAt runtime, the implementation for a certain algorithm is selected based on CPU support, going\ndown the list:\n\n1. Vectorized implementation with 256-bit AVX vectors, if AVX2 is supported.\n2. Vectorized implementation with 128-bit SSE vectors, if SSE4.1 is supported.\n3. Scalar implementation.\n\nCurrently, vectorized SIMD implementations are only available for x86 or x86-64 CPUs. However,\nafter compiling this library on a machine that supports those SIMD intrinsics, the library can\nbe used on other machines.\nAdditionally, the internal data structure for storing vectors and the bit width of the values\nin the vectors are selected at runtime for maximum efficiency and accuracy, given the lengths\nof the input strings.\n\n## Limitations\nDue to the use of SIMD intrinsics, only binary strings that are represented with `u8` bytes\nare supported. Unicode strings are not currently supported.\n\n## Examples\n`triple_accel` provides a very simple and easy to use framework for common edit distance operations.\nCalculating the Hamming distance (number of mismatches) between two strings is extremely simple:\n```Rust\nuse triple_accel::*;\n\nlet a = b\"abcd\";\nlet b = b\"abcc\";\n\nlet dist = hamming(a, b);\nassert!(dist == 1);\n```\nBy default, SIMD will be used if possible.\nSimilarly, we can easily calculate the Levenshtein distance (character mismatches and gaps all have\na cost of 1) between two strings with the following code:\n```Rust\nuse triple_accel::*;\n\nlet a = b\"abc\";\nlet b = b\"abcd\";\n\nlet dist = levenshtein_exp(a, b);\nassert!(dist == 1);\n```\nThis uses exponential search to estimate the number of edits between `a` and `b`, which makes it\nmore efficient than the alternative `levenshtein` function when the number of edits between `a`\nand `b` is low.\n\nIn addition to edit distance routines, `triple_accel` also provides search routines. These routines\nreturn an iterator over matches that indicate where the `needle` string matches the `haystack` string.\n`triple_accel` will attempt to maximize the length of matches that end at the same position and remove\nshorter matches when some matches fully overlap.\n```Rust\nuse triple_accel::*;\n\nlet needle = b\"helllo\";\nlet haystack = b\"hello world\";\n\nlet matches: Vec\u003cMatch\u003e = levenshtein_search(needle, haystack).collect();\n// note: start index is inclusive, end index is exclusive!\nassert!(matches == vec![Match{start: 0, end: 5, k: 1}]);\n```\nSometimes, it is necessary to use the slightly lower level, but also more powerful routines that\n`triple_accel` provides. For example, it is possible to allow transpositions (character swaps) that\nhave a cost of 1, in addition to mismatches and gaps:\n```Rust\nuse triple_accel::levenshtein::*;\n\nlet a = b\"abcd\";\nlet b = b\"abdc\";\nlet k = 2; // upper bound on allowed cost\nlet trace_on = false; // return edit traceback?\n\nlet dist = levenshtein_simd_k_with_opts(a, b, k, trace_on, RDAMERAU_COSTS);\n// note: dist may be None if a and b do not match within a cost of k\nassert!(dist.unwrap().0 == 1);\n```\nDon't let the name of the function fool you! `levenshtein_simd_k_with_opts` will still fall back to\nthe scalar implementation if AVX2 or SSE4.1 support is not available. It just prefers to use SIMD\nwhere possible.\n\nFor most common cases, the re-exported functions are enough, and the low level functions do not\nhave to be used directly.\n\n## License\n[MIT](LICENSE)\n\n## Contributing\nRead the contributing guidelines [here](CONTRIBUTING.md).\n\n## Code of Conduct\nRead the code of conduct [here](CODE_OF_CONDUCT.md).\n\n## Why the name \"triple_accel\"?\nBecause \"Time Altar - Triple Accel\" is a magical ability used by Kiritsugu Emiya to boost his speed\nand reaction time in Fate/Zero. There are also some other references to the Fate series...\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDaniel-Liu-c0deb0t%2Ftriple_accel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDaniel-Liu-c0deb0t%2Ftriple_accel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDaniel-Liu-c0deb0t%2Ftriple_accel/lists"}