{"id":23029649,"url":"https://github.com/antononcube/raku-math-distancefunctions-edit","last_synced_at":"2025-06-17T08:04:19.241Z","repository":{"id":251642756,"uuid":"837981291","full_name":"antononcube/Raku-Math-DistanceFunctions-Edit","owner":"antononcube","description":"Raku package of fast Demerau-Levenshtein distance functions based on C code via NativeCall.","archived":false,"fork":false,"pushed_at":"2024-09-11T16:00:08.000Z","size":52,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-17T08:04:17.050Z","etag":null,"topics":["damerau-levenshtein","damerau-levenshtein-distance","distance-function","edit-distance","raku","rakulang"],"latest_commit_sha":null,"homepage":"https://raku.land/zef:antononcube/Math::DistanceFunctions::Edit","language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antononcube.png","metadata":{"files":{"readme":"README-work.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-04T15:54:53.000Z","updated_at":"2024-11-25T18:53:18.000Z","dependencies_parsed_at":"2024-09-11T22:42:32.873Z","dependency_job_id":null,"html_url":"https://github.com/antononcube/Raku-Math-DistanceFunctions-Edit","commit_stats":null,"previous_names":["antononcube/raku-math-distancefunctions-edit"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/antononcube/Raku-Math-DistanceFunctions-Edit","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Math-DistanceFunctions-Edit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Math-DistanceFunctions-Edit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Math-DistanceFunctions-Edit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Math-DistanceFunctions-Edit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antononcube","download_url":"https://codeload.github.com/antononcube/Raku-Math-DistanceFunctions-Edit/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-Math-DistanceFunctions-Edit/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260318647,"owners_count":22991116,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["damerau-levenshtein","damerau-levenshtein-distance","distance-function","edit-distance","raku","rakulang"],"created_at":"2024-12-15T14:16:32.750Z","updated_at":"2025-06-17T08:04:19.213Z","avatar_url":"https://github.com/antononcube.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Math::DistanceFunctions::Edit\n\n[![Actions Status](https://github.com/antononcube/Raku-Math-DistanceFunctions-Edit/actions/workflows/linux.yml/badge.svg)](https://github.com/antononcube/Raku-Math-DistanceFunctions-Edit/actions)\n[![Actions Status](https://github.com/antononcube/Raku-Math-DistanceFunctions-Edit/actions/workflows/macos.yml/badge.svg)](https://github.com/antononcube/Raku-Math-DistanceFunctions-Edit/actions)\n\u003c!--- [![Actions Status](https://github.com/antononcube/Raku-Math-DistanceFunctions-Edit/actions/workflows/windows.yml/badge.svg)](https://github.com/antononcube/Raku-Math-DistanceFunctions-Edit/actions) --\u003e\n\n\u003c!--- [![](https://raku.land/zef:antononcube/Math::DistanceFunctions::Edit/badges/version)](https://raku.land/zef:antononcube/Math::DistanceFunctions::Edit) --\u003e\n[![License: Artistic-2.0](https://img.shields.io/badge/License-Artistic%202.0-0298c3.svg)](https://opensource.org/licenses/Artistic-2.0)\n\nRaku package of fast Damerau-Levenshtein distance functions based on C code via \"NativeCall\".\n\nFor a pure Raku implementation see [\"Text::Levenshtein::Damerau\"](https://raku.land/github:ugexe/Text::Levenshtein::Damerau), [NLp1].\n\n-----\n\n## Usage examples\n\nThe main function provided by this package is `edit-distance`. \nHere is comparison invocation with `dld` from \"Text::Levenshtein::Damerau\" \nover two string arguments:\n\n```perl6\nuse Math::DistanceFunctions::Edit;\nuse Text::Levenshtein::Damerau;\n\nmy ($w1, $w2) = ('examples', 'samples');\nsay 'edit-distance : ', edit-distance($w1, $w2);\nsay 'dld           : ', dld($w1, $w2);\n```\n\nVectors of integers, booleans, or strings can be also used:\n\n```perl6\nedit-distance(\u003cbark alma area arc\u003e, \u003cArc alma area\u003e):ignore-case;\n```\n\n```perl6\nedit-distance([True, False, False, True], [True, False, False]);\n```\n\n**Remark:** Currently, elements of integer lists are converted to `int32`. \nIf larger integers are used then convert to `Str` first.\n\n-----\n\n## Motivation\n\nThe motivation for making this package was the slow performance of the DSL translation functions in the package\n[\"DSL::Translators\"](https://github.com/antononcube/Raku-DSL-Translators), [AAp1].\nAfter profiling, it turned out about 50% of the time is spent in the function `dld` by \"Text::Levenshtein::Demerau\". \n\nThat is the case because of the fuzzy marching which \"DSL::Translators\" does:\n\n```perl6\nuse DSL::Translators;\n\ndsl-translation('use @dfTitanic; group by sex; show couns;', to =\u003e 'Raku')\u003cCODE\u003e\n```\n\nThe slowdown effect of the \"expensive\" to compute results by `dld` can be addressed by:\n\n- Certain clever checks can be made before invoking `dld`.\n- Create a new function called `edit-distance` in C and set up a \"NativeCall\" connection to it.\n\nSo, at this point, both approaches were taken: the first in \"DSL::Shared\", [AAp2], the second by \"Math::DistanceFunctions::Edit\".\n\n-----\n\n## Implementation\n\nThe design of \"NativeCall\" hook-up is taken from [\"Algorithm::KdTree\"](https://raku.land/github:titsuki/Algorithm::KdTree), [ITp1].\n\nThe actual C-implementation was made by several iterations of LLM code generation.\n\nI considered re-programming to C the Raku code of `dld` in [NLp1], but since\n[Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau–Levenshtein_distance) is a \n[very well known, popular topic](https://rosettacode.org/wiki/Levenshtein_distance) \nLLM generations with simple prompts were used.\n\n(And, yes, I read the code and tested it.)\n\n-----\n\n## Profiling and performance\n\nSince the speed is the most important reason for this package, after its complete initial version,\nprofiling was done each refactoring step. See the file [\"faster-word-distances.raku\"](./examples/faster-word-distances.raku).\n\n- For ASCII (non-UTF-8) strings `edit-distance` is ≈70 times faster than `dld`.\n- For UTF-8 strings ≈5 times faster.\n\nHere is en example output of the normalized profiling times done with the script \"faster-word-distances.raku\":\n\n```\nStrDistance =\u003e 1\ndld =\u003e 0.847204294559419\nedit-distance =\u003e 0.011560672845434399\nrosetta =\u003e 2.5342606961356466\nsift =\u003e 0.021171925438510746\n```\n\n**Remark:** The timing of Raku's built-in [`StrDistance`](https://docs.raku.org/type/StrDistance) is used to normalize the rest of the timings.  \n\n**Remark:** In the profiling also `sift4` from [\"Text::Diff::Sift4\"](https://raku.land/github:MasterDuke17/Text::Diff::Sift4), [MDp1], was used. \n(NQP-based implementation.)\n\n-----\n\n## References\n\n[AAp1] Anton Antonov,\n[DSL::Translators Raku package](https://github.com/antononcube/Raku-DSL-Translators),\n(2020-2024),\n[GitHub/antononcube](https://github.com/antononcube/).\n\n[AAp2] Anton Antonov,\n[DSL::Shared Raku package](https://github.com/antononcube/Raku-Shared),\n(2020-2024),\n[GitHub/antononcube](https://github.com/antononcube/).\n\n[ITp1] Itsuki Toyota,\n[Algorithm::KdTree Raku package](https://github.com/titsuki/p6-Algorithm-KdTree),\n(2016-2024),\n[GitHub/titsuki](https://github.com/titsuki).\n\n[MDp1] MaterDuke17,\n[Text::Diff::Sift4 Raku package](https://github.com/MasterDuke17/Text-Diff-Sift4),\n(2016-2021),\n[GitHub/MaterDuke17](https://github.com/MasterDuke17).\n\n[NLp1] Nick Logan,\n[Text::Levenshtein::Damerau Raku package](https://github.com/ugexe/Raku-Text--Levenshtein--Damerau),\n(2016-2022),\n[GitHub/ugexe](https://github.com/ugexe/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-math-distancefunctions-edit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantononcube%2Fraku-math-distancefunctions-edit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-math-distancefunctions-edit/lists"}