{"id":13411690,"url":"https://github.com/hbollon/go-edlib","last_synced_at":"2025-04-08T08:12:35.301Z","repository":{"id":44138310,"uuid":"288412767","full_name":"hbollon/go-edlib","owner":"hbollon","description":"📚 String comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...","archived":false,"fork":false,"pushed_at":"2022-07-03T15:35:18.000Z","size":78,"stargazers_count":505,"open_issues_count":1,"forks_count":26,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-04-01T05:34:00.605Z","etag":null,"topics":["algorithms","cosine","damerau-levenshtein","edit-distance","edit-distance-algorithms","go","golang","golang-string-comparison","hamming","jaro-winkler","lcs","lcs-distance","levenshtein","levenshtein-distance","similarity-measures","string-comparison","string-distance","string-matching","unicode"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hbollon.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":["hbollon"],"ko_fi":"hugobollon","custom":["paypal.me/hugobollon"]}},"created_at":"2020-08-18T09:30:59.000Z","updated_at":"2025-03-20T16:34:50.000Z","dependencies_parsed_at":"2022-09-18T00:00:23.271Z","dependency_job_id":null,"html_url":"https://github.com/hbollon/go-edlib","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hbollon%2Fgo-edlib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hbollon%2Fgo-edlib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hbollon%2Fgo-edlib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hbollon%2Fgo-edlib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hbollon","download_url":"https://codeload.github.com/hbollon/go-edlib/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247801169,"owners_count":20998339,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","cosine","damerau-levenshtein","edit-distance","edit-distance-algorithms","go","golang","golang-string-comparison","hamming","jaro-winkler","lcs","lcs-distance","levenshtein","levenshtein-distance","similarity-measures","string-comparison","string-distance","string-matching","unicode"],"created_at":"2024-07-30T20:01:15.813Z","updated_at":"2025-04-08T08:12:35.265Z","avatar_url":"https://github.com/hbollon.png","language":"Go","funding_links":["https://github.com/sponsors/hbollon","https://ko-fi.com/hugobollon","paypal.me/hugobollon"],"categories":["数据结构与算法","Data Structures","Data Structures and Algorithms","Data Integration Frameworks","Uncategorized","Go","数据结构`go语言实现的数据结构与算法`","数据结构","Generators"],"sub_categories":["文本分析","Advanced Console UIs","Standard CLI","Text Analysis","标准 CLI"],"readme":"\u003ch1 align=\"center\"\u003eGo-edlib : Edit distance and string comparison library\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href='https://coveralls.io/github/hbollon/go-edlib?branch=master'\u003e\n    \u003cimg src='https://coveralls.io/repos/github/hbollon/go-edlib/badge.svg?branch=master' alt='Coverage Status' /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://goreportcard.com/report/github.com/hbollon/go-edlib\" target=\"_blank\"\u003e\n    \u003cimg alt=\"Go Report Card\" src=\"https://goreportcard.com/badge/github.com/hbollon/go-edlib\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/hbollon/go-edlib/blob/master/LICENSE.md\" target=\"_blank\"\u003e\n    \u003cimg alt=\"License: MIT\" src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pkg.go.dev/github.com/hbollon/go-edlib\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://pkg.go.dev/badge/github.com/hbollon/go-edlib\" alt=\"PkgGoDev\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003e Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...\n\n---\n\n## Table of Contents\n\n- [Requirements](#requirements)\n- [Introduction](#introduction)\n- [Features](#features)\n- [Installation](#installation)\n- [Benchmarks](#benchmarks)\n- [Documentation](#documentation)\n- [Examples](#examples)\n- [Author](#author)\n- [Contributing](#-contributing)\n- [License](#-license)\n\n\n---\n\n## Requirements\n- [Go](https://golang.org/doc/install) (v1.13+)\n\n## Introduction\nGolang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra! \u003cbr\u003e\nDesigned to be fully compatible with Unicode characters!\u003cbr\u003e\nThis library is 100% test covered 😁\n\n## Features\n\n- [Levenshtein](https://en.wikipedia.org/wiki/Levenshtein_distance)\n- [LCS](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem) (Longest common subsequence) with edit distance, backtrack and diff functions\n- [Hamming](https://en.wikipedia.org/wiki/Hamming_distance)\n- [Damerau-Levenshtein](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance), with following variants:\n  - OSA (Optimal string alignment)\n  - Adjacent transpositions\n- [Jaro \u0026 Jaro-Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) similarity algorithms\n- [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity)\n- [Jaccard Index](https://en.wikipedia.org/wiki/Jaccard_index)\n- [QGram](https://en.wikipedia.org/wiki/N-gram)\n- [Sorensen-Dice](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient)\n- Computed similarity percentage functions based on all available edit distance algorithms in this lib\n- Fuzzy search functions based on edit distance with unique or multiples strings output\n- Unicode compatibility 🥳\n\n## Benchmarks\nYou can check an interactive Google chart with few benchmark cases for all similarity algorithms in this library through **StringsSimilarity** function [here](http://benchgraph.codingberg.com/q5)\n\nHowever, if you want or need more details, you can also viewing benchmark raw output [here](https://github.com/hbollon/go-edlib/blob/master/tests/outputs/benchmarks.txt), which also includes memory allocations and test cases output (similarity result and errors).\n\nIf you are on Linux and want to run them on your setup, you can run ``` ./tests/benchmark.sh ``` script.\n\n## Installation\nOpen bash into your project folder and run:\n\n```bash\ngo get github.com/hbollon/go-edlib\n```\n\nAnd import it into your project:\n\n```go\nimport (\n\t\"github.com/hbollon/go-edlib\"\n)\n```\n\n### Run tests\nIf you are on Linux and want to run all unit tests just run ``` ./tests/tests.sh ``` script. \n\nFor Windows users you can run:\n\n```bash\ngo test ./... # Add desired parameters to this command if you want\n```\n\n## Documentation\n\n**You can find all the documentation here :** [Documentation](https://godoc.org/github.com/hbollon/go-edlib) \n\n## Examples\n\n### Calculate string similarity index between two string\n\nYou can use ``` StringSimilarity(str1, str2, algorithm) ``` function.\n**algorithm** parameter must one of the following constants: \n```go\n// Algorithm identifiers\nconst (\n\tLevenshtein Algorithm = iota\n\tDamerauLevenshtein\n\tOSADamerauLevenshtein\n\tLcs\n\tHamming\n\tJaro\n\tJaroWinkler\n\tCosine\n)\n```\n\nExample with levenshtein:\n```go\nres, err := edlib.StringsSimilarity(\"string1\", \"string2\", edlib.Levenshtein)\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"Similarity: %f\", res)\n}\n```\n\n### Execute fuzzy search based on string similarity algorithm\n\n#### 1. Most matching unique result without threshold\n\nYou can use ``` FuzzySearch(str, strList, algorithm) ``` function.\n\n```go\nstrList := []string{\"test\", \"tester\", \"tests\", \"testers\", \"testing\", \"tsting\", \"sting\"}\nres, err := edlib.FuzzySearch(\"testnig\", strList, edlib.Levenshtein)\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"Result: %s\", res)\n}\n\n```\n\n``` \nResult: testing \n```\n\n#### 2. Most matching unique result with threshold\n\nYou can use ``` FuzzySearchThreshold(str, strList, minSimilarity, algorithm) ``` function.\n\n```go\nstrList := []string{\"test\", \"tester\", \"tests\", \"testers\", \"testing\", \"tsting\", \"sting\"}\nres, err := edlib.FuzzySearchThreshold(\"testnig\", strList, 0.7, edlib.Levenshtein)\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"Result for 'testnig': %s\", res)\n}\n\nres, err = edlib.FuzzySearchThreshold(\"hello\", strList, 0.7, edlib.Levenshtein)\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"Result for 'hello': %s\", res)\n}\n\n```\n\n``` \nResult for 'testnig': testing\nResult for 'hello':\n```\n\n#### 3. Most matching result set without threshold\n\nYou can use ``` FuzzySearchSet(str, strList, resultQuantity, algorithm) ``` function.\n\n```go\nstrList := []string{\"test\", \"tester\", \"tests\", \"testers\", \"testing\", \"tsting\", \"sting\"}\nres, err := edlib.FuzzySearchSet(\"testnig\", strList, 3, edlib.Levenshtein)\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"Results: %s\", strings.Join(res, \", \"))\n}\n\n```\n\n``` \nResults: testing, test, tester \n```\n\n#### 4. Most matching result set with threshold\n\nYou can use ``` FuzzySearchSetThreshold(str, strList, resultQuantity, minSimilarity, algorithm) ``` function.\n\n```go\nstrList := []string{\"test\", \"tester\", \"tests\", \"testers\", \"testing\", \"tsting\", \"sting\"}\nres, err := edlib.FuzzySearchSetThreshold(\"testnig\", strList, 3, 0.5, edlib.Levenshtein)\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"Result for 'testnig' with '0.5' threshold: %s\", strings.Join(res, \" \"))\n}\n\nres, err = edlib.FuzzySearchSetThreshold(\"testnig\", strList, 3, 0.7, edlib.Levenshtein)\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"Result for 'testnig' with '0.7' threshold: %s\", strings.Join(res, \" \"))\n}\n\n```\n\n``` \nResult for 'testnig' with '0.5' threshold: testing test tester\nResult for 'testnig' with '0.7' threshold: testing\n```\n\n### Get raw edit distance (Levenshtein, LCS, Damerau–Levenshtein, Hamming)\n\nYou can use one of the following function to get an edit distance between two strings :\n- [LevenshteinDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#LevenshteinDistance)(str1, str2)\n- [DamerauLevenshteinDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#DamerauLevenshteinDistance)(str1, str2)\n- [OSADamerauLevenshteinDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#OSADamerauLevenshteinDistance)(str1, str2)\n- [LCSEditDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#LCSEditDistance)(str1, str2)\n- [HammingDistance](https://pkg.go.dev/github.com/hbollon/go-edlib#HammingDistance)(str1, str2)\n\nExample with Levenshtein distance:\n```go\nres := edlib.LevenshteinDistance(\"kitten\", \"sitting\")\nfmt.Printf(\"Result: %d\", res)\n```\n\n```\nResult: 3\n```\n\n### LCS, LCS Backtrack and LCS Diff\n#### 1. Compute LCS(Longuest Common Subsequence) between two strings\n\nYou can use ``` LCS(str1, str2) ``` function.\n\n```go\nlcs := edlib.LCS(\"ABCD\", \"ACBAD\")\nfmt.Printf(\"Length of their LCS: %d\", lcs)\n```\n\n```\nLength of their LCS: 3\n```\n\n#### 2. Backtrack their LCS\n\nYou can use ``` LCSBacktrack(str1, str2) ``` function.\n\n```go\nres, err := edlib.LCSBacktrack(\"ABCD\", \"ACBAD\")\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"LCS: %s\", res)\n}\n```\n\n```\nLCS: ABD\n```\n\n#### 3. Backtrack all their LCS\n\nYou can use ``` LCSBacktrackAll(str1, str2) ``` function.\n\n```go\nres, err := edlib.LCSBacktrackAll(\"ABCD\", \"ACBAD\")\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"LCS: %s\", strings.Join(res, \", \"))\n}\n```\n\n```\nLCS: ABD, ACD\n```\n\n#### 4. Get LCS Diff between two strings\n\nYou can use ``` LCSDiff(str1, str2) ``` function.\n\n```go\nres, err := edlib.LCSDiff(\"computer\", \"houseboat\")\nif err != nil {\n  fmt.Println(err)\n} else {\n  fmt.Printf(\"LCS: \\n%s\\n%s\", res[0], res[1])\n}\n```\n\n```\nLCS Diff: \n h c o m p u s e b o a t e r\n + -   - -   + + + + +   - -\n```\n\n## Author\n\n👤 **Hugo Bollon**\n\n* Github: [@hbollon](https://github.com/hbollon)\n* LinkedIn: [@Hugo Bollon](https://www.linkedin.com/in/hugo-bollon-68a2381a4/)\n* Portfolio: [hugobollon.me](https://www.hugobollon.me)\n\n## 🤝 Contributing\n\nContributions, issues and feature requests are welcome!\u003cbr /\u003eFeel free to check [issues page](https://github.com/hbollon/go-edlib/issues). \n\n## Show your support\n\nGive a ⭐️ if this project helped you!\n\n## 📝 License\n\nCopyright © 2020 [Hugo Bollon](https://github.com/hbollon).\u003cbr /\u003e\nThis project is [MIT License](https://github.com/hbollon/go-edlib/blob/master/LICENSE.md) licensed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhbollon%2Fgo-edlib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhbollon%2Fgo-edlib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhbollon%2Fgo-edlib/lists"}