{"id":13645381,"url":"https://github.com/hyperjumptech/beda","last_synced_at":"2025-05-14T15:30:39.509Z","repository":{"id":54546047,"uuid":"267521753","full_name":"hyperjumptech/beda","owner":"hyperjumptech","description":"Beda is a golang library for detecting how similar a two string","archived":false,"fork":false,"pushed_at":"2021-02-11T14:01:11.000Z","size":21,"stargazers_count":54,"open_issues_count":1,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-02T20:05:53.019Z","etag":null,"topics":["difference","go","golang","string-distance","string-matching","string-similarity"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hyperjumptech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-2.0.txt","code_of_conduct":"CODE_OF_CONDUCTS.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-28T07:23:36.000Z","updated_at":"2025-03-21T16:29:01.000Z","dependencies_parsed_at":"2022-08-13T19:20:15.404Z","dependency_job_id":null,"html_url":"https://github.com/hyperjumptech/beda","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyperjumptech%2Fbeda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyperjumptech%2Fbeda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyperjumptech%2Fbeda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hyperjumptech%2Fbeda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hyperjumptech","download_url":"https://codeload.github.com/hyperjumptech/beda/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254171543,"owners_count":22026459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["difference","go","golang","string-distance","string-matching","string-similarity"],"created_at":"2024-08-02T01:02:34.270Z","updated_at":"2025-05-14T15:30:38.687Z","avatar_url":"https://github.com/hyperjumptech.png","language":"Go","readme":"# BEDA\r\n\r\n[![Build Status](https://travis-ci.org/hyperjumptech/beda.svg?branch=master)](https://travis-ci.org/hyperjumptech/beda)\r\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\r\n\r\n## Get BEDA\r\n\r\n```\r\ngo get github.com/hyperjumptech/beda\r\n```\r\n\r\n## Introduction \r\n\r\n**BEDA** is a golang library to detect differences or similarities between two words or string.\r\nSome time you want to detect whether a string is \"the same\" or \"somehow similar to\" another string. \r\nSuppose your system wants to detect whenever the user is putting bad-word as their user name, or \r\nto forbid them from using unwanted words in their postings. You need to implement some, *not so easy* , \r\nalgorithm to do this task.\r\n\r\n**BEDA**  contains implementation of algorithm for detecting word differences. They are \r\n\r\n1. Levenshtein Distance :  A string metric for measuring the difference between two sequences. [Wikipedia](https://en.wikipedia.org/wiki/Levenshtein_distance)\r\n2. Trigram or n-gram : A contiguous sequence of n items from a given sample of text or speech. [Wikipedia](https://en.wikipedia.org/wiki/N-gram)\r\n3. Jaro \u0026 Jaro Winkler Distance : A string metric measuring an edit distance between two sequences. [Wikipedia](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)\r\n\r\n**BEDA** is an Indonesia word for \"different\". \r\n\r\n## Usage\r\n\r\n```go\r\nimport \"github.com/hyperjumptech/beda\"\r\n\r\nsd := beda.NewStringDiff(\"The First String\", \"The Second String\")\r\nlDist := sd.LevenshteinDistance()\r\ntDiff := sd.TrigramCompare()\r\njDiff := sd.JaroDistance()\r\njwDiff := sd.JaroWinklerDistance(0.1)\r\n\r\nfmt.Printf(\"Levenshtein Distance is %d \\n\", lDist)\r\nfmt.Printf(\"Trigram Compare is is %f \\n\", lDist)\r\nfmt.Printf(\"Jaro Distance is is %d \\n\", jDiff)\r\nfmt.Printf(\"Jaro Wingkler Distance is %d \\n\", jwDiff)\r\n```\r\n\r\n## Algorithms and APIs\r\n\r\nString comparison is not so easy. \r\nThere are a couple of algorithm to do this comparison, and each of them yield different result. \r\nThus may suited for one purposses compared to the other. \r\n\r\nTo understand how and when or which algorithm should benefit your string comparisson quest,\r\nPlease read this [String similarity algorithms compared](https://medium.com/@appaloosastore/string-similarity-algorithms-compared-3f7b4d12f0ff).\r\nRead them through, they will help you, a lot.\r\n\r\n```go\r\ntype StringDiff struct {\r\n    S1 string\r\n\tS2 string\r\n}\r\n```\r\n\r\n### Levenshtein Distance\r\n\r\nLevenshteinDistance is the minimum number of single-character edits\r\nrequired to change one word into the other, so the result is a positive\r\ninteger. The algorithm is sensitive to string length. Which make it more difficult to draw pattern.\r\n\r\nReading :\r\n\r\n- [https://github.com/mhutter/string-similarity](https://github.com/mhutter/string-similarity)\r\n- [https://en.wikipedia.org/wiki/Levenshtein_distance](https://en.wikipedia.org/wiki/Levenshtein_distance)\r\n\r\nAPI :\r\n\r\n```go\r\nfunc LevenshteinDistance(s1, s2 string) int\r\nfunc (sd *StringDiff) LevenshteinDistance() int\r\n```\r\n\r\n`s1` is the first string to compare\u003cbr\u003e\r\n`s2` is the second string to compare\u003cbr\u003e\r\nThe closer return value to 0 means the more similar the two words.\r\n\r\nExample :\r\n\r\n```go\r\nsd := beda.NewStringDiff(\"abcd\", \"bc\")\r\nlDist := sd.LevenshteinDistance()\r\nfmt.Printf(\"Distance is %d \\n\", lDist) // prints : Distance is 2\r\n```\r\n\r\nor\r\n\r\n```go\r\nfmt.Printf(\"Distance is %d \\n\", beda.LevenshteinDistance(\"abcd\", \"bc\"))\r\n```\r\n\r\n\r\n### Damerau-Levenshtein Distance\r\n\r\n(From [Wikipedia](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance))\r\nDamerau-Levenshtein Distance is a string metric for measuring the edit distance between two \r\nsequences. Informally, the Damerau–Levenshtein distance between two words is the minimum \r\nnumber of operations (consisting of insertions, deletions or substitutions of a single \r\ncharacter, or transposition of two adjacent characters) required to change one word into the other.\r\n\r\nThe Damerau–Levenshtein distance differs from the classical Levenshtein distance by \r\nincluding transpositions among its allowable operations in addition to the three classical \r\nsingle-character edit operations (insertions, deletions and substitutions).\r\n\r\nReading :\r\n\r\n- [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance)\r\n\r\nAPI :\r\n\r\n```go\r\nfunc DamerauLevenshteinDistance(s1, s2 string) int\r\nfunc (sd *StringDiff) DamerauLevenshteinDistance(deleteCost, insertCost, replaceCost, swapCost int) int\r\n```\r\n\r\n`func DamerauLevenshteinDistance` take 2 arguments,\u003cbr\u003e\r\n`s1` is the first string to compare\u003cbr\u003e\r\n`s2` is the second string to compare\u003cbr\u003e\r\nThe closer return value to 0 means the more similar the two words.\r\nThis function uses the default value of 1 for all `deleteCost`, `insertCost`, `replaceCost` and `swapCost`\r\n\r\n`func (sd *StringDiff) DamerauLevenshteinDistance` takes 4 arguments,\u003cbr\u003e\r\n`deleteCost` is multiplier factor for delete operation\u003cbr\u003e\r\n`insertCost` is multiplier factor for insert operation\u003cbr\u003e\r\n`replaceCost` is multiplier factor for replace operation\u003cbr\u003e\r\n`swapCost` is multiplier factor for swap operation\u003cbr\u003e\r\nA multiplier value enable us to weight on how impactful each of the operation \r\ncontributing to the change distance.\r\n\r\n\r\nExample :\r\n\r\n```go\r\nsd := beda.NewStringDiff(\"abcd\", \"bc\")\r\nlDist := sd.DamerauLevenshteinDistance(1,1,1,1)\r\nfmt.Printf(\"Distance is %d \\n\", lDist) // prints : Distance is 2\r\n```\r\n\r\nor\r\n\r\n```go\r\nfmt.Printf(\"Distance is %d \\n\", beda.DamerauLevenshteinDistance(\"abcd\", \"bc\"))\r\n```\r\n\r\n\r\n### TriGram Compare\r\n\r\nTrigramCompare  is a case of n-gram, a contiguous sequence of n (three, in this case) items from a given sample.\r\nIn our case, an application name is a sample and a character is an item.\r\n\r\nReading:\r\n\r\n- [https://github.com/milk1000cc/trigram/blob/master/lib/trigram.rb](https://github.com/milk1000cc/trigram/blob/master/lib/trigram.rb)\r\n- [http://search.cpan.org/dist/String-Trigram/Trigram.pm](http://search.cpan.org/dist/String-Trigram/Trigram.pm)\r\n- [https://en.wikipedia.org/wiki/N-gram](https://en.wikipedia.org/wiki/N-gram)\r\n\r\nAPI :\r\n\r\n```go\r\nfunc TrigramCompare(s1, s2 string) float32\r\nfunc (sd *StringDiff) TrigramCompare() float32\r\n```\r\n\r\n`s1` is the first string to compare\u003cbr\u003e\r\n`s2` is the second string to compare\u003cbr\u003e\r\nThe closer the result to 1 (one) means that the word is closer 100% similarities in 3 grams sequence.\r\n\r\nExample :\r\n\r\n```go\r\nsd := beda.NewStringDiff(\"martha\", \"marhta\")\r\ndiff := sd.TrigramCompare()\r\nfmt.Printf(\"Differences is %f \\n\", diff) \r\n```\r\n\r\nor\r\n\r\n```go\r\nfmt.Printf(\"Distance is %f \\n\", beda.TrigramCompare(\"martha\", \"marhta\"))\r\n```\r\n\r\n### Jaro Distance\r\n\r\nJaroDistance distance between two words is the minimum number\r\nof single-character transpositions required to change one word\r\ninto the other.\r\n\r\nAPI :\r\n\r\n```go\r\nfunc JaroDistance(s1, s2 string) float32\r\nfunc (sd *StringDiff) JaroDistance() float32\r\n```\r\n\r\n`s1` is the first string to compare\u003cbr\u003e\r\n`s2` is the second string to compare\u003cbr\u003e\r\nThe closer the result to 1 (one) means that the word is closer 100% similarities\r\n\r\nExample :\r\n\r\n```go\r\nsd := beda.NewStringDiff(\"martha\", \"marhta\")\r\ndiff := sd.JaroDistance()\r\nfmt.Printf(\"Differences is %f \\n\", diff) \r\n```\r\n\r\nor\r\n\r\n```go\r\nfmt.Printf(\"Distance is %f \\n\", beda.JaroDistance(\"martha\", \"marhta\"))\r\n```\r\n\r\n### Jaro Wingkler Distance\r\n\r\nJaroWinklerDistance uses a prefix scale which gives more\r\nfavourable ratings to strings that match from the beginning\r\nfor a set prefix length\r\n\r\nReading :\r\n\r\n- [https://github.com/flori/amatch](https://github.com/flori/amatch)\r\n- [https://fr.wikipedia.org/wiki/Distance_de_Jaro-Winkler](https://fr.wikipedia.org/wiki/Distance_de_Jaro-Winkler)\r\n- [https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance)\r\n\r\nAPI : \r\n\r\n```go\r\nfunc JaroWinklerDistance(s1, s2 string) float32\r\nfunc (sd *StringDiff) JaroWinklerDistance(p float32) float32\r\n```\r\n\r\nor\r\n\r\n```go\r\nfmt.Printf(\"Distance is %f \\n\", beda.JaroWinklerDistance(\"martha\", \"marhta\"))\r\n```\r\n\r\n`s1` is the first string to compare\u003cbr\u003e\r\n`s2` is the second string to compare\u003cbr\u003e\r\n`p` argument is constant scaling factor for how much the score is adjusted upwards for having common prefixes.\r\nThe standard value for this constant in Winkler’s work is `p = 0.1`\r\n\r\nThe closer the result to 1 (one) means that the word is closer 100% similarities\r\n\r\nExample :\r\n\r\n```go\r\nsd := beda.NewStringDiff(\"martha\", \"marhta\")\r\ndiff := sd.JaroWinklerDistance(0.1)\r\nfmt.Printf(\"Differences is %f \\n\", diff) \r\n```\r\n\r\n# Tasks and Help Wanted.\r\n\r\nYes. We need contributor to make **BEDA** even better and useful to Open Source Community.\r\n\r\nIf you really want to help us, simply `Fork` the project and apply for Pull Request.\r\nPlease read our [Contribution Manual](CONTRIBUTING.md) and [Code of Conduct](CODE_OF_CONDUCTS.md)","funding_links":[],"categories":["Go"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyperjumptech%2Fbeda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhyperjumptech%2Fbeda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhyperjumptech%2Fbeda/lists"}