{"id":22509172,"url":"https://github.com/adrg/strutil","last_synced_at":"2025-05-14T06:14:11.382Z","repository":{"id":38317666,"uuid":"221729455","full_name":"adrg/strutil","owner":"adrg","description":"Go metrics for calculating string similarity and other string utility functions","archived":false,"fork":false,"pushed_at":"2025-03-25T13:43:40.000Z","size":112,"stargazers_count":371,"open_issues_count":1,"forks_count":24,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-13T08:58:16.995Z","etag":null,"topics":["dice-coefficient","golang","hamming-distance","jaccard","jaccard-index","jaccard-similarity","jaro","jaro-winkler","levenshtein","n-gram","n-gram-intersection","overlap-coefficient","smith-waterman","smith-waterman-gotoh","sorensen-dice","string","string-distance","string-matching","string-metrics","string-similarity"],"latest_commit_sha":null,"homepage":"https://pkg.go.dev/github.com/adrg/strutil","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adrg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"ko_fi":"adrg"}},"created_at":"2019-11-14T15:30:07.000Z","updated_at":"2025-04-07T13:06:06.000Z","dependencies_parsed_at":"2023-09-27T21:18:06.085Z","dependency_job_id":"5d2d3094-9534-46c4-b359-5d5a976fe354","html_url":"https://github.com/adrg/strutil","commit_stats":{"total_commits":89,"total_committers":2,"mean_commits":44.5,"dds":0.1235955056179775,"last_synced_commit":"3a48a1777316e7709308be8f134d3fa15aad05f2"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrg%2Fstrutil","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrg%2Fstrutil/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrg%2Fstrutil/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrg%2Fstrutil/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adrg","download_url":"https://codeload.github.com/adrg/strutil/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254083857,"owners_count":22011902,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dice-coefficient","golang","hamming-distance","jaccard","jaccard-index","jaccard-similarity","jaro","jaro-winkler","levenshtein","n-gram","n-gram-intersection","overlap-coefficient","smith-waterman","smith-waterman-gotoh","sorensen-dice","string","string-distance","string-matching","string-metrics","string-similarity"],"created_at":"2024-12-07T01:27:54.441Z","updated_at":"2025-05-14T06:14:11.357Z","avatar_url":"https://github.com/adrg.png","language":"Go","readme":"\u003ch1 align=\"center\"\u003estrutil\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/adrg/strutil/actions/workflows/tests.yml\"\u003e\n        \u003cimg alt=\"Tests status\" src=\"https://github.com/adrg/strutil/actions/workflows/tests.yml/badge.svg\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://codecov.io/gh/adrg/strutil\"\u003e\n        \u003cimg alt=\"Code coverage\" src=\"https://codecov.io/gh/adrg/strutil/branch/master/graphs/badge.svg?branch=master\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pkg.go.dev/github.com/adrg/strutil\"\u003e\n        \u003cimg alt=\"pkg.go.dev documentation\" src=\"https://pkg.go.dev/badge/github.com/adrg/strutil\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://opensource.org/licenses/MIT\" rel=\"nofollow\"\u003e\n        \u003cimg alt=\"MIT license\" src=\"https://img.shields.io/github/license/adrg/strutil\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://goreportcard.com/report/github.com/adrg/strutil\"\u003e\n        \u003cimg alt=\"Go report card\" src=\"https://goreportcard.com/badge/github.com/adrg/strutil\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/adrg/strutil/issues\"\u003e\n        \u003cimg alt=\"GitHub issues\" src=\"https://img.shields.io/github/issues/adrg/strutil\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://ko-fi.com/T6T72WATK\"\u003e\n        \u003cimg alt=\"Buy me a coffee\" src=\"https://img.shields.io/static/v1.svg?label=%20\u0026message=Buy%20me%20a%20coffee\u0026color=579fbf\u0026logo=buy%20me%20a%20coffee\u0026logoColor=white\" /\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\nstrutil provides a collection of string metrics for calculating string similarity as well as\nother string utility functions.  \nFull documentation can be found at https://pkg.go.dev/github.com/adrg/strutil.\n\n## Installation\n\n```\ngo get github.com/adrg/strutil\n```\n\n## String metrics\n\n- [Hamming](#hamming)\n- [Levenshtein](#levenshtein)\n- [Jaro](#jaro)\n- [Jaro-Winkler](#jaro-winkler)\n- [Smith-Waterman-Gotoh](#smith-waterman-gotoh)\n- [Sorensen-Dice](#sorensen-dice)\n- [Jaccard](#jaccard)\n- [Overlap Coefficient](#overlap-coefficient)\n\nThe package defines the `StringMetric` interface, which is implemented by all\nthe string metrics. The interface is used with the `Similarity` function, which\ncalculates the similarity between the specified strings, using the provided\nstring metric.\n\n```go\ntype StringMetric interface {\n    Compare(a, b string) float64\n}\n\nfunc Similarity(a, b string, metric StringMetric) float64 {\n}\n```\n\nAll defined string metrics can be found in the\n[metrics](https://pkg.go.dev/github.com/adrg/strutil/metrics) package.\n\n#### Hamming\n\nCalculate similarity.\n```go\nsimilarity := strutil.Similarity(\"text\", \"test\", metrics.NewHamming())\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.75\n```\n\nCalculate distance.\n```go\nham := metrics.NewHamming()\nfmt.Printf(\"%d\\n\", ham.Distance(\"one\", \"once\")) // Output: 2\n```\n\nMore information and additional examples can be found on\n[pkg.go.dev](https://pkg.go.dev/github.com/adrg/strutil/metrics#Hamming).\n\n#### Levenshtein\n\nCalculate similarity using default options.\n```go\nsimilarity := strutil.Similarity(\"graph\", \"giraffe\", metrics.NewLevenshtein())\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.43\n```\n\nConfigure edit operation costs.\n```go\nlev := metrics.NewLevenshtein()\nlev.CaseSensitive = false\nlev.InsertCost = 1\nlev.ReplaceCost = 2\nlev.DeleteCost = 1\n\nsimilarity := strutil.Similarity(\"make\", \"Cake\", lev)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.50\n```\n\nCalculate distance.\n```go\nlev := metrics.NewLevenshtein()\nfmt.Printf(\"%d\\n\", lev.Distance(\"graph\", \"giraffe\")) // Output: 4\n```\n\nMore information and additional examples can be found on\n[pkg.go.dev](https://pkg.go.dev/github.com/adrg/strutil/metrics#Levenshtein).\n\n#### Jaro\n\n```go\nsimilarity := strutil.Similarity(\"think\", \"tank\", metrics.NewJaro())\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.78\n```\n\nMore information and additional examples can be found on\n[pkg.go.dev](https://pkg.go.dev/github.com/adrg/strutil/metrics#Jaro).\n\n#### Jaro-Winkler\n\n```go\nsimilarity := strutil.Similarity(\"think\", \"tank\", metrics.NewJaroWinkler())\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.80\n```\n\nMore information and additional examples can be found on\n[pkg.go.dev](https://pkg.go.dev/github.com/adrg/strutil/metrics#JaroWinkler).\n\n#### Smith-Waterman-Gotoh\n\nCalculate similarity using default options.\n```go\nswg := metrics.NewSmithWatermanGotoh()\nsimilarity := strutil.Similarity(\"times roman\", \"times new roman\", swg)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.82\n```\n\nCustomize gap penalty and substitution function.\n```go\nswg := metrics.NewSmithWatermanGotoh()\nswg.CaseSensitive = false\nswg.GapPenalty = -0.1\nswg.Substitution = metrics.MatchMismatch {\n    Match:    1,\n    Mismatch: -0.5,\n}\n\nsimilarity := strutil.Similarity(\"Times Roman\", \"times new roman\", swg)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.96\n```\n\nMore information and additional examples can be found on\n[pkg.go.dev](https://pkg.go.dev/github.com/adrg/strutil/metrics#SmithWatermanGotoh).\n\n#### Sorensen-Dice\n\nCalculate similarity using default options.\n```go\nsd := metrics.NewSorensenDice()\nsimilarity := strutil.Similarity(\"time to make haste\", \"no time to waste\", sd)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.62\n```\n\nCustomize n-gram size.\n```go\nsd := metrics.NewSorensenDice()\nsd.CaseSensitive = false\nsd.NgramSize = 3\n\nsimilarity := strutil.Similarity(\"Time to make haste\", \"no time to waste\", sd)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.53\n```\n\nMore information and additional examples can be found on\n[pkg.go.dev](https://pkg.go.dev/github.com/adrg/strutil/metrics#SorensenDice).\n\n#### Jaccard\n\nCalculate similarity using default options.\n```go\nj := metrics.NewJaccard()\nsimilarity := strutil.Similarity(\"time to make haste\", \"no time to waste\", j)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.45\n```\n\nCustomize n-gram size.\n```go\nj := metrics.NewJaccard()\nj.CaseSensitive = false\nj.NgramSize = 3\n\nsimilarity := strutil.Similarity(\"Time to make haste\", \"no time to waste\", j)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.36\n```\n\nThe input of the Sorensen-Dice example is the same as the one of Jaccard\nbecause the metrics bear a resemblance to each other. In fact, each of the\ncoefficients can be used to calculate the other one.\n\nSorensen-Dice to Jaccard.\n```\nJ = SD/(2-SD)\n\nwhere SD is the Sorensen-Dice coefficient and J is the Jaccard index.\n```\n\nJaccard to Sorensen-Dice.\n```\nSD = 2*J/(1+J)\n\nwhere SD is the Sorensen-Dice coefficient and J is the Jaccard index.\n```\n\nMore information and additional examples can be found on\n[pkg.go.dev](https://pkg.go.dev/github.com/adrg/strutil/metrics#Jaccard).\n\n#### Overlap Coefficient\n\nCalculate similarity using default options.\n```go\noc := metrics.NewOverlapCoefficient()\nsimilarity := strutil.Similarity(\"time to make haste\", \"no time to waste\", oc)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.67\n```\n\nCustomize n-gram size.\n```go\noc := metrics.NewOverlapCoefficient()\noc.CaseSensitive = false\noc.NgramSize = 3\n\nsimilarity := strutil.Similarity(\"Time to make haste\", \"no time to waste\", oc)\nfmt.Printf(\"%.2f\\n\", similarity) // Output: 0.57\n```\n\nMore information and additional examples can be found on\n[pkg.go.dev](https://pkg.go.dev/github.com/adrg/strutil/metrics#OverlapCoefficient).\n\n## References\n\nFor more information see:\n- [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance)\n- [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance)\n- [Jaro-Winkler distance](https://en.wikipedia.org/wiki/Jaro-Winkler_distance)\n- [Smith-Waterman algorithm](https://en.wikipedia.org/wiki/Smith-Waterman_algorithm)\n- [Sorensen-Dice coefficient](https://en.wikipedia.org/wiki/Sorensen–Dice_coefficient)\n- [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index)\n- [Overlap coefficient](https://en.wikipedia.org/wiki/Overlap_coefficient)\n\n## Stargazers over time\n\n[![Stargazers over time](https://starchart.cc/adrg/strutil.svg)](https://starchart.cc/adrg/strutil)\n\n## Contributing\n\nContributions in the form of pull requests, issues or just general feedback,\nare always welcome.  \nSee [CONTRIBUTING.MD](CONTRIBUTING.md).\n\n## License\n\nCopyright (c) 2019 Adrian-George Bostan.\n\nThis project is licensed under the [MIT license](https://opensource.org/licenses/MIT).\nSee [LICENSE](LICENSE) for more details.\n","funding_links":["https://ko-fi.com/adrg","https://ko-fi.com/T6T72WATK"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadrg%2Fstrutil","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadrg%2Fstrutil","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadrg%2Fstrutil/lists"}