{"id":13836760,"url":"https://github.com/schollz/closestmatch","last_synced_at":"2025-07-10T16:30:47.116Z","repository":{"id":57481373,"uuid":"86713880","full_name":"schollz/closestmatch","owner":"schollz","description":"Golang library for fuzzy matching within a set of strings :page_with_curl:","archived":false,"fork":false,"pushed_at":"2022-09-13T03:39:56.000Z","size":656,"stargazers_count":418,"open_issues_count":10,"forks_count":53,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-08-09T00:41:58.829Z","etag":null,"topics":["fuzzy-matching","golang-library","levenshtein","string-matching"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/schollz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-03-30T14:43:15.000Z","updated_at":"2024-07-05T00:50:52.000Z","dependencies_parsed_at":"2022-09-26T17:50:29.746Z","dependency_job_id":null,"html_url":"https://github.com/schollz/closestmatch","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schollz%2Fclosestmatch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schollz%2Fclosestmatch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schollz%2Fclosestmatch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/schollz%2Fclosestmatch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/schollz","download_url":"https://codeload.github.com/schollz/closestmatch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225647739,"owners_count":17502126,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fuzzy-matching","golang-library","levenshtein","string-matching"],"created_at":"2024-08-04T15:00:53.963Z","updated_at":"2024-11-20T23:31:39.801Z","avatar_url":"https://github.com/schollz.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"\n# closestmatch :page_with_curl:\n\n\u003ca href=\"#\"\u003e\u003cimg src=\"https://img.shields.io/badge/version-2.1.0-brightgreen.svg?style=flat-square\" alt=\"Version\"\u003e\u003c/a\u003e\n\u003ca href=\"https://travis-ci.org/schollz/closestmatch\"\u003e\u003cimg src=\"https://img.shields.io/travis/schollz/closestmatch.svg?style=flat-square\" alt=\"Build Status\"\u003e\u003c/a\u003e\n\u003ca href=\"http://gocover.io/github.com/schollz/closestmatch\"\u003e\u003cimg src=\"https://img.shields.io/badge/coverage-98%25-brightgreen.svg?style=flat-square\" alt=\"Code Coverage\"\u003e\u003c/a\u003e\n\u003ca href=\"https://godoc.org/github.com/schollz/closestmatch\"\u003e\u003cimg src=\"https://img.shields.io/badge/api-reference-blue.svg?style=flat-square\" alt=\"GoDoc\"\u003e\u003c/a\u003e\n\n*closestmatch* is a simple and fast Go library for fuzzy matching an input string to a list of target strings. *closestmatch* is useful for handling input from a user where the input (which could be mispelled or out of order) needs to match a key in a database. *closestmatch* uses a [bag-of-words approach](https://en.wikipedia.org/wiki/Bag-of-words_model) to precompute character n-grams to represent each possible target string. The closest matches have highest overlap between the sets of n-grams. The precomputation scales well and is much faster and more accurate than Levenshtein for long strings.\n\n\nGetting Started\n===============\n\n## Install\n\n```\ngo get -u -v github.com/schollz/closestmatch\n```\n\n## Use \n\n####  Create a *closestmatch* object from a list words\n\n```golang\n// Take a slice of keys, say band names that are similar\n// http://www.tonedeaf.com.au/412720/38-bands-annoyingly-similar-names.htm\nwordsToTest := []string{\"King Gizzard\", \"The Lizard Wizard\", \"Lizzard Wizzard\"}\n\n// Choose a set of bag sizes, more is more accurate but slower\nbagSizes := []int{2}\n\n// Create a closestmatch object\ncm := closestmatch.New(wordsToTest, bagSizes)\n```\n\n#### Find the closest match, or find the *N* closest matches\n\n```golang\nfmt.Println(cm.Closest(\"kind gizard\"))\n// returns 'King Gizzard'\n\nfmt.Println(cm.ClosestN(\"kind gizard\",3))\n// returns [King Gizzard Lizzard Wizzard The Lizard Wizard]\n```\n\n#### Calculate the accuracy\n\n```golang\n// Calculate accuracy\nfmt.Println(cm.AccuracyMutatingWords())\n// ~ 66 % (still way better than Levenshtein which hits 0% with this particular set)\n\n// Improve accuracy by adding more bags\nbagSizes = []int{2, 3, 4}\ncm = closestmatch.New(wordsToTest, bagSizes)\nfmt.Println(cm.AccuracyMutatingWords())\n// accuracy improves to ~ 76 %\n```\n\n#### Save/Load\n\n```golang\n// Save your current calculated bags\ncm.Save(\"closestmatches.gob\")\n\n// Open it again\ncm2, _ := closestmatch.Load(\"closestmatches.gob\")\nfmt.Println(cm2.Closest(\"lizard wizard\"))\n// prints \"The Lizard Wizard\"\n```\n\n### Advantages\n\n*closestmatch* is more accurate than Levenshtein for long strings (like in the test corpus). \n\n*closestmatch* is ~20x faster than [a fast implementation of Levenshtein](https://groups.google.com/forum/#!topic/golang-nuts/YyH1f_qCZVc). Try it yourself with the benchmarks:\n\n```bash\ncd $GOPATH/src/github.com/schollz/closestmatch \u0026\u0026 go test -run=None -bench=. \u003e closestmatch.bench\ncd $GOPATH/src/github.com/schollz/closestmatch/levenshtein \u0026\u0026 go test -run=None -bench=. \u003e levenshtein.bench\nbenchcmp levenshtein.bench ../closestmatch.bench\n```\n\nwhich gives the following benchmark (on Intel i7-3770 CPU @ 3.40GHz w/ 8 processors):\n\n```bash\nbenchmark                 old ns/op     new ns/op     delta\nBenchmarkNew-8            1.47          1933870       +131555682.31%\nBenchmarkClosestOne-8     104603530     4855916       -95.36%\n```\n\nThe `New()` function in *closestmatch* is so slower than *levenshtein* because there is precomputation needed.\n\n### Disadvantages\n\n*closestmatch* does worse for matching lists of single words, like a dictionary. For comparison:\n\n\n```\n$ cd $GOPATH/src/github.com/schollz/closestmatch \u0026\u0026 go test\nAccuracy with mutating words in book list:      90.0%\nAccuracy with mutating letters in book list:    100.0%\nAccuracy with mutating letters in dictionary:   38.9%\n```\n\nwhile levenshtein performs slightly better for a single-word dictionary (but worse for longer names, like book titles):\n\n```\n$ cd $GOPATH/src/github.com/schollz/closestmatch/levenshtein \u0026\u0026 go test\nAccuracy with mutating words in book list:      40.0%\nAccuracy with mutating letters in book list:    100.0%\nAccuracy with mutating letters in dictionary:   64.8%\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschollz%2Fclosestmatch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fschollz%2Fclosestmatch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fschollz%2Fclosestmatch/lists"}