{"id":13600218,"url":"https://github.com/dexyk/stringosim","last_synced_at":"2025-04-10T21:31:32.948Z","repository":{"id":57497587,"uuid":"55996413","full_name":"dexyk/stringosim","owner":"dexyk","description":"String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...","archived":false,"fork":false,"pushed_at":"2017-09-22T10:59:21.000Z","size":17,"stargazers_count":60,"open_issues_count":0,"forks_count":8,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-12-11T09:23:22.339Z","etag":null,"topics":["comparison","cosine-distance","distance","jaccard","jaro-distance","jaro-winkler","levenshtein","string-distance"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dexyk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-04-11T18:21:23.000Z","updated_at":"2024-08-19T08:13:58.000Z","dependencies_parsed_at":"2022-09-03T23:20:55.053Z","dependency_job_id":null,"html_url":"https://github.com/dexyk/stringosim","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dexyk%2Fstringosim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dexyk%2Fstringosim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dexyk%2Fstringosim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dexyk%2Fstringosim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dexyk","download_url":"https://codeload.github.com/dexyk/stringosim/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248301441,"owners_count":21080893,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["comparison","cosine-distance","distance","jaccard","jaro-distance","jaro-winkler","levenshtein","string-distance"],"created_at":"2024-08-01T18:00:32.532Z","updated_at":"2025-04-10T21:31:27.929Z","avatar_url":"https://github.com/dexyk.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"# stringosim\n\nThe plan for this package is to have Go implementation of different string distance/similarity functions, like Levenshtein (normalized, weighted, Damerau), Jaro-Winkler, Jaccard index, Euclidean distance, Hamming distance...\n\nCurrently it has implemented:\n - Levenshtein\n - Jaccard\n - Hamming\n - LCS\n - Q-gram\n - n-gram based Cosine distanc\n\n Work in progress...\n\n## Import and installation\n\nTo get the library just run:\n\n```shell\n    go get github.com/dexyk/stringosim\n```\n\nTo use the library just import it in your code:\n\n```go\n    import \"github.com/dexyk/stringosim\"\n```\n\nTo run the tests, go to the directory where stringosim package is installed and run:\n\n```shell\n    go test\n```\n\n## Usage\n\nCurrently only Levenshtein, Jaccard, Hamming, LCS string, Q-gram and Cosine distances are implemented.\n\n#### Levenshtein\n\nLevenshtein distance can be calculated with default parameters (use DefaultSimilarityOptions) where cost of insert, delete and substitute operation are 1. You can also use it with other parameters by using SimilarityOptions type. Setting CaseInsensitive to true in SimilarityOptions the comparison will be done without considering character cases.\n\nExample:\n\n```go\n    fmt.Println(stringosim.Levenshtein([]rune(\"stringosim\"), []rune(\"stingobim\")))\n\n    fmt.Println(stringosim.Levenshtein([]rune(\"stringosim\"), []rune(\"stingobim\"),\n    stringosim.LevenshteinSimilarityOptions{\n        InsertCost:     3,\n        DeleteCost:     5,\n        SubstituteCost: 2,\n    }))\n\n    fmt.Println(stringosim.Levenshtein([]rune(\"stringosim\"), []rune(\"STRINGOSIM\"),\n    stringosim.LevenshteinSimilarityOptions{\n        InsertCost:      3,\n        DeleteCost:      4,\n        SubstituteCost:  5,\n        CaseInsensitive: true,\n    }))\n```\n\n#### Jaccard\n\nJaccard distance can be calculated by setting the size of the n-gram which will be used for comparison. If the size is omitted the default value of 1 will be used.\n\nExample:\n\n```go\n    fmt.Println(stringosim.Jaccard([]rune(\"stringosim\"), []rune(\"stingobim\")))\n\n    fmt.Println(stringosim.Jaccard([]rune(\"stringosim\"), []rune(\"stingobim\"), []int{2}))\n\n    fmt.Println(stringosim.Jaccard([]rune(\"stringosim\"), []rune(\"stingobim\"), []int{3}))\n```\n\n#### Hamming\n\nHamming distance can be calculated with options. Default function will calculate standard hamming distance with case sensitive option. It can be also used without case sensitive option.\n\nIf the strings to compare have different lengths, the error will be returned.\n\nExample:\n\n```go\n    dis, _ := stringosim.Hamming([]rune(\"testing\"), []rune(\"restink\"))\n    fmt.Println(dis)\n\n    dis, _ = stringosim.Hamming([]rune(\"testing\"), []rune(\"FESTING\"), stringosim.HammingSimilarityOptions{\n        CaseInsensitive: true,\n    })\n    fmt.Println(dis)\n\n    _, err := stringosim.Hamming([]rune(\"testing\"), []rune(\"testin\"))\n    fmt.Println(err)\n```\n\n#### Longest Common Subsequence (LCS)\n\nLCS between two strings can be calculated with options. Default function will calculate the LCS with case insensitive option. It can be also used without case sensitive option.\n\nExample:\n\n```go\n    fmt.Println(stringosim.LCS([]rune(\"testing lcs algorithm\"), []rune(\"another l c s example\")))\n\n    fmt.Println(stringosim.LCS([]rune(\"testing lcs algorithm\"), []rune(\"ANOTHER L C S EXAMPLE\"),\n    stringosim.LCSSimilarityOptions{\n        CaseInsensitive: true,\n    }))\n```\n\n\n#### Jaro and Jaro-Winkler\n\nJaro and Jaro-Winkler can be calculated with options: case insensitive, and specific values for Jaro-Winkler - threshold, p value and l value.\n\nExample:\n\n```go\n    fmt.Println(stringosim.Jaro([]rune(\"abaccbabaacbcb\"), []rune(\"bababbcabbaaca\")))\n    fmt.Println(stringosim.Jaro([]rune(\"abaccbabaacbcb\"), []rune(\"ABABAbbCABbaACA\"),\n    stringosim.JaroSimilarityOptions{\n        CaseInsensitive: true,\n    }))\n\n    fmt.Println(stringosim.JaroWinkler([]rune(\"abaccbabaacbcb\"), []rune(\"bababbcabbaaca\")))\n    fmt.Println(stringosim.JaroWinkler([]rune(\"abaccbabaacbcb\"), []rune(\"BABAbbCABbaACA\"),\n    stringosim.JaroSimilarityOptions{\n        CaseInsensitive: true,\n        Threshold:       0.7,\n        PValue:          0.1,\n        LValue:          4,\n    }))\n```\n\n#### Q-gram\n\nQ-gram distance can be calculated using default options (DefaultQGramOptions): length of q-grams is 2 and comparison is case sensitive. Using QGramSimilarityOptions as the parameter of the function we can set custom q-gram length and if the comparison is case sensitive or not.\n\nExample:\n\n```go\n    fmt.Println(stringosim.QGram([]rune(\"abcde\"), []rune(\"abdcde\")))\n\n    fmt.Println(stringosim.QGram([]rune(\"abcde\"), []rune(\"ABDCDE\"),\n    stringosim.QGramSimilarityOptions{\n        CaseInsensitive: true,\n        NGramSizes:     []int{3},\n    }))\n```\n\n#### Cosine\n\nCosine distance can be calculated using default options (DefaultCosineOptions): length of n-grams is 2 and comparison is case sensitive. Using CosineSimilarityOptions as the parameter of the function we can set custom n-gram length and if the comparison is case sensitive or not.\n\nExample:\n\n```go\n    fmt.Println(stringosim.Cosine([]rune(\"abcde\"), []rune(\"abdcde\")))\n\n    fmt.Println(stringosim.Cosine(Cosine[]rune(\"abcde\"), []rune(\"ABDCDE\"),\n    stringosim.CosineSimilarityOptions{\n        CaseInsensitive: true,\n        NGramSizes:     []int{3},\n    }))\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdexyk%2Fstringosim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdexyk%2Fstringosim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdexyk%2Fstringosim/lists"}