{"id":16254190,"url":"https://github.com/sumn2u/string-comparisons","last_synced_at":"2025-03-19T21:30:26.476Z","repository":{"id":92638336,"uuid":"173421344","full_name":"sumn2u/string-comparisons","owner":"sumn2u","description":"A collection of string comparisons algorithms","archived":false,"fork":false,"pushed_at":"2024-04-18T20:13:50.000Z","size":717,"stargazers_count":14,"open_issues_count":1,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-17T10:54:32.573Z","etag":null,"topics":["algorithms","cosine-similarity","damerau-levenshtein","distance","hamming-distance","jaccard-similarity","jaro-winkler-distance","javascript","levenshtein-distance","similarity-measures","smith-waterman","sorensen-dice-distance","string-comparison","string-distance","trigrams"],"latest_commit_sha":null,"homepage":"https://sumn2u.github.io/string-comparisons/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sumn2u.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-03-02T08:10:34.000Z","updated_at":"2024-11-26T21:00:53.000Z","dependencies_parsed_at":"2024-10-27T21:32:30.582Z","dependency_job_id":"7e77b9f4-f02c-4fc3-8263-8099225da44d","html_url":"https://github.com/sumn2u/string-comparisons","commit_stats":null,"previous_names":["sumn2u/string-comparisons"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumn2u%2Fstring-comparisons","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumn2u%2Fstring-comparisons/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumn2u%2Fstring-comparisons/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sumn2u%2Fstring-comparisons/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sumn2u","download_url":"https://codeload.github.com/sumn2u/string-comparisons/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244507826,"owners_count":20463689,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithms","cosine-similarity","damerau-levenshtein","distance","hamming-distance","jaccard-similarity","jaro-winkler-distance","javascript","levenshtein-distance","similarity-measures","smith-waterman","sorensen-dice-distance","string-comparison","string-distance","trigrams"],"created_at":"2024-10-10T15:20:24.822Z","updated_at":"2025-03-19T21:30:26.071Z","avatar_url":"https://github.com/sumn2u.png","language":"JavaScript","readme":"# String Comparisons\n\u003cspan class=\"badge-npmversion\"\u003e\u003ca href=\"https://npmjs.org/package/string-comparisons\" title=\"View this project on NPM\"\u003e\u003cimg src=\"https://img.shields.io/npm/v/string-comparisons.svg\" alt=\"NPM version\" /\u003e\u003c/a\u003e\u003c/span\u003e\n![npm](https://img.shields.io/npm/dm/string-comparisons)\n[![GitHub stars](https://img.shields.io/github/stars/sumn2u/string-comparisons)](https://github.com/sumn2u/string-comparisons/stargazers)\n[![GitHub license](https://img.shields.io/github/license/sumn2u/string-comparisons)](https://github.com/sumn2u/string-comparisons/blob/master/LICENCE)\n![example workflow](https://github.com/sumn2u/string-comparisons/actions/workflows/static.yml/badge.svg\n)\n\nThis library offers a range of functions to calculate text similarity, allowing you to measure the likeness of text data in an application. It implements well-established similarity metrics. The library currently supports the following algorithms:\n\n- **Cosine Similarity**\n- **Jaccard Similarity**\n- **Jaro Similarity**\n- **Damerau-Levenshtein Distance**\n- **Hamming Distance**\n- **Levenshtein Distance**\n- **Smith-Waterman Alignment**\n- **Sørensen-Dice Coefficient**\n- **Jaccard Similarity based on Trigrams**\n- **Szymkiewicz Simpson Overlap**\n- **N-Gram**\n- **Q-Gram**\n- **Optimal String Alignment**\n\n\n## Installation\n\nAssuming you have [Node.js](https://nodejs.org/en) and [npm](https://www.npmjs.com)/[yarn](https://yarnpkg.com)/[pnpm](https://pnpm.io/) installed, install the library using:\n\n```bash\n# Install the 'string-comparisons' package using npm\nnpm install string-comparisons\n\n# Alternatively, install the 'string-comparisons' package using yarn\nyarn add string-comparisons\n\n# Or, install the 'string-comparisons' package using pnpm\npnpm add string-comparisons\n```\n\n## Docs\nFind more information on the algorithms by accessing the [class documentation](https://sumn2u.github.io/string-comparisons) of each implemented [algorithm](algorithms.md).\n\n##  String Similarity Algorithm Comparison\n\n| Algorithm              | Normalized | Metric                                  | Similarity | Distance | Space Complexity |\n|------------------------|------------|-----------------------------------------|------------|----------|------------------|\n| cosine.js              | Yes        | Vector Space Model                      | ✓          |          | O(n)             |\n| jaro.js                | No         | Edit Distance                           | ✓          |          | O(min(n, m))     |\n| jaccard.js             | No         | Set Theory                              | ✓          |          | O(min(n, m))     |\n| damerauLevenshtein.js | No         | Edit Distance                           |            | ✓        | O(max(n, m)²)    |\n| hammingDistance.js     | No         | Bitwise Operations                      | ✓          |          | O(1)             |\n| jaroWinkler.js         | No         | Edit Distance                           | ✓          |          | O(min(n, m))     |\n| levenshtein.js         | No         | Edit Distance                           |            | ✓        | O(max(n, m)²)    |\n| smithWaterman.js       | No         | Dynamic Programming (Local Alignment)  | ✓          |          | O(n * m)         |\n| sorensenDice.js        | No         | Set Theory                              | ✓          |          | O(min(n, m))     |\n| trigram.js             | No         | N-gram Overlap                          | ✓          |          | O(n²)            |\n| szymkiewiczSimpsonOverlap.js             | Yes         | Overlap Coefficient                          |  ✓         |          | O(min(m, n))            |\n| nGram.js             | Yes         | Jaccard similarity coefficient                          | ✓          |          | O(m * n)            |\n| qGram.js             | Yes         | Jaccard similarity coefficient                          | ✓          |          | O(n + m)            |\n| optimalStringAlignment.js             | No         | Edit distance                          |         |      ✓      | O(max(n, m)²)             |\n\n**Explanation of Columns:**\n\n- **Normalized:** Indicates whether the algorithm produces a score between 0 and 1 (normalized).\n- **Metric:** The underlying mathematical concept used for comparison.\n- **Similarity:** Whether the algorithm outputs a higher score for more similar strings.\n- **Distance:** Whether the algorithm outputs a lower score for more similar strings. (One algorithm might use similarity, another distance - they provide the opposite information).\n- **Space Complexity:** The amount of extra memory the algorithm needs to run the comparison.\n\n**Notes:**\n\n- ✓ indicates the algorithm applies to that category.\n- Some algorithms can be used for both similarity and distance calculations depending on the interpretation of the score.\n\n\n## Example Usage\n\n\n```javascript\nimport StringComparisons from 'string-comparisons';\n\nconst { Cosine, Jaccard, Jaro, DamerauLevenshtein, HammingDistance, JaroWrinker, Levenshtein, SmithWaterman, SorensenDice, Trigram } = StringComparisons;\n\nconst string1 = 'programming';\nconst string2 = 'programmer';\n\n\nconsole.log('Jaro-Winkler similarity:', JaroWrinker.similarity(string1, string2)); // Output: ~0.9054545454545454\nconsole.log('Levenshtein distance:', Levenshtein.similarity(string1, string2)); // Output: 3\nconsole.log('Smith-Waterman similarity:', SmithWaterman.similarity(string1, string2)); // Output: 16\n\nconst set1 = new Set([1, 2, 3]);\nconst set2 = new Set([2, 3, 4]);\n\nconsole.log('Sørensen-Dice similarity:', SorensenDice.similarity(set1, set2)); // Output: 0.6666666666666667\n\nconst trigram1 = 'hello';\nconst trigram2 = 'world';\n\nconsole.log('Trigram Jaccard similarity:', Trigram.similarity(trigram1, trigram2)); // Output: 0 (no shared trigrams)\n\n// so on\n```\n\n## Contributing\n\nWe encourage contributions to this library! Feel free to fork the repository, make your changes, and submit pull requests.\n\n## Support the Project \u003ca name=\"support-the-project\"\u003e\u003c/a\u003e⭐\n\nIf you feel awesome and want to support us in a small way, please consider starring and sharing the repo! This helps us get visibility and allow the community to grow. 🙏\n\n\n## Contact Us\nIf you have any questions or feedback, please don't hesitate to contact us at sumn2u@gmail.com, or reach out to Suman directly. We hope you find this resource helpful 💜.\n\n\n## License Information\nThis project is licensed under the  [MIT](./LICENSE) , which means that you are free to use, modify, and distribute the code as long as you comply with the terms of the license.\n\n## Resources\n- [String Similarity Comparison in JS with Examples](https://sumn2u.medium.com/string-similarity-comparision-in-js-with-examples-4bae35f13968)\n- [Cosine similarity between two sentences](https://sumn2u.medium.com/cosine-similarity-between-two-sentences-8f6630b0ebb7)\n- [The complete guide to string similarity algorithms](https://yassineelkhal.medium.com/the-complete-guide-to-string-similarity-algorithms-1290ad07c6b7)\n- [N-Gram Similarity and Distance](https://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf)\n- [Approximate string-matching with q-grams and maximal matches](https://www.sciencedirect.com/science/article/pii/0304397592901434)\n- [Research on string similarity algorithm based on Levenshtein Distance](https://ieeexplore.ieee.org/document/8054419)\n- [String similarity search and join: a survey](https://link.springer.com/article/10.1007/s11704-015-5900-5)","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsumn2u%2Fstring-comparisons","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsumn2u%2Fstring-comparisons","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsumn2u%2Fstring-comparisons/lists"}