{"id":18998705,"url":"https://github.com/foxxmd/string-sameness","last_synced_at":"2025-04-22T14:43:34.152Z","repository":{"id":150875096,"uuid":"623592443","full_name":"FoxxMD/string-sameness","owner":"FoxxMD","description":"Compare the sameness of two strings","archived":false,"fork":false,"pushed_at":"2024-06-25T13:46:55.000Z","size":260,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T16:04:54.800Z","etag":null,"topics":["compare","cosine-similarity","dice-coefficient","levenshtein-distance","sameness","string","text","typescript"],"latest_commit_sha":null,"homepage":"https://foxxmd.github.io/string-sameness/","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FoxxMD.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-04T17:21:17.000Z","updated_at":"2024-06-25T13:46:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"30b10ce8-abbe-4d86-8bc0-d035788eaff0","html_url":"https://github.com/FoxxMD/string-sameness","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoxxMD%2Fstring-sameness","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoxxMD%2Fstring-sameness/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoxxMD%2Fstring-sameness/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoxxMD%2Fstring-sameness/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FoxxMD","download_url":"https://codeload.github.com/FoxxMD/string-sameness/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249273915,"owners_count":21241992,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compare","cosine-similarity","dice-coefficient","levenshtein-distance","sameness","string","text","typescript"],"created_at":"2024-11-08T17:47:42.416Z","updated_at":"2025-04-16T20:31:53.451Z","avatar_url":"https://github.com/FoxxMD.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# string-sameness\n\nGenerate scores that represents how similar two strings are based on different string comparison algorithms.\n\nScores from all used algorithms are averaged and then weighted by the length of the content being compared (more weight for longer content).\n\nThe sameness is then given a **score of 0 to 100.**\n\n* 0 =\u003e Totally unique pieces of content\n* 100 =\u003e Identical content\n\n# Install/Usage\n\n```\nnpm install @foxxmd/string-sameness\n```\n\n```js\nimport {stringSameness} from '@foxxmd/string-sameness';\n\nconst result =  stringSameness('This is one sentence', 'This is another sentence');\nconsole.log(result);\n// {\n//     \"strategies\": {\n//         \"dice\": {\n//             \"rawScore\": 0.6666,\n//             \"score\": 66.66\n//         },\n//         \"leven\": {\n//             \"rawScore\": 5,\n//             \"distance\": 5,\n//             \"score\": 79.16\n//         },\n//         \"cosine\": {\n//             \"rawScore\": 0.75,\n//             \"score\": 75\n//         }\n//     },\n//     \"highScore\": 73.61,\n//     \"highScoreWeighted\": 83.58\n// }\n```\n\n# Options\n\nAn optional third argument can be provided to `stringSameness` to customize how strings are normalized before comparison and what strategies are used for comparison.\n\n## Strategies\n\nPass a list of `ComparisonStrategy` objects using `{strategies: []}` to define which string comparisons should be performed on the given strings.\n\nThe average of the scores from all passed strategies is returned as `highScore` (and `highScoreWeighted`) from `stringSameness()`\n\nWhen no strategies are explicitly passed a default set of strategies is used, found in `import {defaultStrategies} from @foxxmd/string-sameness;`:\n\n* [Dice's Coefficient](https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient) in [`diceSimilarities.ts`](/src/matchingStrategies/diceSimilarity.ts)\n* [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) in [`cosineSimilarities.ts`](/src/matchingStrategies/cosineSimilarity.ts)\n* [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance) in [`levenSimilarities.ts`](/src/matchingStrategies/levenSimilarity.ts)\n\nStrategies can be accessed individually using `import {strategies} from @foxxmd/string-sameness`\n\n### Bring Your Own Strategy\n\nUse your own strategy by creating an object that conforms to `ComparisonStrategy`:\n\n```ts\nexport interface ComparisonStrategy {\n    /**\n     * The name of this strategy\n     * */\n    name: string\n    /**\n     * A function that accepts two string arguments and returns a number between 0 and 100 signifying how closely similar the strings are:\n     * 0 =\u003e not similar at all\n     * 100 =\u003e identical\n     * */\n    strategy: (strA: string, strB: string) =\u003e number\n    /**\n     * An optional function that accepts two string arguments and returns whether this strategy should be used\n     * */\n    isValid?: (strA: string, strB: string) =\u003e boolean\n}\n```\n\nExample of using your own strategy with the defaults:\n\n```ts\nimport {stringSameness} from \"@foxxmd/string-sameness\";\nimport {ComparisonStrategy, levenStrategy, cosineStrategy, diceStrategy} from \"@foxxmd/string-sameness/strategies\";\n\nconst myStrat: ComparisonStrategy = {\n    name: 'MyCoolStrat',\n    strategy: (valA: string, valB: string) =\u003e {\n        const a = valA.concat(valB);\n        return a.length;\n    },\n}\nconst strats = [\n    levenStrategy,\n    cosineStrategy,\n    diceStrategy,\n    myStrat\n]\n\nconst result = stringSameness('This is one sentence', 'This is another sentence', {strategies: strats});\n```\n\n## Normalization\n\nPass a list of functions using `{transforms: []}` to transform the strings before comparison. When not explicitly provided a default set of functions is applied to normalize the strings (to remove trivial differences):\n\n* normalize unicode EX convert Ö =\u003e O\n* convert to lowercase\n* trim (remove whitespace at beginning/end)\n* remove non-alphanumeric characters (punctuation and newlines)\n* replace any instances of 2 or more consecutive whitespace with 1 whitespace\n\n* The default set of transformer functions is exported as `import {strDefaultTransforms} from @foxxmd/string-sameness;`\n* All built-in transformers can be found at `import {transforms} from @foxxmd/string-sameness;`\n\nExample of supplying your own transform functions:\n\n```js\nimport {stringSameness, defaultStrCompareTransformFuncs} from '@foxxmd/string-sameness';\n\nconst myFuncs = [\n    ...defaultStrCompareTransformFuncs,\n    // replace all vowels with the letter e\n    (str) =\u003e str.replace(/[aeiou]/ig, 'e')\n]\n\nconst result =  stringSameness('This is one sentence', 'This is another sentence', {transforms: myFuncs});\n```\n\n## Token Re-ordering\n\nIf tokens (word) ordering in the strings is not important you can choose to have string-sameness attempt to re-order all words before comparing sameness. This makes comparison scores much closer to \"absolute sameness in all characters within string\". EX:\n\n* `this is correct order`\n* `order correct this is`\n\nScores 60 **without** reordering \n\nScores 100 **with** reordering\n\nBehavior caveats:\n\n* The **second** string argument is reordered to match the **first** string argument\n* If the second string is longer than the first than any non-matched words are concatenated to the end of the re-ordered string in the same order they were found\n\nTo use:\n\n```js\nimport {stringSameness} from '@foxxmd/string-sameness';\n\nconst res = stringSameness(strA, strB, {reorder: true});\n```\n\n## Factory\n\nFor convenience, a factory function is also provided:\n\n```ts\nimport {createStringSameness, strategies} from \"@foxxmd/string-sameness\";\nimport {myTransforms, myStrats} from './util';\n\nconst {levenStrategy} = strategies;\n\n// sets the default object to used with the third argument for `stringSameness`\nconst myCompare = createStringSameness({transforms: myTransforms, strategies: [levenStrategy, ...myStrats]});\n\n// uses myTransforms and myStrats\nconst plainResult = myCompare('This is one sentence', 'This is another sentence');\n\n// override your defaults using the third argument like normal\nconst overrideResults = myCompare('This is one sentence', 'This is another sentence', {strategies: [levenStrategy]});\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoxxmd%2Fstring-sameness","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffoxxmd%2Fstring-sameness","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffoxxmd%2Fstring-sameness/lists"}