{"id":20788744,"url":"https://github.com/preciz/similarity","last_synced_at":"2025-07-24T19:01:52.353Z","repository":{"id":34931413,"uuid":"192551914","full_name":"preciz/similarity","owner":"preciz","description":"A library for cosine similarity \u0026 simhash calculation","archived":false,"fork":false,"pushed_at":"2024-07-20T22:58:23.000Z","size":49,"stargazers_count":16,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-05-05T19:03:30.540Z","etag":null,"topics":["cosine-similarity","elixir","simhash","vector"],"latest_commit_sha":null,"homepage":"https://hex.pm/packages/similarity","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/preciz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-06-18T14:03:58.000Z","updated_at":"2024-10-21T02:07:22.000Z","dependencies_parsed_at":"2025-05-05T18:35:25.508Z","dependency_job_id":null,"html_url":"https://github.com/preciz/similarity","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/preciz/similarity","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preciz%2Fsimilarity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preciz%2Fsimilarity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preciz%2Fsimilarity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preciz%2Fsimilarity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/preciz","download_url":"https://codeload.github.com/preciz/similarity/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/preciz%2Fsimilarity/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263980181,"owners_count":23538922,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cosine-similarity","elixir","simhash","vector"],"created_at":"2024-11-17T15:16:12.978Z","updated_at":"2025-07-06T22:07:56.978Z","avatar_url":"https://github.com/preciz.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Similarity\n\n[![test](https://github.com/preciz/similarity/actions/workflows/test.yml/badge.svg)](https://github.com/preciz/similarity/actions/workflows/test.yml)\n\nCosine similarity \u0026 Simhash implementation\n\nFull documentation can be found at [https://hexdocs.pm/similarity](https://hexdocs.pm/similarity).\n\n## Installation\n\nAdd `similarity` to your list of dependencies in `mix.exs`:\n\n```elixir\ndef deps do\n  [\n    {:similarity, \"~\u003e 0.4\"}\n  ]\nend\n```\n\n## Cosine Similarity\n\nCosine similarity is not sensitive to the scale of the vector:\n\n```elixir\nSimilarity.cosine([1,2,3], [1,2,3])\n1.0\nSimilarity.cosine([1,2,3], [2,4,6])\n1.0\n```\n\nModule `Similarity.Cosine` takes care of building a struct and streaming similarities:\n(It handles non matching attributes, elements added don't have to have the exact attributes)\n\n```elixir\ns = Similarity.Cosine.new()\ns = s |\u003e Similarity.Cosine.add(\"a\", [{\"bananas\", 9}, {\"hair_color_r\", 124}, {\"hair_color_g\", 8}, {\"hair_color_b\", 122}])\ns = s |\u003e Similarity.Cosine.add(\"b\", [{\"bananas\", 19}, {\"hair_color_r\", 124}, {\"hair_color_g\", 8}, {\"hair_color_b\", 122}])\ns = s |\u003e Similarity.Cosine.add(\"c\", [{\"bananas\", 9}, {\"hair_color_r\", 124}])\n\ns |\u003e Similarity.Cosine.stream |\u003e Enum.to_list\n[\n  {\"a\", \"b\", 1.9967471152702767},\n  {\"a\", \"c\", 1.4142135623730951},\n  {\"b\", \"c\", 1.409736747211141}\n]\n\ns |\u003e Similarity.Cosine.between(\"a\", \"b\")\n1.9967471152702767\n```\n\n`Similarity.cosine_srol/2`\nCosine similarity between two vectors, multiplied by the square root of the length of the vectors.\n(In my experience, where the number of common attributes doesn't match between some vectors, this gives a better value.)\n\n```elixir\na = [1,2,3,4]\nb = [1,2,3]\nc = [1,2,3,4]\n\nSimilarity.cosine_srol(a |\u003e Enum.take(3), b)\n1.7320508075688772\nSimilarity.cosine_srol(a, c)\n2.0\n```\n\nAbove even though the first 3 elements of `a` match with `b`, just like `a` with `c`,\nthe `a` \u0026 `c` cosine similarity returns higher value due to more elements matching.\nIn real world scenario I suggest using this if compared vectors aren't the same length.\n\n## Simhash\n\n```elixir\nleft = \"pork belly jerky brisket tenderloin shank kevin spare ribs\"\nright = \"porchetta pork loin. Leberkas ball tip biltong, beef ribs\"\n\nSimilarity.simhash(left, right, ngram_size: 3)\n0.484375\n```\n\n## Performance\nSimilarity.simhash is 2x faster than simhash-ex v1.1.0 package.\n\n```\nBenchmark suite executing with the following configuration:\nwarmup: 2 s\ntime: 5 s\nmemory time: 0 ns\nparallel: 1\ninputs: none specified\nEstimated total run time: 14 s\n\nBenchmarking simhash-ex...\nBenchmarking similarity.simhash...\n\nName                         ips        average  deviation         median         99th %\nsimilarity.simhash        3.67 K      272.69 μs     ±6.50%      267.84 μs      353.05 μs\nsimhash-ex                1.75 K      572.14 μs    ±12.31%      552.22 μs      781.02 μs\n\nComparison:\nsimilarity.simhash        3.67 K\nsimhash-ex                1.75 K - 2.10x slower +299.46 μs\n```\n\n## License\n\nSimilarity is [MIT licensed](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpreciz%2Fsimilarity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpreciz%2Fsimilarity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpreciz%2Fsimilarity/lists"}