Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/UniversalAvenue/simhash-ex
Elixir implementation of Simhash
https://github.com/UniversalAvenue/simhash-ex
Last synced: 8 days ago
JSON representation
Elixir implementation of Simhash
- Host: GitHub
- URL: https://github.com/UniversalAvenue/simhash-ex
- Owner: UniversalAvenue
- Created: 2016-07-04T15:47:00.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2020-03-06T16:57:30.000Z (over 4 years ago)
- Last Synced: 2024-10-03T06:39:48.020Z (about 1 month ago)
- Language: Elixir
- Size: 16.6 KB
- Stars: 22
- Watchers: 11
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- freaking_awesome_elixir - Elixir - Simhash implementation using Siphash and N-grams. (Algorithms and Data structures)
- fucking-awesome-elixir - simhash - Simhash implementation using Siphash and N-grams. (Algorithms and Data structures)
- awesome-elixir - simhash - Simhash implementation using Siphash and N-grams. (Algorithms and Data structures)
README
# Simhash
An Elixir implementation of [Moses Charikar's](http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/CharikarEstim.pdf) Simhash.## Examples
```elixir
iex> Simhash.similarity("Universal Avenue", "Universe Avenue")
0.71875
iex> Simhash.similarity("hocus pocus", "pocus hocus")
0.8125
iex> Simhash.similarity("Sankt Eriksgatan 1", "S:t Eriksgatan 1")
0.8125
iex> Simhash.similarity("Purple flowers", "Green grass")
0.5625
```By default trigrams (N-gram of size 3) are used as language features, but you can set a different N-gram size:
```elixir
iex> Simhash.similarity("hocus pocus", "pocus hocus", 1)
1.0
iex> Simhash.similarity("Sankt Eriksgatan 1", "S:t Eriksgatan 1", 6)
0.859375
iex> Simhash.similarity("Purple flowers", "Green grass", 6)
0.546875
```## Installation
The package can be installed
by adding `simhash` to your list of dependencies in `mix.exs`:```elixir
def deps do
[
{:simhash, "~> 0.1.2"}
]
end
```