Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dearmadman/minhash
An implementation of the minhash algorithm in golang
https://github.com/dearmadman/minhash
go-minhash golang minhash
Last synced: about 1 month ago
JSON representation
An implementation of the minhash algorithm in golang
- Host: GitHub
- URL: https://github.com/dearmadman/minhash
- Owner: DearMadMan
- Created: 2019-03-31T02:21:08.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-07-09T09:44:50.000Z (over 5 years ago)
- Last Synced: 2024-07-20T08:29:01.311Z (6 months ago)
- Topics: go-minhash, golang, minhash
- Language: Go
- Size: 2.93 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[Minhashing](https://en.wikipedia.org/wiki/MinHash) is an efficient similarity estimation technique that is often used to identify near-duplicate documents in large text collections. This package offers a Golang implementation of the minhash algorithm.
## Usage
```go
m := minhash.New(128)
set1 := m.NewSet([]string{
"minhash", "is", "a", "probabilistic", "data", "structure", "for",
"estimating", "the", "similarity", "between", "datasets",
})set2 := m.NewSet([]string{
"minhash", "is", "a", "probability", "data", "structure", "for",
"estimating", "the", "similarity", "between", "documents",
})set3 := m.NewSet([]string{
"cats", "are", "tall", "and", "have", "been",
"known", "to", "sing", "quite", "loudly",
})fmt.Printf("set1 & set2: %f\n", set1.Jaccard(set2))
fmt.Printf("set1 & set3: %f\n", set1.Jaccard(set3))
fmt.Printf("set2 & set3: %f\n", set2.Jaccard(set3))
```## References
[duhaime/minhash](https://github.com/duhaime/minhash)