https://github.com/anastasia/minhash
https://github.com/anastasia/minhash
minhash
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/anastasia/minhash
- Owner: anastasia
- Created: 2017-05-03T16:52:05.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2019-01-23T17:43:26.000Z (over 6 years ago)
- Last Synced: 2025-01-26T15:14:18.568Z (8 months ago)
- Topics: minhash
- Language: Python
- Size: 16.6 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### MinHash
MinHash explanation: http://infolab.stanford.edu/~ullman/mmds/book.pdf
(chapter 3, also archived here: https://perma.cc/K9B4-QTX3)
A simple take here: https://moz.com/devblog/near-duplicate-detection/This implementation borrows from Chris McCormick's MinHash tutorial.
https://github.com/chrisjmccormick/MinHash#### To install (for now):
`pip install -e "git+git://github.com/anastasia/minhash.git@master#egg=minhash"`#### To run in CLI:
`python minhash.py doc1 doc2`#### To run in python:
```
import minhash
minhash.calculate(string_a, string_b)
```