Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kailashbuki/fingerprint
Document fingerprint generator
https://github.com/kailashbuki/fingerprint
Last synced: 12 days ago
JSON representation
Document fingerprint generator
- Host: GitHub
- URL: https://github.com/kailashbuki/fingerprint
- Owner: kailashbuki
- License: mit
- Created: 2011-06-25T19:48:31.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2022-06-10T04:21:44.000Z (over 2 years ago)
- Last Synced: 2024-11-28T23:40:57.417Z (26 days ago)
- Language: Python
- Homepage: https://github.com/kailashbuki/fingerprint
- Size: 38.1 KB
- Stars: 29
- Watchers: 6
- Forks: 13
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.txt
- License: LICENSE
Awesome Lists containing this project
README
Fingerprint -- Document Fingerprint Generator
---------------------------------------------Fingerprint is a signature of the document. In particular, it is a representative subset of hash values from the set of all hash values of a document. For more detail, please consider taking a look at [Winnowing: Local Algorithms for Document Fingerprinting](http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf) *(specifically Figure 2)*.
Fingerprint Module Installation
-------------------------------The recommended way to install the `fingerprint` module is to simply use `pip`:
```console
$ pip install fingerprint
```
Fingerprint officially supports Python >= 3.0.How to use fingerprint?
-----------------------
```pycon
>>> import fingerprint
>>> fprint = fingerprint.Fingerprint(kgram_len=4, window_len=5, base=10, modulo=1000)
>>> fprint.generate(str="adorunrunrunadorunrun")
>>> fprint.generate(fpath="../CHANGES.txt")
```The default values for the parameters are
```python
kgram_len = 50
window_len = 100
base = 101
modulo = sys.maxint
```