Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tgalopin/simhashphp
SimHash similarities algorithm implementation for PHP
https://github.com/tgalopin/simhashphp
Last synced: about 1 month ago
JSON representation
SimHash similarities algorithm implementation for PHP
- Host: GitHub
- URL: https://github.com/tgalopin/simhashphp
- Owner: tgalopin
- License: mit
- Created: 2012-06-27T16:22:00.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2021-03-08T18:25:56.000Z (almost 4 years ago)
- Last Synced: 2024-10-04T22:05:43.476Z (2 months ago)
- Language: PHP
- Homepage: https://titouangalopin.com
- Size: 42 KB
- Stars: 144
- Watchers: 10
- Forks: 37
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-hacking-lists - tgalopin/simhashphp - SimHash similarities algorithm implementation for PHP (PHP)
README
SimHashPHP
==========> This is the second version of SimHashPHP. If you are using the version 1 and don't want to
> update your code, please refer to the `1.0-security` branch (https://github.com/tgalopin/SimHashPhp/tree/1.0-security).
> The 1.0 branch will be maintained until the release of a v3 but only the v2 will have lastest features.What is SimHashPHP ?
--------------------SimHashPHP is a PHP library that port the SimHash algorithm in PHP.
This algorithm, created by Moses Charikar, provides an efficient way to compute a similarity index between two texts.
It is used by Google internally to detect dupplicate content.See ["SimHash or the way to compare quickly two datasets"](https://titouangalopin.com/2014/06/29/simhash/) for more informations.
[![Build Status](https://secure.travis-ci.org/tgalopin/SimHashPhp.png?branch=master)](http://travis-ci.org/tgalopin/SimHashPhp)
How to use it ?
---------------Install it with [Composer](https://getcomposer.org):
``` sh
composer require tga/simhash-php
```Once installed, include `vendor/autoload.php` to load the library.
The concept of SimHash is described in [this article](https://titouangalopin.com/2014/06/29/simhash/). Here are few examples:
``` php
hash($extractor->extract($text1), \Tga\SimHash\SimHash::SIMHASH_64);
$fp2 = $simhash->hash($extractor->extract($text2), \Tga\SimHash\SimHash::SIMHASH_64);var_dump($fp1->getBinary());
var_dump($fp2->getBinary());// Index between 0 and 1 : 0.80073740291681
var_dump($comparator->compare($fp1, $fp2));
```License
-------This library is under the MIT license (see LICENSE.md)
About
-----SimHashPHP is mainly developed by Titouan Galopin.
Reporting an issue or a feature request
---------------------------------------Issues and feature requests are tracked in the [Github issue tracker](https://github.com/tgalopin/SimHashPhp/issues).