An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with simhash

A curated list of projects in awesome lists tagged with simhash .

https://github.com/sean-public/python-hashes

Interesting (non-cryptographic) hashes implemented in pure Python.

bloom-filter duplicates geohashes hash hash-functions hashes-implemented nilsimsa python python3 simhash

Last synced: 17 Jan 2026

https://github.com/sing1ee/simhash-java

A simple implementation of simhash algorithm by java.

java simhash simhash-java

Last synced: 07 May 2025

https://github.com/vkandy/simhash-js

Simhash implementation in Javascript

hash-functions simhash similarity-score

Last synced: 13 Mar 2026

https://github.com/holsee/spirit_fingers

Elixir SimHash NIFs written in Rust

elixir nif rust simhash

Last synced: 10 Oct 2025

https://github.com/haoyuhu/gosimhash

A simhasher for Chinese documents implemented by golang, simply translated from yanyiwu/gosimhash

jenkins simhash siphash

Last synced: 17 Mar 2026

https://github.com/marcnuth/deduplication

Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.

algorithms cv deduplication google imagehash shingling simhash

Last synced: 14 May 2025

https://github.com/preciz/similarity

A library for cosine similarity & simhash calculation

cosine-similarity elixir simhash vector

Last synced: 24 Jul 2025

https://github.com/oduwsdl/off-topic-memento-toolkit

This system evaluates a collection of mementos (archived web pages) to determine which are off topic. The collection can be part of an Archive-It collection, a single TimeMap, or stored in a WARC file.

cosine measure memento simhash timemap topic warc

Last synced: 09 Oct 2025

https://github.com/xenia101/keystroke-dynamics

⌨️ User Verification based on Keystroke Dynamics / Two-factor Authentication technology based on Key-Stroke

cross-validation k-means k-means-clustering keystore keystroke-dynamics knn machine-learning python3 simhash user-verification

Last synced: 09 Oct 2025

https://github.com/shangri-la-0428/thronglets

P2P shared memory substrate for AI agents — stigmergic knowledge network via libp2p

ai-agents collective-intelligence decentralized libp2p mcp-server model-context-protocol p2p rust simhash stigmergy

Last synced: 07 Apr 2026

https://github.com/hengfeiyang/simhash

a Golang implementation of Simhash Algorithm

simhash

Last synced: 04 Aug 2025

https://github.com/smarthi/pymuvera

Python library for MUVERA multi-vector retrieval via Fixed Dimensional Encodings. ColBERT / ColQwen2 / ColQwen3.5 compatible.

approximate-nearest-neighbor approximate-nearest-neighbor-search colbert colqwen2 embeddings late-interaction multi-vector-retrieval muvera rag simhash

Last synced: 12 Jun 2026

https://github.com/qyokizzzz/simhash

The extended version of simhash supports fingerprint extraction of documents and images.

document-search fingerprint image-deduplication image-search simhash

Last synced: 02 Apr 2025

https://github.com/php-lsys/simhash

simhash for php extension : 判断文本相似度

php simhash

Last synced: 15 Jan 2026

https://github.com/innernull/osimhash

A deduplication lib built Over [SIMHASH](https://github.com/yanyiwu/simhash).

cpp deduplication lsh nlp python python3 simhash simhash-algorithm

Last synced: 18 May 2026

https://github.com/lifefloating/contentcore

爬虫内容处理服务(自用)

bloomfilter python redis-queue simhash

Last synced: 25 May 2026

https://github.com/themankindproject/txtfp

Text fingerprinting: MinHash + LSH, SimHash, TLSH, ONNX semantic embeddings (BGE/E5/MiniLM), with byte-stable hash layouts and no_std + alloc default builds.

deduplication embeddings fingerprinting locality-sensitive-hashing lsh minhash near-duplicate no-std onnx rust sdk semantic-search simhash text-processing tlsh wasm

Last synced: 28 May 2026

https://github.com/justinfargnoli/simhash

A barebones implementation of the simhash data sketching algorithm.

data-sketches data-sketching go golang simhash

Last synced: 14 Jan 2026

https://github.com/manmolecular/history-fp

:feet: Create a behavioral fingerprint based on your zsh command line history

behavior python simhash similarity-search zsh

Last synced: 17 Apr 2026

https://github.com/nemosharma6/event-coding

event coding using spark and stanford-core-nlp

kafka petrarch simhash spark stanford-corenlp

Last synced: 07 Sep 2025

https://github.com/linyshdhhcb/bert-simhashhomeworkcheck-backend

基于 SimHash 与 BERT 的高校学生作业查重系统,通过结合 SimHash 算法和 BERT-Base-Chinese 模型、Vue3、Spring Boot3、EasyExcel、HanLP,实现智能查重。支持文件批量处理,历史作业比对,自动生成详细的 Excel 查重报告。集成 Jaccard、海明距离、Hash、余弦、图片和加权相似度算法,精准评估文件相似性。

bert mybatisplus simhash springboot3

Last synced: 16 Aug 2025

https://github.com/moe131/webcrawler

Python web crawler designed to scrape websites

crawler crawling-python python python-crawler scraping simhash web-crawler

Last synced: 09 Apr 2025

https://github.com/fpopic/avsp

(Class) Big Data Analysis Course Assignments

big-data bigdata message-passing pagerank-algorithm pcy simhash stream-processing

Last synced: 08 Jun 2026

https://github.com/luis-varona/shadowseek

A CLI tool for near-duplicate detection in text files, written in Rust with no dependencies on runtime environments.

duplicate-detection minhash near-duplicate-detection simhash text-classification

Last synced: 25 Jul 2025

https://github.com/majajuri/analiza-velikih-skupova-podataka

Implementacija algoritama predstavljenih na predmetu Analiza velikih skupova podataka (AVSP)

collaborative-filtering-algorithm dgim gna lsh-algorithm modularity node-ranking pcy simhash

Last synced: 05 Apr 2025

https://github.com/long-gong/datasets-e2h

Datasets Euclidean to Hamming Conversion

cpp datasets eigen3 euclidean2hamming hdf5 simhash

Last synced: 19 Jul 2025