Projects in Awesome Lists tagged with simhash
A curated list of projects in awesome lists tagged with simhash .
https://github.com/james-bowman/nlp
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
feature-hash go golang latent-dirichlet-allocation latent-semantic-analysis latent-semantic-indexing lda locality-sensitive-hashing lsa lsh lsi machine-learning natural-language-processing nlp random-indexing random-projections simhash singular-value-decomposition svd tf-idf
Last synced: 23 Jan 2026
https://github.com/sean-public/python-hashes
Interesting (non-cryptographic) hashes implemented in pure Python.
bloom-filter duplicates geohashes hash hash-functions hashes-implemented nilsimsa python python3 simhash
Last synced: 17 Jan 2026
https://github.com/sing1ee/simhash-java
A simple implementation of simhash algorithm by java.
Last synced: 07 May 2025
https://github.com/dynatrace-oss/hash4j
Dynatrace hash library for Java
cardinality-estimation consistent-hashing data-sketches farmhash hash hash-algorithm hash-functions hashing-algorithm hyperloglog imohash java jumphash minhash murmur3 non-cryptographic-hash-functions simhash streaming-algorithms superminhash wyhash xxh3
Last synced: 13 Apr 2025
https://github.com/serega/gaoya
Locality Sensitive Hashing
locality-sensitive-hashing lsh minhash search simhash similarity
Last synced: 17 Mar 2026
https://github.com/vkandy/simhash-js
Simhash implementation in Javascript
hash-functions simhash similarity-score
Last synced: 13 Mar 2026
https://github.com/holsee/spirit_fingers
Elixir SimHash NIFs written in Rust
Last synced: 10 Oct 2025
https://github.com/haoyuhu/gosimhash
A simhasher for Chinese documents implemented by golang, simply translated from yanyiwu/gosimhash
Last synced: 17 Mar 2026
https://github.com/marcnuth/deduplication
Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
algorithms cv deduplication google imagehash shingling simhash
Last synced: 14 May 2025
https://github.com/preciz/similarity
A library for cosine similarity & simhash calculation
cosine-similarity elixir simhash vector
Last synced: 24 Jul 2025
https://github.com/oduwsdl/off-topic-memento-toolkit
This system evaluates a collection of mementos (archived web pages) to determine which are off topic. The collection can be part of an Archive-It collection, a single TimeMap, or stored in a WARC file.
cosine measure memento simhash timemap topic warc
Last synced: 09 Oct 2025
https://github.com/xenia101/keystroke-dynamics
⌨️ User Verification based on Keystroke Dynamics / Two-factor Authentication technology based on Key-Stroke
cross-validation k-means k-means-clustering keystore keystroke-dynamics knn machine-learning python3 simhash user-verification
Last synced: 09 Oct 2025
https://github.com/shangri-la-0428/thronglets
P2P shared memory substrate for AI agents — stigmergic knowledge network via libp2p
ai-agents collective-intelligence decentralized libp2p mcp-server model-context-protocol p2p rust simhash stigmergy
Last synced: 07 Apr 2026
https://github.com/hengfeiyang/simhash
a Golang implementation of Simhash Algorithm
Last synced: 04 Aug 2025
https://github.com/smarthi/pymuvera
Python library for MUVERA multi-vector retrieval via Fixed Dimensional Encodings. ColBERT / ColQwen2 / ColQwen3.5 compatible.
approximate-nearest-neighbor approximate-nearest-neighbor-search colbert colqwen2 embeddings late-interaction multi-vector-retrieval muvera rag simhash
Last synced: 12 Jun 2026
https://github.com/qyokizzzz/simhash
The extended version of simhash supports fingerprint extraction of documents and images.
document-search fingerprint image-deduplication image-search simhash
Last synced: 02 Apr 2025
https://github.com/innernull/osimhash
A deduplication lib built Over [SIMHASH](https://github.com/yanyiwu/simhash).
cpp deduplication lsh nlp python python3 simhash simhash-algorithm
Last synced: 18 May 2026
https://github.com/luozijun/rust-jieba
Rust jieba
hamming-distance hidden-markov-model hmm jaccard jieba minhash mmseg simhash similarity
Last synced: 12 May 2026
https://github.com/lifefloating/contentcore
爬虫内容处理服务(自用)
bloomfilter python redis-queue simhash
Last synced: 25 May 2026
https://github.com/sskender/analysis-of-massive-datasets
Analysis of Massive Datasets FER labs
big-data data-flow data-flows frequency-analysis graph-algorithms graph-theory map-reduce mapreduce minhash node-ranking page-rank page-ranking recommendation-system recommender-system simhash similarity-search
Last synced: 18 Aug 2025
https://github.com/themankindproject/txtfp
Text fingerprinting: MinHash + LSH, SimHash, TLSH, ONNX semantic embeddings (BGE/E5/MiniLM), with byte-stable hash layouts and no_std + alloc default builds.
deduplication embeddings fingerprinting locality-sensitive-hashing lsh minhash near-duplicate no-std onnx rust sdk semantic-search simhash text-processing tlsh wasm
Last synced: 28 May 2026
https://github.com/justinfargnoli/simhash
A barebones implementation of the simhash data sketching algorithm.
data-sketches data-sketching go golang simhash
Last synced: 14 Jan 2026
https://github.com/manmolecular/history-fp
:feet: Create a behavioral fingerprint based on your zsh command line history
behavior python simhash similarity-search zsh
Last synced: 17 Apr 2026
https://github.com/xenia101/illegal-copyright-detection-system-web-
Illegal Copyright Detection System WEB
api csv detection flask html illegal-copyright pandas pickle python rest-api restful-api simhash web
Last synced: 01 May 2026
https://github.com/nemosharma6/event-coding
event coding using spark and stanford-core-nlp
kafka petrarch simhash spark stanford-corenlp
Last synced: 07 Sep 2025
https://github.com/linyshdhhcb/bert-simhashhomeworkcheck-backend
基于 SimHash 与 BERT 的高校学生作业查重系统,通过结合 SimHash 算法和 BERT-Base-Chinese 模型、Vue3、Spring Boot3、EasyExcel、HanLP,实现智能查重。支持文件批量处理,历史作业比对,自动生成详细的 Excel 查重报告。集成 Jaccard、海明距离、Hash、余弦、图片和加权相似度算法,精准评估文件相似性。
bert mybatisplus simhash springboot3
Last synced: 16 Aug 2025
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 09 Apr 2025
https://github.com/fpopic/avsp
(Class) Big Data Analysis Course Assignments
big-data bigdata message-passing pagerank-algorithm pcy simhash stream-processing
Last synced: 08 Jun 2026
https://github.com/luis-varona/shadowseek
A CLI tool for near-duplicate detection in text files, written in Rust with no dependencies on runtime environments.
duplicate-detection minhash near-duplicate-detection simhash text-classification
Last synced: 25 Jul 2025
https://github.com/majajuri/analiza-velikih-skupova-podataka
Implementacija algoritama predstavljenih na predmetu Analiza velikih skupova podataka (AVSP)
collaborative-filtering-algorithm dgim gna lsh-algorithm modularity node-ranking pcy simhash
Last synced: 05 Apr 2025
https://github.com/long-gong/datasets-e2h
Datasets Euclidean to Hamming Conversion
cpp datasets eigen3 euclidean2hamming hdf5 simhash
Last synced: 19 Jul 2025