Projects in Awesome Lists tagged with bm25
A curated list of projects in awesome lists tagged with bm25 .
https://github.com/manticoresoftware/manticoresearch
Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
api bm25 cpp database full-text-search hacktoberfest json mysql search search-api search-engine search-server sphinxsearch sql stream-filtering
Last synced: 13 May 2025
https://github.com/paradedb/paradedb
ParadeDB is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads.
aggregations analytics big-data bm25 database elasticsearch full-text-search htap hybrid-search mpp object-storage olap postgresql real-time-analytics similarity-search sparse-vector sql
Last synced: 13 May 2025
https://github.com/infiniflow/infinity
The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text
ai-native approximate-nearest-neighbor-search bm25 cpp20 cpp20-modules embedding full-text-search hnsw hybrid-search information-retrival nearest-neighbor-search rag search-engine tensor-database vector vector-database vector-search vectordatabase
Last synced: 12 May 2025
https://github.com/sylphai-inc/adalflow
AdalFlow: The library to build & auto-optimize LLM applications.
agent ai auto-prompting bm25 chatbot faiss framework generative-ai information-retrieval llm machine-learning nlp optimizer python question-answering rag reranker retriever summarization trainer
Last synced: 13 Dec 2025
https://github.com/SylphAI-Inc/AdalFlow
AdalFlow: The library to build & auto-optimize LLM applications.
agent ai bm25 chatbot faiss framework generative-ai information-retrieval llm machine-learning nlp optimizer python question-answering rag reranker retriever summarization trainer
Last synced: 06 May 2025
https://github.com/winkjs/wink-nlp
Developer friendly Natural Language Processing ✨
bm25 chatbot custom-entity-detection hacktoberfest named-entity-extraction natural-language-processing negation-handling ner nlp pattern-matching pos-tagging sbd sentence-boundary-detection sentiment-analysis tokenize vectorizer visualization wink wink-nlp word-vectors
Last synced: 13 May 2025
https://github.com/xhluca/bm25s
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
bm25 bm25-l bm25-plus information-retrieval lexical-search okapi-bm25 rag retrieval robertson search
Last synced: 14 May 2025
https://github.com/dorianbrown/rank_bm25
A Collection of BM25 Algorithms in Python
algorithm bm25 information-retrieval ranking
Last synced: 02 Apr 2025
https://github.com/shibing624/similarities
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
bm25 deep-learning faiss image-search image-similarity matching nlp pytorch search-engine similarity similarity-search text-matching
Last synced: 14 May 2025
https://github.com/AmenRa/retriv
A Python Search Engine for Humans 🥸
bm25 dense-retrieval hybrid-retrieval information-retrieval numba search search-engine search-engine-optimization semantic-search sparse-retrieval tf-idf
Last synced: 06 Aug 2025
https://github.com/SeekStorm/SeekStorm
SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rust
apache2 bm25 enterprise-search faceting full-text-search geosearch index lexical-search okapi-bm25 query realtime rust saas search search-engine search-server search-service sparse-retrieval
Last synced: 03 Aug 2025
https://github.com/brunoarine/org-similarity
Emacs package that helps org-mode users (re)discover similar documents
bm25 elisp emacs org-mode org-roam python semantic-similarity similarity-search tf-idf
Last synced: 09 May 2025
https://github.com/winkjs/wink-bm25-text-search
Fast Full Text Search based on BM25
bm25 bm25f full-text-search in-memory-search natural-language-processing nlp semantic-search tf-idf tfidf
Last synced: 20 Aug 2025
https://github.com/lightonai/ducksearch
Efficient BM25 with DuckDB 🦆
bm25 duckdb information-retrieval
Last synced: 26 Aug 2025
https://github.com/kwang2049/easy-elasticsearch
Using business-level retrieval system (BM25) with Python in just a few lines.
bm25 docker elasticsearch information-retrieval
Last synced: 24 Mar 2025
https://github.com/jankovicsandras/plpgsql_bm25
BM25 search implemented in PL/pgSQL
bm25 bm25okapi document-search okapi plpgsql postgres postgresql search text-search
Last synced: 23 Oct 2025
https://github.com/jankovicsandras/bm25opt
faster BM25 search algorithms in Python
bag-of-words bm25 bm25l bm25okapi bm25plus document-search python3 text-search
Last synced: 14 Apr 2025
https://github.com/brunoarine/findlike
Command-line tool that finds lexically similar documents in relation to a reference text file or ad-hoc query
bm25 nlp similarity-search tfidf
Last synced: 18 Jul 2025
https://github.com/searchivarius/accuratelucenebm25
Improving the effectiveness Lucene's BM25 (and testing it using Yahoo! Answers and Stack Overflow collections)
Last synced: 31 Jul 2025
https://github.com/samadpls/bestrag
BestRAG: A library for hybrid RAG, combining dense, sparse, and late interaction methods for efficient document storage and search.
best-rag bm25 embedding-vectors hybrid-rag llm opensource pypi-package qdrant rag retrival-augmented-generation
Last synced: 27 Oct 2025
https://github.com/fukurosan/jhaystack
A JavaScript search engine with zero dependencies.
bitap bm25 clustering fulltext fulltext-search fuzzy javascript nlp query search search-engine spellcheck tfidf typescript
Last synced: 12 Apr 2025
https://github.com/jbesomi/korono
👑Korono: question answering platform for COVID-19 papers
bm25 covid covid-19 covid19 qa question-answering search-engine
Last synced: 13 Apr 2025
https://github.com/tomfran/search-rs
Search engine written in Rust
bm25 boolean-retrieval htmx index-compression information-retrieval rust search-engine
Last synced: 23 Apr 2025
https://github.com/wittline/recommendation-system
Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)
bert bm25 nlp python recommender-system recsys text-analysis tf-idf word2vec
Last synced: 03 Jul 2025
https://github.com/kyr0/clientside-search
A highly efficient, isomorphic, full-featured, multilingual text search engine library, providing full-text search, fuzzy matching, phonetic scoring, document indexing and more, with micro JSON state hydration/dehydration in-browser and server-side.
bk-tree bm25 browser client-side damerau-levenshtein-distance document-indexing document-search full-text-search fuzzy-matching lucene multilingual nodejs phonetics search-engine state-hydration text-processing text-search tf-idf trie
Last synced: 14 Jul 2025
https://github.com/inspirateur/fast-bm25
a fast implementation of BM25
bm25 ranking search search-engine search-in-text
Last synced: 14 Apr 2025
https://github.com/nasrmohammad4804/search-engine-concept
this repo for learning search engine such as elk and web search engine concept such as google to grow knowledge of software engineering
bm25 crwaler elasticsearch etl-pipeline google inverted-index kafka kibana microservice mongodb ranking redis search-engine tf-idf
Last synced: 13 May 2025
https://github.com/ryogrid/anime-illust-image-searcher
Anime Style Illustration Specific Image Search App with ViT Tagger x BM25/Doc2Vec
anime anime-style bm25 deep-learning doc2vec gensim illustration image-search machine-learning onnxruntime python pytorch search-engine streamlit transformer vector-search vision-transformer
Last synced: 22 Apr 2025
https://github.com/raphaelsty/cherche-api
Deploy Cherche using FastAPI and Docker
bm25 docker fastapi neural-search question-answering summarization tfidf
Last synced: 25 Oct 2025
https://github.com/fanzeyi/torchic
A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.
Last synced: 28 Apr 2025
https://github.com/dbozhinovski/relatinator
A humble library for finding related posts and content. Uses tf-idf and BM25 under the hood. Primarily aimed at static site generators.
astro bm25 related-posts static-site tfidf
Last synced: 10 Apr 2025
https://github.com/kbeaugrand/semantickernel.rankers
A robust C# library for reranking search results using Semantic Kernel
ai bm25 gpt llm reranking semantickernel
Last synced: 10 Oct 2025
https://github.com/justinhsu1019/general-rag-template
Template for building a High-Accuracy Retrieval-Augmented Generation (RAG) pipelines with hybrid search (semantic + keyword), reranking, and LLM-based generation.
bm25 gpt high-accuracy langchain llm rag
Last synced: 11 Apr 2025
https://github.com/dcarpintero/github-semantic-search
Semantic Search on Langchain Github Issues with Weaviate
bm25 embedding-vectors hybrid-search langchain large-language-models python semantic-search streamlit weaviate
Last synced: 13 Apr 2025
https://github.com/ddayguerrero/spimi-indexer
Boolean retrieval search engine with SPIMI indexing and BM25 ranking
bm25 bs4 inverted-index okapi python3 reuters-corpus search spimi
Last synced: 16 Mar 2025
https://github.com/asifhaider/llm-finetuning-prompting-project
Python Project Sample for Demonstration
bm25 code-review few-shot-prompting gpt35turbo instruction-following llama3 qlora supervised-finetuning
Last synced: 11 Sep 2025
https://github.com/zabuzard/lexisearchexercises
LexiSearch is an API for retrieving information in given datasets.
api benchmark bm25 dataset edit-distance inverted-index prefix-search q-gram query search-algorithm search-api search-engine web-demo
Last synced: 20 Jun 2025
https://github.com/dnlzrgz/housaku
A powerful yet simple personal search engine built on top of SQLite's FTS5.
bm25 cli fts5 python search search-engine sqlite sqlite3
Last synced: 20 Mar 2025
https://github.com/evalops/congress-bill-search
High-quality congressional bill search with hybrid BM25+vector similarity using DuckDB, TEI embeddings, and GovInfo API. Local deployment with Docker.
bm25 congressional-bills docker duckdb embeddings govinfo-api hybrid-search reranking search-engine tei text-search vector-search
Last synced: 07 Oct 2025
https://github.com/lopinx/wechatmpcopilot
一个微信/网站发文自动化工具。该工具支持从关键词或标题生成文章内容,并通过 AI 模型(如 GPT)生成高质量的文章并进行发布和本地保存。
automation bm25 chatgpt keybert textrank tf-idf wechat weixin
Last synced: 11 Jun 2025
https://github.com/ev2900/bm25_search_example
Example to help understand how the BM25 term based ranking model works in search applications
bm25 python search similarity-search vector-search
Last synced: 04 Oct 2025
https://github.com/ryomendev/codequest
The Document Search Engine is a web application designed to facilitate efficient searching and retrieval of information from a collection of documents. It utilizes various natural language processing techniques to preprocess the documents, extract keywords, calculate term frequencies, and generate relevant search results based on user queries.
bm25 express-ejs natural node-js wink-lemmatizer
Last synced: 17 Mar 2025
https://github.com/morriz/indy-news
Streamlit app and FastAPI that powers Indy News assistant
ai bm25 independent-news media vector-database youtube
Last synced: 28 Feb 2025
https://github.com/sambhav/ir-system
An information retrieval system for a comparative analysis of TF-IDF and BM25 ranking mechanisms
bm25 comparative-analysis information-retrieval reddit scraper tf-idf whoosh
Last synced: 17 Mar 2025
https://github.com/panodata/sphinx-sql-backend
SQL backend for the Sphinx documentation generator. The focus is fulltext search (FTS), but there may be more. [WIP]
bm25 cratedb fts lucene sphinx-doc sphinx-extension sphinx-fts sphinx-search
Last synced: 23 Mar 2025
https://github.com/jesse-c/notes-app-hybrid-search
Make your notes from Notes.app searchable via hybrid search.
bm25 hybrid-search macos notes-app semantic-search vespa
Last synced: 24 Mar 2025
https://github.com/rid17pawar/semantic-search-model-experiments
Experiments in the field of Semantic Search using BM-25 Algorithm, Mean of Word Vectors, along with state of the art Transformer based models namely USE and SBERT.
bm25 fasttext fasttext-embeddings glove glove-embeddings information-retrieval sbert semantic-search universal-sentence-encoder word2vec word2vec-embeddinngs
Last synced: 17 Oct 2025
https://github.com/stefanoghinelli/salton
Information Retrieval class project, an IR system built upon a corpus of research papers. It ranks results using the BM25 function
bm25 information-retrieval nltk okapi python unimore-informatica whoosh
Last synced: 17 Oct 2025
https://github.com/cerno-ai/cerno-insight
High-performance RAG system for intelligent document Q&A with hybrid retrieval, GPU acceleration, and citation-backed answers. Upload docs, ask questions, get precise responses.
artificial-intelligence bm25 docker document-processing embeddings faiss fastapi llms local-first machine-learning natural-language-processing nextjs openai python rag rag-pipeline reranking retreival-augmented-generation semantic-search typescript
Last synced: 08 Nov 2025
https://github.com/farhanshoukat/information-retrieval
Parse HTML pages. Create inverted index. Search for pages
bm25 inverted-index inverter jelinek language-model okapi okapi-bm25 parser tf-idf
Last synced: 31 Mar 2025
https://github.com/kalifou/ri_tme1
Information retrieval - assignments for course at UPMC - Paris 6
bm25 evaluation-metrics hits-algorithm information-retrieval language-model language-modeling pagerank-algorithm python
Last synced: 29 Mar 2025
https://github.com/ambidextrous9/mtp-news-article-based-question-answering-system
MTP-FlanT5-SBERT-Model-for-NewsQA-and-Teacher-Student-Model
bm25 flan-t5 language-model newsqa nlp qa question-answering sbert transformer
Last synced: 10 Oct 2025
https://github.com/avishrantssh/pyranker
Python based package consisiting several Rankers for Information Retrieval
bm25 information-retrieval ranking search-engine tf-idf vectorspacemodel
Last synced: 12 Apr 2025
https://github.com/taha-kms/classmate-rag
a local, multilingual (EN/IT) study assistant that indexes course materials and answers questions with citations—using multilingual-e5-base for retrieval and Llama 3.1-8B for generation. CLI-only.
bm25 chromadb cli docker e5 information-retrieval llama3 llm rag retrieval-augmented-generation
Last synced: 08 Oct 2025
https://github.com/mina-faridi/document-ranking-with-galago
Galago related homeworks of Information Retrieval Course
bm25 document-ranking galago information-retrieval map ndcg pivoted-length-normalisation recall stemming tokenizing university-of-tehran
Last synced: 30 Dec 2025
https://github.com/rohith-2/bm25-fusion
An ultra-fast BM25 retriever with support for multiple variants and meta-data filtering.
bm25 information-retrieval keyword-search lexical-search metadata-filtering numba py-search python rag search sparse-search
Last synced: 14 Dec 2025
https://github.com/e1washere/production-rag-service
Production-grade RAG service demonstrating enterprise MLOps practices with hybrid search, comprehensive observability, and automated deployment pipelines.
ai azure bm25 embeddings faiss fastapi github-actions hybrid-search llm mlops observability rag redis terraform testing
Last synced: 14 Oct 2025
https://github.com/wizo17/contextual_rag_application
RAG Application with Contextual Retrieval and Lexical Retrieval.
bm25 bm25-okapi langchain mlflow-tracking openai-api python rag streamlit
Last synced: 16 Oct 2025
https://github.com/hockyy/qmedigle-fe
bm25 information medline retrieval topk
Last synced: 24 Oct 2025
https://github.com/jhaayush2004/reranking-in-rag
Reranking from scratch using sentence-transformer, BM25, Cohere and Cross-Encoders !!!
bm25 cohere cross crossencoder flashrerank nlp rag reranking sentence-transformers
Last synced: 21 Feb 2025
https://github.com/atinyshrimp/tripadvisor-recommendation-ml-nlp
Machine Learning and NLP models for improving text-based recommendations on TripAdvisor, using BM25, TF-IDF, embeddings, and a Hybrid approach.
bm25 data-science embeddings kaggle-dataset machine-learning nlp nlp-machine-learning python recommandation-system sentence-embeddings sentence-transformers text-similarity tripadvisor
Last synced: 04 Oct 2025
https://github.com/rrayhka/information-retrieval-bert-bm25
Search Engine untuk mengambil keputusan Mahkamah Agung Indonesia menggunakan BERT embedding dan model BM25.
bert-embeddings bm25 information-retrieval mahkamahagung nlp putusan search-engine
Last synced: 22 Mar 2025
https://github.com/vickshan001/imdb-search-engine-project
NLP-powered IMDb search engine with Flask backend using BM25 and TF-IDF for smart movie retrieval and ranking.
bm25 flask imdb information-retrieval movie-search nlp python react search-engine tf-idf
Last synced: 30 Mar 2025
https://github.com/ffreemt/similarity-matrix
Similarity matrix based on doc-term-scores from textacy
Last synced: 15 Mar 2025
https://github.com/armanjscript/hybrid-rag-chatbot
A powerful web-based application designed to answer questions based on the content of uploaded PDF documents. This project leverages a Hybrid Retrieval-Augmented Generation (RAG) approach, combining the strengths of vector-based semantic search and keyword-based search to deliver accurate and relevant responses
bm25 chroma chromadb ensemble-retriever hybrid-rag langchain langchain-ollama ollama ollama-embeddings pypdf qwen2-5 rag rag-chatbot streamlit
Last synced: 30 Dec 2025
https://github.com/oaklight/vectorsearch
Dockerized vector database based on pgvector, pgvectorscale and pg_search
bm25 pgsearch pgvector postgres semantic-search vector-database vector-search
Last synced: 24 Mar 2025
https://github.com/paradedb/blog
Blog posts for ParadeDB as .mdx, hosted on Mintlify
aggregations analytics big-data bm25 database datalake elasticsearch full-text-search htap hybrid-search mpp object-storage olap paradedb pgsearch postgres real-time-analytics self-hosted similarity-search sql
Last synced: 10 Jun 2025
https://github.com/ictup/enhancing-qa-systems-through-integrated-reasoning-over-knowledge-bases-and-large-language-models
KG-RAG + ToT + multi-agent LLMs for evidence-grounded QA with Neo4j and fine-tuning; reproducible medical case study & evaluation.
autogen bm25 fine-tuning knowledge-graph llm llm-ranking lora mindmap neo4j nli peft prompt-engineering question-answering rag reasoning self-consistency tree-of-thoughts
Last synced: 18 Sep 2025
https://github.com/waitingsong/paradedb
ParadeDb JavaScript Client Library
analytics bm25 elasticsearch fts full-text-search hnsw hybrid-search mpp paradedb pgvector postgres similarity-search
Last synced: 21 Apr 2025
https://github.com/maryamyazdi/news_ranking
Ranking retrieved news from several categories found related to a certain query by bm25 algorithm.
bm25 information-retrieval ranking
Last synced: 12 Mar 2025
https://github.com/louislefevre/information-retrieval-models
Ranks passages against queries using various models and techniques.
bm25 dirichlet-smoothing information-retrieval laplace-smoothing lidstone-smoothing query-likelihood tfidf vectorspace
Last synced: 30 Mar 2025
https://github.com/patelvivekdev/fast-bm25
BM25 (Okapi BM25) implementation in TypeScript with field boosting and parallel processing support.
Last synced: 22 Jul 2025
https://github.com/arnab-0053/song-identifier
It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.
bm25 nlp nltk python rank-bm25 scikit-learn song-lyrics spotify-dataset text-preprocessing
Last synced: 05 Mar 2025
https://github.com/griffio/sqldelight-bm25-module-app
SqlDelight module for VectorChord Bm25
bm25 kotlin postgersql sqldelight vectorchord
Last synced: 04 Sep 2025
https://github.com/ussarmy/wink
A tool to Win the week
aws bm25 chatbot codechef hackerearth materializecss python sbd spoj stopstalk timus tokenize web2py wink-nlp
Last synced: 28 Mar 2025
https://github.com/mohabdo21/hybridrec-contextenrichment
An advanced hybrid recommendation system that combines collaborative filtering and content-based filtering approaches, enhanced with temporal awareness and contextual personalization
als bm25 collaborative-filtering content-based-filtering context-aware-recommender-system cosine-similarity machine-learning matrix-factorization n-gram online-learning real-time-adaptation recommendation-system temporal-weighting tf-idf
Last synced: 09 Oct 2025
https://github.com/dnlzrgz/chercher
Chercher is a universal, extensible, and personal search engine.
bm25 cli search search-engine tui
Last synced: 19 Jul 2025