An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with bm25

A curated list of projects in awesome lists tagged with bm25 .

https://github.com/manticoresoftware/manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon

api bm25 cpp database full-text-search hacktoberfest json mysql search search-api search-engine search-server sphinxsearch sql stream-filtering

Last synced: 13 May 2025

https://github.com/paradedb/paradedb

ParadeDB is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads.

aggregations analytics big-data bm25 database elasticsearch full-text-search htap hybrid-search mpp object-storage olap postgresql real-time-analytics similarity-search sparse-vector sql

Last synced: 13 May 2025

https://github.com/infiniflow/infinity

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text

ai-native approximate-nearest-neighbor-search bm25 cpp20 cpp20-modules embedding full-text-search hnsw hybrid-search information-retrival nearest-neighbor-search rag search-engine tensor-database vector vector-database vector-search vectordatabase

Last synced: 12 May 2025

https://github.com/xhluca/bm25s

Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

bm25 bm25-l bm25-plus information-retrieval lexical-search okapi-bm25 rag retrieval robertson search

Last synced: 14 May 2025

https://github.com/dorianbrown/rank_bm25

A Collection of BM25 Algorithms in Python

algorithm bm25 information-retrieval ranking

Last synced: 02 Apr 2025

https://github.com/shibing624/similarities

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。

bm25 deep-learning faiss image-search image-similarity matching nlp pytorch search-engine similarity similarity-search text-matching

Last synced: 14 May 2025

https://github.com/brunoarine/org-similarity

Emacs package that helps org-mode users (re)discover similar documents

bm25 elisp emacs org-mode org-roam python semantic-similarity similarity-search tf-idf

Last synced: 09 May 2025

https://github.com/lightonai/ducksearch

Efficient BM25 with DuckDB 🦆

bm25 duckdb information-retrieval

Last synced: 26 Aug 2025

https://github.com/kwang2049/easy-elasticsearch

Using business-level retrieval system (BM25) with Python in just a few lines.

bm25 docker elasticsearch information-retrieval

Last synced: 24 Mar 2025

https://github.com/stephanj/bm25

A BM25 Java implementation using streams, stop words and stemming.

bm25 llm nlp rerank stemming

Last synced: 13 Oct 2025

https://github.com/brunoarine/findlike

Command-line tool that finds lexically similar documents in relation to a reference text file or ad-hoc query

bm25 nlp similarity-search tfidf

Last synced: 18 Jul 2025

https://github.com/searchivarius/accuratelucenebm25

Improving the effectiveness Lucene's BM25 (and testing it using Yahoo! Answers and Stack Overflow collections)

bm25 lucene

Last synced: 31 Jul 2025

https://github.com/logan-markewich/bm25-rs

Efficient BM25 indexing using rust

bm25 index indexing retrieval rust search

Last synced: 05 Sep 2025

https://github.com/samadpls/bestrag

BestRAG: A library for hybrid RAG, combining dense, sparse, and late interaction methods for efficient document storage and search.

best-rag bm25 embedding-vectors hybrid-rag llm opensource pypi-package qdrant rag retrival-augmented-generation

Last synced: 27 Oct 2025

https://github.com/jbesomi/korono

👑Korono: question answering platform for COVID-19 papers

bm25 covid covid-19 covid19 qa question-answering search-engine

Last synced: 13 Apr 2025

https://github.com/wittline/recommendation-system

Build a Content-Based Movie Recommender System (TF-IDF, BM25, BERT)

bert bm25 nlp python recommender-system recsys text-analysis tf-idf word2vec

Last synced: 03 Jul 2025

https://github.com/kyr0/clientside-search

A highly efficient, isomorphic, full-featured, multilingual text search engine library, providing full-text search, fuzzy matching, phonetic scoring, document indexing and more, with micro JSON state hydration/dehydration in-browser and server-side.

bk-tree bm25 browser client-side damerau-levenshtein-distance document-indexing document-search full-text-search fuzzy-matching lucene multilingual nodejs phonetics search-engine state-hydration text-processing text-search tf-idf trie

Last synced: 14 Jul 2025

https://github.com/inspirateur/fast-bm25

a fast implementation of BM25

bm25 ranking search search-engine search-in-text

Last synced: 14 Apr 2025

https://github.com/nasrmohammad4804/search-engine-concept

this repo for learning search engine such as elk and web search engine concept such as google to grow knowledge of software engineering

bm25 crwaler elasticsearch etl-pipeline google inverted-index kafka kibana microservice mongodb ranking redis search-engine tf-idf

Last synced: 13 May 2025

https://github.com/raphaelsty/cherche-api

Deploy Cherche using FastAPI and Docker

bm25 docker fastapi neural-search question-answering summarization tfidf

Last synced: 25 Oct 2025

https://github.com/fanzeyi/torchic

A generic search engine built using Go & Spring & Redis. Project for Google's CodeU event.

bm25 crawler search-engine

Last synced: 28 Apr 2025

https://github.com/dbozhinovski/relatinator

A humble library for finding related posts and content. Uses tf-idf and BM25 under the hood. Primarily aimed at static site generators.

astro bm25 related-posts static-site tfidf

Last synced: 10 Apr 2025

https://github.com/kbeaugrand/semantickernel.rankers

A robust C# library for reranking search results using Semantic Kernel

ai bm25 gpt llm reranking semantickernel

Last synced: 10 Oct 2025

https://github.com/justinhsu1019/general-rag-template

Template for building a High-Accuracy Retrieval-Augmented Generation (RAG) pipelines with hybrid search (semantic + keyword), reranking, and LLM-based generation.

bm25 gpt high-accuracy langchain llm rag

Last synced: 11 Apr 2025

https://github.com/ddayguerrero/spimi-indexer

Boolean retrieval search engine with SPIMI indexing and BM25 ranking

bm25 bs4 inverted-index okapi python3 reuters-corpus search spimi

Last synced: 16 Mar 2025

https://github.com/dnlzrgz/housaku

A powerful yet simple personal search engine built on top of SQLite's FTS5.

bm25 cli fts5 python search search-engine sqlite sqlite3

Last synced: 20 Mar 2025

https://github.com/evalops/congress-bill-search

High-quality congressional bill search with hybrid BM25+vector similarity using DuckDB, TEI embeddings, and GovInfo API. Local deployment with Docker.

bm25 congressional-bills docker duckdb embeddings govinfo-api hybrid-search reranking search-engine tei text-search vector-search

Last synced: 07 Oct 2025

https://github.com/lopinx/wechatmpcopilot

一个微信/网站发文自动化工具。该工具支持从关键词或标题生成文章内容,并通过 AI 模型(如 GPT)生成高质量的文章并进行发布和本地保存。

automation bm25 chatgpt keybert textrank tf-idf wechat weixin

Last synced: 11 Jun 2025

https://github.com/ev2900/bm25_search_example

Example to help understand how the BM25 term based ranking model works in search applications

bm25 python search similarity-search vector-search

Last synced: 04 Oct 2025

https://github.com/ariya/text-match

demo of text matching using BM25

bm25

Last synced: 04 Oct 2025

https://github.com/ryomendev/codequest

The Document Search Engine is a web application designed to facilitate efficient searching and retrieval of information from a collection of documents. It utilizes various natural language processing techniques to preprocess the documents, extract keywords, calculate term frequencies, and generate relevant search results based on user queries.

bm25 express-ejs natural node-js wink-lemmatizer

Last synced: 17 Mar 2025

https://github.com/morriz/indy-news

Streamlit app and FastAPI that powers Indy News assistant

ai bm25 independent-news media vector-database youtube

Last synced: 28 Feb 2025

https://github.com/sambhav/ir-system

An information retrieval system for a comparative analysis of TF-IDF and BM25 ranking mechanisms

bm25 comparative-analysis information-retrieval reddit scraper tf-idf whoosh

Last synced: 17 Mar 2025

https://github.com/panodata/sphinx-sql-backend

SQL backend for the Sphinx documentation generator. The focus is fulltext search (FTS), but there may be more. [WIP]

bm25 cratedb fts lucene sphinx-doc sphinx-extension sphinx-fts sphinx-search

Last synced: 23 Mar 2025

https://github.com/jesse-c/notes-app-hybrid-search

Make your notes from Notes.app searchable via hybrid search.

bm25 hybrid-search macos notes-app semantic-search vespa

Last synced: 24 Mar 2025

https://github.com/rid17pawar/semantic-search-model-experiments

Experiments in the field of Semantic Search using BM-25 Algorithm, Mean of Word Vectors, along with state of the art Transformer based models namely USE and SBERT.

bm25 fasttext fasttext-embeddings glove glove-embeddings information-retrieval sbert semantic-search universal-sentence-encoder word2vec word2vec-embeddinngs

Last synced: 17 Oct 2025

https://github.com/stefanoghinelli/salton

Information Retrieval class project, an IR system built upon a corpus of research papers. It ranks results using the BM25 function

bm25 information-retrieval nltk okapi python unimore-informatica whoosh

Last synced: 17 Oct 2025

https://github.com/cerno-ai/cerno-insight

High-performance RAG system for intelligent document Q&A with hybrid retrieval, GPU acceleration, and citation-backed answers. Upload docs, ask questions, get precise responses.

artificial-intelligence bm25 docker document-processing embeddings faiss fastapi llms local-first machine-learning natural-language-processing nextjs openai python rag rag-pipeline reranking retreival-augmented-generation semantic-search typescript

Last synced: 08 Nov 2025

https://github.com/farhanshoukat/information-retrieval

Parse HTML pages. Create inverted index. Search for pages

bm25 inverted-index inverter jelinek language-model okapi okapi-bm25 parser tf-idf

Last synced: 31 Mar 2025

https://github.com/kalifou/ri_tme1

Information retrieval - assignments for course at UPMC - Paris 6

bm25 evaluation-metrics hits-algorithm information-retrieval language-model language-modeling pagerank-algorithm python

Last synced: 29 Mar 2025

https://github.com/avishrantssh/pyranker

Python based package consisiting several Rankers for Information Retrieval

bm25 information-retrieval ranking search-engine tf-idf vectorspacemodel

Last synced: 12 Apr 2025

https://github.com/taha-kms/classmate-rag

a local, multilingual (EN/IT) study assistant that indexes course materials and answers questions with citations—using multilingual-e5-base for retrieval and Llama 3.1-8B for generation. CLI-only.

bm25 chromadb cli docker e5 information-retrieval llama3 llm rag retrieval-augmented-generation

Last synced: 08 Oct 2025

https://github.com/rohith-2/bm25-fusion

An ultra-fast BM25 retriever with support for multiple variants and meta-data filtering.

bm25 information-retrieval keyword-search lexical-search metadata-filtering numba py-search python rag search sparse-search

Last synced: 14 Dec 2025

https://github.com/e1washere/production-rag-service

Production-grade RAG service demonstrating enterprise MLOps practices with hybrid search, comprehensive observability, and automated deployment pipelines.

ai azure bm25 embeddings faiss fastapi github-actions hybrid-search llm mlops observability rag redis terraform testing

Last synced: 14 Oct 2025

https://github.com/wizo17/contextual_rag_application

RAG Application with Contextual Retrieval and Lexical Retrieval.

bm25 bm25-okapi langchain mlflow-tracking openai-api python rag streamlit

Last synced: 16 Oct 2025

https://github.com/jhaayush2004/reranking-in-rag

Reranking from scratch using sentence-transformer, BM25, Cohere and Cross-Encoders !!!

bm25 cohere cross crossencoder flashrerank nlp rag reranking sentence-transformers

Last synced: 21 Feb 2025

https://github.com/atinyshrimp/tripadvisor-recommendation-ml-nlp

Machine Learning and NLP models for improving text-based recommendations on TripAdvisor, using BM25, TF-IDF, embeddings, and a Hybrid approach.

bm25 data-science embeddings kaggle-dataset machine-learning nlp nlp-machine-learning python recommandation-system sentence-embeddings sentence-transformers text-similarity tripadvisor

Last synced: 04 Oct 2025

https://github.com/rrayhka/information-retrieval-bert-bm25

Search Engine untuk mengambil keputusan Mahkamah Agung Indonesia menggunakan BERT embedding dan model BM25.

bert-embeddings bm25 information-retrieval mahkamahagung nlp putusan search-engine

Last synced: 22 Mar 2025

https://github.com/vickshan001/imdb-search-engine-project

NLP-powered IMDb search engine with Flask backend using BM25 and TF-IDF for smart movie retrieval and ranking.

bm25 flask imdb information-retrieval movie-search nlp python react search-engine tf-idf

Last synced: 30 Mar 2025

https://github.com/ffreemt/similarity-matrix

Similarity matrix based on doc-term-scores from textacy

bm25 nlp textacy tfidf

Last synced: 15 Mar 2025

https://github.com/armanjscript/hybrid-rag-chatbot

A powerful web-based application designed to answer questions based on the content of uploaded PDF documents. This project leverages a Hybrid Retrieval-Augmented Generation (RAG) approach, combining the strengths of vector-based semantic search and keyword-based search to deliver accurate and relevant responses

bm25 chroma chromadb ensemble-retriever hybrid-rag langchain langchain-ollama ollama ollama-embeddings pypdf qwen2-5 rag rag-chatbot streamlit

Last synced: 30 Dec 2025

https://github.com/oaklight/vectorsearch

Dockerized vector database based on pgvector, pgvectorscale and pg_search

bm25 pgsearch pgvector postgres semantic-search vector-database vector-search

Last synced: 24 Mar 2025

https://github.com/maryamyazdi/news_ranking

Ranking retrieved news from several categories found related to a certain query by bm25 algorithm.

bm25 information-retrieval ranking

Last synced: 12 Mar 2025

https://github.com/patelvivekdev/fast-bm25

BM25 (Okapi BM25) implementation in TypeScript with field boosting and parallel processing support.

bm25 bm25-okapi fast-bm25

Last synced: 22 Jul 2025

https://github.com/arnab-0053/song-identifier

It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.

bm25 nlp nltk python rank-bm25 scikit-learn song-lyrics spotify-dataset text-preprocessing

Last synced: 05 Mar 2025

https://github.com/griffio/sqldelight-bm25-module-app

SqlDelight module for VectorChord Bm25

bm25 kotlin postgersql sqldelight vectorchord

Last synced: 04 Sep 2025

https://github.com/mohabdo21/hybridrec-contextenrichment

An advanced hybrid recommendation system that combines collaborative filtering and content-based filtering approaches, enhanced with temporal awareness and contextual personalization

als bm25 collaborative-filtering content-based-filtering context-aware-recommender-system cosine-similarity machine-learning matrix-factorization n-gram online-learning real-time-adaptation recommendation-system temporal-weighting tf-idf

Last synced: 09 Oct 2025

https://github.com/dnlzrgz/chercher

Chercher is a universal, extensible, and personal search engine.

bm25 cli search search-engine tui

Last synced: 19 Jul 2025