An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with document-search

A curated list of projects in awesome lists tagged with document-search .

https://github.com/deepsense-ai/ragbits

Building blocks for rapid development of GenAI applications

agents document-search evaluation guardrails llms optimization prompts rag vector-stores

Last synced: 25 Feb 2026

https://github.com/neuml/paperai

📄 🤖 Semantic search and workflows for medical/scientific papers

ai artificial-intelligence document-search machine-learning medical nlp python scientific-papers search txtai

Last synced: 31 Oct 2025

https://github.com/dtsola/xiaoyaosearch

小遥搜索,听懂你的话、看懂你的图,用AI找到本地任何文件。让搜索像聊天一样简单。XiaoyaoSearch: Understands your words, reads your images, finds any local file with AI. Making search as easy as chatting.

agent-skills ai-search document-search file-search local-search mcp multimodal-ai natural-language productivity semantic-search

Last synced: 01 Apr 2026

https://github.com/redis-developer/redis-arxiv-search

Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.

arxiv arxiv-papers cohere document-retrieval document-search huggingface machine-learning nlp openai react redis vector-database vector-search

Last synced: 17 Sep 2025

https://github.com/robindekoster/chatgpt-custom-knowledge-chatbot

This open source chatbot project lets you create a chatbot that uses your own data to answer questions, thanks to the power of the OpenAI GPT-3.5 model.

ai chatbot chatgpt chatgpt-api contextual-chatbot document-search gpt knowledge-base llama-index machine-learning openai openai-chatgpt python python3

Last synced: 18 Jul 2025

https://github.com/poloclub/mememo

A JavaScript library that brings vector search and RAG to your browser!

browser document-search gen-ai image-search llm rag vector-database vector-search

Last synced: 13 May 2025

https://github.com/flamehaven01/flamehaven-filesearch

Self-hosted RAG search engine — 34 formats, BM25+hybrid search, multi-LLM (Gemini/OpenAI/Claude/Ollama), FastAPI + Docker, production-ready in 3 min

bm25 crewai docker document-parsing document-search fastapi haystack hybrid-search knowledge-base langchain llamaindex llm ollama open-source python rag self-hosted semantic-search vector-search

Last synced: 23 Apr 2026

https://github.com/capjamesg/jamesql

An in-memory NoSQL database implemented in Python.

document-search nosql nosql-database python web-search

Last synced: 05 Apr 2025

https://github.com/kcubeterm/achoz

Search through all your personal data efficiently like web search.

crawler document-search filesearch search-engine websearch

Last synced: 21 Aug 2025

https://github.com/gmickel/gno

Local AI-powered document search and editing with first-in-class hybrid retrieval, LLM answers, WebUI, REST API and MCP support for AI clients.

ai-assistant bun cli code-search document-search embeddings knowledge-base llm local-first mcp offline pkm rag second-brain semantic-search typescript vector-search

Last synced: 19 Apr 2026

https://github.com/daac-tools/find-simdoc

Finding all pairs of similar documents time- and memory-efficiently

all-pairs document-search rust similarity-search

Last synced: 22 Jun 2025

https://github.com/teilomillet/raggo

A lightweight, production-ready RAG (Retrieval Augmented Generation) library in Go.

ai chromadb document-search embeddings golang llm milvus openai question-answering rag retrieval-augmented-generation vector-database vector-search

Last synced: 11 Apr 2025

https://github.com/easonlai/chatbot_with_pdf_streamlit

This code example shows how to make a chatbot for semantic search over documents using Streamlit, LangChain, and various vector databases. The chatbot lets users ask questions and get answers from a document collection. The code is in Python and can be customized for different scenarios and data.

azure azure-cognitive-search azure-openai chroma document-search embedding-models gpt-3 gpt-35-turbo langchain langchain-python openai pinecone python semantic-search streamlit vector-database vector-search vector-similarity

Last synced: 26 Apr 2025

https://github.com/kyr0/clientside-search

A highly efficient, isomorphic, full-featured, multilingual text search engine library, providing full-text search, fuzzy matching, phonetic scoring, document indexing and more, with micro JSON state hydration/dehydration in-browser and server-side.

bk-tree bm25 browser client-side damerau-levenshtein-distance document-indexing document-search full-text-search fuzzy-matching lucene multilingual nodejs phonetics search-engine state-hydration text-processing text-search tf-idf trie

Last synced: 14 Jul 2025

https://github.com/lekt9/albert-launcher

AI-powered file launcher and semantic search assistant. Like Spotlight/Alfred but with advanced AI capabilities for understanding context and meaning. Features local processing, privacy-first design, and seamless integration with your workflow.

ai alfred desktop document-search electron llm local local-ai macos ollama privacy productivity-tool search-engine semantic-search spotlight typescript vector-search

Last synced: 08 May 2026

https://github.com/chrisryugj/docufinder

파일을 찾지 말고, 내용을 찾으세요. HWPX · PDF · Office 수천 건의 본문을 1초 만에. 100% 로컬 · 완전 오프라인. (Tauri · React · Rust)

desktop-app document-search fts5 hwpx korean korean-nlp offline-first onnx rag react rust tauri

Last synced: 13 May 2026

https://github.com/lethalbit/bookwurm

dead simple document index and search, nothing fancy

document-indexing document-search

Last synced: 24 Feb 2026

https://github.com/gsidhu/buzee-releases

Public releases for Buzee

desktop-app document-search tauri-app

Last synced: 07 Mar 2026

https://github.com/mdietrichstein/ir-search-engine-rust

Rust-based text search engine from scratch supporting multiple document similarity metrics (TF-IDF, BM25, BM25VA)

document-search document-similarity information-retrieval nlp rust search search-engine

Last synced: 10 Apr 2025

https://github.com/aimaster-dev/smartrag

SmartRAG is a terminal-based RAG system using LangGraph. It processes queries by retrieving relevant content from markdown or PDFs, then responds using OpenAI GPT. Supports webpage-to-PDF conversion, vector DB search, and modular flow control.

ai automation chatbot cli document-search gpt knowledge-base langchain langgraph markdown nlp openai pdf python query rag retrieval terminal vector-database web-scraping

Last synced: 09 Apr 2026

https://github.com/oxdev03/node-tantivy-binding

Node.js bindings for Tantivy. Provides indexing, querying, and advanced search features with TypeScript support.

document-search indexing lucene napi-rs nodejs search tantivy typescript

Last synced: 20 Oct 2025

https://github.com/qyokizzzz/simhash

The extended version of simhash supports fingerprint extraction of documents and images.

document-search fingerprint image-deduplication image-search simhash

Last synced: 02 Apr 2025

https://github.com/hallelx2/vectorless-engine

A retrieval engine that reasons over document structure — not embeddings. No chunking, no top-K, no vector DB.

agent ai anthropic claude document-search gemini go golang llm mcp openai rag rag-alternative retrieval structured-retrieval vectorless

Last synced: 30 May 2026

https://github.com/goodguyady/querybaseai

AI-powered hybrid search engine combining keyword, vector, and LLM-based contextual search using RAG with support for AI21, OpenAI or any other LLMs.

ai django django-rest-framework document-search elasticsearch llm milvus nlp rag vector-search

Last synced: 25 Feb 2026

https://github.com/mishraanuraagx/chatq

Local Retrieval-Augmented Generation (RAG) system built with FastAPI, integrating vector search, Elasticsearch, and optional web search to power LLM-based intelligent question answering using models like Mistral or GPT-4.

ai chatbot document-search elasticsearch fastapi knowledge-retrieval langchain llm local-llm mistral ollama rag retrieval-augmented-generation semantic-search vector-database

Last synced: 16 Feb 2026

https://github.com/jumjumiasbullah-08/kms-v01

📚 Knowledge Management System (KMS) - Document Management Based Sebuah aplikasi berbasis web untuk mengelola, menyimpan, dan mencari dokumen secara efisien menggunakan PHP murni dan MySQL.

cms content-management-system document-management document-search file-storage information-management-system knowledge-management-system mysql php web-application

Last synced: 16 May 2026

https://github.com/shrimpy8/semantic-search-next

A full-stack RAG (Retrieval Augmented Generation) application with hybrid search, cross-encoder reranking, citation-verified AI answers, and LLM-as-Judge evaluation. Supports multiple AI providers including OpenAI, Anthropic, and Ollama for fully local operation.

ai-answers anthropic chromadb citation-validation document-search embeddings fastapi hybrid-search jina llm natural-language-processing nextjs ollama openai postgresql python rag retrieval-augmented-generation semantic-search vector-database

Last synced: 05 May 2026

https://github.com/praarishtech/support-copilot-public

Slack-integrated assistant for Jira and Confluence. Word-based search across spaces — without leaving Slack. Available for exclusive buyout.

assistant buyout confluence document-search enterprise exclusive jira search single-tenant slack word-search

Last synced: 02 Nov 2025

https://github.com/gwicho38/legal-workspace-mcp

MCP server that gives Claude self-updating access to a local document directory. TF-IDF search, live file watching, PDF/DOCX/MD support. Built for legal workflows, works with any documents.

anthropic claude claude-code document-search legal legal-tech mcp mcp-server model-context-protocol tfidf

Last synced: 02 Mar 2026

https://github.com/ckrough/retriever

AI-powered document Q&A using RAG (Retrieval-Augmented Generation). Built with FastAPI, Claude, and Chroma for accurate, cited answers.

chroma claude document-search fastapi knowledge-base llm python question-answering rag retrieval-augmented-generation self-service semantic-search vector-database

Last synced: 01 Apr 2026

https://github.com/krisluczka/osse

Open Source Search Engine with built-in web/document crawler and an indexing method.

cpp document-indexing document-search document-searching indexing-engine search-engine web-crawler web-crawling web-indexer web-indexing

Last synced: 15 Apr 2025

https://github.com/samjoesilvano/multi-source-knowledge-retrieval-system

An end-to-end multi-source knowledge retrieval system using LangChain, FAISS, and OpenAI embeddings. This Retrieval-Augmented Generation (RAG) pipeline intelligently searches across Wikipedia, arXiv, and custom websites, optimizing source selection and delivering precise, real-time results based on query relevance.

ai-pipeline document-search faiss information-retrieval knowledge-retrieval langchain langchain-agents langchain-tools machine-learning multi-source-retrieval natural-language-processing openai-embeddings python retrieval-augmented-generation semantic-search

Last synced: 10 May 2026

https://github.com/nishit00/document-qa-rag-system

📄 Transform documents into interactive AI conversations with ease, creating a searchable knowledge base for efficient information retrieval.

ai docker document-search faiss faiss-vector-database groq langchain llama3 llm nlp pdf python question-answering questions-and-answers rag rag-chatbot streamlit vector-store

Last synced: 12 Apr 2026