Projects in Awesome Lists tagged with document-search
A curated list of projects in awesome lists tagged with document-search .
https://github.com/deepsense-ai/ragbits
Building blocks for rapid development of GenAI applications
agents document-search evaluation guardrails llms optimization prompts rag vector-stores
Last synced: 25 Feb 2026
https://github.com/neuml/paperai
📄 🤖 Semantic search and workflows for medical/scientific papers
ai artificial-intelligence document-search machine-learning medical nlp python scientific-papers search txtai
Last synced: 31 Oct 2025
https://github.com/dtsola/xiaoyaosearch
小遥搜索,听懂你的话、看懂你的图,用AI找到本地任何文件。让搜索像聊天一样简单。XiaoyaoSearch: Understands your words, reads your images, finds any local file with AI. Making search as easy as chatting.
agent-skills ai-search document-search file-search local-search mcp multimodal-ai natural-language productivity semantic-search
Last synced: 01 Apr 2026
https://github.com/infinilabs/coco-app
🥥 Coco AI App - Search, Connect, Collaborate, Your Personal AI Search and Assistant, all in one space.
ai-search ai-search-engine assistant cmd-k deepseek desktop-search document-search enterprise-search image-search launcher lightweight raycast searchbox spotlight tauri-app unified-search video-search workspace-search
Last synced: 16 May 2025
https://github.com/redis-developer/redis-arxiv-search
Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.
arxiv arxiv-papers cohere document-retrieval document-search huggingface machine-learning nlp openai react redis vector-database vector-search
Last synced: 17 Sep 2025
https://github.com/robindekoster/chatgpt-custom-knowledge-chatbot
This open source chatbot project lets you create a chatbot that uses your own data to answer questions, thanks to the power of the OpenAI GPT-3.5 model.
ai chatbot chatgpt chatgpt-api contextual-chatbot document-search gpt knowledge-base llama-index machine-learning openai openai-chatgpt python python3
Last synced: 18 Jul 2025
https://github.com/poloclub/mememo
A JavaScript library that brings vector search and RAG to your browser!
browser document-search gen-ai image-search llm rag vector-database vector-search
Last synced: 13 May 2025
https://github.com/flamehaven01/flamehaven-filesearch
Self-hosted RAG search engine — 34 formats, BM25+hybrid search, multi-LLM (Gemini/OpenAI/Claude/Ollama), FastAPI + Docker, production-ready in 3 min
bm25 crewai docker document-parsing document-search fastapi haystack hybrid-search knowledge-base langchain llamaindex llm ollama open-source python rag self-hosted semantic-search vector-search
Last synced: 23 Apr 2026
https://github.com/capjamesg/jamesql
An in-memory NoSQL database implemented in Python.
document-search nosql nosql-database python web-search
Last synced: 05 Apr 2025
https://github.com/kcubeterm/achoz
Search through all your personal data efficiently like web search.
crawler document-search filesearch search-engine websearch
Last synced: 21 Aug 2025
https://github.com/gmickel/gno
Local AI-powered document search and editing with first-in-class hybrid retrieval, LLM answers, WebUI, REST API and MCP support for AI clients.
ai-assistant bun cli code-search document-search embeddings knowledge-base llm local-first mcp offline pkm rag second-brain semantic-search typescript vector-search
Last synced: 19 Apr 2026
https://github.com/daac-tools/find-simdoc
Finding all pairs of similar documents time- and memory-efficiently
all-pairs document-search rust similarity-search
Last synced: 22 Jun 2025
https://github.com/infinilabs/coco-server
🥥 Coco AI Server - Search, Connect, Collaborate, AI-powered enterprise search, all in one space.
ai-assistant ai-search cloud-search collaboration desktop-search document-search enterprise-assistant enterprise-search glean-opensource-alternative launcher rag raycast semantic-search workspace-search
Last synced: 22 Apr 2025
https://github.com/zayedrais/documentsearchengine
Document Search Engine project with TF-IDF abd Google universal sentence encoder model
data-science deep-learning document-search document-similarity juypter machine-learning python python-text-analysis semantic-search semantic-search-engine tensorflow tensorflow-models tensorflow-tutorials text-analysis text-search text-semantic-similarity tfidf tfidf-text-analysis tfidf-vectorizer universal-sentence-encoder
Last synced: 02 May 2025
https://github.com/teilomillet/raggo
A lightweight, production-ready RAG (Retrieval Augmented Generation) library in Go.
ai chromadb document-search embeddings golang llm milvus openai question-answering rag retrieval-augmented-generation vector-database vector-search
Last synced: 11 Apr 2025
https://github.com/jankovicsandras/plpgsql_bm25
BM25 search implemented in PL/pgSQL
bm25 bm25okapi document-search okapi plpgsql postgres postgresql search text-search
Last synced: 23 Oct 2025
https://github.com/jankovicsandras/bm25opt
faster BM25 search algorithms in Python
bag-of-words bm25 bm25l bm25okapi bm25plus document-search python3 text-search
Last synced: 14 Apr 2025
https://github.com/easonlai/chatbot_with_pdf_streamlit
This code example shows how to make a chatbot for semantic search over documents using Streamlit, LangChain, and various vector databases. The chatbot lets users ask questions and get answers from a document collection. The code is in Python and can be customized for different scenarios and data.
azure azure-cognitive-search azure-openai chroma document-search embedding-models gpt-3 gpt-35-turbo langchain langchain-python openai pinecone python semantic-search streamlit vector-database vector-search vector-similarity
Last synced: 26 Apr 2025
https://github.com/kyr0/clientside-search
A highly efficient, isomorphic, full-featured, multilingual text search engine library, providing full-text search, fuzzy matching, phonetic scoring, document indexing and more, with micro JSON state hydration/dehydration in-browser and server-side.
bk-tree bm25 browser client-side damerau-levenshtein-distance document-indexing document-search full-text-search fuzzy-matching lucene multilingual nodejs phonetics search-engine state-hydration text-processing text-search tf-idf trie
Last synced: 14 Jul 2025
https://github.com/lekt9/albert-launcher
AI-powered file launcher and semantic search assistant. Like Spotlight/Alfred but with advanced AI capabilities for understanding context and meaning. Features local processing, privacy-first design, and seamless integration with your workflow.
ai alfred desktop document-search electron llm local local-ai macos ollama privacy productivity-tool search-engine semantic-search spotlight typescript vector-search
Last synced: 08 May 2026
https://github.com/chrisryugj/docufinder
파일을 찾지 말고, 내용을 찾으세요. HWPX · PDF · Office 수천 건의 본문을 1초 만에. 100% 로컬 · 완전 오프라인. (Tauri · React · Rust)
desktop-app document-search fts5 hwpx korean korean-nlp offline-first onnx rag react rust tauri
Last synced: 13 May 2026
https://github.com/lethalbit/bookwurm
dead simple document index and search, nothing fancy
document-indexing document-search
Last synced: 24 Feb 2026
https://github.com/gsidhu/buzee-releases
Public releases for Buzee
desktop-app document-search tauri-app
Last synced: 07 Mar 2026
https://github.com/bent10/boox
Search anything, instantly
boox document-search full-text-search fulltext-search fuzzy-matching fuzzy-search instantsearch inverted-index nlp search search-engine search-index tf-idf tfidf vector-search vector-space-model
Last synced: 10 Apr 2025
https://github.com/mdietrichstein/ir-search-engine-rust
Rust-based text search engine from scratch supporting multiple document similarity metrics (TF-IDF, BM25, BM25VA)
document-search document-similarity information-retrieval nlp rust search search-engine
Last synced: 10 Apr 2025
https://github.com/aimaster-dev/smartrag
SmartRAG is a terminal-based RAG system using LangGraph. It processes queries by retrieving relevant content from markdown or PDFs, then responds using OpenAI GPT. Supports webpage-to-PDF conversion, vector DB search, and modular flow control.
ai automation chatbot cli document-search gpt knowledge-base langchain langgraph markdown nlp openai pdf python query rag retrieval terminal vector-database web-scraping
Last synced: 09 Apr 2026
https://github.com/oxdev03/node-tantivy-binding
Node.js bindings for Tantivy. Provides indexing, querying, and advanced search features with TypeScript support.
document-search indexing lucene napi-rs nodejs search tantivy typescript
Last synced: 20 Oct 2025
https://github.com/opengento/magento2-document-search
This module aims to make documents searchable for customers in Magento 2.
document-management document-search magento magento-2 magento-extension magento-module magento-search magento2 magento2-extension magento2-extension-free magento2-module php search
Last synced: 23 Apr 2025
https://github.com/tomlin7/ai-research-assistant
Semantic document search system with pgvector and PGAI
ai assistant document-search machine-learning natural-language-processing ollama pgai pgvector postgres postgresql research-assistant semantic-search sentence-embeddings sentence-transformers sentiment-analysis summarization text-similarity text-summarization
Last synced: 19 Feb 2026
https://github.com/qyokizzzz/simhash
The extended version of simhash supports fingerprint extraction of documents and images.
document-search fingerprint image-deduplication image-search simhash
Last synced: 02 Apr 2025
https://github.com/opengento/magento2-document-product-search
This module aims to make documents searchable with product keywords in Magento 2.
document-management document-search magento magento-2 magento-extension magento-module magento-search magento2 magento2-extension magento2-extension-free magento2-module php product-search
Last synced: 23 Apr 2025
https://github.com/hallelx2/vectorless-engine
A retrieval engine that reasons over document structure — not embeddings. No chunking, no top-K, no vector DB.
agent ai anthropic claude document-search gemini go golang llm mcp openai rag rag-alternative retrieval structured-retrieval vectorless
Last synced: 30 May 2026
https://github.com/goodguyady/querybaseai
AI-powered hybrid search engine combining keyword, vector, and LLM-based contextual search using RAG with support for AI21, OpenAI or any other LLMs.
ai django django-rest-framework document-search elasticsearch llm milvus nlp rag vector-search
Last synced: 25 Feb 2026
https://github.com/mishraanuraagx/chatq
Local Retrieval-Augmented Generation (RAG) system built with FastAPI, integrating vector search, Elasticsearch, and optional web search to power LLM-based intelligent question answering using models like Mistral or GPT-4.
ai chatbot document-search elasticsearch fastapi knowledge-retrieval langchain llm local-llm mistral ollama rag retrieval-augmented-generation semantic-search vector-database
Last synced: 16 Feb 2026
https://github.com/jumjumiasbullah-08/kms-v01
📚 Knowledge Management System (KMS) - Document Management Based Sebuah aplikasi berbasis web untuk mengelola, menyimpan, dan mencari dokumen secara efisien menggunakan PHP murni dan MySQL.
cms content-management-system document-management document-search file-storage information-management-system knowledge-management-system mysql php web-application
Last synced: 16 May 2026
https://github.com/shrimpy8/semantic-search-next
A full-stack RAG (Retrieval Augmented Generation) application with hybrid search, cross-encoder reranking, citation-verified AI answers, and LLM-as-Judge evaluation. Supports multiple AI providers including OpenAI, Anthropic, and Ollama for fully local operation.
ai-answers anthropic chromadb citation-validation document-search embeddings fastapi hybrid-search jina llm natural-language-processing nextjs ollama openai postgresql python rag retrieval-augmented-generation semantic-search vector-database
Last synced: 05 May 2026
https://github.com/praarishtech/support-copilot-public
Slack-integrated assistant for Jira and Confluence. Word-based search across spaces — without leaving Slack. Available for exclusive buyout.
assistant buyout confluence document-search enterprise exclusive jira search single-tenant slack word-search
Last synced: 02 Nov 2025
https://github.com/gwicho38/legal-workspace-mcp
MCP server that gives Claude self-updating access to a local document directory. TF-IDF search, live file watching, PDF/DOCX/MD support. Built for legal workflows, works with any documents.
anthropic claude claude-code document-search legal legal-tech mcp mcp-server model-context-protocol tfidf
Last synced: 02 Mar 2026
https://github.com/rozhakxd/dochunter
developer-tools django document-search education open-source productivity python search-engine
Last synced: 08 Mar 2025
https://github.com/ckrough/retriever
AI-powered document Q&A using RAG (Retrieval-Augmented Generation). Built with FastAPI, Claude, and Chroma for accurate, cited answers.
chroma claude document-search fastapi knowledge-base llm python question-answering rag retrieval-augmented-generation self-service semantic-search vector-database
Last synced: 01 Apr 2026
https://github.com/krisluczka/osse
Open Source Search Engine with built-in web/document crawler and an indexing method.
cpp document-indexing document-search document-searching indexing-engine search-engine web-crawler web-crawling web-indexer web-indexing
Last synced: 15 Apr 2025
https://github.com/samjoesilvano/multi-source-knowledge-retrieval-system
An end-to-end multi-source knowledge retrieval system using LangChain, FAISS, and OpenAI embeddings. This Retrieval-Augmented Generation (RAG) pipeline intelligently searches across Wikipedia, arXiv, and custom websites, optimizing source selection and delivering precise, real-time results based on query relevance.
ai-pipeline document-search faiss information-retrieval knowledge-retrieval langchain langchain-agents langchain-tools machine-learning multi-source-retrieval natural-language-processing openai-embeddings python retrieval-augmented-generation semantic-search
Last synced: 10 May 2026
https://github.com/nishit00/document-qa-rag-system
📄 Transform documents into interactive AI conversations with ease, creating a searchable knowledge base for efficient information retrieval.
ai docker document-search faiss faiss-vector-database groq langchain llama3 llm nlp pdf python question-answering questions-and-answers rag rag-chatbot streamlit vector-store
Last synced: 12 Apr 2026
https://github.com/domwal/acervo-digital-pessoal
Website in PHP to index all pdf content and easy way to find any text
ajax bootstrap css document-search full-text-search html indexing javascript jquery linux-debian mysql pdf pdf-search php php73 windows-10
Last synced: 27 Feb 2026