An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with embedding-vectors

A curated list of projects in awesome lists tagged with embedding-vectors .

https://github.com/Dicklesworthstone/swiss_army_llama

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

embedding-similarity embedding-vectors embeddings llama2 llamacpp semantic-search

Last synced: 09 Apr 2025

https://github.com/dicklesworthstone/swiss_army_llama

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

embedding-similarity embedding-vectors embeddings llama2 llamacpp semantic-search

Last synced: 15 May 2025

https://github.com/gurubase/gurubase

Gurubase lets you add an "Ask AI" button to your technical docs, turning your content into a searchable Q&A assistant. It uses web pages, PDFs, YouTube videos, and GitHub repos as sources to generate instant, accurate answers with references. Deploy it via Slack, Discord, or a web widget.

ai ask-ai chatbot discord-bot docker embedding-vectors genai llm nextjs openai rag self-hosted slack-bot

Last synced: 04 Apr 2025

https://github.com/dicklesworthstone/fast_vector_similarity

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.

embedding-vectors llm similarity vector vectorsearch

Last synced: 16 May 2025

https://github.com/Dicklesworthstone/fast_vector_similarity

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.

embedding-vectors llm similarity vector vectorsearch

Last synced: 18 Apr 2025

https://github.com/Gurubase/gurubase

Gurubase is an open-source RAG system that lets you create AI-powered Q&A assistants by indexing websites, PDF documents, YouTube videos, and GitHub code repositories.

ai chatbot docker embedding-vectors genai llm nextjs openai rag self-hosted

Last synced: 02 Mar 2025

https://github.com/nitaiaharoni1/vector-storage

Vector Storage is a vector database that enables semantic similarity searches on text documents in the browser's local storage. It uses OpenAI embeddings to convert documents into vectors and allows searching for similar documents based on cosine similarity.

cosine-similarity embedding-vectors javascript local-storage localstorage lru-cache npm open-source openai semantic-search semantic-similarity typescript vector-database vector-db vector-search vector-similarity vector-similarity-database vector-similarity-search

Last synced: 16 May 2025

https://github.com/yusufhilmi/client-vector-search

A client side vector search library that can embed, store, search, and cache vectors. Works on the browser and node. It outperforms OpenAI's text-embedding-ada-002 and is way faster than Pinecone and other VectorDBs.

embedding-models embedding-vectors embeddings openai search text-embeddings transformers vector vector-search

Last synced: 14 Feb 2026

https://github.com/IngestAI/embedditor

⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.

datapreprocessing datascience embedding-vectors embeddings genai laravel llm markup-language ml nlp nltk php vector-database vector-search vectorization veml

Last synced: 28 Mar 2025

https://github.com/awa-ai/awadb

AI Native database for embedding vectors

ai-native aigc chatgpt embedding-vectors llm vectordb

Last synced: 05 Apr 2025

https://github.com/messkan/rag-chunk

A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.

chunking document-chunking embedding-vectors ia langchain llm nlp python rag rag-pipeline retrieval-augmented-generation text-splitting vector-search

Last synced: 05 Mar 2026

https://github.com/pentoai/vectory

Vectory provides a collection of tools to track and compare embedding versions.

deep-learning deep-neural-networks embedding-python embedding-vectors embeddings-similarity evaluation-framework

Last synced: 18 Feb 2026

https://github.com/shamspias/langchain-chat

langchain-chat is an AI-driven Q&A system that leverages OpenAI's GPT-4 model and FAISS for efficient document indexing. It loads and splits documents from websites or PDFs, remembers conversations, and provides accurate, context-aware answers based on the indexed data. Easy to set up and extend.

chatbot chatbots embedding-model embedding-models embedding-python embedding-similarity embedding-vectors faiss faiss-backend gpt-3 gpt-35-turbo gpt-4 gpt-j langchain langchain-python pinecone vector-database

Last synced: 30 Jul 2025

https://github.com/fredsiika/huxley-pdf

Upload personal docs and Chat with your PDF files with this GPT4-powered app. Built with LangChain, Pinecone Vector Database, deployed on Streamlit

chatgpt chatpdf embedding embedding-vectors langchain langchain-python openai pinecone vector-database

Last synced: 15 Apr 2025

https://github.com/get-convex/dryad

Dryad talks to you tree! Easy semantic code search on any repository

ai embedding-vectors gpt-4

Last synced: 08 Oct 2025

https://github.com/taherfattahi/recommendation-systems-by-llms

Enhancing Recommendation Systems with Large Language Models (RAG - LangChain - OpenAI)

embedding-vectors llm openai recommendation-system semantic semanticsearch

Last synced: 26 Jan 2026

https://github.com/patterns-ai-core/milvus

Ruby wrapper for the Milvus vector search database API

api-client embedding-vectors machine-learning milvus ml ruby rubyml search-engine vector-search

Last synced: 01 May 2025

https://github.com/kozistr/triton-grpc-proxy-rs

Proxy server for triton gRPC server that inferences embedding model in Rust

docker docker-compose embedding-vectors embeddings grpc ntex onnx onnxruntime proxy pytorch restful-api rust triton-client triton-server

Last synced: 20 Jul 2025

https://github.com/samadpls/bestrag

BestRAG: A library for hybrid RAG, combining dense, sparse, and late interaction methods for efficient document storage and search.

best-rag bm25 embedding-vectors hybrid-rag llm opensource pypi-package qdrant rag retrival-augmented-generation

Last synced: 27 Oct 2025

https://github.com/dcarpintero/wikisearch

Multilingual Semantic Search with Reranking on a prepared large vectorized dataset comprising 10 million Wikipedia documents. It supports dense retrieval, keyword search, and hybrid search.

cohere cohere-rerank dense-retrieval embedding-vectors generative-ai large-language-models python reranking semantic-search streamlit weaviate wikipedia

Last synced: 06 Oct 2025

https://github.com/easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

azure-openai chroma chromadb embedding-models embedding-vectors embeddings langchain langchain-python pdf pdf-document-processor pdf-parser pdf-parsing python word-embeddings

Last synced: 25 Jun 2025

https://github.com/rajadilipkolli/ai-playground

AI implementation using langchain4j and springAI frameworks with Java

embedding-vectors lamma langchain4j pgvector spring-ai spring-ai-ollama spring-ai-openai spring-boot-3 testcontainers

Last synced: 22 Mar 2025

https://github.com/torys877/vectrain

Vectrain is a high-performance, modular Go service that ingests data, generates vector embeddings, and stores them in vector databases for semantic search, recommendations, and analytics.

ai embedding-vectors embeddings go golang kafka knowledge-management llm qdrant rag semantic-search text-embeddings vector-database vector-embeddings

Last synced: 10 Oct 2025

https://github.com/dcarpintero/llamaindexchat

LLM Chatbot with Retrieval Augmented Generation using Llamaindex

embedding-vectors large-language-models llamaindex python retrieval-augmented-generation streamlit

Last synced: 08 Oct 2025

https://github.com/dcarpintero/athena

Scientific Research Assistant built with LLMs, Retrieval Augmented Generation, and Semantic Search.

cohere cohere-ai embedding-vectors langchain large-language-models prompt-engineering python retrieval-augmented-generation semantic-search streamlit weaviate

Last synced: 13 Apr 2025

https://github.com/taherfattahi/embedding-optimizer

Two approaches to generating optimized embeddings in the Retrieval-Augmented Generation (RAG) Pattern

embedding-vectors faiss langchain openai rag vectordatabase

Last synced: 31 Jul 2025

https://github.com/crclark/graph-anns

Efficient approximate nearest neighbor search data structure

approximate-nearest-neighbor-search embedding-vectors

Last synced: 12 Aug 2025

https://github.com/anthonypdawson/vector-inspector

The missing developer tool for working with vector databases. A comprehensive desktop app for visualizing, querying, and managing vector data.

chromadb clustering data-visualization developer-tools embedding-vectors embeddings hdbscan hnsw lancedb pgvector pinecone pyside6 qdrant rag retrieval-augmented-generation semantic-search vector-database vector-db vector-inspector vector-search

Last synced: 07 Apr 2026

https://github.com/better-with-models/tinyquant

TinyQuant is a CPU-only vector quantization codec that compresses high-dimensional embedding vectors to low-bit representations while preserving cosine similarity rankings.

embedding-models embedding-vectors embeddings embeddings-similarity pgvector

Last synced: 28 Apr 2026

https://github.com/geekmaxi/developer_assistant

Developer Assistant,开源的开发者技术问答助手!它基于先进大模型技术,解答编程难题,知识库覆盖Python、Java等主流开发语言,助您高效开发!

assistant assistant-app assistant-chat-bots deepseek developer-tools embedding-vectors embeddings knowledge-base langchain llama3 llm openai

Last synced: 25 Feb 2026

https://github.com/sdsc-innovation/itembed

Python library to train shallow embeddings on unordered sequences

embedding-vectors python python-library word2vec

Last synced: 17 Jun 2025

https://github.com/olasunkanmi-se/intellisearch

IntelliSearch is an advanced retrieval-based question-answering and recommendation system that leverages embeddings and a large language model (LLM) to provide accurate and relevant information to users.

ai embedding-models embedding-vectors gemini-api gemini-pro generative-ai google-generative-ai large-language-models llm machine-learning non-supervised-learning pgvector rag retrival-augmented similarity-searches typescript vector-databases vector-search

Last synced: 07 Mar 2026

https://github.com/egermano/poc-rag-ollama

Playing with Generative AI

chromadb embedding-vectors llama2 ollama rag

Last synced: 22 Jan 2026

https://github.com/labrijisaad/llm-rag

A Streamlit app leveraging a RAG LLM with FAISS to offer answers from uploaded files.

chatgpt cicd embedding-vectors faiss llm machine-learning mlops openai rag rag-llm streamlit streamlit-webapp

Last synced: 26 Feb 2026

https://github.com/chetanxpro/chat-with-pdf

This is a web app where user can talk with there pdf , just need to run few scripts to ingest there pdf, and then with web interface they can talk with pdf

embedding-vectors vector-similarity

Last synced: 08 Jun 2026

https://github.com/amirlayegh/airbnb-semantic-search

A semantic search system for Airbnb listings in Stockholm, built with Superlinked and Qdrant. It leverages multi-attribute vector search and Retrieval-Augmented Generation (RAG) to enhance search accuracy, embedding different data types (e.g., price, description) with specialized models. Powered by FastAPI and Streamlit.

embedding-vectors rag recommender-system semantic-search

Last synced: 29 Apr 2026

https://github.com/frankykyaw/deepmelodylstm

An LSTM based music generation model trained on midi data. The model takes in a sequence of a certain length and learns to predict the next note.

deep-learning embedding-vectors lstm music-generation music-theory neural-network tensorflow

Last synced: 16 May 2026

https://github.com/Gabriellgpc/computer-vision-dataset-maker

The Power of Florence-2 with OpenVINO & FiftyOne: Real-World Applications in Image Analysis

computer-vision deep-learning embedding-vectors fiftyone florence-2 image image-captioning image-recognition openvino

Last synced: 11 Apr 2025

https://github.com/seonglae/tei

Text Embeddings Inference (TEI)'s unofficial python wrapper library for batch processing with asyncio

aiohttp asyncio embedding embedding-vectors embeddings tei text-embeddings text-embeddings-inference

Last synced: 30 Jun 2025

https://github.com/blacknahil/semantic_search

A semantic search system for Wikipedia articles using Weaviate and Cohere. It indexes articles with custom embeddings and provides a query interface to retrieve the most relevant matches. The system demonstrates the power of vector-based search for natural language queries.

cohere embedding-vectors semantic-search-algorithm weaviate

Last synced: 18 May 2026

https://github.com/sodalabsio/knowledge-mapper

Project for developing knowledge maps via deep text embeddings.

embedding-vectors high-dimensional-visualization natural-language-processing

Last synced: 23 Jul 2025

https://github.com/tegridydev/face-based-attention-circuits

Face-Based Attention Circuits (FBAC): A Theoretical Framework for Context-Aware Embeddings

attention-mechanism embedding-vectors interpretability m-f-d-m machine-learning neural-networks python research-project

Last synced: 18 Feb 2026

https://github.com/ziozzang/embedding-server

Testing Embedding Server (Compatible OpenAI API). model from LLaMa/Mistral

embedding-models embedding-vectors flask openai-api

Last synced: 06 May 2026

https://github.com/mrseanryan/entity-classifier

Classify entities into clusters via embedding vectors, using a given list of category names

classification classification-algorithm embedding-vectors machine-learning

Last synced: 08 Nov 2025

https://github.com/mumtaz4118/transfer-learning-on-covid-data

This deep learning model(CNN) uses Transfer learning by Feature Extraction and Fine Tuning in order to make multiclass-classification between COVID-19, Pneumonia and Healthy images.

data-science deep-learning embedding-vectors feature-engineering feature-extraction machine-learning pca-analysis research-project transfer-learning

Last synced: 10 Oct 2025

https://github.com/clark-labs-inc/clark-hash

Clark Hash, 32x smaller searchable sketches for embeddings

embedding-vectors embeddings embeddings-similarity lsh sketching-algorithm

Last synced: 01 Jun 2026

https://github.com/dappros/site_crawler

Site crawler used in Ethora platform as an option to import your specific business data into your AI agent chat bot.

crawler data-ingestion embedding-vectors embeddings ethora llm rag retrieval-augmented-generation retrieval-based-chatbots retrieval-chatbot semantic-search site-crawler vectorstore web-scraping website-indexing

Last synced: 20 Jan 2026

https://github.com/0xibra/linux-tower-gpt-embeddings-experiment

This project is a work-in-progress and serves as an experiment for context injection with GPT and code embeddings. The goal is to use GPT to develop the remaining features of the project.

code-embedding code-generation embedding-vectors embeddings gpt-3 openai-api

Last synced: 23 Feb 2025

https://github.com/venkat-a/text_processing_rnn_lstm

Text Processing RNN leverages RNN and LSTM models for advanced text processing. It features deep learning techniques for NLP tasks, utilizing GloVe for word embeddings, aimed at both educational and practical applications.

categorical-encoding deep-neural-networks embedding-vectors lstm-neural-networks model-evaluation-and-tuning nltk-tokenizer numpy-arrays pandas-dataframe pytorch-nlp rnn-encoder-decoder tokenization vocabulary-builder

Last synced: 25 Feb 2025

https://github.com/hummusonrails/couchbase-azure-blog-vector-search-cli

CLI tool for scraping dynamic iframe-based blog content, generating vector embeddings with Azure OpenAI, and enabling semantic search with Couchbase.

azure azure-openai cli couchbase embedding-vectors python selenium vector-search

Last synced: 07 May 2026

https://github.com/gopikrsmscs/isebetter

A semantic search transformer model fine-tuned on the PyTorch GitHub issues dataset, hosted on Hugging Face, and integrated with Streamlit for easy use.

embedding-vectors huggingface-transformers nlp-machine-learning strea training tran

Last synced: 11 Jul 2025

https://github.com/webobite/fact-chatbot

A Fact chatbot is a project in which it read a txt file which consist all facts ahead of time and answer the user with some useful information regarding the same on the basis of facts provided in text file.

chatbot chatgpt chatgpt3 data data-visualization embedding-vectors generativeai nlp

Last synced: 04 May 2026

https://github.com/hi-tech-ai/help-scout-assistant-using-pinecone-vector-database

Help Scout Assistant is a document processing and query-response system that leverages Pinecone for vector storage and retrieval. The tool allows you to load PDF documents into a vector store, where they can be queried using OpenAI's language models.

ai-assistant embedding-vectors help-scout pinecone rag-chatbot vector-database

Last synced: 29 Sep 2025