Projects in Awesome Lists tagged with embedding-vectors
A curated list of projects in awesome lists tagged with embedding-vectors .
https://github.com/towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
computer-vision convolutional-networks embedding-vectors embeddings feature-extraction feature-vector image-processing image-retrieval llm machine-learning milvus pipeline towhee transformer unstructured-data video-processing vision-transformer vit
Last synced: 13 May 2025
https://github.com/Dicklesworthstone/swiss_army_llama
A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
embedding-similarity embedding-vectors embeddings llama2 llamacpp semantic-search
Last synced: 09 Apr 2025
https://github.com/dicklesworthstone/swiss_army_llama
A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
embedding-similarity embedding-vectors embeddings llama2 llamacpp semantic-search
Last synced: 15 May 2025
https://github.com/gurubase/gurubase
Gurubase lets you add an "Ask AI" button to your technical docs, turning your content into a searchable Q&A assistant. It uses web pages, PDFs, YouTube videos, and GitHub repos as sources to generate instant, accurate answers with references. Deploy it via Slack, Discord, or a web widget.
ai ask-ai chatbot discord-bot docker embedding-vectors genai llm nextjs openai rag self-hosted slack-bot
Last synced: 04 Apr 2025
https://github.com/dicklesworthstone/fast_vector_similarity
The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.
embedding-vectors llm similarity vector vectorsearch
Last synced: 16 May 2025
https://github.com/Dicklesworthstone/fast_vector_similarity
The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.
embedding-vectors llm similarity vector vectorsearch
Last synced: 18 Apr 2025
https://github.com/bbc-esq/vectordb-plugin
Plugin that lets you ask questions about your documents including audio and video files.
bark database-management embedding-models embedding-vectors embeddings gtts koboldai koboldcpp python rag retrieval-augmented-generation retrieval-chatbot tiledb vector-data-management vector-database vector-search vision whisper whispers2t whisperspeech
Last synced: 16 May 2025
https://github.com/BBC-Esq/VectorDB-Plugin
Plugin that lets you ask questions about your documents including audio and video files.
bark database-management embedding-models embedding-vectors embeddings gtts koboldai koboldcpp python rag retrieval-augmented-generation retrieval-chatbot tiledb vector-data-management vector-database vector-search vision whisper whispers2t whisperspeech
Last synced: 25 Oct 2025
https://github.com/geeks-of-data/knowledge-gpt
Extract knowledge from all information sources using gpt and other language models. Index and make Q&A session with information sources.
context embedding embedding-vectors gpt gpt3-turbo gpt4 huggingface huggingface-transformers information-extraction language-model llama llm natural-language-processing openai python question-answering scraper sentence-embeddings sentence-similarity vector-search
Last synced: 04 Apr 2025
https://github.com/Gurubase/gurubase
Gurubase is an open-source RAG system that lets you create AI-powered Q&A assistants by indexing websites, PDF documents, YouTube videos, and GitHub code repositories.
ai chatbot docker embedding-vectors genai llm nextjs openai rag self-hosted
Last synced: 02 Mar 2025
https://github.com/nitaiaharoni1/vector-storage
Vector Storage is a vector database that enables semantic similarity searches on text documents in the browser's local storage. It uses OpenAI embeddings to convert documents into vectors and allows searching for similar documents based on cosine similarity.
cosine-similarity embedding-vectors javascript local-storage localstorage lru-cache npm open-source openai semantic-search semantic-similarity typescript vector-database vector-db vector-search vector-similarity vector-similarity-database vector-similarity-search
Last synced: 16 May 2025
https://github.com/yusufhilmi/client-vector-search
A client side vector search library that can embed, store, search, and cache vectors. Works on the browser and node. It outperforms OpenAI's text-embedding-ada-002 and is way faster than Pinecone and other VectorDBs.
embedding-models embedding-vectors embeddings openai search text-embeddings transformers vector vector-search
Last synced: 14 Feb 2026
https://github.com/IngestAI/embedditor
⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.
datapreprocessing datascience embedding-vectors embeddings genai laravel llm markup-language ml nlp nltk php vector-database vector-search vectorization veml
Last synced: 28 Mar 2025
https://github.com/aws-samples/rag-using-langchain-amazon-bedrock-and-opensearch
RAG with langchain using Amazon Bedrock and Amazon OpenSearch
aws bedrock embedding-vectors genai generative-ai opensearch rag retrieval-augmented-generation vector
Last synced: 03 Mar 2026
https://github.com/dadmatech/dadmatools
DadmaTools is a Persian NLP tools developed by Dadmatech Co.
chunker constituency-parser dataset-loader dependency-parser embedding-vectors embeddings lemmatizer natural-language-processing ner nlptoolkit persian persian-nlp postagger spacy tokenizer
Last synced: 25 Oct 2025
https://github.com/Dadmatech/DadmaTools
DadmaTools is a Persian NLP tools developed by Dadmatech Co.
chunker constituency-parser dataset-loader dependency-parser embedding-vectors embeddings lemmatizer natural-language-processing ner nlptoolkit persian persian-nlp postagger spacy tokenizer
Last synced: 09 Jul 2025
https://github.com/awa-ai/awadb
AI Native database for embedding vectors
ai-native aigc chatgpt embedding-vectors llm vectordb
Last synced: 05 Apr 2025
https://github.com/messkan/rag-chunk
A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
chunking document-chunking embedding-vectors ia langchain llm nlp python rag rag-pipeline retrieval-augmented-generation text-splitting vector-search
Last synced: 05 Mar 2026
https://github.com/ikergarcia1996/metavec
A monolingual and cross-lingual meta-embedding generation and evaluation framework
embedding embedding-evaluation embedding-models embedding-vectors embeddings emnlp2021 fasttext fasttext-embeddings meta-embedding meta-embeddings word2vec
Last synced: 17 Sep 2025
https://github.com/pentoai/vectory
Vectory provides a collection of tools to track and compare embedding versions.
deep-learning deep-neural-networks embedding-python embedding-vectors embeddings-similarity evaluation-framework
Last synced: 18 Feb 2026
https://github.com/shahules786/twitter-sentiment
Sentiment analyzer for your tweets.
embedding-vectors natural-language-processing pytorch sentiment-analyzer twitter-domain
Last synced: 02 Jul 2025
https://github.com/shamspias/langchain-chat
langchain-chat is an AI-driven Q&A system that leverages OpenAI's GPT-4 model and FAISS for efficient document indexing. It loads and splits documents from websites or PDFs, remembers conversations, and provides accurate, context-aware answers based on the indexed data. Easy to set up and extend.
chatbot chatbots embedding-model embedding-models embedding-python embedding-similarity embedding-vectors faiss faiss-backend gpt-3 gpt-35-turbo gpt-4 gpt-j langchain langchain-python pinecone vector-database
Last synced: 30 Jul 2025
https://github.com/patterns-ai-core/weaviate-ruby
Ruby wrapper for the Weaviate vector search database API
api-client embedding-vectors machine-learning ml ruby rubyml vector-search vector-search-engine weaviate
Last synced: 01 May 2025
https://github.com/fredsiika/huxley-pdf
Upload personal docs and Chat with your PDF files with this GPT4-powered app. Built with LangChain, Pinecone Vector Database, deployed on Streamlit
chatgpt chatpdf embedding embedding-vectors langchain langchain-python openai pinecone vector-database
Last synced: 15 Apr 2025
https://github.com/get-convex/dryad
Dryad talks to you tree! Easy semantic code search on any repository
Last synced: 08 Oct 2025
https://github.com/patterns-ai-core/qdrant-ruby
Ruby wrapper for the Qdrant vector search database API
api-client embedding-vectors machine-learning ml qdrant ruby rubyml vector-search vector-search-engine
Last synced: 01 May 2025
https://github.com/taherfattahi/recommendation-systems-by-llms
Enhancing Recommendation Systems with Large Language Models (RAG - LangChain - OpenAI)
embedding-vectors llm openai recommendation-system semantic semanticsearch
Last synced: 26 Jan 2026
https://github.com/patterns-ai-core/milvus
Ruby wrapper for the Milvus vector search database API
api-client embedding-vectors machine-learning milvus ml ruby rubyml search-engine vector-search
Last synced: 01 May 2025
https://github.com/kozistr/triton-grpc-proxy-rs
Proxy server for triton gRPC server that inferences embedding model in Rust
docker docker-compose embedding-vectors embeddings grpc ntex onnx onnxruntime proxy pytorch restful-api rust triton-client triton-server
Last synced: 20 Jul 2025
https://github.com/samadpls/bestrag
BestRAG: A library for hybrid RAG, combining dense, sparse, and late interaction methods for efficient document storage and search.
best-rag bm25 embedding-vectors hybrid-rag llm opensource pypi-package qdrant rag retrival-augmented-generation
Last synced: 27 Oct 2025
https://github.com/dcarpintero/wikisearch
Multilingual Semantic Search with Reranking on a prepared large vectorized dataset comprising 10 million Wikipedia documents. It supports dense retrieval, keyword search, and hybrid search.
cohere cohere-rerank dense-retrieval embedding-vectors generative-ai large-language-models python reranking semantic-search streamlit weaviate wikipedia
Last synced: 06 Oct 2025
https://github.com/hkproj/retrieval-augmented-generation-notes
Slides for "Retrieval Augmented Generation" video
embedding-vectors hnsw language-model retrieval-augmented-generation sentence-bert study-notes vector-db
Last synced: 06 May 2025
https://github.com/easonlai/chat_with_pdf_table
The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.
azure-openai chroma chromadb embedding-models embedding-vectors embeddings langchain langchain-python pdf pdf-document-processor pdf-parser pdf-parsing python word-embeddings
Last synced: 25 Jun 2025
https://github.com/rajadilipkolli/ai-playground
AI implementation using langchain4j and springAI frameworks with Java
embedding-vectors lamma langchain4j pgvector spring-ai spring-ai-ollama spring-ai-openai spring-boot-3 testcontainers
Last synced: 22 Mar 2025
https://github.com/torys877/vectrain
Vectrain is a high-performance, modular Go service that ingests data, generates vector embeddings, and stores them in vector databases for semantic search, recommendations, and analytics.
ai embedding-vectors embeddings go golang kafka knowledge-management llm qdrant rag semantic-search text-embeddings vector-database vector-embeddings
Last synced: 10 Oct 2025
https://github.com/France-Travail/embcompare
A simple python tool for embedding comparison
comparison-tool embedding-python embedding-vectors embeddings embeddings-similarity embeddings-word2vec pypi-package python-package python3 streamlit-dashboard
Last synced: 19 Aug 2025
https://github.com/dcarpintero/llamaindexchat
LLM Chatbot with Retrieval Augmented Generation using Llamaindex
embedding-vectors large-language-models llamaindex python retrieval-augmented-generation streamlit
Last synced: 08 Oct 2025
https://github.com/france-travail/embcompare
A simple python tool for embedding comparison
comparison-tool embedding-python embedding-vectors embeddings embeddings-similarity embeddings-word2vec pypi-package python-package python3 streamlit-dashboard
Last synced: 07 Mar 2026
https://github.com/dcarpintero/athena
Scientific Research Assistant built with LLMs, Retrieval Augmented Generation, and Semantic Search.
cohere cohere-ai embedding-vectors langchain large-language-models prompt-engineering python retrieval-augmented-generation semantic-search streamlit weaviate
Last synced: 13 Apr 2025
https://github.com/taherfattahi/embedding-optimizer
Two approaches to generating optimized embeddings in the Retrieval-Augmented Generation (RAG) Pattern
embedding-vectors faiss langchain openai rag vectordatabase
Last synced: 31 Jul 2025
https://github.com/crclark/graph-anns
Efficient approximate nearest neighbor search data structure
approximate-nearest-neighbor-search embedding-vectors
Last synced: 12 Aug 2025
https://github.com/brightxiaohan/facerecognizer
Deep face recognition.
embedding-vectors face-recognition insightface reba
Last synced: 07 Sep 2025
https://github.com/anthonypdawson/vector-inspector
The missing developer tool for working with vector databases. A comprehensive desktop app for visualizing, querying, and managing vector data.
chromadb clustering data-visualization developer-tools embedding-vectors embeddings hdbscan hnsw lancedb pgvector pinecone pyside6 qdrant rag retrieval-augmented-generation semantic-search vector-database vector-db vector-inspector vector-search
Last synced: 07 Apr 2026
https://github.com/better-with-models/tinyquant
TinyQuant is a CPU-only vector quantization codec that compresses high-dimensional embedding vectors to low-bit representations while preserving cosine similarity rankings.
embedding-models embedding-vectors embeddings embeddings-similarity pgvector
Last synced: 28 Apr 2026
https://github.com/dcarpintero/github-semantic-search
Semantic Search on Langchain Github Issues with Weaviate
bm25 embedding-vectors hybrid-search langchain large-language-models python semantic-search streamlit weaviate
Last synced: 13 Apr 2025
https://github.com/ubos-tech/node-red-contrib-chromadb
Chroma is the open-source embedding database
chromadb database embedding embedding-vectors llms node-red node-red-contrib node-red-flow ubos-tech vector vectordb
Last synced: 08 Mar 2026
https://github.com/geekmaxi/developer_assistant
Developer Assistant,开源的开发者技术问答助手!它基于先进大模型技术,解答编程难题,知识库覆盖Python、Java等主流开发语言,助您高效开发!
assistant assistant-app assistant-chat-bots deepseek developer-tools embedding-vectors embeddings knowledge-base langchain llama3 llm openai
Last synced: 25 Feb 2026
https://github.com/sdsc-innovation/itembed
Python library to train shallow embeddings on unordered sequences
embedding-vectors python python-library word2vec
Last synced: 17 Jun 2025
https://github.com/olasunkanmi-se/intellisearch
IntelliSearch is an advanced retrieval-based question-answering and recommendation system that leverages embeddings and a large language model (LLM) to provide accurate and relevant information to users.
ai embedding-models embedding-vectors gemini-api gemini-pro generative-ai google-generative-ai large-language-models llm machine-learning non-supervised-learning pgvector rag retrival-augmented similarity-searches typescript vector-databases vector-search
Last synced: 07 Mar 2026
https://github.com/egermano/poc-rag-ollama
Playing with Generative AI
chromadb embedding-vectors llama2 ollama rag
Last synced: 22 Jan 2026
https://github.com/nssharmaofficial/review-sentiment-classifier
Review classification in pytorch using LSTM
classification embedding-layer embedding-vectors imdb-dataset lstm lstm-layer nlp pytorch
Last synced: 24 Jan 2026
https://github.com/labrijisaad/llm-rag
A Streamlit app leveraging a RAG LLM with FAISS to offer answers from uploaded files.
chatgpt cicd embedding-vectors faiss llm machine-learning mlops openai rag rag-llm streamlit streamlit-webapp
Last synced: 26 Feb 2026
https://github.com/chetanxpro/chat-with-pdf
This is a web app where user can talk with there pdf , just need to run few scripts to ingest there pdf, and then with web interface they can talk with pdf
embedding-vectors vector-similarity
Last synced: 08 Jun 2026
https://github.com/crate/langchain-cratedb
CrateDB provider for LangChain.
cratedb cratedb-client cratedb-connector cratedb-driver cratedb-sdk embedding-vectors embeddings langchain langchain-openai langchain-python llm openai vdbms vector-database vector-store
Last synced: 28 Apr 2025
https://github.com/amirlayegh/airbnb-semantic-search
A semantic search system for Airbnb listings in Stockholm, built with Superlinked and Qdrant. It leverages multi-attribute vector search and Retrieval-Augmented Generation (RAG) to enhance search accuracy, embedding different data types (e.g., price, description) with specialized models. Powered by FastAPI and Streamlit.
embedding-vectors rag recommender-system semantic-search
Last synced: 29 Apr 2026
https://github.com/frankykyaw/deepmelodylstm
An LSTM based music generation model trained on midi data. The model takes in a sequence of a certain length and learns to predict the next note.
deep-learning embedding-vectors lstm music-generation music-theory neural-network tensorflow
Last synced: 16 May 2026
https://github.com/Gabriellgpc/computer-vision-dataset-maker
The Power of Florence-2 with OpenVINO & FiftyOne: Real-World Applications in Image Analysis
computer-vision deep-learning embedding-vectors fiftyone florence-2 image image-captioning image-recognition openvino
Last synced: 11 Apr 2025
https://github.com/seonglae/tei
Text Embeddings Inference (TEI)'s unofficial python wrapper library for batch processing with asyncio
aiohttp asyncio embedding embedding-vectors embeddings tei text-embeddings text-embeddings-inference
Last synced: 30 Jun 2025
https://github.com/blacknahil/semantic_search
A semantic search system for Wikipedia articles using Weaviate and Cohere. It indexes articles with custom embeddings and provides a query interface to retrieve the most relevant matches. The system demonstrates the power of vector-based search for natural language queries.
cohere embedding-vectors semantic-search-algorithm weaviate
Last synced: 18 May 2026
https://github.com/sodalabsio/knowledge-mapper
Project for developing knowledge maps via deep text embeddings.
embedding-vectors high-dimensional-visualization natural-language-processing
Last synced: 23 Jul 2025
https://github.com/tegridydev/face-based-attention-circuits
Face-Based Attention Circuits (FBAC): A Theoretical Framework for Context-Aware Embeddings
attention-mechanism embedding-vectors interpretability m-f-d-m machine-learning neural-networks python research-project
Last synced: 18 Feb 2026
https://github.com/ziozzang/embedding-server
Testing Embedding Server (Compatible OpenAI API). model from LLaMa/Mistral
embedding-models embedding-vectors flask openai-api
Last synced: 06 May 2026
https://github.com/mrseanryan/entity-classifier
Classify entities into clusters via embedding vectors, using a given list of category names
classification classification-algorithm embedding-vectors machine-learning
Last synced: 08 Nov 2025
https://github.com/mumtaz4118/transfer-learning-on-covid-data
This deep learning model(CNN) uses Transfer learning by Feature Extraction and Fine Tuning in order to make multiclass-classification between COVID-19, Pneumonia and Healthy images.
data-science deep-learning embedding-vectors feature-engineering feature-extraction machine-learning pca-analysis research-project transfer-learning
Last synced: 10 Oct 2025
https://github.com/clark-labs-inc/clark-hash
Clark Hash, 32x smaller searchable sketches for embeddings
embedding-vectors embeddings embeddings-similarity lsh sketching-algorithm
Last synced: 01 Jun 2026
https://github.com/dappros/site_crawler
Site crawler used in Ethora platform as an option to import your specific business data into your AI agent chat bot.
crawler data-ingestion embedding-vectors embeddings ethora llm rag retrieval-augmented-generation retrieval-based-chatbots retrieval-chatbot semantic-search site-crawler vectorstore web-scraping website-indexing
Last synced: 20 Jan 2026
https://github.com/0xibra/linux-tower-gpt-embeddings-experiment
This project is a work-in-progress and serves as an experiment for context injection with GPT and code embeddings. The goal is to use GPT to develop the remaining features of the project.
code-embedding code-generation embedding-vectors embeddings gpt-3 openai-api
Last synced: 23 Feb 2025
https://github.com/venkat-a/text_processing_rnn_lstm
Text Processing RNN leverages RNN and LSTM models for advanced text processing. It features deep learning techniques for NLP tasks, utilizing GloVe for word embeddings, aimed at both educational and practical applications.
categorical-encoding deep-neural-networks embedding-vectors lstm-neural-networks model-evaluation-and-tuning nltk-tokenizer numpy-arrays pandas-dataframe pytorch-nlp rnn-encoder-decoder tokenization vocabulary-builder
Last synced: 25 Feb 2025
https://github.com/pngo1997/retrieval-augmented-retrieval-rag-for-cleantech-media
Implements a Retrieval-Augmented Generation (RAG) system.
chunking embedding-models embedding-vectors llm natural-language-processing prompt-engineering qa-generation rag-systems ragas-evaluation retrieval-augmented-generation splitter text-generation
Last synced: 18 Jun 2025
https://github.com/hummusonrails/couchbase-azure-blog-vector-search-cli
CLI tool for scraping dynamic iframe-based blog content, generating vector embeddings with Azure OpenAI, and enabling semantic search with Couchbase.
azure azure-openai cli couchbase embedding-vectors python selenium vector-search
Last synced: 07 May 2026
https://github.com/gopikrsmscs/isebetter
A semantic search transformer model fine-tuned on the PyTorch GitHub issues dataset, hosted on Hugging Face, and integrated with Streamlit for easy use.
embedding-vectors huggingface-transformers nlp-machine-learning strea training tran
Last synced: 11 Jul 2025
https://github.com/webobite/fact-chatbot
A Fact chatbot is a project in which it read a txt file which consist all facts ahead of time and answer the user with some useful information regarding the same on the basis of facts provided in text file.
chatbot chatgpt chatgpt3 data data-visualization embedding-vectors generativeai nlp
Last synced: 04 May 2026
https://github.com/hi-tech-ai/help-scout-assistant-using-pinecone-vector-database
Help Scout Assistant is a document processing and query-response system that leverages Pinecone for vector storage and retrieval. The tool allows you to load PDF documents into a vector store, where they can be queried using OpenAI's language models.
ai-assistant embedding-vectors help-scout pinecone rag-chatbot vector-database
Last synced: 29 Sep 2025