An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with embedding-models

A curated list of projects in awesome lists tagged with embedding-models .

https://github.com/volcengine/MineContext

MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)

agent context-engineering electron embedding-models memory proactive-ai python python3 rag react vector-database vision-language-model

Last synced: 21 Oct 2025

https://github.com/Sujit-O/pykg2vec

Python library for knowledge graph embedding and representation learning.

embedding-models embedding-python knowledge-graph representation-learning

Last synced: 29 Apr 2025

https://github.com/sujit-o/pykg2vec

Python library for knowledge graph embedding and representation learning.

embedding-models embedding-python knowledge-graph representation-learning

Last synced: 09 Apr 2025

https://github.com/marl/openl3

OpenL3: Open-source deep audio and image embeddings

audio deep-learning embedding embedding-models image image-embeddings machine-listening

Last synced: 16 May 2025

https://github.com/yusufhilmi/client-vector-search

A client side vector search library that can embed, store, search, and cache vectors. Works on the browser and node. It outperforms OpenAI's text-embedding-ada-002 and is way faster than Pinecone and other VectorDBs.

embedding-models embedding-vectors embeddings openai search text-embeddings transformers vector vector-search

Last synced: 14 Feb 2026

https://github.com/spcl/ncc

Neural Code Comprehension: A Learnable Representation of Code Semantics

code-analysis embedding-based embedding-models embeddings llvm-ir machine-learning neural-networks

Last synced: 12 Apr 2025

https://github.com/akutuzov/webvectors

Web-ify your word2vec: framework to serve distributional semantic models online

distributional-semantics embedding-models flask gensim web-app word2vec

Last synced: 18 Jan 2026

https://github.com/shobrook/weightgain

Train an adapter for any embedding model in under a minute

adapter embedding-models embeddings fine-tuning lora openai

Last synced: 22 Aug 2025

https://github.com/shamspias/langchain-chat

langchain-chat is an AI-driven Q&A system that leverages OpenAI's GPT-4 model and FAISS for efficient document indexing. It loads and splits documents from websites or PDFs, remembers conversations, and provides accurate, context-aware answers based on the indexed data. Easy to set up and extend.

chatbot chatbots embedding-model embedding-models embedding-python embedding-similarity embedding-vectors faiss faiss-backend gpt-3 gpt-35-turbo gpt-4 gpt-j langchain langchain-python pinecone vector-database

Last synced: 30 Jul 2025

https://github.com/alisonbma/aisfx

Representation Learning for the Automatic Indexing of Sound Effects Libraries (ISMIR 2022): Deep audio embeddings pre-trained on UCS & Non-UCS-compliant datasets.

deep-learning embedding-models machine-learning music-information-retrieval representation-learning sound-effects-library universal-category-system

Last synced: 10 Sep 2025

https://github.com/worldbank/gistembed

GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings

deep-learning embedding-models fine-tuning huggingface mteb sentence-embeddings sentence-transformers

Last synced: 24 Apr 2025

https://github.com/maxscheurer/cppe

C++ and Python library for Polarizable Embedding

embedding-models hartree-fock quantum-chemistry

Last synced: 18 Feb 2026

https://github.com/fabiangroeger96/deep-embedded-music

Creation of an embedding space using unsupervised triplet loss and Tile2Vec that can be used for a variety of downstream tasks

audio contrastive-learning embedding-models embeddings music music-information-retrieval tensorflow tensorflow2 tile2vec triplet-loss unsupervised-learning unsupervised-machine-learning

Last synced: 09 May 2025

https://github.com/easonlai/chatbot_with_pdf_streamlit

This code example shows how to make a chatbot for semantic search over documents using Streamlit, LangChain, and various vector databases. The chatbot lets users ask questions and get answers from a document collection. The code is in Python and can be customized for different scenarios and data.

azure azure-cognitive-search azure-openai chroma document-search embedding-models gpt-3 gpt-35-turbo langchain langchain-python openai pinecone python semantic-search streamlit vector-database vector-search vector-similarity

Last synced: 26 Apr 2025

https://github.com/ashutosh1919/data2vec-pytorch

Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text.

audio-machine-learning computer-vision data2vec deep-learning embedding-models multimodal-deep-learning nlp pytorch self-supervised-learning

Last synced: 19 Mar 2025

https://github.com/easonlai/chat_with_pdf_table

The contents of this repository showcase how to extract table data from a PDF file and preprocess it to facilitate word embedding. This preprocessing step enhances the readability of table data for language models and enables us to extract more contextual information from the tables.

azure-openai chroma chromadb embedding-models embedding-vectors embeddings langchain langchain-python pdf pdf-document-processor pdf-parser pdf-parsing python word-embeddings

Last synced: 25 Jun 2025

https://github.com/karolzak/images-vector-search

Simple implementation of search for visually similar images using deep learning and vector search. It's based on pretrained ImageNet weights so it doesnt require any additional training

cnn deep-learning deep-neural-networks embedding-models image image-processing images keras keras-tensorflow neural-networks resnet resnet-152 resnet-50 similarity-detection similarity-search transfer-learning vector-search vgg vgg19 visual-search

Last synced: 07 May 2025

https://github.com/olibomby/cm3p

CM3P (Contrastive Metadata-Map Masked Pre-training) multi-modal representation learning framework for osu! beatmaps

beatmaps classifier-model contrastive-learning embedding-models huggingface-transformers osugame

Last synced: 23 May 2026

https://github.com/touhi99/dl_dialogue_act_classification

DL Lab Project - Given a subset of switchboard corpus, goal is to classify dialogue acts from Speech and Text data. We define a RNN-LSTM model for Text classification and CNN model for speech classification and then ensemble both model to output a stable and higher performance model

classification cnn deep-learning deep-neural-networks dialogue dialogue-act embedding-models keras rnn

Last synced: 22 Apr 2025

https://github.com/sovit-123/local_file_search

Local file search using embedding techniques

embedding-models embeddings nlp vector-search

Last synced: 11 Apr 2025

https://github.com/natelalor/ai_report_generator

A tool that converts long audio files into a thorough, summarized report. Leverages OpenAI and its API (ChatGPT backend), Langchain for text processing, and Pinecone for vector database facilitation.

artificial-intelligence chatbot embedding-models langchain map-reduce object-oriented-programming openai openai-api pinecone python vector-database

Last synced: 26 Jul 2025

https://github.com/better-with-models/tinyquant

TinyQuant is a CPU-only vector quantization codec that compresses high-dimensional embedding vectors to low-bit representations while preserving cosine similarity rankings.

embedding-models embedding-vectors embeddings embeddings-similarity pgvector

Last synced: 28 Apr 2026

https://github.com/seven-33/langchain-chat

langchain-chat is an AI-driven Q&A system that leverages OpenAI's GPT-4 model and FAISS for efficient document indexing. It loads and splits documents from websites or PDFs, remembers conversations, and provides accurate, context-aware answers based on the indexed data. Easy to set up and extend.

chatbot chatbots embedding-model embedding-models embedding-python embedding-similarity faiss faiss-backend gpt-3 gpt-4 gpt-j gpt35turbo langchain language-python pinecone

Last synced: 26 Apr 2025

https://github.com/louisbrulenaudet/lemone-embed

All-in-one repo for the Lemone-embed project, a series of fine-tuned embedding models for Tax retrieval augmented generation (RAG).

e5 embedding-models embeddings fine-tuning huggingface lemone lemone-embed open-source rag roberta roberta-model sentence-transformers similarity-search tax taxation

Last synced: 22 Apr 2025

https://github.com/trustlelab/siteware-backend-v2

Siteware Backend - German Voice AI Agent provider - Deepgram + Twilio + Elevenlabs + OpenAI + Pinecone

deepgram elevenlabs embedding-models function-calling javascript openai pinecone prompt-engineering twilio-api typescript vector-database

Last synced: 10 Apr 2025

https://github.com/ksm26/embedding-models-from-architecture-to-implementation

Understand and build embedding models, focusing on word and sentence embeddings, dual encoder architectures. Learn to train embedding models using contrastive loss, implement them in semantic search and RAG systems.

ai-applications ai-architecture bert bert-embeddings bert-fine-tuning bert-model contrastive-loss dual-encoder embedding-models machine-learning model-training question-answer-retrieval rag-systems semantic-search sentence-embeddings transformer-models word-embeddings word2vec

Last synced: 25 Jul 2025

https://github.com/yuniko-software/bge-m3-onnx

ONNX implementation of the BGE-M3 multilingual embedding model and tokenizer with native C#, Java, and Python implementations. Generates all three embedding types: dense, sparse, and ColBERT vectors.

bge-m3 csharp dotnet embedding-models hugginface inference java machine-learning onnx python pytorch tokenizer vector-database

Last synced: 26 Aug 2025

https://github.com/firojalam/crisis-embedding-models

Embedding models designed using crisis related tweets collected by AIDR (http://aidr.qcri.org/)

crisis-related embedding embedding-models model tweets vector vector-model word

Last synced: 10 Nov 2025

https://github.com/olasunkanmi-se/intellisearch

IntelliSearch is an advanced retrieval-based question-answering and recommendation system that leverages embeddings and a large language model (LLM) to provide accurate and relevant information to users.

ai embedding-models embedding-vectors gemini-api gemini-pro generative-ai google-generative-ai large-language-models llm machine-learning non-supervised-learning pgvector rag retrival-augmented similarity-searches typescript vector-databases vector-search

Last synced: 07 Mar 2026

https://github.com/nanxstats/exp2vec

🧬 Tissue-specific gene embeddings trained on GTEx data

embedding-models embeddings gene-expression glove-embeddings gtex machine-learning

Last synced: 08 May 2026

https://github.com/singhxtushar/imdb-analysis

IMDB-Analysis is a sentiment Analysis project based on movie review, whether it is +ve or -ve. Model is design with a simple RNN architecture and embedded with word2vec. Deployed on streamlit web-app open cloud service.

embedding-models imdb rnn streamlit-webapp tensorflow

Last synced: 16 Feb 2026

https://github.com/gatlenculp/embedding_translation

Alignment across Deep Neural Network Language Models’ Representations

embedding-models llm machine-learning

Last synced: 14 Feb 2026

https://github.com/gaurav-026/skillmatch-ai

The goal of this application is to generate suggestions based on the given resume of the candidate, store the candidate profile in Pinecone database, and shortlist candidates accroding to the skills matched with match score.

embedding-models gemini-api nextjs pinecone-db tailwindcss typesscript vector-database

Last synced: 09 Jun 2026

https://github.com/rasyosef/text-embedding-models-training

Notebooks to train and evaluate Amharic Text Embedding Models based on BERT and RoBERTa for Passage Retrieval

bert embedding-models embeddings huggingface model-training roberta sentence-transformers transformers

Last synced: 05 Oct 2025

https://github.com/04bhavyaa/langchain-models

This project explores various LLMs and embedding models using LangChain, integrating OpenAI, Hugging Face, Google Gemini, and Anthropic. It includes chat models, document similarity search, and embeddings with cosine similarity for retrieval. The setup is simple, making it easy to experiment with LLMs and vector search. 🚀 (Big Thankyou to CampusX)

anthropic-claude chatmodel embedding-models embeddings google-generative-ai huggingface langchain langchain-models llm-model openai-api vectorization

Last synced: 28 Apr 2026

https://github.com/with-caer/curtana

Simplified zero-cost wrapper over llama.cpp powered by the lama-cpp-2 Crate.

crate embedding-models llama llm-inference rust

Last synced: 12 Oct 2025

https://github.com/doobidoo/agentnexus

A TypeScript-based autonomous agent framework with modular systems for memory, planning, and tool integration. Features vector-based recall, multi-strategy planning, and extensible tools for AI agent development.

action-execution ai-agent-framework anthropic autonomous-agents chromadb embedding-models llm-integration memory-system multi-agent-system planning-system playwright typescript vector-database vector-search

Last synced: 08 Sep 2025

https://github.com/ziozzang/embedding-server

Testing Embedding Server (Compatible OpenAI API). model from LLaMa/Mistral

embedding-models embedding-vectors flask openai-api

Last synced: 06 May 2026

https://github.com/subhangisati/rag-using-deepseek-r1

This repository highlights my learning journey in building Retrieval-Augmented Generation (RAG) pipelines using DeepSeek on Lightning AI, covering document ingestion, retrieval, and integration with generative AI. It showcases fine-tuning, evaluation, and optimization for accurate open-domain QA and knowledge management.

api deepseek document-indexing embedding-models fine-tuning generative-ai gpt huggingface-transformers langchain llm rag

Last synced: 04 May 2026

https://github.com/rbroc/simcat

A Python package to simulate multi-agent cognitive association tasks 🤖 🧠 👥

agent-based-simulations cognition embedding-models foraging social-interaction

Last synced: 30 Mar 2025

https://github.com/chatterjeesaurabh/natural-language-processing

Text Preprocessing, Embedding Methods such as BoW, TF-IDF and Word2Vec, Text Classification using LSTM, Topic Modeling with LDA and BERTopic.

embedding-models natural-language-processing nlp-machine-learning text-classification topic-modeling

Last synced: 14 Jun 2026

https://github.com/rakibhhridoy/natural-language-processing-steps

Preprocess data in nlp text classification and text sequence in TensorFlow. There's different steps in both classification and sequence task, thus it need different steps. These steps in TensorFlow is so much easy if you get into it.

classification embedded embedding-models feed keras natural-language-processing nlp python sentence sequence tensorflow text-analysis text-classification text-processing

Last synced: 06 May 2026

https://github.com/yuniko-software/qwen3-tokenizer

Multi-language BPE tokenizer implementation for Qwen3 models. Lightweight byte-pair encoding for C#/.NET, Java, Rust

bpe-tokenizer csharp dotnet embedding-models huggingface inference java machine-learning onnx qwen rust vector-database

Last synced: 31 Oct 2025

https://github.com/mya-mya/novel2vecweb

Word2Vec の小説バージョン

bootstrap bootstrap5 embedding-models html jinja2 python word2vec

Last synced: 18 May 2026

https://github.com/chiragsaini/textual-similarity

This notebook provides textual similarity between given two paragraphs. Google universal sentence encoder is used to create embeddings for these words.

deep-learning embedding-models jupyter-notebook nlp python textual-similarity

Last synced: 11 Apr 2026

https://github.com/pranav-kural/ledaa-load-data

AWS Lambda function handling data ingestion in RAG pipeline of LEDAA project.

aws-lambda conversational-ai embedding-models pinecon python retrieval-augmented-generation

Last synced: 27 Oct 2025

https://github.com/yuniko-software/qwen3-tokenizer-dotnet

Multi-language BPE tokenizer implementation for Qwen3 models. Lightweight byte-pair encoding for C#/.NET

bpe-tokenizer csharp dotnet embedding-models huggingface inference llm machine-learning onnx qwen vector-database

Last synced: 13 Apr 2026

https://github.com/stackmodel/babyagi-autonomous-agents

Demonstrates how to implement BabyAGI by Yohei Nakajima.

agi ai aws babyagi bedrock embedding-models faiss langchain llm python serpapi

Last synced: 13 Apr 2026

https://github.com/celiason/museum-news

webapp to find out historic details about the museum

ai chatbot embedding-models llms rag

Last synced: 16 May 2026

https://github.com/mahnoorsheikh16/explainable-fake-news-detection-and-personalized-credible-recommendation-via-graphml

System for detecting fake news and suggesting credible alternatives. Takes a news URL and outputs a credibility score, explanation, and top reliable sources. Uses TF-IDF + Logistic Regression, XGBoost, and DistilBERT with hybrid BERT–LightGCN models, plus SHAP and GNNExplainer for interpretability.

bert-embeddings binary-classification distilbert embedding-models fake-news-detection gnn-explainer graphml graphsage lightgcn logistic-regression pytorch recommendation-system shap tf-idf xgboost-classifier

Last synced: 08 Oct 2025

https://github.com/vvkmnn/touristai

🇫🇷 English to French Translation via Python 3 and Keras RNNs.

bidirectional-rnn embedding-models keras rnn-model rnn-tensorflow

Last synced: 10 Oct 2025

https://github.com/huacenxu/embedding-models-for-ai-retrieval

This project develops a domain-specific embedding model to enhance document retrieval in AI-powered search systems. It incorporates techniques like synthetic data generation, model fine-tuning, and vector search using FAISS, evaluated with MRR@5 for performance.

document-retrieval embedding-models faiss machine-learning mrr nlp reallifeproject semantic-search

Last synced: 12 Mar 2025

https://github.com/hase3b/scprag

This repository implements a Retrieval-Augmented Generation (RAG) system for the Supreme Court of Pakistan, utilizing different LLMs, embedding models, and retrieval and generation enhancement strategies. It processes SCP judgments, applies chunking, and generates legal summaries and answers based on relevant case data.

beautifulsoup4 embedding-models huggingface langchain legal-corpus llama llm mistral nlp ocr pdf2image pinecone pymupdf pytesseract regex retreival retrieval-augmented-generation selenium vectorstore

Last synced: 06 Apr 2026

https://github.com/parthapray/ecotroph-rag

This repo shows the coding of EcoTroph-RAG: A Retrieval-Augmented Ecological Intelligence Framework for Freshwater Fish Diet Analysis

bart-large-cnn bge-m3 bm25 diet ecological embedding-models fish huggingface llm minilm-l6-v2 nomic-ai-nomic-embed-text-v15 rag summarization t5-base

Last synced: 01 Jun 2026

https://github.com/aspadax/dim

Use LLMs for effective and refined vectorizations.

cv embedding-models gpt machine-learning nlp rust vector

Last synced: 11 Feb 2026

https://github.com/emapco/chem-mrl

Chem-MRL: SMILES Matryoshka Representation Learning Embedding Model

chemoinformatics embedding-models latent-space matryoshka-representation-learning smiles

Last synced: 01 May 2025

https://github.com/jonathanfavorite/ragamuffin

A lightweight, cross-platform .NET library for building RAG (Retrieval-Augmented Generation) pipelines with local embedding models and SQLite vector storage. Perfect for developers who need privacy-focused, offline-capable document search and AI-powered question answering without external API dependencies.

ai chunking document-processing dotnet embedding-models fluent-api local-ai metadata ml nlp offline-ai onnx pdf-processing privacy-focused rag retrieval-augmented-generation semantic-search sqlite vector-database vector-search

Last synced: 02 Jun 2026

https://github.com/matteospanio/mule-torch

Unofficial PyTorch port of MULE (SF-NFNet-F0), the music-audio embedding model.

embedding-models music torch

Last synced: 06 Jun 2026

https://github.com/huacenxu/covid-morality

This project builds a novel liberty dictionary to quantify liberty morality—a concept missing from the extended Moral Foundations Dictionary (eMFD)—and leverages it to study the relationship between audience engagement and COVID-related news.

academic-project ai coronavirus covid-19 embedding-models nlp nlp-machine-learning

Last synced: 30 Apr 2026

https://github.com/antoniakras/semantic-video-search

GPU-optimized semantic search on video transcripts, with benchmarking of FAISS, Pinecone, and PostgreSQL vector databases. Deployed via Docker on FORTH’s GPU infrastructure.

bert-embeddings bert-fine-tuning cuda dokcer embedding-models embeddings-word2vec faiss-vector-database gpu-computing huggingface-transformers nlp-machine-learning pgvector pineconedb postgresql python pytorch retrieval-augmented-generation similarity-search vector-database whisper-ai

Last synced: 03 May 2026