Projects in Awesome Lists tagged with rag-evaluation
A curated list of projects in awesome lists tagged with rag-evaluation .
https://github.com/giskard-ai/giskard
🐢 Open-Source Evaluation & Testing for AI & LLM systems
agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation llm-security llmops ml-testing ml-validation mlops rag-evaluation red-team-tools responsible-ai trustworthy-ai
Last synced: 14 May 2025
https://github.com/Giskard-AI/giskard
🐢 Open-Source Evaluation & Testing for AI & LLM systems
agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation llm-security llmops ml-testing ml-validation mlops rag-evaluation red-team-tools responsible-ai trustworthy-ai
Last synced: 15 Apr 2025
https://github.com/marker-inc-korea/autorag
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
analysis automl benchmarking document-parser embeddings evaluation llm llm-evaluation llm-ops open-source ops optimization pipeline python qa rag rag-evaluation retrieval-augmented-generation
Last synced: 03 Apr 2026
https://github.com/agenta-ai/agenta
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
agents evaluation llm-as-a-judge llm-evaluation llm-framework llm-monitoring llm-observability llm-platform llm-playground llm-tools llmops observability prompt-engineering prompt-management rag-evaluation
Last synced: 11 Mar 2026
https://github.com/Agenta-AI/agenta
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
human-annotation langchain large-language-models llama-index llm llm-evaluation llm-framework llm-tools llmops llms prompt-engineering prompt-management prompt-toolkit rag rag-evaluation
Last synced: 13 Mar 2025
https://github.com/vectara/open-rag-eval
Open source RAG evaluation package
evaluation-metrics metrics rag rag-evaluation retrieval-augmented-generation vectara
Last synced: 22 Apr 2025
https://github.com/LLAMATOR-Core/llamator
Framework for testing vulnerabilities of large language models (LLM).
agent ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools security-tools vulnerability
Last synced: 10 May 2025
https://github.com/llamator-core/llamator
Framework for testing vulnerabilities of large language models (LLM).
ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools red-teaming security-tools vulnerability-assessment
Last synced: 17 Jan 2026
https://github.com/romiconez/llamator
Framework for testing vulnerabilities of large language models (LLM).
ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools red-teaming security-tools vulnerability-assessment
Last synced: 22 Mar 2025
https://github.com/dokimos-dev/dokimos
LLM and agent evaluation for Java & Kotlin. Runs in JUnit and CI. Spring AI, LangChain4j, Koog.
agent-evaluation agentic-ai evaluation evaluation-framework evaluation-metrics java junit junit-extension koog kotlin langchain4j llm llm-evaluation llm-evaluation-framework llm-evaluation-metrics rag rag-evaluation retrieval-augmented-generation spring-ai spring-ai-evaluation
Last synced: 09 Jun 2026
https://github.com/mts-ai/rurage
information-retrieval llm-evaluation question-answering rag rag-evaluation
Last synced: 10 Nov 2025
https://github.com/vero-labs-ai/vero-eval
Open source framework for evaluating AI Agents
dataset-generation datasets evals evaluation evaluation-framework evaluation-metrics langgraph llm-evaluation llm-evaluation-framework python rag-evaluation rag-testing synthetic-dataset-generation testing testing-framework testing-library user-persona
Last synced: 07 Apr 2026
https://github.com/evaliphy/evaliphy
The E2E AI testing tool | No ML Overhead
ai ai-test-automation ai-testing ai-testing-tool end-to-end-testing llm-evaluation llm-evaluation-framework llm-evaluation-toolkit llm-testing rag rag-evaluation rag-pipeline test-automation test-automation-framework testing-tools
Last synced: 09 Jun 2026
https://github.com/oztrkoguz/rag-framework-evaluation
This project aims to compare different Retrieval-Augmented Generation (RAG) frameworks in terms of speed and performance.
autogen autogen-rag crewai crewai-rag langchain langchain-rag llamaindex llamaindex-rag rag rag-evaluation swarms swarms-rag
Last synced: 20 Mar 2025
https://github.com/xmpuspus/kb-arena
Benchmark 7 retrieval strategies on your own docs — naive vector, contextual, QnA pairs, knowledge graph, RAPTOR, PageIndex, and hybrid. Find which KB architecture fits your data.
benchmark chromadb cli document-retrieval evaluation graphrag hybrid-search knowledge-graph llm neo4j python rag rag-evaluation retrieval retrieval-augmented-generation vector-search
Last synced: 02 May 2026
https://github.com/simranjeet97/learn_rag_from_scratch_llm
Learn Retrieval-Augmented Generation (RAG) from Scratch using LLMs from Hugging Face and Langchain or Python
artificial-intelligence datascience-machinelearning genai-domain genai-usecase generative-ai llm-apps llm-evaluation llm-framework llm-training rag rag-application rag-chatbot rag-embeddings rag-evaluation rag-implementation rag-llm rag-model rag-pipeline retrieval-augmented-generation
Last synced: 31 Jul 2025
https://github.com/shaadclt/evalrag
A comprehensive evaluation toolkit for assessing Retrieval-Augmented Generation (RAG) outputs using linguistic, semantic, and fairness metrics
Last synced: 22 Jul 2025
https://github.com/hallengray/rag-forge
Production-grade RAG pipelines with evaluation baked in
cli embeddings llm llm-evaluation mcp observability python rag rag-evaluation rag-pipeline ragas retrieval-augmented-generation vector-database
Last synced: 18 Apr 2026
https://github.com/fkapsahili/entrag
EntRAG - Enterprise RAG Benchmark
benchmark dataset evaluation evaluations generative-ai knowledge-graph llm llm-evaluation rag rag-evaluation retrieval retrieval-augmented-generation
Last synced: 09 Mar 2026
https://github.com/anasaber/mlflow_with_rag
Using MLflow to deploy your RAG pipeline, using LLamaIndex, Langchain and Ollama/HuggingfaceLLMs/Groq
cicd deployment evaluation-metrics llamaindex llamaindex-rag mlflow mlflow-deployement mlflow-projects mlflow-tracking mlflow-tracking-server mlflow-ui mlops mlops-project mlops-template rag rag-evaluation rag-pipeline
Last synced: 06 Feb 2026
https://github.com/kaos599/betterrag
BetterRAG: Powerful RAG evaluation toolkit for LLMs. Measure, analyze, and optimize how your AI processes text chunks with precision metrics. Perfect for RAG systems, document processing, and embedding quality assessment.
chunking-optimization embeddings embeddings-extraction embeddings-optimization evaluation evaluation-framework optimization rag rag-application rag-evaluation rag-optimization
Last synced: 05 May 2026
https://github.com/unshdee/proofrag
Point your agent at your docs and your RAG app; get a golden test set + an LLM-as-judge & retrieval scorecard, in one command.
agent-skills ci claude claude-code codex evaluation llm llm-as-judge python rag rag-evaluation retrieval
Last synced: 01 Jun 2026
https://github.com/keitabroadwater/llm-eval-lab
A web sandbox for hands-on learning of LLM and RAG Evaluation
evaluation-framework fastapi gpt4 llm-evaluation llmops nextjs rag-evaluation ragas
Last synced: 19 Apr 2026
https://github.com/alexmartin1722/mirage
An evaluation framework for evaluating any modality to text generation and multimodal RAG.
multimodal multimodal-rag multimodal-summarization rag rag-evaluation
Last synced: 14 May 2026
https://github.com/jhaayush2004/rag-evaluation
Different approaches to evaluate RAG !!!
bert-score giskard hallucination-detection langchain rag rag-evaluation ragas vectara wandb
Last synced: 16 Oct 2025