Projects in Awesome Lists tagged with evaluation-framework

https://github.com/eleutherai/lm-evaluation-harness

A framework for few-shot evaluation of language models.

evaluation-framework language-model transformer

Last synced: 09 Sep 2025

https://github.com/EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of language models.

evaluation-framework language-model transformer

Last synced: 23 Mar 2025

https://github.com/confident-ai/deepeval

The LLM Evaluation Framework

evaluation-framework evaluation-metrics llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Last synced: 13 May 2025

https://github.com/promptfoo/promptfoo

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

ci ci-cd cicd evaluation evaluation-framework llm llm-eval llm-evaluation llm-evaluation-framework llmops pentesting prompt-engineering prompt-testing prompts rag red-teaming testing vulnerability-scanners

Last synced: 03 Mar 2026

https://github.com/mr-gpt/deepeval

The LLM Evaluation Framework

evaluation-framework evaluation-metrics llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Last synced: 12 Jan 2026

https://github.com/huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

evaluation evaluation-framework evaluation-metrics huggingface

Last synced: 14 Oct 2025

https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation

This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.

bpr bprmf bprslim collaborative-filtering content-based-recommendation deep-learning evaluation-framework funksvd hybrid-recommender-system hyperparameters knn matrix-completion matrix-factorization neural-network recommendation-algorithms recommendation-system recommender-system reproducibility reproducible-research slimelasticnet

Last synced: 11 May 2025

https://github.com/relari-ai/continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

evaluation-framework evaluation-metrics information-retrieval llm-evaluation llmops rag retrieval-augmented-generation

Last synced: 05 Apr 2025

https://github.com/servicenow/agentlab

AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.

agent agents benchmark evaluation-framework lab llm llm-agents prompting web-agents

Last synced: 25 Sep 2025

https://github.com/tonicai/tonic_validate

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-framework evaluation-metrics large-language-models llm llmops llms rag retrieval-augmented-generation

Last synced: 24 Apr 2026

https://github.com/aiverify-foundation/moonshot

Moonshot - A simple and modular tool to evaluate and red-team any LLM application.

benchmarking evaluation-framework llm red-teaming trustworthy-ai

Last synced: 05 Feb 2026

https://github.com/athina-ai/athina-evals

Python SDK for running evaluations on LLM generated responses

evaluation evaluation-framework evaluation-metrics llm-eval llm-evaluation llm-evaluation-toolkit llm-ops llmops

Last synced: 29 Dec 2025

https://github.com/TonicAI/tonic_validate

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-framework evaluation-metrics large-language-models llm llmops llms rag retrieval-augmented-generation

Last synced: 04 Apr 2025

https://github.com/JinjieNi/MixEval

The official evaluation suite and dynamic data release for MixEval.

benchmark benchmark-mixture benchmarking-framework benchmarking-suite evaluation evaluation-framework foundation-models large-language-model large-language-models large-multimodal-models llm-evaluation llm-evaluation-framework llm-inference mixeval

Last synced: 14 Sep 2025

https://github.com/zeno-ml/zeno

AI Data Management & Evaluation Platform

ai data-science evaluation evaluation-framework machine-learning python

Last synced: 18 Apr 2025

https://github.com/lartpang/PySODEvalToolkit

PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

camouflaged-object-detection co-saliency co-salient-object-detection e-measure evaluation evaluation-framework evaluation-metrics evaluator f-measure fm-curve latex mae metrics metrics-visualization pr-curve python3 s-measure saliency saliency-detection salient-object-detection

Last synced: 21 Nov 2025

https://github.com/lartpang/pysodevaltoolkit

PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection

camouflaged-object-detection co-saliency co-salient-object-detection e-measure evaluation evaluation-framework evaluation-metrics evaluator f-measure fm-curve latex mae metrics metrics-visualization pr-curve python3 s-measure saliency saliency-detection salient-object-detection

Last synced: 15 Apr 2025

https://github.com/bijington/expressive

Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.

cross-platform evaluation evaluation-framework expression-evaluator expression-parser hacktoberfest netstandard parsing xamarin

Last synced: 31 Mar 2025

https://github.com/ServiceNow/AgentLab

AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.

agents benchmark evaluation-framework llm llm-agents prompting web-agents

Last synced: 30 Aug 2025

https://github.com/nlp-uoregon/mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark

datasets evaluation evaluation-datasets evaluation-framework language-model large-language-models multilingual natural-language-processing nlp

Last synced: 02 Aug 2025

https://github.com/AI21Labs/lm-evaluation

Evaluation suite for large-scale language models.

evaluation-framework language-model

Last synced: 23 Apr 2025

https://github.com/tsenst/crowdflow

Optical Flow Dataset and Benchmark for Visual Crowd Analysis

benchmark-suite computer-vision crowd-analysis crowd-counting dataset evaluation-framework motion-estimation multi-object-tracking optical-flow synthetic-images tracking tracking-by-detection trajectories tub-crowdflow-dataset video-analytics video-processing video-surveillance

Last synced: 06 Mar 2026

https://github.com/alibaba-damo-academy/MedEvalKit

MedEvalKit: A Unified Medical Evaluation Framework

evaluation-framework llm medicalai multimodal

Last synced: 28 Jul 2025

https://github.com/microsoft/eureka-ml-insights

A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.

ai artificial-intelligence evaluation-framework llm machine-learning mllm

Last synced: 05 Apr 2025

https://github.com/x-plug/writingbench

WritingBench: A Comprehensive Benchmark for Generative Writing

ai benchmark evaluation-framework huggingface llm long-context long-text nlp text-generation writing

Last synced: 01 Sep 2025

https://github.com/kaiko-ai/eva

Evaluation framework for oncology foundation models (FMs)

evaluation-framework foundation-models machine-learning oncology

Last synced: 24 Dec 2025

https://github.com/codefuse-ai/codefuse-evaluation

Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中

code-evaluation codecommenteval codefuse codetranseval evaluation-framework lcc repository-eval

Last synced: 07 Apr 2025

https://github.com/bmw-innovationlab/sordi-ai-evaluation-gui

This repository allows you to evaluate a trained computer vision model and get general information and evaluation metrics with little configuration.

ai bmw computer-vision dataset deeplearning docker evaluation evaluation-framework no-code python rest-api sordi synthetic-data tensorflow

Last synced: 02 Jul 2025

https://github.com/nouhadziri/DialogEntailment

The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"

bert dialogue-evaluation evaluation-framework natural-language-inference

Last synced: 02 Apr 2025

https://github.com/pentoai/vectory

Vectory provides a collection of tools to track and compare embedding versions.

deep-learning deep-neural-networks embedding-python embedding-vectors embeddings-similarity evaluation-framework

Last synced: 18 Feb 2026

https://github.com/jinzhuoran/rwku

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024

adversarial-attacks benchmark evaluation-framework forgetting large-language-models membership-inference-attack natural-language-processing privacy-protection right-to-be-forgotten unlearning

Last synced: 02 Apr 2025

https://github.com/letta-ai/letta-evals

Evaluation kit for testing stateful agents

agentevals agents evaluation-framework language-model letta letta-agents

Last synced: 23 May 2026

https://github.com/jinzhuoran/RWKU

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024

adversarial-attacks benchmark evaluation-framework forgetting large-language-models membership-inference-attack natural-language-processing privacy-protection right-to-be-forgotten unlearning

Last synced: 24 Mar 2025

https://github.com/powerflows/powerflows-dmn

Power Flows DMN - Powerful decisions and rules engine

decision-engine decision-tables dmn dmn-engine dmn-model evaluation evaluation-framework feel groovy java javascript kotlin kotlin-dsl mvel rule-engine rules rules-engine xml yaml

Last synced: 09 Apr 2025

https://github.com/cedrickchee/vibe-jet

A browser-based 3D multiplayer flying game with arcade-style mechanics, created using the Gemini 2.5 Pro through a technique called "vibe coding"

evaluation-framework flight-simulator game-development gemini-2-5-pro-exp llm-evaluation vibe-check vibe-coding

Last synced: 05 May 2025

https://github.com/gair-nlp/scaleeval

Scalable Meta-Evaluation of LLMs as Evaluators

evaluation-framework generative-ai llm nlp

Last synced: 23 Jun 2025

https://github.com/googlecloudplatform/evalbench

EvalBench is a flexible framework designed to measure the quality of generative AI (GenAI) workflows around database specific tasks.

databases eval evaluation-framework nl2sql text2sql

Last synced: 27 Apr 2026

https://github.com/adithya-s-k/indic_eval

A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasks

evaluation-framework llm-evaluation llms

Last synced: 03 Aug 2025

https://github.com/tohtsky/irspack

Train, evaluate, and optimize implicit feedback-based recommender systems.

eigen evaluation-framework hyperparameter-optimization knn-algorithm matrix-factorization optuna pybind11 recommender-systems

Last synced: 15 May 2026

https://github.com/vero-labs-ai/vero-eval

Open source framework for evaluating AI Agents

dataset-generation datasets evals evaluation evaluation-framework evaluation-metrics langgraph llm-evaluation llm-evaluation-framework python rag-evaluation rag-testing synthetic-dataset-generation testing testing-framework testing-library user-persona

Last synced: 07 Apr 2026

https://github.com/astrabert/sentrev

Simple customizable evaluation for text retrieval performance of Sentence Transformers embedders on PDFs

embedders evaluation-framework python python-package qdrant semantic-search sentence-transformers text-embedding vector-database

Last synced: 16 Apr 2025

https://github.com/davidheineman/thresh

🌾 Universal, customizable and deployable fine-grained evaluation for text generation.

annotation-tool evaluation-framework natural-language-processing nlp thresh

Last synced: 16 Jan 2026

https://github.com/ad-freiburg/elevant

Entity linking evaluation and analysis tool

entity-disambiguation entity-linking evaluation-framework

Last synced: 29 Oct 2025

https://github.com/vinid/quica

quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex

evaluation-framework evaluation-metrics inter-coder-agreement inter-rater-agreement python

Last synced: 15 May 2025

https://github.com/AstraBert/diRAGnosis

Diagnose the performance of your RAG🩺

docker evaluation-framework fastapi gradio llamaindex llm python-package qdrant rag retrieval synthetic-dataset-generation vector-database

Last synced: 14 Mar 2025

https://github.com/hpai-bsc/turtle

A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)

evaluation-framework rtl

Last synced: 18 Jul 2025

https://github.com/ma7555/evalify

Evaluate your biometric verification models literally in seconds.

evaluation evaluation-framework evaluation-metrics face-recognition face-verification python

Last synced: 07 May 2025

https://github.com/liaad/tieval

An Evaluation Framework for Temporal Information Extraction Systems

evaluation-framework information-extraction nlp temporal-relations

Last synced: 25 Apr 2025

https://github.com/hlt-mt/subsonar

Evaluate the quality of SRT files using the multilingual multimodal SONAR model.

evaluation-framework evaluation-metrics subtitles subtitling

Last synced: 16 Jan 2026

https://github.com/diningphil/mlwiz

Machine Learning Research Wizard

evaluation-framework experiments machine-learning

Last synced: 19 Apr 2026

https://github.com/borgwardtlab/ggme

Official repository for the ICLR 2022 paper "Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions" https://openreview.net/forum?id=tBtoZYKd9n

evaluation-framework evaluation-metrics generative-model graph-learning machine-learning

Last synced: 11 Jul 2025

https://github.com/eduardogr/evalytics

HR tool to orchestrate the Performance Review Cycle of the employees of a company.

company evaluation-cycle evaluation-framework human-resources performance-evaluation python python-3

Last synced: 07 Jul 2025

https://github.com/lapix-ufsc/lapixdl

Python package with Deep Learning utilities for Computer Vision

computer-vision deep-learning evaluation-framework image-processing

Last synced: 09 Apr 2026

https://github.com/GiovanniBaccichet/DNCS-HTTP3

Docker-based virtualized framework for analysing HTTP/3+QUIC performance and compare it to HTTP/2 and TCP.

docker evaluation-framework http3 performace performance-evaluation quic ssl tcp vagrant video-streaming

Last synced: 07 Apr 2025

https://github.com/giovannibaccichet/dncs-http3

Docker-based virtualized framework for analysing HTTP/3+QUIC performance and compare it to HTTP/2 and TCP.

docker evaluation-framework http3 performace performance-evaluation quic ssl tcp vagrant video-streaming

Last synced: 27 Jul 2025

https://github.com/vectara/mirage-bench

Repository for Multililngual Generation, RAG evaluations, and surrogate judge training for Arena RAG leaderboard (NAACL'25)

anyscale-endpoint arena azure-api claude-api cohere-api evaluation-framework gemini-api llm-inference openai-api rag retrieval-augmented-generation vllm

Last synced: 27 Feb 2026

https://github.com/rosinality/halite

Acceleration framework for Human Alignment Learning

evaluation-framework inference large-language-models proximal-policy-optimization reinforcement-learning reinforcement-learning-from-human-feedback transformers

Last synced: 28 Jul 2025

https://github.com/aigc-apps/PertEval

This is the accompanying repo of the NeurIPS '24 D&B Spotlight paper, PertEval, including code, data, and main results.

evaluation-framework evaluation-metrics large-language-models llm-evaluation machine-learning trustworthy-ai

Last synced: 09 Jul 2025

https://github.com/maximhq/maxim-cookbooks

Maxim is an end-to-end AI evaluation and observability platform that empowers modern AI teams to ship agents with quality, reliability, and speed.

evaluation evaluation-framework genai observability

Last synced: 03 Mar 2026

https://github.com/iaar-shanghai/guessarena

[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning

benchmark chatgpt deepseek domain-specific-eval evaluation-framework gamearena guessarena knowledge-evaluation large-language-models llm-eval openai qwen reasoning-evaluation reliable-evaluation

Last synced: 24 Apr 2026

https://github.com/seblemaguer/replikant

A flexible evaluation platform to enable researchers to conduct replicable subjective evaluation

evaluation evaluation-framework listening-test replicability

Last synced: 06 Sep 2025

https://github.com/jimmc414/claudecode_n_codex_swebench

Toolkit for measuring Claude Code and Codex performance over time against a baseline using SWEbench-lite dataset **No API key required for Max subscribers**

claude-code claudecode eval evaluation-framework swebench

Last synced: 17 Sep 2025

https://github.com/feup-infolab/army-ant

An experimental information retrieval framework and a workbench for innovation in entity-oriented search.

ant evaluation-framework information-retrieval research

Last synced: 13 Jul 2025

https://github.com/vcerqueira/modelradar

Aspect-based Forecasting Accuracy

deep-learning evaluation-framework forecasting machine-learning time-series

Last synced: 22 Jan 2026

https://github.com/jetbrains/teamcity-ai-agent-testing-demo

End-to-end TeamCity framework to run AI agents on SWE-Bench Lite. Spin up isolated Docker images per task, extract patches, score with the official harness, and aggregate success rates. As an example, we'll look at Junie and Google Gemini CLI

agent-evaluation agentic-ai ai eval evaluation evaluation-framework evaluation-tools

Last synced: 18 Apr 2026

https://github.com/maastrichtu-ids/fair-enough-metrics

☑️ API to publish FAIR metrics tests written in python

evaluation-framework evaluation-metrics fair-data

Last synced: 15 Jun 2025

https://github.com/feup-infolab/army-ant-install

Army ANT installation via Docker Compose.

ant docker-compose-files evaluation-framework information-retrieval research

Last synced: 19 Mar 2026

https://github.com/stack-rs/mitosis

Mitosis: A Unified Transport Evaluation Framework

cli distributed distributed-systems evaluation evaluation-framework library rust transport-layer

Last synced: 04 Mar 2026

https://github.com/sap-samples/llm-round-trip-correctness

This repo provides code for evaluation of llm round-trip-correctness on text to process model and vice versa

benchmarking business evaluation-framework genai processes round-trip-correctness

Last synced: 13 Apr 2025

https://github.com/dongli/esmdiag

This is a diagnostic package for earth system modeling.

earth-science evaluation-framework

Last synced: 04 Apr 2026

https://github.com/yukinagae/genkitx-promptfoo

Community Plugin for Genkit to use Promptfoo

ai evaluation evaluation-framework firebase genkit genkit-plugin genkitx llm llm-eval llm-evaluation llm-evaluation-framework llmops plugin prompt prompt-testing promptfoo prompts testing

Last synced: 27 Jul 2025

https://github.com/cmry/amica

Repository for the experiments described in "Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity" submitted as pre-print to arXiv.

cyberbullying cyberbullying-detection cybersecurity evaluation evaluation-framework machine-learning reproduction text-mining

Last synced: 23 Apr 2025

https://github.com/aigc-apps/perteval

This is the accompanying repo of the NeurIPS '24 D&B Spotlight paper, PertEval, including code, data, and main results.

evaluation-framework evaluation-metrics large-language-models llm-evaluation machine-learning trustworthy-ai

Last synced: 13 Apr 2025

https://github.com/aidos-lab/rings

Relevant Information in Node features and Graph Structure

data-centric evaluation-framework geometric-deep-learning graph-learning icml-2025

Last synced: 05 Feb 2026

https://github.com/teilomillet/kushim

eval creator

dataset dataset-generation eval evaluation evaluation-framework llm openai

Last synced: 15 Mar 2026

https://github.com/artefactop/promptdev

A prompt evaluation framework that provides comprehensive testing for AI agents across multiple providers.

ci-cd evaluation-framework llm llm-eval llm-evaluation llm-evaluation-framework prompt prompt-engineering prompt-toolkit red-team testing

Last synced: 30 Oct 2025

https://github.com/bassrehab/spark-llm-eval

Spark-native LLM evaluation framework with confidence intervals, significance testing, and Databricks integration

databricks evaluation-framework llm-evaluation- machine-learning mlflow mlops nlp pyspark python

Last synced: 14 Jan 2026

https://github.com/astrabert/diragnosis

Diagnose the performance of your RAG🩺

docker evaluation-framework fastapi gradio llamaindex llm python-package qdrant rag retrieval synthetic-dataset-generation vector-database

Last synced: 11 Jun 2025

https://github.com/pedrodevog/synthecg

The first systematic evaluation framework for synthetic 10-second 12-lead ECGs from diagnostic class-conditioned generative models

deep-learning diffusion-models ecg electrocardiogram evaluation-framework gan generative-ai medical-ai ptb-xl python pytorch state-space-model synthetic-data time-series

Last synced: 15 May 2026

https://github.com/kaos599/betterrag

BetterRAG: Powerful RAG evaluation toolkit for LLMs. Measure, analyze, and optimize how your AI processes text chunks with precision metrics. Perfect for RAG systems, document processing, and embedding quality assessment.

chunking-optimization embeddings embeddings-extraction embeddings-optimization evaluation evaluation-framework optimization rag rag-application rag-evaluation rag-optimization

Last synced: 05 May 2026

https://github.com/leo310/rag-chunking-evaluation

Assess the effectiveness of chunking strategies in RAG systems via a custom evaluation framework.

chunking evaluation-framework retrieval retrieval-augmented-generation

Last synced: 22 Jan 2026

https://github.com/yukinagae/promptfoo-sample

Sample project demonstrates how to use Promptfoo, a test framework for evaluating the output of generative AI models

evaluation evaluation-framework llm llm-eval llm-evaluation llm-evaluation-framework llmops prompt-testing promptfoo prompts testing

Last synced: 25 Feb 2026

https://github.com/szegedai/hun_ner_checklist

CHECKLIST-style test cases and the testing of three Hungarian Named Entity Recognition tools.

evaluation-framework hungarian-language ner nlp

Last synced: 01 Feb 2026

https://github.com/jplane/llm-function-call-eval

Demonstrates a workflow for LLM function calling evaluation. Uses GitHub Copilot to generate synthetic eval data and Azure AI Foundry for handling results.

azure-ai-foundry evaluation-framework function-calling llm synthetic-dataset-generation tool-use vscode

Last synced: 26 May 2026

https://github.com/ksm26/improving-accuracy-of-llm-applications

The course equips developers with techniques to enhance the reliability of LLMs, focusing on evaluation, prompt engineering, and fine-tuning. Learn to systematically improve model accuracy through hands-on projects, including building a text-to-SQL agent and applying advanced fine-tuning methods.

evaluation-framework instruction-fine-tuning iterative-fine-tuning llama-models llm-accuracy lora memory-tuning model-reliability mome performance-optimization prompt-engineering self-reflection text-to-sql

Last synced: 28 Mar 2025

https://github.com/yukinagae/genkit-promptfoo-sample

Sample implementation demonstrating how to use Firebase Genkit with Promptfoo

evaluation evaluation-framework genkit llm llm-eval llm-evaluation llm-evaluation-framework llmops prompt-testing promptfoo prompts testing

Last synced: 15 Aug 2025

https://github.com/parthapray/llm_evaluation_metrics_localized

This repo contains code for localized LLM evaluation metrics vis a framework using Ollama and edge resource and novel derived metrics

evaluation evaluation-framework evaluation-metrics evaluations flask large-language-models metrics ollama-api restful-api

Last synced: 18 Apr 2026

https://github.com/aiflowml/hyperparams

HyperParams: A Decentralized Framework for AI Agent Assessment and Certification

agent agents evaluation evaluation-framework evaluation-functions evaluation-kit evaluation-metrics evaluation-test ml ml-engineering

Last synced: 31 Oct 2025

https://github.com/theaiautomators/deepeval-wrapper

REST API wrapper for DeepEval Python library with authentication

evaluation evaluation-framework evaluation-metrics

Last synced: 18 Jan 2026

https://github.com/amadlaorg/judge

🧑‍⚖️ Judge verifies that system settings meet required configurations and resource specifications 🧑‍⚖️

auditing evaluation evaluation-framework

Last synced: 18 Jan 2026

https://github.com/keitabroadwater/llm-eval-lab

A web sandbox for hands-on learning of LLM and RAG Evaluation

evaluation-framework fastapi gpt4 llm-evaluation llmops nextjs rag-evaluation ragas

Last synced: 19 Apr 2026

https://github.com/syed-m-hussain/recap

RECAP (Review Engine for Critiquing and Advising Pitches) is an LLM-powered agentic system designed to help founders and entrepreneurs receive actionable, multi-perspective, and structured feedback on their startup pitch presentations

evaluation-framework langchain langgraph-agents

Last synced: 24 Apr 2026

https://github.com/arclabs561/anno

Information extraction for Rust: NER, coreference resolution, and evaluation

bert candle coreference-resolution entity-extraction evaluation-framework gliner information-extraction ner nlp onnx rust

Last synced: 13 Jan 2026