Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists by open-compass

A curated list of projects in awesome lists by open-compass .

https://github.com/open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

benchmark chatgpt evaluation large-language-model llama2 llama3 llm openai

Last synced: 29 Oct 2024

https://github.com/open-compass/vlmevalkit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 08 Nov 2024

https://github.com/open-compass/mixtralkit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

llm mistral moe

Last synced: 08 Nov 2024

https://github.com/open-compass/MixtralKit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

llm mistral moe

Last synced: 07 Nov 2024

https://github.com/open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 08 Aug 2024

https://github.com/open-compass/lawbench

Benchmarking Legal Knowledge of Large Language Models

benchmark chatgpt law llm

Last synced: 08 Nov 2024

https://github.com/open-compass/LawBench

Benchmarking Legal Knowledge of Large Language Models

benchmark chatgpt law llm

Last synced: 02 Nov 2024

https://github.com/open-compass/t-eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step

Last synced: 08 Nov 2024

https://github.com/open-compass/T-Eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step

Last synced: 03 Aug 2024

https://github.com/open-compass/mmbench

Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"

Last synced: 08 Nov 2024

https://github.com/open-compass/botchat

Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.

Last synced: 08 Nov 2024

https://github.com/open-compass/mathbench

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

Last synced: 08 Nov 2024

https://github.com/open-compass/devbench

A Comprehensive Benchmark for Software Development.

Last synced: 08 Nov 2024

https://github.com/open-compass/MathBench

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

Last synced: 27 Oct 2024

https://github.com/open-compass/ada-leval

The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"

gpt4 llm long-context

Last synced: 08 Nov 2024

https://github.com/open-compass/criticeval

[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs

Last synced: 08 Nov 2024

https://open-compass.github.io/CriticEval/

[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs

Last synced: 01 Nov 2024

https://github.com/open-compass/anah

[ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2

acl gpt hallucination-detection llms neurips

Last synced: 08 Nov 2024

https://github.com/open-compass/code-evaluator

A multi-language code evaluation tool.

Last synced: 08 Nov 2024

https://github.com/open-compass/prosa

[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Last synced: 08 Nov 2024

https://github.com/open-compass/gta

Official repository for paper "GTA: A Benchmark for General Tool Agents"

Last synced: 08 Nov 2024

https://github.com/open-compass/compassbench

Demo data of CompassBench

Last synced: 08 Nov 2024

https://github.com/open-compass/cibench

Official Repo of "CIBench: Evaluation of LLMs as Code Interpreter "

Last synced: 08 Nov 2024