Projects in Awesome Lists by open-compass

https://github.com/open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

benchmark chatgpt evaluation large-language-model llama2 llama3 llm openai

Last synced: 29 Oct 2024

https://github.com/open-compass/vlmevalkit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 08 Nov 2024

https://github.com/open-compass/mixtralkit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

llm mistral moe

Last synced: 08 Nov 2024

https://github.com/open-compass/MixtralKit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

llm mistral moe

Last synced: 07 Nov 2024

https://github.com/open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 08 Aug 2024

https://github.com/open-compass/lawbench

Benchmarking Legal Knowledge of Large Language Models

benchmark chatgpt law llm

Last synced: 08 Nov 2024

https://github.com/open-compass/LawBench

Benchmarking Legal Knowledge of Large Language Models

benchmark chatgpt law llm

Last synced: 02 Nov 2024

https://github.com/open-compass/t-eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step

Last synced: 08 Nov 2024

https://github.com/open-compass/T-Eval

[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step

Last synced: 03 Aug 2024

https://github.com/open-compass/mmbench

Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"

Last synced: 08 Nov 2024

https://github.com/open-compass/botchat

Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.

Last synced: 08 Nov 2024

https://github.com/open-compass/mathbench

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

Last synced: 08 Nov 2024

https://github.com/open-compass/devbench

A Comprehensive Benchmark for Software Development.

Last synced: 08 Nov 2024

https://github.com/open-compass/gaokao-eval

Last synced: 08 Nov 2024

https://github.com/open-compass/MathBench

[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

Last synced: 27 Oct 2024

https://github.com/open-compass/ada-leval

The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"

gpt4 llm long-context

Last synced: 08 Nov 2024

Ecosyste.ms: Awesome