Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists by open-compass
A curated list of projects in awesome lists by open-compass .
https://github.com/open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
benchmark chatgpt evaluation large-language-model llama2 llama3 llm openai
Last synced: 29 Oct 2024
https://github.com/open-compass/vlmevalkit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa
Last synced: 08 Nov 2024
https://github.com/open-compass/mixtralkit
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
Last synced: 08 Nov 2024
https://github.com/open-compass/MixtralKit
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
Last synced: 07 Nov 2024
https://github.com/open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks
chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa
Last synced: 08 Aug 2024
https://github.com/open-compass/lawbench
Benchmarking Legal Knowledge of Large Language Models
Last synced: 08 Nov 2024
https://github.com/open-compass/LawBench
Benchmarking Legal Knowledge of Large Language Models
Last synced: 02 Nov 2024
https://github.com/open-compass/t-eval
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
Last synced: 08 Nov 2024
https://github.com/open-compass/T-Eval
[ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step
Last synced: 03 Aug 2024
https://github.com/open-compass/mmbench
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
Last synced: 08 Nov 2024
https://github.com/open-compass/botchat
Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.
Last synced: 08 Nov 2024
https://github.com/open-compass/mathbench
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
Last synced: 08 Nov 2024
https://github.com/open-compass/devbench
A Comprehensive Benchmark for Software Development.
Last synced: 08 Nov 2024
https://github.com/open-compass/MathBench
[ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset
Last synced: 27 Oct 2024
https://github.com/open-compass/ada-leval
The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"
Last synced: 08 Nov 2024
https://github.com/open-compass/criticeval
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs
Last synced: 08 Nov 2024
https://open-compass.github.io/CriticEval/
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs
Last synced: 01 Nov 2024
https://github.com/open-compass/anah
[ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2
acl gpt hallucination-detection llms neurips
Last synced: 08 Nov 2024
https://github.com/open-compass/code-evaluator
A multi-language code evaluation tool.
Last synced: 08 Nov 2024
https://github.com/open-compass/prosa
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Last synced: 08 Nov 2024
https://github.com/open-compass/gta
Official repository for paper "GTA: A Benchmark for General Tool Agents"
Last synced: 08 Nov 2024
https://github.com/open-compass/cibench
Official Repo of "CIBench: Evaluation of LLMs as Code Interpreter "
Last synced: 08 Nov 2024