https://github.com/fsoft-ai4code/code-llm-evaluator
🏎️ Fast Code LLMs evaluator
https://github.com/fsoft-ai4code/code-llm-evaluator
Last synced: 2 months ago
JSON representation
🏎️ Fast Code LLMs evaluator
- Host: GitHub
- URL: https://github.com/fsoft-ai4code/code-llm-evaluator
- Owner: FSoft-AI4Code
- License: mit
- Created: 2024-03-28T15:21:31.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-14T16:55:35.000Z (almost 2 years ago)
- Last Synced: 2025-09-05T13:17:45.136Z (7 months ago)
- Language: Python
- Homepage: https://code-llm-evaluator.readthedocs.io/en/latest/
- Size: 39.1 KB
- Stars: 3
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- License: LICENSE
- Citation: CITATION.bib
Awesome Lists containing this project
README
=================
CodeLLM Evaluator
=================
Easy to evaluate with fast inference settings CodeLLMs
Overview
========
`CodeLLM Evaluator` provide the ability for fast and efficiently evaluation
on code generation task. Inspired by `lm-evaluation-harness `_ and `bigcode-eval-harness `_,
we designed our framework for multiple use-case, easy to add new metrics and customized task.
**Features:**
* Implemented HumanEval, MBPP benchmarks for Coding LLMs.
* Support for models loaded via `transformers `_, `DeepSpeed `_.
* Support for evaluation on adapters (e.g. LoRA) supported in HuggingFace's `PEFT `_ library.
* Support for inference with distributed native transformers or fast inference with `VLLMs `_ backend.
* Easy support for custom prompts, task and metrics.
Setup
=====
Install `code-eval` package from the github repository via `pip`:
.. code-block:: console
$ git clone https://github.com/FSoft-AI4Code/code-llm-evaluator.git
$ cd code-llm-evaluator
$ pip install -e .
Quick-start
===========
To evaluate a supported task in python, you can load our :py:func:`code_eval.Evaluator` to generate
and compute evaluate metrics on the run.
.. code-block:: python
from code_eval import Evaluator
from code_eval.task import HumanEval
task = HumanEval()
evaluator = Evaluator(task=task)
output = evaluator.generate(num_return_sequences=3,
batch_size=16,
temperature=0.9)
result = evaluator.evaluate(output)
CLI Usage
=========
Inference with Transformers
---------------------------
Load model and generate answer using native transformers (``tf``), pass model local path or
HuggingFace Hub name. We select transformers as default backend, but you can pass ``backend="tf"`` to specify it:
.. code-block:: console
$ code-eval --model_name microsoft/phi-1 \
--task humaneval \
--batch_size 8 \
--backend hf \
.. tip::
Load LoRA adapters by add ``--peft_model`` argument. The ``--model_name`` must point
to full model architecture.
.. code-block:: console
$ code-eval --model_name microsoft/phi-1 \
--peft_model \
--task humaneval \
--batch_size 8 \
--backend hf \
Inference with vLLM engine
--------------------------
We recommend using vLLM engine for fast inference.
vLLM supported tensor parallel, data parallel or combination of both.
Reference to vLLM documenation for more detail.
To use ``code-eval`` with vLLM engine, please refer to vLLM engine documents to `instal it `_.
.. note::
You can install vLLM using pip:
.. code-block:: console
$ pip install vllm
With model supported by vLLM (See more: `vLLM supported model `_)
run:
.. code-block:: console
$ code-eval --model_name microsoft/phi-1 \
--task humaneval \
--batch_size 8 \
--backend vllm
.. tip::
You can use LoRA with similar syntax.
.. code-block:: console
$ code-eval --model_name microsoft/phi-1 \
--peft_model \
--task humaneval \
--batch_size 8 \
--backend vllm \
Cite as
=======
.. code-block::
@misc{code-eval,
author = {Dung Nguyen Manh},
title = {A framework for easily evaluation code generation model},
month = 3,
year = 2024,
publisher = {github},
version = {v0.0.1},
url = {https://github.com/FSoft-AI4Code/code-llm-evaluator}
}