https://github.com/fsoft-ai4code/code-llm-evaluator

🏎️ Fast Code LLMs evaluator
https://github.com/fsoft-ai4code/code-llm-evaluator

Last synced: 6 months ago
JSON representation

🏎️ Fast Code LLMs evaluator

Host: GitHub
URL: https://github.com/fsoft-ai4code/code-llm-evaluator
Owner: FSoft-AI4Code
License: mit
Created: 2024-03-28T15:21:31.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-05-14T16:55:35.000Z (about 2 years ago)
Last Synced: 2025-09-05T13:17:45.136Z (10 months ago)
Language: Python
Homepage: https://code-llm-evaluator.readthedocs.io/en/latest/
Size: 39.1 KB
Stars: 3
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- License: LICENSE
- Citation: CITATION.bib

Awesome Lists containing this project

README

          =================

CodeLLM Evaluator

=================

Easy to evaluate with fast inference settings CodeLLMs

Overview

========

`CodeLLM Evaluator` provide the ability for fast and efficiently evaluation 

on code generation task. Inspired by `lm-evaluation-harness `_ and `bigcode-eval-harness `_,

we designed our framework for multiple use-case, easy to add new metrics and customized task.

**Features:**

* Implemented HumanEval, MBPP benchmarks for Coding LLMs.

* Support for models loaded via `transformers `_, `DeepSpeed `_.

* Support for evaluation on adapters (e.g. LoRA) supported in HuggingFace's `PEFT `_ library.

* Support for inference with distributed native transformers or fast inference with `VLLMs `_ backend.

* Easy support for custom prompts, task and metrics.

Setup

=====

Install `code-eval` package from the github repository via `pip`:

.. code-block:: console

    $ git clone https://github.com/FSoft-AI4Code/code-llm-evaluator.git

    $ cd code-llm-evaluator

    $ pip install -e .

Quick-start

===========

To evaluate a supported task in python, you can load our :py:func:`code_eval.Evaluator` to generate

and compute evaluate metrics on the run.

.. code-block:: python

    from code_eval import Evaluator

    from code_eval.task import HumanEval

    task = HumanEval()

    evaluator = Evaluator(task=task)

    output = evaluator.generate(num_return_sequences=3,

                                batch_size=16,

                                temperature=0.9)

    result = evaluator.evaluate(output)

CLI Usage

=========

Inference with Transformers

---------------------------

Load model and generate answer using native transformers (``tf``), pass model local path or

HuggingFace Hub name. We select transformers as default backend, but you can pass ``backend="tf"`` to specify it:

.. code-block:: console

    $ code-eval --model_name microsoft/phi-1 \

        --task humaneval \

        --batch_size 8 \

        --backend hf \

.. tip:: 

    Load LoRA adapters by add ``--peft_model`` argument. The ``--model_name`` must point

    to full model architecture.

    .. code-block:: console

        $ code-eval --model_name microsoft/phi-1 \

            --peft_model  \

            --task humaneval \

            --batch_size 8 \

            --backend hf \

Inference with vLLM engine

--------------------------

We recommend using vLLM engine for fast inference. 

vLLM supported tensor parallel, data parallel or combination of both.

Reference to vLLM documenation for more detail. 

To use ``code-eval`` with vLLM engine, please refer to vLLM engine documents to `instal it `_.

.. note:: 

    

    You can install vLLM using pip:

    .. code-block:: console

        $ pip install vllm

With model supported by vLLM (See more: `vLLM supported model `_) 

run:

.. code-block:: console

    $ code-eval --model_name microsoft/phi-1 \

        --task humaneval \

        --batch_size 8 \

        --backend vllm

.. tip::

    You can use LoRA with similar syntax.

    .. code-block:: console

        $ code-eval --model_name microsoft/phi-1 \

            --peft_model  \

            --task humaneval \

            --batch_size 8 \

            --backend vllm \

Cite as

=======

.. code-block:: 

    @misc{code-eval,

        author       = {Dung Nguyen Manh},

        title        = {A framework for easily evaluation code generation model},

        month        = 3,

        year         = 2024,

        publisher    = {github},

        version      = {v0.0.1},

        url          = {https://github.com/FSoft-AI4Code/code-llm-evaluator}

    }

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fsoft-ai4code/code-llm-evaluator

Awesome Lists containing this project

README