https://github.com/MCEVAL/McEval

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/MCEVAL/McEval
Owner: MCEVAL
Created: 2024-06-06T09:58:42.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-12-12T09:34:58.000Z (7 months ago)
Last Synced: 2024-12-12T10:27:43.890Z (7 months ago)
Language: Python
Size: 17.8 MB
Stars: 29
Watchers: 2
Forks: 2
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE/CODE-LICENSE

Awesome Lists containing this project

awesome-llm-eval - McEval - 4相比，在多语言的编程能力上仍然存在较大差距，绝大多数开源模型甚至无法超越GPT-3.5。此外测试也表明开源模型中如Codestral，DeepSeek-Coder, CodeQwen以及一些衍生模型也展现出优异的多语言能力. McEval is a massively multilingual code benchmark covering 40 programming languages with 16K test samples, which substantially pushes the limits of code LLMs in multilingual scenarios. The benchmark contains challenging code completion, understanding, and generation evaluation tasks with finely curated massively multilingual instruction corpora McEval-Instruct. McEval leaderboard can be found [here](https://mceval.github.io/). (2024-06-11) | (Datasets-or-Benchmark / 代码能力)
StarryDivineSky - MCEVAL/McEval

README

        



  

    

  







  

    

  

  

    

  

  



# McEval: Massively Multilingual Code Evaluation

Official repository for our paper "McEval: Massively Multilingual Code Evaluation"



    🏠 Home Page  •

    📊 Benchmark Data  •

    📚 Instruct Data  •

    🏆 Leaderboard  



## Table of contents

- [McEval: Massively Multilingual Code Evaluation](#Introduction)

  - [📌 Introduction](#introduction)

  - [🏆 Leaderboard](#leaderboard)

  - [📋 Task](#task)

  - [📚 Data](#data)

  - [💻 Usage](#usage)

  - [📖 Citation](#citation)

## Introduction

**McEval** is a massively multilingual code benchmark covering **40** programming languages with **16K** test samples, which substantially pushes the limits of code LLMs in multilingual scenarios.







### Task Examples







Furthermore, we curate massively multilingual instruction corpora **McEval-Instruct**.

Refer to our paper for more details. 

## Results 













Refer to our 🏆 Leaderboard   for more results.

## Data



| **Dataset** |  **Download** |

| :------------: | :------------: |

| McEval Evaluation Dataset  | [🤗 HuggingFace](https://huggingface.co/datasets/Multilingual-Multimodal-NLP/McEval)   |

| McEval-Instruct  | [🤗 HuggingFace](https://huggingface.co/datasets/Multilingual-Multimodal-NLP/McEval-Instruct)    |



## Usage

### Environment

Runtime environments for different programming languages could be found in [Environments](asserts/eval_env.png)

We recommend using Docker for evaluation, we have created a Docker image with all the necessary environments pre-installed.

Directly pull the image from Docker Hub or Aliyun Docker Hub:

```bash 

# Docker hub:

docker pull multilingualnlp/mceval

# Aliyun docker hub:

docker pull registry.cn-hangzhou.aliyuncs.com/mceval/mceval:v1

docker run -it -d --restart=always --name mceval_dev --workdir  /   /bin/bash

docker attach mceval_dev

``` 

### Inference

We provide some model inference codes, including torch and vllm implementations.

#### Inference with torch 

Take the evaluation generation task as an example.

```bash

cd inference 

bash scripts/inference_torch.sh

```

#### Inference with vLLM(recommended)

Take the evaluation generation task as an example.

```bash

cd inference 

bash scripts/run_generation_vllm.sh

```

### Evaluation

#### Data Format 

**🛎️ Please prepare the inference results of the model in the following format and use them for the next evaluation step.**

(1) Folder Structure

Place the data in the following folder structure, each file corresponds to the test results of each language. 

```bash 

\evaluate_model_name 

  - CPP.jsonl

  - Python.jsonl

  - Java.jsonl

  ...

```

You can use script [split_result.py](inference/split_result.py) to split inference results. 

```bash 

python split_result --split_file  --save_dir 

```

(2) File Format 

Each line in the file for each test language has the following format.

The *raw_generation* field is the generated code.

More examples can be found in [Evualute Data Format Examples](examples/evaluate/)

```bash 

{

    "task_id": "Lang/1",

    "prompt": "",

    "canonical_solution": "",

    "test": "",

    "entry_point": "",

    "signature": "",

    "docstring": "",

    "instruction": "",

    "raw_generation": [""]

}

```

#### Evaluate Generation Task

Take the evaluation generation task as an example.

```bash

cd eval 

bash scripts/eval_generation.sh

```

## Mcoder

We have open-sourced the code for [Mcoder](Mcoder/) training, including [CodeQwen1.5](https://github.com/QwenLM/CodeQwen1.5) and [DeepSeek-Coder](https://github.com/deepseek-ai/deepseek-coder) as base models.

We will make the model weights of Mcoder available for download soon.

## More Examples

More examples could be found in [Examples](docs/Examples.md)

## License

This code repository is licensed under the [the MIT License](LICENSE-CODE). The use of McEval data is subject to the [CC-BY-SA-4.0](LICENSE-DATA).

## Citation

If you find our work helpful, please use the following citations.

```bibtext

@article{mceval,

  title={McEval: Massively Multilingual Code Evaluation},

  author={Chai, Linzheng and Liu, Shukai and Yang, Jian and Yin, Yuwei and Jin, Ke and Liu, Jiaheng and Sun, Tao and Zhang, Ge and Ren, Changyu and Guo, Hongcheng and others},

  journal={arXiv e-prints},

  pages={arXiv--2406},

  year={2024}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/MCEVAL/McEval

Awesome Lists containing this project

README