{"id":13653137,"url":"https://github.com/MCEVAL/McEval","last_synced_at":"2025-04-23T06:31:18.948Z","repository":{"id":243046337,"uuid":"811287610","full_name":"MCEVAL/McEval","owner":"MCEVAL","description":null,"archived":false,"fork":false,"pushed_at":"2024-12-12T09:34:58.000Z","size":18685,"stargazers_count":29,"open_issues_count":3,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-12-12T10:27:43.890Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MCEVAL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE/CODE-LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-06T09:58:42.000Z","updated_at":"2024-12-12T09:35:02.000Z","dependencies_parsed_at":"2024-11-10T04:40:44.654Z","dependency_job_id":null,"html_url":"https://github.com/MCEVAL/McEval","commit_stats":null,"previous_names":["mceval/mceval"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MCEVAL%2FMcEval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MCEVAL%2FMcEval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MCEVAL%2FMcEval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MCEVAL%2FMcEval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MCEVAL","download_url":"https://codeload.github.com/MCEVAL/McEval/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250384893,"owners_count":21421813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T02:01:06.342Z","updated_at":"2025-04-23T06:31:13.928Z","avatar_url":"https://github.com/MCEVAL.png","language":"Python","readme":"\u003c!-- # MCEVAL: Massively Multilingual Code Evaluation --\u003e\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://mceval.github.io/\"\u003e\n    \u003cimg src=\"assets/icon.png\" width=\"25%\" alt=\"McEval\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003cdiv align=\"center\" style=\"line-height: 1;\"\u003e\n  \u003ca href=\"\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Code License\" src=\"https://img.shields.io/badge/Code_License-MIT-f5de53%3F?color=green\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Data License\" src=\"https://img.shields.io/badge/Data_License-CC--BY--SA--4.0-f5de53%3F?color=blue\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e\n  \u003c!-- \u003ca href=\"\" style=\"margin: 2px;\"\u003e\n    \u003cimg alt=\"Data License\" src=\"https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?\u0026color=f5de53\" style=\"display: inline-block; vertical-align: middle;\"/\u003e\n  \u003c/a\u003e --\u003e\n\n\u003c/div\u003e\n\n\n# McEval: Massively Multilingual Code Evaluation\nOfficial repository for our paper \"McEval: Massively Multilingual Code Evaluation\"\n\n\n\u003cp align=\"left\"\u003e\n    \u003ca href=\"https://mceval.github.io/\"\u003e🏠 Home Page \u003c/a\u003e •\n    \u003ca href=\"https://huggingface.co/datasets/Multilingual-Multimodal-NLP/McEval\"\u003e📊 Benchmark Data \u003c/a\u003e •\n    \u003ca href=\"https://huggingface.co/datasets/Multilingual-Multimodal-NLP/McEval-Instruct\"\u003e📚 Instruct Data \u003c/a\u003e •\n    \u003ca href=\"https://mceval.github.io/leaderboard.html\"\u003e🏆 Leaderboard \u003c/a\u003e \n\u003c/p\u003e\n\n\n## Table of contents\n- [McEval: Massively Multilingual Code Evaluation](#Introduction)\n  - [📌 Introduction](#introduction)\n  - [🏆 Leaderboard](#leaderboard)\n  - [📋 Task](#task)\n  - [📚 Data](#data)\n  - [💻 Usage](#usage)\n  - [📖 Citation](#citation)\n\n\n## Introduction\n**McEval** is a massively multilingual code benchmark covering **40** programming languages with **16K** test samples, which substantially pushes the limits of code LLMs in multilingual scenarios.\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/intro.png\" width=\"50%\" alt=\"McEval\" /\u003e\n\u003c/p\u003e\n\n\n### Task Examples\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/bench_cases.png\" width=\"80%\" alt=\"McEval\" /\u003e\n\u003c/p\u003e\n\n\u003c!-- ### Languages\n`['AWK','C','CPP','C#','CommonLisp','CoffeeScript','Dart','EmacsLisp','Elixir','Erlang','Fortran','F#','Go','Groovy','Haskell','HTML','Java','JavaScript','JSON','Julia','Kotlin','Lua','Markdown','Pascal','Perl','PHP','PowerShell','Python','R','Racket','Ruby','Rust','Scala','Scheme','Shell','Swift','Tcl','TypeScript','VisualBasic','VimScript']` --\u003e\n\nFurthermore, we curate massively multilingual instruction corpora **McEval-Instruct**.\n\nRefer to our paper for more details. \n\n## Results \n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/gen_result.png\" width=\"100%\" alt=\"McEval\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/cate_result.png\" width=\"100%\" alt=\"McEval\" /\u003e\n\u003c/p\u003e\n\n\nRefer to our \u003ca href=\"https://mceval.github.io/leaderboard.html\"\u003e🏆 Leaderboard \u003c/a\u003e  for more results.\n\n\n## Data\n\u003cdiv align=\"center\"\u003e\n\n| **Dataset** |  **Download** |\n| :------------: | :------------: |\n| McEval Evaluation Dataset  | [🤗 HuggingFace](https://huggingface.co/datasets/Multilingual-Multimodal-NLP/McEval)   |\n| McEval-Instruct  | [🤗 HuggingFace](https://huggingface.co/datasets/Multilingual-Multimodal-NLP/McEval-Instruct)    |\n\n\u003c/div\u003e\n\n\n## Usage\n\n\n### Environment\n\nRuntime environments for different programming languages could be found in [Environments](asserts/eval_env.png)\n\nWe recommend using Docker for evaluation, we have created a Docker image with all the necessary environments pre-installed.\n\n\u003c!-- Docker images will be released soon. --\u003e\nDirectly pull the image from Docker Hub or Aliyun Docker Hub:\n\n\n```bash \n# Docker hub:\ndocker pull multilingualnlp/mceval\n\n# Aliyun docker hub:\ndocker pull registry.cn-hangzhou.aliyuncs.com/mceval/mceval:v1\n\ndocker run -it -d --restart=always --name mceval_dev --workdir  / \u003cimage-name\u003e  /bin/bash\ndocker attach mceval_dev\n``` \n\n### Inference\nWe provide some model inference codes, including torch and vllm implementations.\n\n#### Inference with torch \nTake the evaluation generation task as an example.\n```bash\ncd inference \nbash scripts/inference_torch.sh\n```\n\n#### Inference with vLLM(recommended)\nTake the evaluation generation task as an example.\n```bash\ncd inference \nbash scripts/run_generation_vllm.sh\n```\n\n### Evaluation\n\n#### Data Format \n**🛎️ Please prepare the inference results of the model in the following format and use them for the next evaluation step.**\n\n(1) Folder Structure\nPlace the data in the following folder structure, each file corresponds to the test results of each language. \n```bash \n\\evaluate_model_name \n  - CPP.jsonl\n  - Python.jsonl\n  - Java.jsonl\n  ...\n```\nYou can use script [split_result.py](inference/split_result.py) to split inference results. \n```bash \npython split_result --split_file \u003cinference_result\u003e --save_dir \u003csave_dir\u003e\n```\n\n(2) File Format \nEach line in the file for each test language has the following format.\nThe *raw_generation* field is the generated code.\nMore examples can be found in [Evualute Data Format Examples](examples/evaluate/)\n```bash \n{\n    \"task_id\": \"Lang/1\",\n    \"prompt\": \"\",\n    \"canonical_solution\": \"\",\n    \"test\": \"\",\n    \"entry_point\": \"\",\n    \"signature\": \"\",\n    \"docstring\": \"\",\n    \"instruction\": \"\",\n    \"raw_generation\": [\"\u003cGenerated Code\u003e\"]\n}\n```\n\n\n#### Evaluate Generation Task\nTake the evaluation generation task as an example.\n```bash\ncd eval \nbash scripts/eval_generation.sh\n```\n\n## Mcoder\nWe have open-sourced the code for [Mcoder](Mcoder/) training, including [CodeQwen1.5](https://github.com/QwenLM/CodeQwen1.5) and [DeepSeek-Coder](https://github.com/deepseek-ai/deepseek-coder) as base models.\n\nWe will make the model weights of Mcoder available for download soon.\n\n\n## More Examples\nMore examples could be found in [Examples](docs/Examples.md)\n\n## License\nThis code repository is licensed under the [the MIT License](LICENSE-CODE). The use of McEval data is subject to the [CC-BY-SA-4.0](LICENSE-DATA).\n\n## Citation\nIf you find our work helpful, please use the following citations.\n```bibtext\n@article{mceval,\n  title={McEval: Massively Multilingual Code Evaluation},\n  author={Chai, Linzheng and Liu, Shukai and Yang, Jian and Yin, Yuwei and Jin, Ke and Liu, Jiaheng and Sun, Tao and Zhang, Ge and Ren, Changyu and Guo, Hongcheng and others},\n  journal={arXiv e-prints},\n  pages={arXiv--2406},\n  year={2024}\n}\n```\n\n\n\u003c!-- ## Contact  --\u003e\n\n\n\n","funding_links":[],"categories":["Datasets-or-Benchmark","A01_文本生成_文本对话"],"sub_categories":["代码能力","大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMCEVAL%2FMcEval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMCEVAL%2FMcEval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMCEVAL%2FMcEval/lists"}