{"id":15037370,"url":"https://github.com/thudm/codegeex","last_synced_at":"2025-05-14T02:04:55.880Z","repository":{"id":59844493,"uuid":"537827151","full_name":"THUDM/CodeGeeX","owner":"THUDM","description":"CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)","archived":false,"fork":false,"pushed_at":"2024-08-13T05:59:38.000Z","size":14413,"stargazers_count":8447,"open_issues_count":168,"forks_count":630,"subscribers_count":90,"default_branch":"main","last_synced_at":"2025-04-08T22:19:24.818Z","etag":null,"topics":["code-generation","pretrained-models","tools"],"latest_commit_sha":null,"homepage":"https://codegeex.cn","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/THUDM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-17T14:06:29.000Z","updated_at":"2025-04-08T16:55:00.000Z","dependencies_parsed_at":"2024-10-30T22:30:24.974Z","dependency_job_id":"7c278f56-1ac3-4991-8082-50519c746b89","html_url":"https://github.com/THUDM/CodeGeeX","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FCodeGeeX","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FCodeGeeX/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FCodeGeeX/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/THUDM%2FCodeGeeX/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/THUDM","download_url":"https://codeload.github.com/THUDM/CodeGeeX/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254052692,"owners_count":22006716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-generation","pretrained-models","tools"],"created_at":"2024-09-24T20:34:27.805Z","updated_at":"2025-05-14T02:04:55.860Z","avatar_url":"https://github.com/THUDM.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"resources/logo/codegeex_logo.png\"\u003e\n\n\u003cp align=\"center\"\u003e\n    🏠 \u003ca href=\"https://codegeex.cn\" target=\"_blank\"\u003eHomepage\u003c/a\u003e | 📖 \u003ca href=\"https://models.aminer.cn/codegeex/blog/\" target=\"_blank\"\u003eBlog\u003c/a\u003e | 🪧 \u003ca href=\"https://models.aminer.cn/codegeex/playground\" target=\"_blank\"\u003eDEMO\u003c/a\u003e | 🤖 \u003ca href=\"https://codegeex.cn/download/request\" target=\"_blank\"\u003eDownload Model\u003c/a\u003e | 📄 \u003ca href=\"https://arxiv.org/abs/2303.17568\" target=\"_blank\"\u003ePaper\u003c/a\u003e | 🌐 \u003ca href=\"README_zh.md\" target=\"_blank\"\u003e中文\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    🛠 \u003ca href=\"https://marketplace.visualstudio.com/items?itemName=aminer.codegeex\" target=\"_blank\"\u003eVS Code\u003c/a\u003e, \u003ca href=\"https://plugins.jetbrains.com/plugin/20587-codegeex\" target=\"_blank\"\u003eJetbrains\u003c/a\u003e, \u003ca href=\"https://plugins.jetbrains.com/plugin/20587-codegeex\" target=\"_blank\"\u003eCloud Studio\u003c/a\u003e supported | 👋 Join our \u003ca href=\"https://discord.gg/8gjHdkmAN6\" target=\"_blank\"\u003eDiscord\u003c/a\u003e, \u003ca href=\"https://join.slack.com/t/codegeexworkspace/shared_invite/zt-1s118ffrp-mpKKhQD0tKBmzNZVCyEZLw\" target=\"_blank\"\u003eSlack\u003c/a\u003e, \u003ca href=\"https://t.me/+IipIayJ32B1jOTg1\" target=\"_blank\"\u003eTelegram\u003c/a\u003e, \u003ca href=\"resources/zh/wechat.md\"target=\"_blank\"\u003eWeChat\u003c/a\u003e\n\u003c/p\u003e\n\n\n\n🌟 The newest [CodeGeeX4](https://github.com/THUDM/CodeGeeX4) has been released. | 最新一代 [CodeGeeX4](https://github.com/THUDM/CodeGeeX4) 模型已经正式开源。\n\n- [CodeGeeX: A Multilingual Code Generation Model](#codegeex-a-multilingual-code-generation-model)\n  - [News](#news)\n  - [Getting Started](#getting-started)\n    - [Installation](#installation)\n    - [Model Weights](#model-weights)\n    - [Inference on GPUs](#inference-on-gpus)\n    - [VS Code and Jetbrains Extension Guidance](#vs-code-and-jetbrains-extension-guidance)\n  - [CodeGeeX: Architecture, Code Corpus, and Implementation](#codegeex-architecture-code-corpus-and-implementation)\n  - [HumanEval-X: A new benchmark for Multilingual Program Synthesis](#humaneval-x-a-new-benchmark-for-multilingual-program-synthesis)\n    - [Multilingual Code Generation](#multilingual-code-generation)\n    - [Crosslingual Code Translation](#crosslingual-code-translation)\n    - [How to use HumanEval-X and contribute to it?](#how-to-use-humaneval-x-and-contribute-to-it)\n  - [License](#license)\n  - [Citation](#citation)\n\n# CodeGeeX: A Multilingual Code Generation Model\n\nWe introduce CodeGeeX, a large-scale multilingual code generation model with 13 billion parameters, pre-trained on a large code corpus of more than 20 programming languages. As of **June 22**, 2022, CodeGeeX has been trained on more than 850 billion tokens on a cluster of 1,536 [Ascend 910 AI Processors](https://e.huawei.com/en/products/servers/ascend). CodeGeeX has several unique features:\n* **Multilingual Code Generation**: CodeGeeX has good performance for generating executable programs in several mainstream programming languages, including Python, C++, Java, JavaScript, Go, etc. [DEMO](https://models.aminer.cn/codegeex)\n* **Crosslingual Code Translation**: CodeGeeX supports the translation of code snippets between different languages. Simply by one click, CodeGeeX can transform a program into any expected language with a high accuracy. [DEMO](https://models.aminer.cn/codegeex/codeTranslator)\n* **Customizable Programming Assistant**: CodeGeeX is available in the VS Code extension marketplace **for free**. It supports code completion, explanation, summarization and more, which empower users with a better coding experience. [VS Code Extension](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex)\n* **Open-Source and Cross-Platform**: All codes and model weights are publicly available for research purposes. CodeGeeX supports both Ascend and NVIDIA platforms. It supports inference in a single Ascend 910, NVIDIA V100 or A100. [Apply Model Weights](https://models.aminer.cn/codegeex/download/request)\n\n**HumanEval-X for Realistic Multilingual Benchmarking.** To help standardize the evaluation of multilingual code generation and translation, we develop and release the **HumanEval-X** Benchmark. HumanEval-X is a new multilingual benchmark that contains **820 human-crafted** coding problems in **5** programming languages (Python, C++, Java, JavaScript, and Go), each of these problems is associated with tests and solutions. [Usage](codegeex/benchmark/README.md)  [🤗 Available in HuggingFace](https://huggingface.co/datasets/THUDM/humaneval-x)\n\n\u003cimg src=\"resources/en/hx_boxplot.png\"\u003e\n\n\u003cp align=\"center\"\u003e\u003ci\u003eCodeGeeX achieves the highest average performance compared with other open-sourced multilingual baselines.\u003c/i\u003e \u003c/p\u003e\n\n## News\n\n* 🌟 **2023-07-24**: [CodeGeeX2](https://github.com/THUDM/CodeGeeX2) has been released, more powerful, faster, and lightweight. Support 100+ languages and many new features.\n\n* **2023-5-16**: CodeGeeX paper has been accepted by [KDD 2023, Long Beach](https://kdd.org/kdd2023/) and will be represented during the conference.\n\n* **2023-03-30**: CodeGeeX paper is now available at [arxiv](https://arxiv.org/abs/2303.17568).\n\n* **2023-02-14**: CodeGeeX now supports [Cloud Studio](https://cloudstudio.net/), a fantastic web IDE from Tencent. Click on the badge on top of this page to quickly launch an environment to test CodeGeeX.\n\n* **2023-02-13**: Thanks a lot to [OneFlow](https://github.com/Oneflow-Inc/oneflow) team for adding oneflow backend for CodeGeeX's inference (Even faster than FasterTransformer under FP16!). Check more details [here](https://github.com/THUDM/CodeGeeX/pull/65).\n\n* **2023-02**: We are hosting [CodeGeeX \"Coding With AI\" Hackathon](https://dorahacks.io/hackathon/codegeex/), design cool applications based on CodeGeeX and win prizes (RTX 4090, DJI drone, etc)!\n\n* **2022-12-31**: We release the FasterTransformer version of CodeGeeX in [codegeex-fastertransformer](https://github.com/CodeGeeX/codegeex-fastertransformer). The INT8 accelerated version reaches an a verage speed of \u003c15ms/token. Happy new year to everyone!\n\n* **2022-12-13**: We release the source code of CodeGeeX VS Code extension in [codegeex-vscode-extension](https://github.com/CodeGeeX/codegeex-vscode-extension). Follow [QuickStart](https://github.com/CodeGeeX/codegeex-vscode-extension/blob/main/doc/quickstart.md) to start development.\n\n* **2022-12-11**: CodeGeeX is now available for Jetbrains IDEs (IntelliJ IDEA, PyCharm, GoLand, CLion, etc), download it [here](https://plugins.jetbrains.com/plugin/20587-codegeex).\n\n* **2022-12-04**: We release source code of quantization (requires less GPU RAM: 27GB -\u003e 15GB) and model parallelism (possible to run on multiple GPUs with \u003c8G RAM).\n \n* **2022-09-30**: We release the cross-platform source code and models weights for both Ascend and NVIDIA platforms.\n\n## Getting Started\n\nCodeGeeX is initially implemented in Mindspore and trained Ascend 910 AI Processors. We provide a torch-compatible version based on [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) to facilitate usage on GPU platforms.\n### Installation\n\nPython 3.7+ / CUDA 11+ / PyTorch 1.10+ / DeepSpeed 0.6+ are required. Install ``codegeex`` package via: \n```bash\ngit clone git@github.com:THUDM/CodeGeeX.git\ncd CodeGeeX\npip install -e .\n```\nOr use [CodeGeeX docker](https://hub.docker.com/r/codegeex/codegeex) to quickly set up the environment (with [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) installed):\n```bash\ndocker pull codegeex/codegeex:latest\n# To enable GPU support, clarify device ids with --device\ndocker run --gpus '\"device=0,1\"' -it --ipc=host --name=codegeex codegeex/codegeex\n```\n\n### Model Weights\n\nApply and download model weights through this [link](https://models.aminer.cn/codegeex/download/request). You'll receive by mail ```urls.txt``` that contains temporary download links. We recommend you to use [aria2](https://aria2.github.io/) to download it via the following command (Please make sure you have enough disk space to download the checkpoint (~26GB)):\n```bash\naria2c -x 16 -s 16 -j 4 --continue=true -i urls.txt \n```\nRun the following command to get the full model weights:\n```bash\ncat codegeex_13b.tar.gz.* \u003e codegeex_13b.tar.gz\ntar xvf codegeex_13b.tar.gz\n```\n\n### Inference on GPUs\n\nHave a try on generating the first program with CodeGeeX. First, specify the path of the model weights in ``configs/codegeex_13b.sh``. Second, write the prompt (natural language description or code snippet) into a file, e.g., ``tests/test_prompt.txt``, then run the following script:\n```bash\n# On a single GPU (with more than 27GB RAM)\nbash ./scripts/test_inference.sh \u003cGPU_ID\u003e ./tests/test_prompt.txt\n\n# With quantization (with more than 15GB RAM)\nbash ./scripts/test_inference_quantized.sh \u003cGPU_ID\u003e ./tests/test_prompt.txt\n\n# On multiple GPUs (with more than 6GB RAM, need to first convert ckpt to MP_SIZE partitions)\nbash ./scripts/convert_ckpt_parallel.sh \u003cLOAD_CKPT_PATH\u003e \u003cSAVE_CKPT_PATH\u003e \u003cMP_SIZE\u003e\nbash ./scripts/test_inference_parallel.sh \u003cMP_SIZE\u003e ./tests/test_prompt.txt\n```\n\n### VS Code and Jetbrains Extension Guidance\n\nBased on CodeGeeX, we also develop free extentions for VS Code and Jetbrains IDEs, and more in the future. \n\nFor VS Code, search \"codegeex\" in Marketplace or install it [here](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex). Detailed instructions can be found in \n[VS Code Extension Guidance](vscode-extension/README.md). For developers, we have also released the source code in [codegeex-vscode-extension](https://github.com/CodeGeeX/codegeex-vscode-extension), please follow [QuickStart](https://github.com/CodeGeeX/codegeex-vscode-extension/blob/main/doc/quickstart.md) to start development.\n\nFor Jetbrains IDEs, search \"codegeex\" in Plugins or install it [here](https://plugins.jetbrains.com/plugin/20587-codegeex). \nMake sure your IDE version is 2021.1 or later. CodeGeeX now supports IntelliJ IDEA, PyCharm, GoLand, CLion, Android Studio, AppCode, Aqua, DataSpell, DataGrip, Rider, RubyMine, and WebStorm. \n\n## CodeGeeX: Architecture, Code Corpus, and Implementation\n\n**Architecture**: CodeGeeX is a large-scale pre-trained programming language model based on transformers. It is a left-to-right autoregressive decoder, which takes code and natural language as input and predicts the probability of the next token. CodeGeeX contains 40 transformer layers with a hidden size of 5,120 for self-attention blocks and 20,480 for feed-forward layers, making its size reach 13 billion parameters. It supports a maximum sequence length of 2,048.\n\n\u003cimg src=\"resources/en/codegeex_training.png\"\u003e\n\u003cp align=\"center\"\u003e\u003ci\u003e\u003cb\u003eLeft:\u003c/b\u003e the proportion of programming languages in CodeGeeX's training data. \n  \u003cb\u003eRight:\u003c/b\u003e the plot of training loss against the training steps of CodeGeeX.\u003c/i\u003e\u003c/p\u003e\n\n**Code Corpus**: Our training data contains two parts. The first part is from open-sourced code datasets, [The Pile](https://pile.eleuther.ai/) and [CodeParrot](https://github.com/huggingface/transformers/tree/main/examples/research_projects/codeparrot). The Pile contains a subset of code corpus that collects public repositories with more than 100 stars from GitHub, from which we select codes in 23 popular programming languages. The second part is supplementary data directly scrapped from the public GitHub repositories that do not appear in previous datasets, including Python, Java and C++. To obtain data of potentially higher quality, repositories with at least one star and its size smaller than 10MB are chosen. A file is filtered out if it 1) has more than 100 characters per line on average, 2) is automatically generated, 3) has a ratio of alphabet less than 40%, or 4) is bigger than 100KB or smaller than 1KB. To help the model distinguish different languages, we add a language-specific prefix at the beginning of each segment in the form of ``[Comment sign] language: [LANG]``, e.g., ``# language: Python``. For tokenization, we use the same tokenizer as GPT-2 and process whitespaces as extra tokens, resulting in a vocabulary of 50,400 tokens. In total, the code corpus has 23 programming languages with 158.7B tokens.\n\n**Training**: We implement CodeGeeX in [Mindspore 1.7](https://www.mindspore.cn/) and train it on 1,536 Ascend 910 AI Processor (32GB). The model weights are under FP16 format, except that we use FP32 for layer-norm and softmax for higher precision and stability. The entire model consumes about 27GB of memory. To increase the training efficiency, we adopt an 8-way model parallel training together with 192-way data parallel training, with ZeRO-2 optimizer enabled. The micro-batch size is 16 and the global batch size reaches 3,072. Moreover, we adopt techniques to further boost the training efficiency including the element-wise operator fusion, fast gelu activation, matrix multiplication dimension optimization, etc. The entire training process takes nearly two months, spanning from April 18 to June 22, 2022, during which 850B tokens were passed for training, i.e., 5+ epochs.\n\n## HumanEval-X: A new benchmark for Multilingual Program Synthesis\nTo better evaluate the multilingual ability of code generation models, we propose a new benchmark HumanEval-X. While previous works evaluate multilingual program synthesis under semantic similarity (e.g., [CodeBLEU](https://arxiv.org/abs/2009.10297)) which is often misleading, HumanEval-X evaluates the functional correctness of the generated programs. HumanEval-X consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks.\n\n\u003cimg src=\"resources/en/hx_tasks.png\"\u003e\n\n\u003cp align=\"center\"\u003e\u003ci\u003eAn illustration of tasks supported by \u003cb\u003eHumanEval-X\u003c/b\u003e. Declarations, docstrings, and solutions are marked with red, green, and blue respectively. \u003cb\u003eCode generation\u003c/b\u003e uses declaration and docstring as input, to generate solution. \u003cb\u003eCode translation\u003c/b\u003e uses declaration in both languages and translate the solution in source language to the one in target language.\u003c/i\u003e\u003c/p\u003e\n\nIn HumanEval-X, every sample in each language contains declaration, docstring, and solution, which can be combined in various ways to support different downstream tasks including generation, translation, summarization, etc. We currently focus on two tasks: **code generation** and **code translation**. For code generation, the model uses declaration and docstring as input to generate the solution. For code translation, the model uses declarations in both languages and the solution in the source language as input, to generate solutions in the target language. We remove the description during code translation to prevent the model from directly solving the problem. For both tasks, we use the unbiased pass@k metric proposed in [Codex](https://arxiv.org/abs/2107.03374): $\\text{pass}@k:= \\mathbb{E}[1-\\frac{\\tbinom{n-c}{k}}{\\tbinom{n}{k}}]$, with $n=200$ and $k\\in(1,10,100)$.\n\n### Multilingual Code Generation\n\n\u003cimg src=\"resources/en/hx_generattion_radar_horizon.png\"\u003e\n\u003cp align=\"center\"\u003e\u003ci\u003e\u003cb\u003eLeft\u003c/b\u003e: the detailed pass@k (k=1,10,100) performance on code generation task for five languages in HumanEval-X. \u003cb\u003eRight\u003c/b\u003e: the average performance of all languages of each model. CodeGeeX achieves the highest average performance compared with InCoder-6.7B, CodeGen-Multi-6B and CodeGen-Multi-16B.\u003c/i\u003e\u003c/p\u003e\n\n\nWe compare CodeGeeX with two other open-sourced code generation models, [InCoder](https://github.com/dpfried/incoder) (from Meta) and [CodeGen](https://github.com/salesforce/CodeGen) (from Salesforce). Specifically, InCoder-6.7B, CodeGen-Multi-6B and CodeGen-Multi-16B are considered. CodeGeeX significantly outperforms models with smaller scales (by 7.5%~16.3%) and is competitive with CodeGen-Multi-16B with a larger scale (average performance 54.76% vs. 54.39%). CodeGeeX achieves the best average performance across languages.\n\n### Crosslingual Code Translation\n\n\u003cimg src=\"resources/en/hx_translation.png\"\u003e\n\n\u003cp align=\"center\"\u003e\u003ci\u003eResults on HumanEval-X \u003cb\u003ecode translation\u003c/b\u003e task. Best language-wise performance are \u003cb\u003ebolded\u003c/b\u003e.\u003c/i\u003e\u003c/p\u003e\n\nWe also evaluate the performance of translation across different programming languages. We test the zero-shot performance of CodeGeeX, as well as the fine-tuned CodeGeeX-13B-FT (fine-tuned using the training set of code translation tasks in [XLCoST](https://github.com/reddy-lab-code-research/XLCoST); Go is absent in the original set, we thus add a small set to it). The results indicate that models have a preference for languages, e.g., CodeGeeX is good at translating other languages to Python and C++, while CodeGen-Multi-16B is better at translating to JavaScript and Go; these could probably be due to the difference in language distribution in the training corpus. Among 20 translation pairs, we also observe that the performance of A-to-B and B-to-A are always negatively correlated, which might indicate that the current models are still not capable of learning all languages well. \n\n### How to use HumanEval-X and contribute to it?\n\nFor more details on how to use HumanEval-X, please see [usage](codegeex/benchmark/README.md). We highly welcome the community to contribute to HumanEval-X by adding more problems or extending it to other languages, please check out the [standard format](codegeex/benchmark/README.md#how-to-use-humaneval-x) of HumanEval-X and add a pull request. \n\nPlease kindly let us know if you have any comment or suggestion, via [codegeex@aminer.cn](mailto:codegeex@aminer.cn).\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eExamples of Generation\u003c/b\u003e\u003c/summary\u003e\n\u003cimg src=\"resources/en/hx_examples.png\"\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eAcknowledgement\u003c/b\u003e\u003c/summary\u003e\n\u003cbr/\u003e\nThis project is supported by the National Science Foundation for Distinguished Young Scholars (No. 61825602). \n\n### Lead Contributors\n\nQinkai Zheng ([Tsinghua KEG](http://keg.cs.tsinghua.edu.cn/glm-130b/)), Xiao Xia (Tsinghua KEG), Xu Zou (Tsinghua KEG)\n\n### Contributors\n\nTsinghua KEG---The Knowledge Engineering Group at Tsinghua: Aohan Zeng, Wendi Zheng, Lilong Xue\n\nZhilin Yang's Group at Tsinghua IIIS: Yifeng Liu, Yanru Chen,  Yichen Xu (BUPT, work was done when visiting Tsinghua)\n\nPeng Cheng Laboratory: Qingyu Chen, Zhongqi Li, Gaojun Fan\n\nZhipu\\.AI: Yufei Xue, Shan Wang, Jiecai Shan, Haohan Jiang, Lu Liu, Xuan Xue, Peng Zhang\n\nAscend and Mindspore Team: Yifan Yao, Teng Su, Qihui Deng, Bin Zhou\n\n### Data Annotations\n\nRuijie Cheng (Tsinghua), Peinan Yu (Tsinghua), Jingyao Zhang (Zhipu\\.AI), Bowen Huang (Zhipu\\.AI), Shaoyu Wang (Zhipu\\.AI) \n    \n### Advisors\n\n[Zhilin Yang](https://kimiyoung.github.io/) (Tsinghua IIIS), Yuxiao Dong (Tsinghua KEG), Wenguang Chen (Tsinghua PACMAN), Jie Tang (Tsinghua KEG)\n    \n\n### Computation Sponsors\n\n[Peng Cheng Laboratory](https://www.pcl.ac.cn/index.html)\n\n[Zhipu.AI](https://www.zhipu.ai/)---an AI startup that aims to teach machines to think like humans\n\n### Project Leader \n\n[Jie Tang](http://keg.cs.tsinghua.edu.cn/jietang/) (Tsinghua KEG \u0026 BAAI)\n\u003c/details\u003e\n\n## License\n\nOur code is licensed under the [Apache-2.0 license](LICENSE).\nOur model is licensed under the [license](MODEL_LICENSE).\n\n## Citation\n\nIf you find our work useful, please cite:\n\n```\n@inproceedings{zheng2023codegeex,\n  title={CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X},\n  author={Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Zihan Wang and Lei Shen and Andi Wang and Yang Li and Teng Su and Zhilin Yang and Jie Tang},\n  booktitle={Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},\n  pages={5673--5684},\n  year={2023}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Fcodegeex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthudm%2Fcodegeex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthudm%2Fcodegeex/lists"}