https://github.com/bin123apple/autocoder

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.
https://github.com/bin123apple/autocoder

code-generation code-interpreter humaneval llm nlp nlp-machine-learning text-generation

Last synced: 3 months ago
JSON representation

We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.

Host: GitHub
URL: https://github.com/bin123apple/autocoder
Owner: bin123apple
License: apache-2.0
Created: 2024-05-13T03:06:16.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-07-06T21:00:06.000Z (about 1 year ago)
Last Synced: 2025-04-05T05:33:26.579Z (3 months ago)
Topics: code-generation, code-interpreter, humaneval, llm, nlp, nlp-machine-learning, text-generation
Language: Python
Homepage: https://arxiv.org/abs/2405.14906
Size: 25.8 MB
Stars: 839
Watchers: 15
Forks: 71
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-ChatGPT-repositories - AutoCoder - We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o. (NLP)
awesome-llm-projects - AutoCoder - Generierungsaufgabe entwickelt wurde. Seine Testgenauigkeit auf dem HumanEval-Basisdatensatz übertrifft die von GPT-4 Turbo (April 2024) und GPT-4o. (Projekte / 🦄 LLMs)
StarryDivineSky - bin123apple/autocoder - 4 Turbo（2024 年 4 月）和 GPT-4o。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

README

# AutoCoder

## News :fire:

A new model [AutoCoder_QW_7B](https://huggingface.co/Bin12345/AutoCoder_QW_7B) is uploaded. In this model, We fixed the previous problem that the model will only start the code interpreter when you ask it to *verify* its code.

The base model of AutoCode_QW_7B is [CodeQwen1.5-7b](https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat).

## Introduction :mega:
We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024). (**90.9% vs 90.2%**).

Additionally, compared to previous open-source models, AutoCoder offers a new feature: it can **automatically install the required packages** and attempt to run the code until it deems there are no issues, **whenever the user wishes to execute the code**.

* Difference between the code interpreter of AutoCoder and the GPT-4 Turbo:

Below are the video demos for the code interpreter comparison between GPT-4 Turbo and AutoCoder:

GPT-4o can not access the external library.

[GPT-4o](https://github.com/bin123apple/AutoCoder/assets/99925255/be47b449-4e8a-4b77-981b-ec79b15970cc)

AutoCoder can automatically install the required packages. This feature expands the scope of code interpreter's application.

[AutoCoder](https://github.com/bin123apple/AutoCoder/assets/99925255/1893f904-c1f2-4f59-9ec5-45b69efcc26a)

* Difference between the code interpreter of AutoCoder and the current open-source code interpreter [OpenCodeInterpreter](https://opencodeinterpreter.github.io/):

The code interpreter of AutoCoder, like GPT-4 Turbo, is only called when the user has a need to verify the code, while OpenCodeInterpreter runs all generated python code.

## Model :gift:
The Model is avaliable on Huggingface:

[AutoCoder (33B)](https://huggingface.co/Bin12345/AutoCoder)
[AutoCoder-S (6.7B)](https://huggingface.co/Bin12345/AutoCoder_S_6.7B)

The base models of AutoCoder (33B) and AutoCoder-S (6.7B) are deepseeker-coder.

[AutoCoder_QW_7B](https://huggingface.co/Bin12345/AutoCoder_QW_7B)

The base model of AutoCoder_QW_7B is CodeQwen1.5-7b.

## Quick Start :rocket:
1. Create the conda env

```
conda create -n AutoCoder python=3.11
conda activate AutoCoder
pip install -r requirements.txt
```

2. Test on HumanEval **90.9% on base, 78.0% on base + extra**. (Skip to Step 5, if you don't want to test its performance on benchmarks)

```
cd Evaluation
python test_humaneval.py
```
You will receive a file named AutoCoder_HumanEval+.jsonl, which follows the EvalPlus format, after this step.

Then follow the testing framework of the [EvalPlus GitHub](https://github.com/evalplus/evalplus). You will see the results.

**NOTE**:
* Don't forget to use evalplus's `evalplus.sanitize` to post-process the code.
* If you don't use the greedy method (for example set the `do_sample=True`) for the code generation. You will probably see the different results.

3. Test on MBPP **82.5% on base, 70.6% on base + extra**. (Skip to Step 5, if you don't want to test its performance on benchmarks)

```
python test_humaneval.py
```

Post-process to delete the nature language for testing
```
python postprocess_mbpp.py
```
Your will get a AutoCoder_Mbpp+-sanitized.jsonl file after this step, it extracted all the code blocks.
Then, directly test it by using [EvalPlus GitHub](https://github.com/evalplus/evalplus) (You don't need to use to use evalplus's `evalplus.sanitize` to post-process the code this time).

4. Test on DS-1000. (Skip to Step 5, if you don't want to test its performance on benchmarks)

```
python test_ds1000.py
```

Your will get a jsonl file after this step, it extracted all the code blocks.
Then, directly test it by using [DS-1000 GitHub](https://github.com/xlang-ai/DS-1000).

5. Web demo (Include code interpreter)

Install gradio and Run:

```
pip install gradio==3.48.0
cd /Web_demo
python chatbot.py
```

## **NOTE** :warning:
* We suggest to set `do_sample = True` (default setting here) while using the code interpreter.

* It would be preferable to use Linux for deploying everything.

## Contact :email:
If you have any inquiries, please feel free to raise an issue or reach out to [email protected].

## Citation :book:
```
@misc{lei2024autocoder,
title={AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}},
author={Bin Lei and Yuchen Li and Qiuwu Chen},
year={2024},
eprint={2405.14906},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
```

## Acknowledgments :pray:
Thanks to Tianyu Zheng, the first author of the [OpenCodeInterpreter](https://opencodeinterpreter.github.io/), for guidance on some technical details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bin123apple/autocoder

Awesome Lists containing this project

README