Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/andreaschandra/code-gen-extended


https://github.com/andreaschandra/code-gen-extended

Last synced: 15 days ago
JSON representation

Awesome Lists containing this project

README

        

### Steps for Reproducing Milestone 2 Results
### Fine-tuning GPT-2 on CodeSearchNet for Code Generation

## Setup

From /cs230/gpt-2-csn:

$ pip install -r path/to/requirements.txt

$ python download_model.py 117M

## Data

- Note that CodeSearchNet python training data is stored in /cs230/gpt-2-csn/src/pythonTrain/python_train_all.txt and git lfs tracked for space efficiency.

- This dataset has been encoded and stored in /cs230/gpt-2-csn/src/pythonTrainPreprocessed/python_train_all.npz for space efficiency.

- To encode your own data, the following script is compatible with a minimum python 3.7.x: $ python encode.py trainingData.txt trainingData.npz

## Training

$ python /src/train.py --dataset /src/pythonTrainPreprocessed/python_train_all.npz --model_name 117M

- Samples will automatically be generated every 100 steps.

- Additional parameters such as learning rate and batch size can be specified on the commandline.