Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andreaschandra/code-gen-extended
https://github.com/andreaschandra/code-gen-extended
Last synced: 15 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/andreaschandra/code-gen-extended
- Owner: andreaschandra
- Created: 2023-07-18T15:31:12.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-18T15:35:19.000Z (over 1 year ago)
- Last Synced: 2024-05-01T16:40:21.228Z (7 months ago)
- Language: Python
- Size: 7.64 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Steps for Reproducing Milestone 2 Results
### Fine-tuning GPT-2 on CodeSearchNet for Code Generation## Setup
From /cs230/gpt-2-csn:
$ pip install -r path/to/requirements.txt
$ python download_model.py 117M
## Data
- Note that CodeSearchNet python training data is stored in /cs230/gpt-2-csn/src/pythonTrain/python_train_all.txt and git lfs tracked for space efficiency.
- This dataset has been encoded and stored in /cs230/gpt-2-csn/src/pythonTrainPreprocessed/python_train_all.npz for space efficiency.
- To encode your own data, the following script is compatible with a minimum python 3.7.x: $ python encode.py trainingData.txt trainingData.npz
## Training
$ python /src/train.py --dataset /src/pythonTrainPreprocessed/python_train_all.npz --model_name 117M
- Samples will automatically be generated every 100 steps.
- Additional parameters such as learning rate and batch size can be specified on the commandline.