https://github.com/bklieger/train-student-gpt
Training a small language model from scratch on student lecture commentary
https://github.com/bklieger/train-student-gpt
Last synced: 3 months ago
JSON representation
Training a small language model from scratch on student lecture commentary
- Host: GitHub
- URL: https://github.com/bklieger/train-student-gpt
- Owner: Bklieger
- License: mit
- Created: 2024-06-22T18:22:32.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-22T18:52:50.000Z (over 1 year ago)
- Last Synced: 2024-12-18T00:11:44.450Z (11 months ago)
- Language: Python
- Homepage:
- Size: 48.2 MB
- Stars: 10
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Train Student GPT
## Description
Train a small Generative Pre-trained Transformer to generate student lecture commentary data from SIGHT ([Wang et. al., 2023](https://github.com/rosewang2008/sight/)).
## Getting Started
To train the model, you can run:
~~~
python run.py --mode train
~~~
To use the model, you can run:
~~~
python run.py --mode generate --prompt "### Lec 29 | MIT 18.01 Single Variable Calculus, Fall 2007" --max_new_tokens 300
~~~
## Results
### Examples from Training Data
> Processed from SIGHT data in comments.json. The format is "### {Lecture title}\n{Student comment}.
```
### 4. Factorization into A = LU
Thank you for your leasons!
### Lec 2 | MIT 18.01 Single Variable Calculus, Fall 2007
I sure will pay it back hundredfold. Thanks!!!
### 2. Conditioning and Bayes' Rule
amazing explanations
```
### Training
The results of training for 27 minutes on an NVIDIA A100-80GB:

The generations include several comments and titles which appear realistic relative to the model size.
### Examples from Generated Data
> Prompt was "### Lec 29 | MIT 18.01 Single Variable Calculus, Fall 2007". The following are the best generated video titles and comments chosen from the [model's output](example/generated.md).
```
### Lec 1 | MIT 18.01 Single Variable Calculus, Fall 2007
He the best.
### Lec 24 | MIT 18.01 Single Variable Calculus, Fall 2007
Thanks
### 1. Introduction to Statistics
this course 😂😂
```
## Credits:
Andrej Karpathy for model code [https://www.youtube.com/watch?v=kCc8FmEb1nY](https://www.youtube.com/watch?v=kCc8FmEb1nY)
Wang et. al., 2023 for data: [https://github.com/rosewang2008/sight/](https://github.com/rosewang2008/sight/)