Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hesamsheikh/dataset_git_commands
https://github.com/hesamsheikh/dataset_git_commands
Last synced: 8 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/hesamsheikh/dataset_git_commands
- Owner: hesamsheikh
- Created: 2024-08-01T18:17:29.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-05T14:57:12.000Z (5 months ago)
- Last Synced: 2024-08-06T11:30:37.334Z (5 months ago)
- Language: Jupyter Notebook
- Size: 1.71 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Creating Synthetic Dataset Using Llama 3.1 405B and Nemotron 4
In this notebook we will use the following structure to create a synthetic dataset of Intructions and Git Commands.
![alt text](image.png)
We will create a set of instructions related to git queries in natural language, then we will generate the response for each instruction.
The instruction/response pairs will be passed to a reward model, Nemotron 4, to filter out any bad pairs.
Finally, the dataset will be pushed to HuggingFace.
## Dataset
The final work can be viewed here:
https://huggingface.co/datasets/hesamsheikh/git-prompt