https://github.com/fredi-python/alpaca2incite-dataset-converter
Python script that converts datasets in the Alpaca-data format to the jsonl format needed to fine-tune the RedPajama-INCITE language model.
https://github.com/fredi-python/alpaca2incite-dataset-converter
Last synced: 5 months ago
JSON representation
Python script that converts datasets in the Alpaca-data format to the jsonl format needed to fine-tune the RedPajama-INCITE language model.
- Host: GitHub
- URL: https://github.com/fredi-python/alpaca2incite-dataset-converter
- Owner: fredi-python
- Created: 2023-05-11T10:18:02.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-11T10:50:51.000Z (over 2 years ago)
- Last Synced: 2025-04-08T17:20:00.853Z (8 months ago)
- Language: Python
- Size: 3.91 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Alpaca2INCITE-Dataset-Converter
This Python script converts datasets in the Alpaca-data format to the jsonl format needed to fine-tune the RedPajama-INCITE language model. The RedPajama-INCITE model is a commercially-usable language model developed by Together Computer, with a 3B and a 7B parameter pretrained language model.
## Usage:
```
$ python3 convert.py -h
usage: convert.py [-h] --input_file INPUT_FILE --output_file OUTPUT_FILE [--source SOURCE]
Alpaca2INCITE-Dataset-Converter
options:
-h, --help show this help message and exit
--input_file INPUT_FILE
the input JSON file
--output_file OUTPUT_FILE
the output JSONL file
--source SOURCE the optional source metadata for the JSONL entries
```