Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ryhkml/fine-tune-forge
JSONL generator designed to elevate the fine-tuning process of cutting-edge language models like Google's PaLM 2 and OpenAI's GPT-3.5
https://github.com/ryhkml/fine-tune-forge
gpt-3 image-ocr jsonl localhost openai text-bison tools vertexai
Last synced: about 2 months ago
JSON representation
JSONL generator designed to elevate the fine-tuning process of cutting-edge language models like Google's PaLM 2 and OpenAI's GPT-3.5
- Host: GitHub
- URL: https://github.com/ryhkml/fine-tune-forge
- Owner: ryhkml
- License: mit
- Created: 2024-01-19T02:56:50.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-06-16T10:04:12.000Z (7 months ago)
- Last Synced: 2024-06-16T11:25:16.737Z (7 months ago)
- Topics: gpt-3, image-ocr, jsonl, localhost, openai, text-bison, tools, vertexai
- Language: TypeScript
- Homepage:
- Size: 1.49 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
FineTuneForge
FineTuneForge is a tool designed specifically for generating JSON Lines (JSONL) to facilitate the fine-tuning of AI language models like Google's PaLM 2 and OpenAI's GPT-3.5. It enables developers to easily transform text data into a JSONL format that machines can read.
![Screenshot FineTuneForge Webapp](./Screenshot%20FineTuneForge%20Webapp.png)
## Getting Started
To get started with FineTuneForge, follow these steps:
### Installation
```sh
git clone https://github.com/ryhkml/fine-tune-forge.git
cd fine-tune-forge
chmod +x ./install.sh
./install.sh
```### Usage
Run the JSONL generator with the following command:
```sh
npm run build
```Serve server
```sh
npm run serve
```## Directory Structure
FineTuneForge is organized into several directories, each serving a specific purpose in the workflow of the JSONL generator. Below is an overview of these directories and their intended use:
- `DATADOC_OCR`: This directory acts as a temporary storage for OCR (Optical Character Recognition) images
- `DATASET`: The `DATASET` directory is the designated location for storing the completed dataset files. Once the JSONL files have been generated and are ready for use in fine-tuning the language models, they are placed in this directory
- `DATATMP`: This directory for temporary storage of instruction content
- `tls`: This directory is reserved for storing SSL/TLS certificates## Configuring SSL/TLS for HTTPS
To enable HTTPS in the application, you need to configure SSL/TLS certificates correctly.
### Required Files
Before you start, ensure you have the following files placed in the `tls` directory:
- `fullchain.pem`: This is your certificate file that contains the full chain of trust, including any intermediate certificates along with your own
- `cert-key.pem`: This file contains your private key and must be kept secure. It is used to establish the encrypted connection
- `ca.crt` (optional): This Certificate Authority (CA) file is used if you need to specify an external CAIf you use docker, uncomment the environment variable `PROTOCOL_SERVER` in `docker-compose.yaml`
## License
This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.