An open API service indexing awesome lists of open source software.

https://github.com/pytlicek/llm-train-chat-example

LLM Training experiment
https://github.com/pytlicek/llm-train-chat-example

Last synced: 6 months ago
JSON representation

LLM Training experiment

Awesome Lists containing this project

README

          

# BERT-Based Text Generation

This repository contains code for training a BERT model for masked language modeling and generating text based on prompts using the trained model.

## Installation

Before running the code, ensure you have Python and PyTorch installed. You also need to install the `transformers` library by Hugging Face:

```
pip install transformers
pip install tokenizers
pip install torch
```

## Files

- `train.py`: This script trains a BERT model on a text dataset for masked language modeling. It uses the transformers library and a custom dataset class for training.
- `chat.py`: This script demonstrates how to generate text based on prompts using the trained BERT model. Note that BERT is not primarily designed for text generation, so the results might not always be coherent.

## Usage

### Training the Model

Run the `train.py` script to train the model. Ensure you have a dataset named `dataset.txt` in the same directory:

```
python train.py
```

The trained model and tokenizer will be saved in the `./results` directory.

### Generating Text

Use the `chat.py` script to generate text based on prompts using the trained model:

```
python chat.py
```

## Note

The generated text quality might vary as BERT is primarily designed for understanding tasks rather than generation. However, this project serves as a demonstration of custom training and text generation capabilities.

Enjoy exploring BERT-based text generation!