https://github.com/sebinsaji007/NSFW-finetuned-on-llm

Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it
https://github.com/sebinsaji007/NSFW-finetuned-on-llm

Last synced: 4 months ago
JSON representation

Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it

Host: GitHub
URL: https://github.com/sebinsaji007/NSFW-finetuned-on-llm
Owner: sebinsaji007
Created: 2023-07-16T16:55:46.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-07-19T08:21:59.000Z (almost 2 years ago)
Last Synced: 2024-04-17T23:55:39.300Z (about 1 year ago)
Language: Jupyter Notebook
Size: 47.9 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome_ai_agents - Nsfw-Finetuned-On-Llm - Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it (Building / Datasets)
awesome_ai_agents - Nsfw-Finetuned-On-Llm - Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it (Building / Datasets)

README

# NSFW-finetuned-on-llm using google colab
Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it# NSFW Classification using Falcon-7b

This repository contains code for training a neural network model to classify NSFW (Not Safe for Work) content using the Falcon-7b model. The model is fine-tuned on the NSFW dataset and utilizes the Peft library for efficient training.

## Usage

To use the NSFW classification code:

1. Install the required dependencies:

- trl
- transformers
- accelerate
- peft
- datasets
- bitsandbytes
- einops
- wandb

2. Load the NSFW dataset.
3. Load the Falcon-7b model and tokenizer.
4. Configure the Peft library for the LoRA algorithm.
5. Set the training arguments.
6. Train the model using the SFTTrainer.

Please refer to the code in the `flacon7b.ipynb` notebook for the detailed implementation.

Note: Running the training code may take time and require sufficient computational resources(but can be run on a free google colab version).
## Run the model
```
text = "your input here"

inputs = tokenizer(text, return_tensors="pt")

inputs.pop("token_type_ids", None)

outputs = model.generate(**inputs, max_length=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```