Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sebinsaji007/NSFW-finetuned-on-llm
Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it
https://github.com/sebinsaji007/NSFW-finetuned-on-llm
Last synced: 3 days ago
JSON representation
Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it
- Host: GitHub
- URL: https://github.com/sebinsaji007/NSFW-finetuned-on-llm
- Owner: sebinsaji007
- Created: 2023-07-16T16:55:46.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-19T08:21:59.000Z (over 1 year ago)
- Last Synced: 2024-04-17T23:55:39.300Z (9 months ago)
- Language: Jupyter Notebook
- Size: 47.9 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome_ai_agents - Nsfw-Finetuned-On-Llm - Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it (Building / Datasets)
- awesome_ai_agents - Nsfw-Finetuned-On-Llm - Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it (Building / Datasets)
README
# NSFW-finetuned-on-llm using google colab
Here we have taken falcon 7B as the LLM and finetuned NSFW dataset with it# NSFW Classification using Falcon-7bThis repository contains code for training a neural network model to classify NSFW (Not Safe for Work) content using the Falcon-7b model. The model is fine-tuned on the NSFW dataset and utilizes the Peft library for efficient training.
## Usage
To use the NSFW classification code:
1. Install the required dependencies:
- trl
- transformers
- accelerate
- peft
- datasets
- bitsandbytes
- einops
- wandb2. Load the NSFW dataset.
3. Load the Falcon-7b model and tokenizer.
4. Configure the Peft library for the LoRA algorithm.
5. Set the training arguments.
6. Train the model using the SFTTrainer.Please refer to the code in the `flacon7b.ipynb` notebook for the detailed implementation.
Note: Running the training code may take time and require sufficient computational resources(but can be run on a free google colab version).
## Run the model
```
text = "your input here"inputs = tokenizer(text, return_tensors="pt")
inputs.pop("token_type_ids", None)
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```