An open API service indexing awesome lists of open source software.

https://github.com/sevilaymuni/flask-app-roberta-detect-news

Flask Deployment: RoBERTa-Unreliable News Detection App
https://github.com/sevilaymuni/flask-app-roberta-detect-news

app-deployment data-preprocess flask-api huggingface-api huggingface-spaces pytorch roberta-model transformers webapp

Last synced: about 2 months ago
JSON representation

Flask Deployment: RoBERTa-Unreliable News Detection App

Awesome Lists containing this project

README

        

# Detecting Unreliable News with Fine-Tuned RoBERTa






This project fine-tunes the [RoBERTa](https://arxiv.org/abs/1907.11692) model for detecting unreliable news articles. The application classifies news articles as **reliable** or **unreliable** by leveraging a pre-trained transformer model and a labeled dataset. The backend is implemented using Flask and deployed on [Render.com](https://render.com), with the fine-tuned model stored on [Hugging Face Space](https://huggingface.co/).

## Features
- **Model**: Fine-tuned [roberta-base](https://huggingface.co/roberta-base) for binary classification.
- **Dataset**: Labeled dataset with news article attributes: title, text, and reliability label.
- **Deployment**: Flask-based API hosted on Render; model stored on Hugging Face for easy access.
- **Evaluation**: Achieves over **99% accuracy** and F1 score on the validation dataset.

## Table of Contents
1. [Installation](#installation)
2. [Usage](#usage)
3. [Project Workflow](#project-workflow)
4. [Model Training](#model-training)
5. [Results](#results)
6. [Deployment](#deployment)
7. [Acknowledgments](#acknowledgments)

---

## Installation
1. Clone the repository:
```bash
git clone https://github.com/username/unreliable-news-detector.git
cd unreliable-news-detector
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Set up Hugging Face authentication:
```bash
export HF_TOKEN=your_huggingface_token
```

4. Configure Flask environment variables:
```bash
export FLASK_APP=app.py
export FLASK_ENV=development
```

---

## Usage
Run the Flask server:
```bash
flask run
```
Access the API at `http://localhost:5000` or the deployed version on Render.

### API Endpoints
- **`POST /predict`**
Input: JSON with `title` and `text`.
Output: Predicted label (`0` = Reliable, `1` = Unreliable) and confidence score.

---

## Project Workflow
1. **Data Preprocessing**:
- Cleaned text by removing special characters and URLs.
- Combined `title` and `text` fields for holistic context.
- Removed duplicates and balanced classes.

2. **Data Splitting**:
- 80/20 split for training and validation.
- Ensured no overlap between sets.

3. **Model Training**:
- Used Hugging Face's `Trainer` API.
- Fine-tuned RoBERTa for 2 epochs with a learning rate of `2e-5`.

4. **Evaluation**:
- Metrics: Accuracy and F1 Score.
- Plotted confusion matrix and performance graphs.

---

## Model Training
### Hyperparameters:
- **Learning Rate**: `2e-5`
- **Batch Size**: `16`
- **Epochs**: `2`
- **Weight Decay**: `0.01`

### Performance:
- **Training Loss**: 0.0556
- **Validation Accuracy**: 99.35%
- **Validation F1 Score**: 99.34%

---

## Results
### Confusion Matrix:
| | Predicted Reliable | Predicted Unreliable |
|---------------|--------------------|----------------------|
| **Actual Reliable** | 2055 | 11 |
| **Actual Unreliable** | 15 | 1944 |

### Visualization:
- Plotted training loss and evaluation accuracy over epochs.
- Highlighted the model's strong generalization capabilities.

---

## Deployment
### Backend:
- Flask REST API serving predictions via POST requests.

### Hosting:
- Backend deployed on [Render.com](https://render.com).
- Model weights stored on Hugging Face Space for easy accessibility.

---

## Acknowledgments
- Hugging Face for providing pre-trained RoBERTa and the Trainer API.
- Render.com for hosting the Flask backend.

---