https://github.com/sevilaymuni/flask-app-roberta-detect-news
Flask Deployment: RoBERTa-Unreliable News Detection App
https://github.com/sevilaymuni/flask-app-roberta-detect-news
app-deployment data-preprocess flask-api huggingface-api huggingface-spaces pytorch roberta-model transformers webapp
Last synced: about 2 months ago
JSON representation
Flask Deployment: RoBERTa-Unreliable News Detection App
- Host: GitHub
- URL: https://github.com/sevilaymuni/flask-app-roberta-detect-news
- Owner: SevilayMuni
- License: mit
- Created: 2024-12-17T13:10:41.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-12-22T19:22:46.000Z (5 months ago)
- Last Synced: 2025-04-08T04:32:06.786Z (about 2 months ago)
- Topics: app-deployment, data-preprocess, flask-api, huggingface-api, huggingface-spaces, pytorch, roberta-model, transformers, webapp
- Language: Jupyter Notebook
- Homepage: https://roberta-news-detection-app.onrender.com
- Size: 6.62 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Detecting Unreliable News with Fine-Tuned RoBERTa
This project fine-tunes the [RoBERTa](https://arxiv.org/abs/1907.11692) model for detecting unreliable news articles. The application classifies news articles as **reliable** or **unreliable** by leveraging a pre-trained transformer model and a labeled dataset. The backend is implemented using Flask and deployed on [Render.com](https://render.com), with the fine-tuned model stored on [Hugging Face Space](https://huggingface.co/).
## Features
- **Model**: Fine-tuned [roberta-base](https://huggingface.co/roberta-base) for binary classification.
- **Dataset**: Labeled dataset with news article attributes: title, text, and reliability label.
- **Deployment**: Flask-based API hosted on Render; model stored on Hugging Face for easy access.
- **Evaluation**: Achieves over **99% accuracy** and F1 score on the validation dataset.## Table of Contents
1. [Installation](#installation)
2. [Usage](#usage)
3. [Project Workflow](#project-workflow)
4. [Model Training](#model-training)
5. [Results](#results)
6. [Deployment](#deployment)
7. [Acknowledgments](#acknowledgments)---
## Installation
1. Clone the repository:
```bash
git clone https://github.com/username/unreliable-news-detector.git
cd unreliable-news-detector
```2. Install dependencies:
```bash
pip install -r requirements.txt
```3. Set up Hugging Face authentication:
```bash
export HF_TOKEN=your_huggingface_token
```4. Configure Flask environment variables:
```bash
export FLASK_APP=app.py
export FLASK_ENV=development
```---
## Usage
Run the Flask server:
```bash
flask run
```
Access the API at `http://localhost:5000` or the deployed version on Render.### API Endpoints
- **`POST /predict`**
Input: JSON with `title` and `text`.
Output: Predicted label (`0` = Reliable, `1` = Unreliable) and confidence score.---
## Project Workflow
1. **Data Preprocessing**:
- Cleaned text by removing special characters and URLs.
- Combined `title` and `text` fields for holistic context.
- Removed duplicates and balanced classes.2. **Data Splitting**:
- 80/20 split for training and validation.
- Ensured no overlap between sets.3. **Model Training**:
- Used Hugging Face's `Trainer` API.
- Fine-tuned RoBERTa for 2 epochs with a learning rate of `2e-5`.4. **Evaluation**:
- Metrics: Accuracy and F1 Score.
- Plotted confusion matrix and performance graphs.---
## Model Training
### Hyperparameters:
- **Learning Rate**: `2e-5`
- **Batch Size**: `16`
- **Epochs**: `2`
- **Weight Decay**: `0.01`### Performance:
- **Training Loss**: 0.0556
- **Validation Accuracy**: 99.35%
- **Validation F1 Score**: 99.34%---
## Results
### Confusion Matrix:
| | Predicted Reliable | Predicted Unreliable |
|---------------|--------------------|----------------------|
| **Actual Reliable** | 2055 | 11 |
| **Actual Unreliable** | 15 | 1944 |### Visualization:
- Plotted training loss and evaluation accuracy over epochs.
- Highlighted the model's strong generalization capabilities.---
## Deployment
### Backend:
- Flask REST API serving predictions via POST requests.### Hosting:
- Backend deployed on [Render.com](https://render.com).
- Model weights stored on Hugging Face Space for easy accessibility.---
## Acknowledgments
- Hugging Face for providing pre-trained RoBERTa and the Trainer API.
- Render.com for hosting the Flask backend.---