https://github.com/md-emon-hasan/informatruth

Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.
https://github.com/md-emon-hasan/informatruth

ai-webapp confidence-score document-classification end-to-end-ml-workflows fake-news-detection fine-tuning flan-t5 huggingface-transformers machine-learning misinformation-detection natural-language-processing news-analysis news-classification roberta sequence-classification text-analysis text-classification transformers truth-verification url-parser

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/md-emon-hasan/informatruth
Owner: Md-Emon-Hasan
License: mit
Created: 2024-08-27T06:44:18.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-10-03T10:24:47.000Z (4 months ago)
Last Synced: 2025-10-03T12:23:28.965Z (4 months ago)
Topics: ai-webapp, confidence-score, document-classification, end-to-end-ml-workflows, fake-news-detection, fine-tuning, flan-t5, huggingface-transformers, machine-learning, misinformation-detection, natural-language-processing, news-analysis, news-classification, roberta, sequence-classification, text-analysis, text-classification, transformers, truth-verification, url-parser
Language: Jupyter Notebook
Homepage: https://informatruth.onrender.com
Size: 9.61 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: news/news.pdf
- License: LICENSE

Awesome Lists containing this project

README

# 📘 InformaTruth: AI-Driven News Authenticity Analyzer
InformaTruth is an end-to-end AI-powered multi-agent fact-checking system that automatically verifies news articles, PDFs, and web content. It leverages RoBERTa fine-tuning, LangGraph orchestration, RAG pipelines, and fallback retrieval agents to deliver reliable, context-aware verification. The system features a modular multi-agent architecture including Planner, Retriever, Generator, Memory, and Fallback Agents, integrating diverse tools for comprehensive reasoning.

It achieves ~70% accuracy and F1 ~69% on the LIAR dataset, with 95% query coverage and ~60% improved reliability through intelligent tool routing and memory integration. Designed for real-world deployment, InformaTruth includes a Flask-based responsive UI, FastAPI endpoints, Dockerized containers, and a CI/CD pipeline, enabling enterprise-grade automated fact verification at scale.

[![InformaTruth](https://github.com/user-attachments/assets/60bfca60-19bc-404e-9f97-a57ed6f0b5f1)](https://github.com/user-attachments/assets/60bfca60-19bc-404e-9f97-a57ed6f0b5f1)

---

## 🚀 Live Demo

🖥️ **Try it now**: [InformaTruth — Fake News Detection AI App](https://informatruth.onrender.com)

---

## ⚙️ Tech Stack
| **Category**
| ---------------------------
| **Core Framework**
| **Classification Model**
| **Explanation Model**
| **Training Data**
| **Evaluation Metrics**
| **Training Framework**
| **LangGraph
| **Agents Used**
| **Input Modalities**
| **Tool Augmentation**
| **Web Scraping**
| **PDF Parsing**
| **Explainability**
| **State Management**
| **Deployment Interface**
| **Hosting Platform**
| **Version Control**
| **Logging & Debugging**
| **Input Support** | **Technology/Resource** | | ------------------------------------------------------------------------------------------------------ | | PyTorch, Transformers, HuggingFace | | Fine-tuned RoBERTa-base on LIAR Dataset | | FLAN-T5-base (Zero-shot Prompting) | | LIAR Dataset (Political Fact-Checking) | | Accuracy, Precision, Recall, F1-score | | HuggingFace Trainer | Orchestration** | LangGraph (Multi-Agent Directed Acyclic Execution Graph) | | PlannerAgent, InputHandlerAgent, ToolRouterAgent, ExecutorAgent, ExplanationAgent, FallbackSearchAgent | | Raw Text, Website URLs (via Newspaper3k), PDF Documents (via PyMuPDF) | | DuckDuckGo Search API (Fallback), Wikipedia (Planned), ToolRouter Logic | | Newspaper3k (HTML → Clean Article) | | PyMuPDF | | Natural language justification generated using FLAN-T5 | | Shared State Object (LangGraph-compatible) | | Flask (HTML,CSS,JS) | | Render (Docker) | | Git, GitHub | | Logs, Print Debugs, Custom Logger | | Text, URLs, PDF documents |

---

## ✅ Key Features

* **🔄 Multi-Format Input Support**
Accepts raw **text**, **web URLs**, and **PDF documents** with automated preprocessing for each type.

* **🧠 Full NLP Pipeline**
Integrates summarization (optional), **fake news classification** (RoBERTa), and **natural language explanation** (FLAN-T5).

* **🧱 Modular Agent-Based Architecture**
Built using **LangGraph** with modular agents: `Planner`, `Tool Router`, `Executor`, `Explanation Agent`, and `Fallback Agent`.

* **📜 Explanation Generation**
Uses **FLAN-T5** to generate human-readable, zero-shot rationales for model predictions.

* **🧪 Tool-Augmented & Fallback Logic**
Dynamically queries **DuckDuckGo** when local context is insufficient, enabling robust fallback handling.

* **🧼 Clean, Modular Codebase with Logging**
Structured using clean architecture principles, agent separation, and informative logging.

* **🌐 Flask with Web UI**
User-friendly, interactive, and responsive frontend for input, output, and visual explanations.

* **🐳 Dockerized for Deployment**
Fully containerized setup with `Dockerfile` and `requirements.txt` for seamless deployment.

* **⚙️ CI/CD with GitHub Actions**
Automated pipelines for testing, linting, and Docker build validation to ensure code quality and production-readiness.

---

## 📦 Project File Structure

```bash
InformaTruth/
│
├── .github/ # GitHub Actions
│ └── workflows/
│ └── main.yml
│
├── agents/ # Modular agents (planner, executor, etc.)
│ ├── executor.py
│ ├── fallback_search.py
│ ├── input_handler.py
│ ├── planner.py
│ ├── router.py
│ └── __init__.py
│
├── fine_tuned_liar_detector/ # Fine-tuned RoBERTa model directory
│ ├── config.json
│ ├── vocab.json
│ ├── tokenizer_config.json
│ ├── special_tokens_map.json
│ ├── model.safetensors
│ └── merges.txt
│
├── liar_dataset/ # Dataset for fine tune
│ ├── test.tsv
│ ├── train.tsv
│ └── valid.tsv
│
├── graph/ # LangGraph state and builder logic
│ ├── builder.py
│ ├── state.py
│ └── __init__.py
│
├── models/ # Classification + LLM model loader
│ ├── classifier.py
│ ├── loader.py
│ └── __init__.py
│
├── news/ # Sample news or test input
│ └── news.pdf
│
├── notebook/ # Jupyter notebooks for experimentation
│ └── Experiments.ipynb
│
├── static/ # Static files (CSS, JS)
│ ├── css/
│ │ └── style.css
│ └── js/
│ └── script.js
│
├── templates/ # HTML templates for Flask UI
│ ├── dj_base.html
│ └── dj_index.html
│
├── tests/ # Unit tests
│ └── test_app.py
│
├── train/ # Training logic
│ ├── config.py
│ ├── data_loader.py
│ ├── predictor.py
│ ├── run.py
│ ├── trainer.py
│ └── __init__.py
│
├── utils/ # Utilities like logging, evaluation
│ ├── logger.py
│ ├── results.py
│ └── __init__.py
│
├── __init__.py
├── app.png # Demo
├── demo.webm # Demo video
├── app.py # Flask app entry point
├── main.py # Main script / orchestrator
├── config.py # Configuratin file
├── setup.py # Project setup for pip install
├── render.yaml # Project setup render
├── Dockerfile # Docker container spec
├── requirements.txt # Python dependencies
├── LICENSE # License file
├── .gitignore # Git ignore rules
├── .gitattributes # Git lfs rules
└── README.md # Readme
```

---

## 🧱 System Architecture
```mermaid
graph TD
A[User Input] --> B{Input Type}
B -->|Text| C[Direct Text Processing]
B -->|URL| D[Newspaper3k Parser]
B -->|PDF| E[PyMuPDF Parser]

C --> F[Text Cleaner]
D --> F
E --> F

F --> G[Context Validator]
G -->|Sufficient Context| H[RoBERTa Classifier]
G -->|Insufficient Context| I[Web Search Agent]

I --> J[Context Aggregator]
J --> H

H --> K[FLAN-T5 Explanation Generator]
K --> L[Output Formatter]

L --> M[Web UI using Flask,HTML,CSS,JS]

style M fill:#e3f2fd,stroke:#90caf9
style G fill:#fff9c4,stroke:#fbc02d
style I fill:#fbe9e7,stroke:#ff8a65
style H fill:#f1f8e9,stroke:#aed581
```

---

## 📊 Model Performance
| Epoch | Train Loss | Val Loss | Accuracy | F1 | Precision | Recall |
|-------|------------|----------|----------|--------|-----------|---------|
| 1 | 0.6353 | 0.6205 | 0.6557 | 0.6601 | 0.6663 | 0.6557 |
| 2 | 0.6132 | 0.5765 | 0.7032 | 0.6720 | 0.6817 | 0.7032 |
| 3 | 0.5957 | 0.5779 | 0.6970 | 0.6927 | 0.6899 | 0.6970 |
| 4 | 0.5781 | 0.5778 | 0.6978 | 0.6899 | 0.6864 | 0.6978 |
| 5 | 0.5599 | 0.5810 | 0.6954 | 0.6882 | 0.6846 | 0.6954 |

> Emphasis on **Recall** ensures the model catches most fake news cases.

---

## 🐳 Docker Instructions
### Step 1: Build Docker image
```bash
docker build -t informa-truth-app .
```

### Step 2: Run Docker container
```bash
docker run -p 8501:8501 informa-truth-app
```

---

## ⚙️ CI/CD Pipeline (GitHub Actions)
The CI/CD pipeline automates code checks, Docker image building, and Streamlit app validation.

### Sample Workflow
```yaml
name: CI Pipeline
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v3

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
pip install -r requirements.txt
pip install flake8 pytest

- name: Run tests
run: pytest tests/

- name: Docker build
run: docker build -t informa-truth-app .
```

---

## 🌐 Real-World Use Case
* Journalists and media watchdogs
* Educators and students
* Concerned citizens and digital media consumers
* Social media platforms for content moderation

---

## 👤 Author
**Md Emon Hasan**
📧 iconicemon01@gmail.com
🔗 [GitHub](https://github.com/Md-Emon-Hasan)
🔗 [LinkedIn](https://www.linkedin.com/in/md-emon-hasan-695483237/)
🔗 [Facebook](https://www.facebook.com/mdemon.hasan2001/)
🔗 [WhatsApp](https://wa.me/8801834363533)

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/md-emon-hasan/informatruth

Awesome Lists containing this project

README