https://github.com/md-emon-hasan/informatruth
Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.
https://github.com/md-emon-hasan/informatruth
ai-webapp confidence-score document-classification end-to-end-ml-workflows fake-news-detection fine-tuning flan-t5 huggingface-transformers machine-learning misinformation-detection natural-language-processing news-analysis news-classification roberta sequence-classification text-analysis text-classification transformers truth-verification url-parser
Last synced: 4 months ago
JSON representation
Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.
- Host: GitHub
- URL: https://github.com/md-emon-hasan/informatruth
- Owner: Md-Emon-Hasan
- License: mit
- Created: 2024-08-27T06:44:18.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-10-03T10:24:47.000Z (4 months ago)
- Last Synced: 2025-10-03T12:23:28.965Z (4 months ago)
- Topics: ai-webapp, confidence-score, document-classification, end-to-end-ml-workflows, fake-news-detection, fine-tuning, flan-t5, huggingface-transformers, machine-learning, misinformation-detection, natural-language-processing, news-analysis, news-classification, roberta, sequence-classification, text-analysis, text-classification, transformers, truth-verification, url-parser
- Language: Jupyter Notebook
- Homepage: https://informatruth.onrender.com
- Size: 9.61 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: news/news.pdf
- License: LICENSE
Awesome Lists containing this project
README
# ๐ InformaTruth: AI-Driven News Authenticity Analyzer
InformaTruth is an end-to-end AI-powered multi-agent fact-checking system that automatically verifies news articles, PDFs, and web content. It leverages RoBERTa fine-tuning, LangGraph orchestration, RAG pipelines, and fallback retrieval agents to deliver reliable, context-aware verification. The system features a modular multi-agent architecture including Planner, Retriever, Generator, Memory, and Fallback Agents, integrating diverse tools for comprehensive reasoning.
It achieves ~70% accuracy and F1 ~69% on the LIAR dataset, with 95% query coverage and ~60% improved reliability through intelligent tool routing and memory integration. Designed for real-world deployment, InformaTruth includes a Flask-based responsive UI, FastAPI endpoints, Dockerized containers, and a CI/CD pipeline, enabling enterprise-grade automated fact verification at scale.
[](https://github.com/user-attachments/assets/60bfca60-19bc-404e-9f97-a57ed6f0b5f1)
---
## ๐ Live Demo
๐ฅ๏ธ **Try it now**: [InformaTruth โ Fake News Detection AI App](https://informatruth.onrender.com)
---
## โ๏ธ Tech Stack
| **Category** | **Technology/Resource** |
| --------------------------- | ------------------------------------------------------------------------------------------------------ |
| **Core Framework** | PyTorch, Transformers, HuggingFace |
| **Classification Model** | Fine-tuned RoBERTa-base on LIAR Dataset |
| **Explanation Model** | FLAN-T5-base (Zero-shot Prompting) |
| **Training Data** | LIAR Dataset (Political Fact-Checking) |
| **Evaluation Metrics** | Accuracy, Precision, Recall, F1-score |
| **Training Framework** | HuggingFace Trainer |
| **LangGraph Orchestration** | LangGraph (Multi-Agent Directed Acyclic Execution Graph) |
| **Agents Used** | PlannerAgent, InputHandlerAgent, ToolRouterAgent, ExecutorAgent, ExplanationAgent, FallbackSearchAgent |
| **Input Modalities** | Raw Text, Website URLs (via Newspaper3k), PDF Documents (via PyMuPDF) |
| **Tool Augmentation** | DuckDuckGo Search API (Fallback), Wikipedia (Planned), ToolRouter Logic |
| **Web Scraping** | Newspaper3k (HTML โ Clean Article) |
| **PDF Parsing** | PyMuPDF |
| **Explainability** | Natural language justification generated using FLAN-T5 |
| **State Management** | Shared State Object (LangGraph-compatible) |
| **Deployment Interface** | Flask (HTML,CSS,JS) |
| **Hosting Platform** | Render (Docker) |
| **Version Control** | Git, GitHub |
| **Logging & Debugging** | Logs, Print Debugs, Custom Logger |
| **Input Support** | Text, URLs, PDF documents |
---
## โ
Key Features
* **๐ Multi-Format Input Support**
Accepts raw **text**, **web URLs**, and **PDF documents** with automated preprocessing for each type.
* **๐ง Full NLP Pipeline**
Integrates summarization (optional), **fake news classification** (RoBERTa), and **natural language explanation** (FLAN-T5).
* **๐งฑ Modular Agent-Based Architecture**
Built using **LangGraph** with modular agents: `Planner`, `Tool Router`, `Executor`, `Explanation Agent`, and `Fallback Agent`.
* **๐ Explanation Generation**
Uses **FLAN-T5** to generate human-readable, zero-shot rationales for model predictions.
* **๐งช Tool-Augmented & Fallback Logic**
Dynamically queries **DuckDuckGo** when local context is insufficient, enabling robust fallback handling.
* **๐งผ Clean, Modular Codebase with Logging**
Structured using clean architecture principles, agent separation, and informative logging.
* **๐ Flask with Web UI**
User-friendly, interactive, and responsive frontend for input, output, and visual explanations.
* **๐ณ Dockerized for Deployment**
Fully containerized setup with `Dockerfile` and `requirements.txt` for seamless deployment.
* **โ๏ธ CI/CD with GitHub Actions**
Automated pipelines for testing, linting, and Docker build validation to ensure code quality and production-readiness.
---
## ๐ฆ Project File Structure
```bash
InformaTruth/
โ
โโโ .github/ # GitHub Actions
โ โโโ workflows/
โ โโโ main.yml
โ
โโโ agents/ # Modular agents (planner, executor, etc.)
โ โโโ executor.py
โ โโโ fallback_search.py
โ โโโ input_handler.py
โ โโโ planner.py
โ โโโ router.py
โ โโโ __init__.py
โ
โโโ fine_tuned_liar_detector/ # Fine-tuned RoBERTa model directory
โ โโโ config.json
โ โโโ vocab.json
โ โโโ tokenizer_config.json
โ โโโ special_tokens_map.json
โ โโโ model.safetensors
โ โโโ merges.txt
โ
โโโ liar_dataset/ # Dataset for fine tune
โ โโโ test.tsv
โ โโโ train.tsv
โ โโโ valid.tsv
โ
โโโ graph/ # LangGraph state and builder logic
โ โโโ builder.py
โ โโโ state.py
โ โโโ __init__.py
โ
โโโ models/ # Classification + LLM model loader
โ โโโ classifier.py
โ โโโ loader.py
โ โโโ __init__.py
โ
โโโ news/ # Sample news or test input
โ โโโ news.pdf
โ
โโโ notebook/ # Jupyter notebooks for experimentation
โ โโโ Experiments.ipynb
โ
โโโ static/ # Static files (CSS, JS)
โ โโโ css/
โ โ โโโ style.css
โ โโโ js/
โ โโโ script.js
โ
โโโ templates/ # HTML templates for Flask UI
โ โโโ dj_base.html
โ โโโ dj_index.html
โ
โโโ tests/ # Unit tests
โ โโโ test_app.py
โ
โโโ train/ # Training logic
โ โโโ config.py
โ โโโ data_loader.py
โ โโโ predictor.py
โ โโโ run.py
โ โโโ trainer.py
โ โโโ __init__.py
โ
โโโ utils/ # Utilities like logging, evaluation
โ โโโ logger.py
โ โโโ results.py
โ โโโ __init__.py
โ
โโโ __init__.py
โโโ app.png # Demo
โโโ demo.webm # Demo video
โโโ app.py # Flask app entry point
โโโ main.py # Main script / orchestrator
โโโ config.py # Configuratin file
โโโ setup.py # Project setup for pip install
โโโ render.yaml # Project setup render
โโโ Dockerfile # Docker container spec
โโโ requirements.txt # Python dependencies
โโโ LICENSE # License file
โโโ .gitignore # Git ignore rules
โโโ .gitattributes # Git lfs rules
โโโ README.md # Readme
```
---
## ๐งฑ System Architecture
```mermaid
graph TD
A[User Input] --> B{Input Type}
B -->|Text| C[Direct Text Processing]
B -->|URL| D[Newspaper3k Parser]
B -->|PDF| E[PyMuPDF Parser]
C --> F[Text Cleaner]
D --> F
E --> F
F --> G[Context Validator]
G -->|Sufficient Context| H[RoBERTa Classifier]
G -->|Insufficient Context| I[Web Search Agent]
I --> J[Context Aggregator]
J --> H
H --> K[FLAN-T5 Explanation Generator]
K --> L[Output Formatter]
L --> M[Web UI using Flask,HTML,CSS,JS]
style M fill:#e3f2fd,stroke:#90caf9
style G fill:#fff9c4,stroke:#fbc02d
style I fill:#fbe9e7,stroke:#ff8a65
style H fill:#f1f8e9,stroke:#aed581
```
---
## ๐ Model Performance
| Epoch | Train Loss | Val Loss | Accuracy | F1 | Precision | Recall |
|-------|------------|----------|----------|--------|-----------|---------|
| 1 | 0.6353 | 0.6205 | 0.6557 | 0.6601 | 0.6663 | 0.6557 |
| 2 | 0.6132 | 0.5765 | 0.7032 | 0.6720 | 0.6817 | 0.7032 |
| 3 | 0.5957 | 0.5779 | 0.6970 | 0.6927 | 0.6899 | 0.6970 |
| 4 | 0.5781 | 0.5778 | 0.6978 | 0.6899 | 0.6864 | 0.6978 |
| 5 | 0.5599 | 0.5810 | 0.6954 | 0.6882 | 0.6846 | 0.6954 |
> Emphasis on **Recall** ensures the model catches most fake news cases.
---
## ๐ณ Docker Instructions
### Step 1: Build Docker image
```bash
docker build -t informa-truth-app .
```
### Step 2: Run Docker container
```bash
docker run -p 8501:8501 informa-truth-app
```
---
## โ๏ธ CI/CD Pipeline (GitHub Actions)
The CI/CD pipeline automates code checks, Docker image building, and Streamlit app validation.
### Sample Workflow
```yaml
name: CI Pipeline
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install flake8 pytest
- name: Run tests
run: pytest tests/
- name: Docker build
run: docker build -t informa-truth-app .
```
---
## ๐ Real-World Use Case
* Journalists and media watchdogs
* Educators and students
* Concerned citizens and digital media consumers
* Social media platforms for content moderation
---
## ๐ค Author
**Md Emon Hasan**
๐ง iconicemon01@gmail.com
๐ [GitHub](https://github.com/Md-Emon-Hasan)
๐ [LinkedIn](https://www.linkedin.com/in/md-emon-hasan-695483237/)
๐ [Facebook](https://www.facebook.com/mdemon.hasan2001/)
๐ [WhatsApp](https://wa.me/8801834363533)
---