An open API service indexing awesome lists of open source software.

https://github.com/md-emon-hasan/informatruth

Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.
https://github.com/md-emon-hasan/informatruth

ai-webapp confidence-score document-classification end-to-end-ml-workflows fake-news-detection fine-tuning flan-t5 huggingface-transformers machine-learning misinformation-detection natural-language-processing news-analysis news-classification roberta sequence-classification text-analysis text-classification transformers truth-verification url-parser

Last synced: 4 months ago
JSON representation

Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.

Awesome Lists containing this project

README

          

# ๐Ÿ“˜ InformaTruth: AI-Driven News Authenticity Analyzer
InformaTruth is an end-to-end AI-powered multi-agent fact-checking system that automatically verifies news articles, PDFs, and web content. It leverages RoBERTa fine-tuning, LangGraph orchestration, RAG pipelines, and fallback retrieval agents to deliver reliable, context-aware verification. The system features a modular multi-agent architecture including Planner, Retriever, Generator, Memory, and Fallback Agents, integrating diverse tools for comprehensive reasoning.

It achieves ~70% accuracy and F1 ~69% on the LIAR dataset, with 95% query coverage and ~60% improved reliability through intelligent tool routing and memory integration. Designed for real-world deployment, InformaTruth includes a Flask-based responsive UI, FastAPI endpoints, Dockerized containers, and a CI/CD pipeline, enabling enterprise-grade automated fact verification at scale.

[![InformaTruth](https://github.com/user-attachments/assets/60bfca60-19bc-404e-9f97-a57ed6f0b5f1)](https://github.com/user-attachments/assets/60bfca60-19bc-404e-9f97-a57ed6f0b5f1)

---

## ๐Ÿš€ Live Demo

๐Ÿ–ฅ๏ธ **Try it now**: [InformaTruth โ€” Fake News Detection AI App](https://informatruth.onrender.com)

---

## โš™๏ธ Tech Stack
| **Category** | **Technology/Resource** |
| --------------------------- | ------------------------------------------------------------------------------------------------------ |
| **Core Framework** | PyTorch, Transformers, HuggingFace |
| **Classification Model** | Fine-tuned RoBERTa-base on LIAR Dataset |
| **Explanation Model** | FLAN-T5-base (Zero-shot Prompting) |
| **Training Data** | LIAR Dataset (Political Fact-Checking) |
| **Evaluation Metrics** | Accuracy, Precision, Recall, F1-score |
| **Training Framework** | HuggingFace Trainer |
| **LangGraph Orchestration** | LangGraph (Multi-Agent Directed Acyclic Execution Graph) |
| **Agents Used** | PlannerAgent, InputHandlerAgent, ToolRouterAgent, ExecutorAgent, ExplanationAgent, FallbackSearchAgent |
| **Input Modalities** | Raw Text, Website URLs (via Newspaper3k), PDF Documents (via PyMuPDF) |
| **Tool Augmentation** | DuckDuckGo Search API (Fallback), Wikipedia (Planned), ToolRouter Logic |
| **Web Scraping** | Newspaper3k (HTML โ†’ Clean Article) |
| **PDF Parsing** | PyMuPDF |
| **Explainability** | Natural language justification generated using FLAN-T5 |
| **State Management** | Shared State Object (LangGraph-compatible) |
| **Deployment Interface** | Flask (HTML,CSS,JS) |
| **Hosting Platform** | Render (Docker) |
| **Version Control** | Git, GitHub |
| **Logging & Debugging** | Logs, Print Debugs, Custom Logger |
| **Input Support** | Text, URLs, PDF documents |

---

## โœ… Key Features

* **๐Ÿ”„ Multi-Format Input Support**
Accepts raw **text**, **web URLs**, and **PDF documents** with automated preprocessing for each type.

* **๐Ÿง  Full NLP Pipeline**
Integrates summarization (optional), **fake news classification** (RoBERTa), and **natural language explanation** (FLAN-T5).

* **๐Ÿงฑ Modular Agent-Based Architecture**
Built using **LangGraph** with modular agents: `Planner`, `Tool Router`, `Executor`, `Explanation Agent`, and `Fallback Agent`.

* **๐Ÿ“œ Explanation Generation**
Uses **FLAN-T5** to generate human-readable, zero-shot rationales for model predictions.

* **๐Ÿงช Tool-Augmented & Fallback Logic**
Dynamically queries **DuckDuckGo** when local context is insufficient, enabling robust fallback handling.

* **๐Ÿงผ Clean, Modular Codebase with Logging**
Structured using clean architecture principles, agent separation, and informative logging.

* **๐ŸŒ Flask with Web UI**
User-friendly, interactive, and responsive frontend for input, output, and visual explanations.

* **๐Ÿณ Dockerized for Deployment**
Fully containerized setup with `Dockerfile` and `requirements.txt` for seamless deployment.

* **โš™๏ธ CI/CD with GitHub Actions**
Automated pipelines for testing, linting, and Docker build validation to ensure code quality and production-readiness.

---

## ๐Ÿ“ฆ Project File Structure

```bash
InformaTruth/
โ”‚
โ”œโ”€โ”€ .github/ # GitHub Actions
โ”‚ โ””โ”€โ”€ workflows/
โ”‚ โ””โ”€โ”€ main.yml
โ”‚
โ”œโ”€โ”€ agents/ # Modular agents (planner, executor, etc.)
โ”‚ โ”œโ”€โ”€ executor.py
โ”‚ โ”œโ”€โ”€ fallback_search.py
โ”‚ โ”œโ”€โ”€ input_handler.py
โ”‚ โ”œโ”€โ”€ planner.py
โ”‚ โ”œโ”€โ”€ router.py
โ”‚ โ””โ”€โ”€ __init__.py
โ”‚
โ”œโ”€โ”€ fine_tuned_liar_detector/ # Fine-tuned RoBERTa model directory
โ”‚ โ”œโ”€โ”€ config.json
โ”‚ โ”œโ”€โ”€ vocab.json
โ”‚ โ”œโ”€โ”€ tokenizer_config.json
โ”‚ โ”œโ”€โ”€ special_tokens_map.json
โ”‚ โ”œโ”€โ”€ model.safetensors
โ”‚ โ””โ”€โ”€ merges.txt
โ”‚
โ”œโ”€โ”€ liar_dataset/ # Dataset for fine tune
โ”‚ โ”œโ”€โ”€ test.tsv
โ”‚ โ”œโ”€โ”€ train.tsv
โ”‚ โ””โ”€โ”€ valid.tsv
โ”‚
โ”œโ”€โ”€ graph/ # LangGraph state and builder logic
โ”‚ โ”œโ”€โ”€ builder.py
โ”‚ โ”œโ”€โ”€ state.py
โ”‚ โ””โ”€โ”€ __init__.py
โ”‚
โ”œโ”€โ”€ models/ # Classification + LLM model loader
โ”‚ โ”œโ”€โ”€ classifier.py
โ”‚ โ”œโ”€โ”€ loader.py
โ”‚ โ””โ”€โ”€ __init__.py
โ”‚
โ”œโ”€โ”€ news/ # Sample news or test input
โ”‚ โ””โ”€โ”€ news.pdf
โ”‚
โ”œโ”€โ”€ notebook/ # Jupyter notebooks for experimentation
โ”‚ โ””โ”€โ”€ Experiments.ipynb
โ”‚
โ”œโ”€โ”€ static/ # Static files (CSS, JS)
โ”‚ โ”œโ”€โ”€ css/
โ”‚ โ”‚ โ””โ”€โ”€ style.css
โ”‚ โ””โ”€โ”€ js/
โ”‚ โ””โ”€โ”€ script.js
โ”‚
โ”œโ”€โ”€ templates/ # HTML templates for Flask UI
โ”‚ โ”œโ”€โ”€ dj_base.html
โ”‚ โ””โ”€โ”€ dj_index.html
โ”‚
โ”œโ”€โ”€ tests/ # Unit tests
โ”‚ โ””โ”€โ”€ test_app.py
โ”‚
โ”œโ”€โ”€ train/ # Training logic
โ”‚ โ”œโ”€โ”€ config.py
โ”‚ โ”œโ”€โ”€ data_loader.py
โ”‚ โ”œโ”€โ”€ predictor.py
โ”‚ โ”œโ”€โ”€ run.py
โ”‚ โ”œโ”€โ”€ trainer.py
โ”‚ โ””โ”€โ”€ __init__.py
โ”‚
โ”œโ”€โ”€ utils/ # Utilities like logging, evaluation
โ”‚ โ”œโ”€โ”€ logger.py
โ”‚ โ”œโ”€โ”€ results.py
โ”‚ โ””โ”€โ”€ __init__.py
โ”‚
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ app.png # Demo
โ”œโ”€โ”€ demo.webm # Demo video
โ”œโ”€โ”€ app.py # Flask app entry point
โ”œโ”€โ”€ main.py # Main script / orchestrator
โ”œโ”€โ”€ config.py # Configuratin file
โ”œโ”€โ”€ setup.py # Project setup for pip install
โ”œโ”€โ”€ render.yaml # Project setup render
โ”œโ”€โ”€ Dockerfile # Docker container spec
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”œโ”€โ”€ LICENSE # License file
โ”œโ”€โ”€ .gitignore # Git ignore rules
โ”œโ”€โ”€ .gitattributes # Git lfs rules
โ””โ”€โ”€ README.md # Readme
```

---

## ๐Ÿงฑ System Architecture
```mermaid
graph TD
A[User Input] --> B{Input Type}
B -->|Text| C[Direct Text Processing]
B -->|URL| D[Newspaper3k Parser]
B -->|PDF| E[PyMuPDF Parser]

C --> F[Text Cleaner]
D --> F
E --> F

F --> G[Context Validator]
G -->|Sufficient Context| H[RoBERTa Classifier]
G -->|Insufficient Context| I[Web Search Agent]

I --> J[Context Aggregator]
J --> H

H --> K[FLAN-T5 Explanation Generator]
K --> L[Output Formatter]

L --> M[Web UI using Flask,HTML,CSS,JS]

style M fill:#e3f2fd,stroke:#90caf9
style G fill:#fff9c4,stroke:#fbc02d
style I fill:#fbe9e7,stroke:#ff8a65
style H fill:#f1f8e9,stroke:#aed581
```

---

## ๐Ÿ“Š Model Performance
| Epoch | Train Loss | Val Loss | Accuracy | F1 | Precision | Recall |
|-------|------------|----------|----------|--------|-----------|---------|
| 1 | 0.6353 | 0.6205 | 0.6557 | 0.6601 | 0.6663 | 0.6557 |
| 2 | 0.6132 | 0.5765 | 0.7032 | 0.6720 | 0.6817 | 0.7032 |
| 3 | 0.5957 | 0.5779 | 0.6970 | 0.6927 | 0.6899 | 0.6970 |
| 4 | 0.5781 | 0.5778 | 0.6978 | 0.6899 | 0.6864 | 0.6978 |
| 5 | 0.5599 | 0.5810 | 0.6954 | 0.6882 | 0.6846 | 0.6954 |

> Emphasis on **Recall** ensures the model catches most fake news cases.

---

## ๐Ÿณ Docker Instructions
### Step 1: Build Docker image
```bash
docker build -t informa-truth-app .
```

### Step 2: Run Docker container
```bash
docker run -p 8501:8501 informa-truth-app
```

---

## โš™๏ธ CI/CD Pipeline (GitHub Actions)
The CI/CD pipeline automates code checks, Docker image building, and Streamlit app validation.

### Sample Workflow
```yaml
name: CI Pipeline
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v3

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
pip install -r requirements.txt
pip install flake8 pytest

- name: Run tests
run: pytest tests/

- name: Docker build
run: docker build -t informa-truth-app .
```

---

## ๐ŸŒ Real-World Use Case
* Journalists and media watchdogs
* Educators and students
* Concerned citizens and digital media consumers
* Social media platforms for content moderation

---

## ๐Ÿ‘ค Author
**Md Emon Hasan**
๐Ÿ“ง iconicemon01@gmail.com
๐Ÿ”— [GitHub](https://github.com/Md-Emon-Hasan)
๐Ÿ”— [LinkedIn](https://www.linkedin.com/in/md-emon-hasan-695483237/)
๐Ÿ”— [Facebook](https://www.facebook.com/mdemon.hasan2001/)
๐Ÿ”— [WhatsApp](https://wa.me/8801834363533)

---