{"id":28324172,"url":"https://github.com/md-emon-hasan/informatruth","last_synced_at":"2025-10-11T09:42:35.321Z","repository":{"id":316370657,"uuid":"848113353","full_name":"Md-Emon-Hasan/InformaTruth","owner":"Md-Emon-Hasan","description":"Fine-tuned roberta-base classifier on the LIAR dataset. Aaccepts multiple input types text, URLs, and PDFs and outputs a prediction with a confidence score. It also leverages google/flan-t5-base to generate explanations and uses an Agentic AI with LangGraph to orchestrate agents for planning, retrieval, execution, fallback, and reasoning.","archived":false,"fork":false,"pushed_at":"2025-10-03T10:24:47.000Z","size":10082,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-03T12:23:28.965Z","etag":null,"topics":["ai-webapp","confidence-score","document-classification","end-to-end-ml-workflows","fake-news-detection","fine-tuning","flan-t5","huggingface-transformers","machine-learning","misinformation-detection","natural-language-processing","news-analysis","news-classification","roberta","sequence-classification","text-analysis","text-classification","transformers","truth-verification","url-parser"],"latest_commit_sha":null,"homepage":"https://informatruth.onrender.com","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Md-Emon-Hasan.png","metadata":{"files":{"readme":"README.md","changelog":"news/news.pdf","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-08-27T06:44:18.000Z","updated_at":"2025-10-03T10:24:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"acc10feb-f8aa-4d2f-9cd8-3de1ac484dd1","html_url":"https://github.com/Md-Emon-Hasan/InformaTruth","commit_stats":null,"previous_names":["md-emon-hasan/informatruth"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Md-Emon-Hasan/InformaTruth","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Md-Emon-Hasan%2FInformaTruth","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Md-Emon-Hasan%2FInformaTruth/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Md-Emon-Hasan%2FInformaTruth/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Md-Emon-Hasan%2FInformaTruth/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Md-Emon-Hasan","download_url":"https://codeload.github.com/Md-Emon-Hasan/InformaTruth/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Md-Emon-Hasan%2FInformaTruth/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279006750,"owners_count":26084182,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-webapp","confidence-score","document-classification","end-to-end-ml-workflows","fake-news-detection","fine-tuning","flan-t5","huggingface-transformers","machine-learning","misinformation-detection","natural-language-processing","news-analysis","news-classification","roberta","sequence-classification","text-analysis","text-classification","transformers","truth-verification","url-parser"],"created_at":"2025-05-25T17:10:31.964Z","updated_at":"2025-10-11T09:42:35.316Z","avatar_url":"https://github.com/Md-Emon-Hasan.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📘 InformaTruth: AI-Driven News Authenticity Analyzer\nInformaTruth is an end-to-end AI-powered multi-agent fact-checking system that automatically verifies news articles, PDFs, and web content. It leverages RoBERTa fine-tuning, LangGraph orchestration, RAG pipelines, and fallback retrieval agents to deliver reliable, context-aware verification. The system features a modular multi-agent architecture including Planner, Retriever, Generator, Memory, and Fallback Agents, integrating diverse tools for comprehensive reasoning.\n\nIt achieves ~70% accuracy and F1 ~69% on the LIAR dataset, with 95% query coverage and ~60% improved reliability through intelligent tool routing and memory integration. Designed for real-world deployment, InformaTruth includes a Flask-based responsive UI, FastAPI endpoints, Dockerized containers, and a CI/CD pipeline, enabling enterprise-grade automated fact verification at scale.\n\n[![InformaTruth](https://github.com/user-attachments/assets/60bfca60-19bc-404e-9f97-a57ed6f0b5f1)](https://github.com/user-attachments/assets/60bfca60-19bc-404e-9f97-a57ed6f0b5f1)\n\n---\n\n## 🚀 Live Demo\n\n🖥️ **Try it now**: [InformaTruth — Fake News Detection AI App](https://informatruth.onrender.com)\n\n---\n\n## ⚙️ Tech Stack\n| **Category**                | **Technology/Resource**                                                                                |\n| --------------------------- | ------------------------------------------------------------------------------------------------------ |\n| **Core Framework**          | PyTorch, Transformers, HuggingFace                                                                     |\n| **Classification Model**    | Fine-tuned RoBERTa-base on LIAR Dataset                                                                |\n| **Explanation Model**       | FLAN-T5-base (Zero-shot Prompting)                                                                     |\n| **Training Data**           | LIAR Dataset (Political Fact-Checking)                                                                 |\n| **Evaluation Metrics**      | Accuracy, Precision, Recall, F1-score                                                                  |\n| **Training Framework**      | HuggingFace Trainer                                                                                    |\n| **LangGraph Orchestration** | LangGraph (Multi-Agent Directed Acyclic Execution Graph)                                               |\n| **Agents Used**             | PlannerAgent, InputHandlerAgent, ToolRouterAgent, ExecutorAgent, ExplanationAgent, FallbackSearchAgent |\n| **Input Modalities**        | Raw Text, Website URLs (via Newspaper3k), PDF Documents (via PyMuPDF)                                  |\n| **Tool Augmentation**       | DuckDuckGo Search API (Fallback), Wikipedia (Planned), ToolRouter Logic                                |\n| **Web Scraping**            | Newspaper3k (HTML → Clean Article)                                                                     |\n| **PDF Parsing**             | PyMuPDF                                                                                                |\n| **Explainability**          | Natural language justification generated using FLAN-T5                                                 |\n| **State Management**        | Shared State Object (LangGraph-compatible)                                                             |\n| **Deployment Interface**    | Flask (HTML,CSS,JS)                                                                                |\n| **Hosting Platform**        | Render (Docker)                                                                  |\n| **Version Control**         | Git, GitHub                                                                                            |\n| **Logging \u0026 Debugging**     | Logs, Print Debugs, Custom Logger                                                 |\n| **Input Support**         | Text, URLs, PDF documents                                                             |\n\n---\n\n## ✅ Key Features\n\n* **🔄 Multi-Format Input Support**\n  Accepts raw **text**, **web URLs**, and **PDF documents** with automated preprocessing for each type.\n\n* **🧠 Full NLP Pipeline**\n  Integrates summarization (optional), **fake news classification** (RoBERTa), and **natural language explanation** (FLAN-T5).\n\n* **🧱 Modular Agent-Based Architecture**\n  Built using **LangGraph** with modular agents: `Planner`, `Tool Router`, `Executor`, `Explanation Agent`, and `Fallback Agent`.\n\n* **📜 Explanation Generation**\n  Uses **FLAN-T5** to generate human-readable, zero-shot rationales for model predictions.\n\n* **🧪 Tool-Augmented \u0026 Fallback Logic**\n  Dynamically queries **DuckDuckGo** when local context is insufficient, enabling robust fallback handling.\n\n* **🧼 Clean, Modular Codebase with Logging**\n  Structured using clean architecture principles, agent separation, and informative logging.\n\n* **🌐 Flask with Web UI**\n  User-friendly, interactive, and responsive frontend for input, output, and visual explanations.\n\n* **🐳 Dockerized for Deployment**\n  Fully containerized setup with `Dockerfile` and `requirements.txt` for seamless deployment.\n\n* **⚙️ CI/CD with GitHub Actions**\n  Automated pipelines for testing, linting, and Docker build validation to ensure code quality and production-readiness.\n\n---\n\n## 📦 Project File Structure\n\n```bash\nInformaTruth/\n│\n├── .github/              # GitHub Actions\n│   └── workflows/\n│       └── main.yml \n│\n├── agents/                            # Modular agents (planner, executor, etc.)\n│   ├── executor.py\n│   ├── fallback_search.py\n│   ├── input_handler.py\n│   ├── planner.py\n│   ├── router.py\n│   └── __init__.py\n│\n├── fine_tuned_liar_detector/         # Fine-tuned RoBERTa model directory\n│   ├── config.json\n│   ├── vocab.json\n│   ├── tokenizer_config.json\n│   ├── special_tokens_map.json\n│   ├── model.safetensors\n│   └── merges.txt\n│\n├── liar_dataset/                     # Dataset for fine tune\n│   ├── test.tsv\n│   ├── train.tsv\n│   └── valid.tsv\n│\n├── graph/                            # LangGraph state and builder logic\n│   ├── builder.py\n│   ├── state.py\n│   └── __init__.py\n│\n├── models/                           # Classification + LLM model loader\n│   ├── classifier.py\n│   ├── loader.py\n│   └── __init__.py\n│\n├── news/                             # Sample news or test input\n│   └── news.pdf\n│\n├── notebook/                         # Jupyter notebooks for experimentation\n│   └── Experiments.ipynb\n│\n├── static/                           # Static files (CSS, JS)\n│   ├── css/\n│   │   └── style.css\n│   └── js/\n│       └── script.js\n│\n├── templates/                        # HTML templates for Flask UI\n│   ├── dj_base.html\n│   └── dj_index.html\n│\n├── tests/                            # Unit tests\n│   └── test_app.py\n│\n├── train/                            # Training logic\n│   ├── config.py\n│   ├── data_loader.py\n│   ├── predictor.py\n│   ├── run.py\n│   ├── trainer.py\n│   └── __init__.py\n│\n├── utils/                            # Utilities like logging, evaluation\n│   ├── logger.py\n│   ├── results.py\n│   └── __init__.py\n│\n├── __init__.py                        \n├── app.png                           # Demo\n├── demo.webm                         # Demo video\n├── app.py                            # Flask app entry point\n├── main.py                           # Main script / orchestrator\n├── config.py                         # Configuratin file\n├── setup.py                          # Project setup for pip install\n├── render.yaml                       # Project setup render\n├── Dockerfile                        # Docker container spec\n├── requirements.txt                  # Python dependencies\n├── LICENSE                           # License file\n├── .gitignore                        # Git ignore rules\n├── .gitattributes                    # Git lfs rules\n└── README.md                         # Readme\n```\n\n---\n\n## 🧱 System Architecture\n```mermaid\ngraph TD\n    A[User Input] --\u003e B{Input Type}\n    B --\u003e|Text| C[Direct Text Processing]\n    B --\u003e|URL| D[Newspaper3k Parser]\n    B --\u003e|PDF| E[PyMuPDF Parser]\n\n    C --\u003e F[Text Cleaner]\n    D --\u003e F\n    E --\u003e F\n\n    F --\u003e G[Context Validator]\n    G --\u003e|Sufficient Context| H[RoBERTa Classifier]\n    G --\u003e|Insufficient Context| I[Web Search Agent]\n    \n    I --\u003e J[Context Aggregator]\n    J --\u003e H\n\n    H --\u003e K[FLAN-T5 Explanation Generator]\n    K --\u003e L[Output Formatter]\n    \n    L --\u003e M[Web UI using Flask,HTML,CSS,JS]\n\n    style M fill:#e3f2fd,stroke:#90caf9\n    style G fill:#fff9c4,stroke:#fbc02d\n    style I fill:#fbe9e7,stroke:#ff8a65\n    style H fill:#f1f8e9,stroke:#aed581\n```\n\n---\n\n## 📊 Model Performance\n| Epoch | Train Loss | Val Loss | Accuracy | F1     | Precision | Recall  |\n|-------|------------|----------|----------|--------|-----------|---------|\n| 1     | 0.6353     | 0.6205   | 0.6557   | 0.6601 | 0.6663    | 0.6557  |\n| 2     | 0.6132     | 0.5765   | 0.7032   | 0.6720 | 0.6817    | 0.7032  |\n| 3     | 0.5957     | 0.5779   | 0.6970   | 0.6927 | 0.6899    | 0.6970  |\n| 4     | 0.5781     | 0.5778   | 0.6978   | 0.6899 | 0.6864    | 0.6978  |\n| 5     | 0.5599     | 0.5810   | 0.6954   | 0.6882 | 0.6846    | 0.6954  |\n\n\u003e Emphasis on **Recall** ensures the model catches most fake news cases.\n\n---\n\n## 🐳 Docker Instructions\n### Step 1: Build Docker image\n```bash\ndocker build -t informa-truth-app .\n```\n\n### Step 2: Run Docker container\n```bash\ndocker run -p 8501:8501 informa-truth-app\n```\n\n---\n\n## ⚙️ CI/CD Pipeline (GitHub Actions)\nThe CI/CD pipeline automates code checks, Docker image building, and Streamlit app validation.\n\n### Sample Workflow\n```yaml\nname: CI Pipeline\non: [push]\njobs:\n  build:\n    runs-on: ubuntu-latest\n    steps:\n      - name: Checkout repo\n        uses: actions/checkout@v3\n\n      - name: Setup Python\n        uses: actions/setup-python@v4\n        with:\n          python-version: '3.10'\n\n      - name: Install dependencies\n        run: |\n          pip install -r requirements.txt\n          pip install flake8 pytest\n\n      - name: Run tests\n        run: pytest tests/\n\n      - name: Docker build\n        run: docker build -t informa-truth-app .\n```\n\n---\n\n## 🌐 Real-World Use Case\n* Journalists and media watchdogs\n* Educators and students\n* Concerned citizens and digital media consumers\n* Social media platforms for content moderation\n\n---\n\n## 👤 Author\n**Md Emon Hasan**  \n📧 iconicemon01@gmail.com  \n🔗 [GitHub](https://github.com/Md-Emon-Hasan)\n🔗 [LinkedIn](https://www.linkedin.com/in/md-emon-hasan-695483237/)\n🔗 [Facebook](https://www.facebook.com/mdemon.hasan2001/)\n🔗 [WhatsApp](https://wa.me/8801834363533)\n\n---","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmd-emon-hasan%2Finformatruth","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmd-emon-hasan%2Finformatruth","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmd-emon-hasan%2Finformatruth/lists"}