{"id":50425308,"url":"https://github.com/aman179102/trust-aware","last_synced_at":"2026-05-31T10:02:32.525Z","repository":{"id":334449315,"uuid":"1141406657","full_name":"aman179102/trust-aware","owner":"aman179102","description":"A trust-aware, human-in-the-loop AI decision system that knows when not to trust model confidence.","archived":false,"fork":false,"pushed_at":"2026-02-02T19:00:12.000Z","size":80,"stargazers_count":5,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-02-03T09:12:53.001Z","etag":null,"topics":["ai-safety","fastapi","human-in-the-loop","machine-learning","nlp","trustworthy-ai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aman179102.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-24T19:50:31.000Z","updated_at":"2026-02-02T19:00:16.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/aman179102/trust-aware","commit_stats":null,"previous_names":["aman179102/trust-aware"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aman179102/trust-aware","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Ftrust-aware","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Ftrust-aware/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Ftrust-aware/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Ftrust-aware/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aman179102","download_url":"https://codeload.github.com/aman179102/trust-aware/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aman179102%2Ftrust-aware/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33726719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-safety","fastapi","human-in-the-loop","machine-learning","nlp","trustworthy-ai"],"created_at":"2026-05-31T10:02:31.712Z","updated_at":"2026-05-31T10:02:32.511Z","avatar_url":"https://github.com/aman179102.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Trust-Aware AI Decision System\n\nOverconfident AI can sound sure, pass tests, and still make quietly wrong decisions.  \nMost sentiment APIs return a label and a confidence score, then leave you to guess when it’s unsafe to automate.  \nThis project shows how to turn that same model into a **trust-aware, human-in-the-loop AI system** that knows when to ask for help.\n\n\u003e **TL;DR**  \n\u003e A local FastAPI service that wraps a Hugging Face sentiment model with a risk-aware decision layer.  \n\u003e Instead of just returning a label, it exposes confidence, margin, and linguistic risk signals and decides whether to auto-accept a prediction or defer to a human reviewer.  \n\u003e Built entirely from free, CPU-friendly tools for privacy-friendly, on-prem style workflows.\n\n## 🌍 Featured \u0026 Recognized\n\nThis project was featured in **AI Pick (Japan)** as one of the\n*Top 5 trending GitHub projects in AI*.\n\nSource:\n[Click Here](https://ai-pick.jp/37183/)\n\n\n\u003cimg width=\"951\" height=\"996\" alt=\"Screenshot_2026-02-03_00-25-11\" src=\"https://github.com/user-attachments/assets/85ace8fe-a601-4098-a52f-e1500acd52ee\" /\u003e\n\n\n## Project Overview\n\nThis repository implements an end-to-end, trust-aware sentiment analysis system:\n\n- Uses a pretrained DistilBERT model from Hugging Face (no training required).\n- Runs fully locally on CPU (no paid APIs, no cloud dependencies).\n- Wraps the model in a **risk engine** that inspects confidence, score margins, and linguistic ambiguity.\n- Makes a **human-in-the-loop decision**: either `accepted` or `needs_human_review`.\n- Returns both **machine-readable signals** and a **human-readable explanation** for every request.\n\n---\n\n## Typical ML APIs vs This Project\n\n| Aspect              | Typical ML API                                   | This project                                                                 |\n| ------------------- | ------------------------------------------------ | ----------------------------------------------------------------------------- |\n| Output              | Label + raw confidence                           | Label, confidence, margin, `risk_score`, `risk_signals`, explanation, scores |\n| Decision logic      | Threshold on confidence                          | Multi-signal risk engine with explicit human-review threshold                |\n| Automation          | Optimised for full automation                    | Optimised for safe automation + human in the loop                            |\n| Transparency        | Opaque; little insight into uncertainty          | Structured risk metadata + narrative explanation                             |\n| Deployment model    | Often cloud/SaaS                                 | Fully local, CPU-only                                                        |\n| Governance posture  | Trust and review added later                     | Trust, review, and deferral designed in from day one                         |\n\n---\n\n## Real Example: High Confidence, Unsafe Automation\n\nEven when the model is confident, this system may *still* decide to defer:\n\n```jsonc\n{\n  \"label\": \"POSITIVE\",\n  \"decision\": \"needs_human_review\",\n  \"confidence\": 0.91,\n  \"margin\": 0.06,\n  \"risk_score\": 2,\n  \"risk_signals\": [\"low_margin\", \"ambiguity\"],\n  \"explanation\": \"Although model confidence is high, the score margin is narrow and the text contains contrastive phrasing, so the system defers to human review instead of auto-approving.\",\n  \"model_name\": \"distilbert-base-uncased-finetuned-sst-2-english\",\n  \"scores\": {\n    \"NEGATIVE\": 0.09,\n    \"POSITIVE\": 0.91\n  }\n}\n```\n---\n\n## Why Confidence-Only AI Is Dangerous\n\nMost real-world AI failures are not about getting a single prediction wrong,\nthey are about systems that are **confidently wrong** without any visibility or\nsafeguards.\n\nExamples:\n\n- A support bot confidently gives the wrong policy answer to a customer.\n- A moderation model is only slightly more confident in \"safe\" than in\n  \"hate\" but still auto-approves the content.\n- A risk scoring model silently drifts and no one notices because only the\n  final label is logged.\n\nModern industry practice is moving towards systems that:\n\n- expose **confidence scores and raw model outputs**\n- apply **policy rules** on top of the model (e.g. thresholds, fallbacks)\n- produce **simple explanations** that humans can understand\n- **defer to humans** when the model is unsure\n\nThis project is a minimal, end-to-end example of that pattern.\n\n---\n\n## What Makes This System Different\n\nFor a given piece of text, the system:\n\n1. **Runs a Hugging Face text classification model** locally on CPU.\n2. Computes **per-label probabilities** using softmax.\n3. Evaluates several **trust signals** instead of relying on confidence alone:\n   - confidence compared to a configurable threshold (default `0.7`)\n   - the **margin** between the top-1 and top-2 classes\n   - linguistic ambiguity patterns (hedging, questions, contrastive phrasing)\n4. Aggregates these signals into a **risk score**:\n   - each risk factor (low confidence, low margin, ambiguity) increments the score\n   - if `risk_score ≥ 2` → `decision = \"needs_human_review\"`\n   - else → `decision = \"accepted\"`\n5. Generates a **human-readable explanation** that describes:\n   - the predicted label and confidence\n   - which trust signals were triggered (confidence, margin, ambiguity, mixed sentiment)\n   - why automation was allowed or blocked\n   - that the system intentionally prefers **safety over blind automation**\n\nEverything is exposed through simple FastAPI endpoints:\n\n- `POST /analyze` – run a trust-aware analysis on text\n- `GET /health` – lightweight health check (model loaded, process ready)\n\nExample `/analyze` response:\n\n```jsonc\n{\n  \"label\": \"POSITIVE\",\n  \"decision\": \"accepted\",\n  \"confidence\": 0.93,\n  \"margin\": 0.90,\n  \"risk_score\": 0,\n  \"risk_signals\": [],\n  \"explanation\": \"...human-readable text explaining why automation was safe...\",\n  \"model_name\": \"distilbert-base-uncased-finetuned-sst-2-english\",\n  \"scores\": {\n    \"NEGATIVE\": 0.07,\n    \"POSITIVE\": 0.93\n  }\n}\n```\n\n---\n\n## Architecture (Model → Risk Engine → Decision)\n\nHigh-level data flow:\n\n```text\nUser Text\n   │\n   ▼\n[app/model.py]        Hugging Face model on CPU\n   │   └─ produces label + per-class probabilities\n   ▼\n[app/decision.py]     Risk \u0026 policy layer\n   │   └─ combines confidence, score margin, and ambiguity signals into a risk score\n   ▼\n[app/explain.py]      Explanation layer\n   │   └─ explains how confidence, margin, ambiguity, and risk score led to the decision\n   ▼\n[app/api.py]          FastAPI endpoint\n   │   └─ validation, error handling, response schema\n   ▼\nClient / UI\n```\n\n## Example Scenarios (clear vs ambiguous inputs)\n\n### Clear, low-risk sentiment\n\nRequest:\n\n```bash\ncurl -X POST \"http://127.0.0.1:8000/analyze\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"I absolutely love this product, it works perfectly.\"}'\n```\n\nPossible response:\n\n```jsonc\n{\n  \"label\": \"POSITIVE\",\n  \"decision\": \"accepted\",\n  \"confidence\": 0.97,\n  \"margin\": 0.90,\n  \"risk_score\": 0,\n  \"risk_signals\": [],\n  \"explanation\": \"...no risk signals fired; confidence and margin are both strong...\",\n  \"model_name\": \"distilbert-base-uncased-finetuned-sst-2-english\",\n  \"scores\": {\n    \"NEGATIVE\": 0.03,\n    \"POSITIVE\": 0.97\n  }\n}\n```\n\n### Ambiguous, high-risk sentiment\n\nRequest:\n\n```bash\ncurl -X POST \"http://127.0.0.1:8000/analyze\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"The product is great, but the customer service was terrible.\"}'\n```\n\nPossible response:\n\n```jsonc\n{\n  \"label\": \"POSITIVE\",\n  \"decision\": \"needs_human_review\",\n  \"confidence\": 0.86,\n  \"margin\": 0.18,\n  \"risk_score\": 2,\n  \"risk_signals\": [\"low_margin\", \"ambiguity\"],\n  \"explanation\": \"Although confidence is high, the margin is narrow and the text uses contrastive phrasing, so the system defers to human review.\",\n  \"model_name\": \"distilbert-base-uncased-finetuned-sst-2-english\",\n  \"scores\": {\n    \"NEGATIVE\": 0.14,\n    \"POSITIVE\": 0.86\n  }\n}\n```\n\nThese examples highlight how **the same base model** can behave very\ndifferently once wrapped in a risk-aware decision layer.\n\n### Module Layout\n\n- `app/model.py`\n  - Loads the Hugging Face model\n    (`distilbert-base-uncased-finetuned-sst-2-english`).\n  - Provides a small `TextClassifier` wrapper with a `predict(text)` method.\n  - Returns a `TextClassificationResult` with `label`, `confidence`, and\n    `scores` (per-label probabilities).\n\n- `app/decision.py`\n  - Implements the **trust policy** around the model.\n  - Exposes `DEFAULT_CONFIDENCE_THRESHOLD` (0.7 by default).\n  - Computes a `DecisionDetails` object with:\n    - `decision` → `\"accepted\"` or `\"needs_human_review\"`\n    - `threshold` → the threshold used for this request\n    - `margin` → the probability gap between top-1 and top-2 labels\n\n- `app/explain.py`\n  - Generates a **human-readable explanation** string.\n  - Uses:\n    - the model's confidence\n    - the decision threshold and margin\n    - ambiguity and risk score signals from the decision layer\n  - Explanations are **rule-based**, multi-signal, and easy to inspect.\n\n- `app/api.py`\n  - Defines the FastAPI router and the `POST /analyze` endpoint.\n  - Handles input validation, error responses, and output schema.\n  - Delegates work to `model.py`, `decision.py`, and `explain.py`.\n\n- `main.py`\n  - Creates the FastAPI app and includes the router.\n  - Entry point for `uvicorn main:app`.\n\nThis separation mirrors how larger production systems are structured and makes\nit easier to extend or replace individual components later.\n\n---\n\n## How to Run Locally\n\n### 1. Prerequisites\n\n- Python **3.9+** recommended\n- A machine with enough RAM to load a small transformer model (e.g. 2–4 GB)\n- No GPU required; everything runs on CPU.\n\n### 2. Clone and install\n\n```bash\nhttps://github.com/aman179102/trust-aware.git\ncd trust-aware\n\npython -m venv .venv\nsource .venv/bin/activate  # On Windows: .venv\\\\Scripts\\\\activate\n\npip install --upgrade pip\npip install -r requirements.txt\n```\n\n\u003e **Note on PyTorch**: the `torch` dependency installed via `pip` will use a\n\u003e CPU build by default on most systems. If you need a specific build (e.g.\n\u003e CUDA-enabled), follow the instructions on the official PyTorch website.\n\n### 3. Run the API server\n\nFrom the project root (where `main.py` lives):\n\n```bash\nuvicorn main:app --reload\n```\n\nThe API will be available at:\n\n- OpenAPI docs: \u003chttp://127.0.0.1:8000/docs\u003e\n- Raw JSON: `POST http://127.0.0.1:8000/analyze`\n\n---\n(*) Demo Screenshot Given Below :-\n\n\n![showcase](https://github.com/user-attachments/assets/088d8e86-a2b9-4466-b611-a96aa7dd6742)\n\n---\n\n## Intended Use \u0026 Non-Goals\n\n- **Intended use**\n  - As a reference implementation for **trust-aware AI** and\n    human-in-the-loop decision making.\n  - As a local service for teams who want to experiment with\n    **risk-sensitive sentiment analysis** without sending data to the cloud.\n  - As a portfolio / demo project for discussing **AI safety and governance**\n    in interviews or design reviews.\n\n- **Non-goals**\n  - This is **not** a truth engine or a replacement for human judgment.\n  - This is **not** a high-accuracy, domain-tuned sentiment model; it is a\n    small, generic model wrapped in a strong policy layer.\n  - This repository does **not** include automated retraining, calibration, or\n    production observability; those are intentionally called out as future\n    work.\n\n---\n\n## Human-in-the-Loop Philosophy\n\nThis project is intentionally small, but it demonstrates several patterns that\nshow up in real **human-in-the-loop** systems:\n\n- **Model vs. policy separation**\n  - The Hugging Face model only knows how to score text.\n  - The system wraps those scores in a separate **decision layer** that\n    enforces business rules (thresholds, human review).\n\n- **Explicit uncertainty handling**\n  - Instead of just returning a label, the API always returns structured\n    signals: the label, confidence, margin, risk score, and which risk\n    signals fired.\n  - Low-confidence or high-risk predictions are never silently treated as\n    high-confidence truths.\n\n- **Simple, inspectable explanations**\n  - Explanations here are deliberately rule-based and lightweight.\n  - In many production environments, this type of explanation is preferred\n    because it is predictable, auditable, and cheap to compute.\n\n- **Humans stay in control**\n  - The system is designed so that humans, not the model, make the final call\n    in ambiguous or high-risk situations.\n  - The `/analyze` output is structured so that downstream tools or reviewers\n    can build dashboards, queues, or escalation workflows on top.\n\n---\n\n## Future Improvements (sarcasm, domain rules, active learning)\n\nIdeas for future work:\n\n- **Sarcasm and nuanced tone**: extend the ambiguity detector to better handle\n  sarcasm, irony, and subtle tone shifts that can confuse sentiment models.\n- **Domain-specific rules**: layer in domain rules (e.g. finance, healthcare,\n  trust \u0026 safety) that can override the model when certain topics or entities\n  are present.\n- **Active learning and feedback**: add mechanisms for reviewers to provide\n  feedback on high-risk cases and use that feedback to retrain or recalibrate\n  models offline.\n- **Model choice**: swap in a different Hugging Face text classifier (e.g.\n  topic classification, toxicity detection) by changing the model name in\n  `app/model.py`.\n- **Richer policies**: add per-label thresholds or business rules\n  (e.g. always send certain risk labels to human review regardless of\n  confidence).\n- **Logging and monitoring**: log decisions, confidences, and margins to a\n  local store for later analysis.\n- **UI layer**: build a small web or desktop UI that consumes the `/analyze`\n  endpoint.\n- **More advanced explainability**: integrate additional open-source\n  techniques (e.g. saliency maps or attention visualisation) while still\n  staying local and free.\n\n---\n\n## How to Evaluate This System\n\nWhen reviewing this project, it is important to treat it as a\n**trust-first / safety-first** system rather than an accuracy benchmark.\n\n- **Not accuracy-first**\n  - The underlying DistilBERT model is \"good enough\" for sentiment, but the\n    focus of this repository is on the **policy and risk layer** wrapped\n    around it.\n\n- **Trust-first decision logic**\n  - The system explicitly tracks multiple weak signals (confidence, margin,\n    ambiguity, mixed signals) and aggregates them into a **risk_score**.\n  - Automation is only allowed when the cumulative risk is low; otherwise the\n    system defers to human review.\n\n- **Reading `risk_score` and `risk_signals`**\n  - `risk_signals` lists *which* factors fired, e.g.\n    `['low_margin', 'ambiguity']`.\n  - `risk_score` is the **count** of those signals, and it is directly used to\n    choose between `\"accepted\"` and `\"needs_human_review\"`.\n  - Reviewers can quickly see *why* a prediction was considered risky by\n    inspecting these fields and the explanation text.\n\n- **Why high confidence can still trigger review**\n  - Confidence alone can be miscalibrated or misleading, especially for\n    unfamiliar or ambiguous inputs.\n  - If, for example, the text includes strong contrast (\"but\", \"however\") or\n    explicit questions, the system may still route the case to human review\n    even when confidence is high and the margin is reasonable.\n\n- **Why deferring to humans is safer**\n  - In many production environments, the cost of an overconfident automated\n    decision is much higher than the cost of asking a human to review a\n    borderline case.\n  - This project intentionally **optimises for safe behaviour under\n    uncertainty**, making it more suitable as a building block for\n    safety-critical or regulated domains than a typical \"black box\" sentiment\n    API.\n\nTaken together, these properties make the system **provable and reviewable**:\nan engineer, auditor, or hiring manager can understand in a few minutes how\ndecisions are made, which signals drive those decisions, and where the safe\nfallbacks are.\n\n---\n\n## License and Credits\n\n- Built using only **free and open-source tools**.\n- Hugging Face model:\n  [`distilbert-base-uncased-finetuned-sst-2-english`](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english)\n- See the repository LICENSE file for full details (add one if publishing\n  this project publicly).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faman179102%2Ftrust-aware","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faman179102%2Ftrust-aware","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faman179102%2Ftrust-aware/lists"}