{"id":49640276,"url":"https://github.com/suranjitpartho/clinical-data-intelligence-system","last_synced_at":"2026-05-22T08:00:50.793Z","repository":{"id":353162953,"uuid":"1206609821","full_name":"suranjitpartho/clinical-data-intelligence-system","owner":"suranjitpartho","description":"An AI Agent for clinical data intelligence. Built with LangGraph, FastAPI, and pgvector to unify structured SQL records and unstructured clinical notes via a self-healing, schema-aware reasoning engine.","archived":false,"fork":false,"pushed_at":"2026-05-18T19:41:47.000Z","size":3322,"stargazers_count":2,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-18T21:51:12.703Z","etag":null,"topics":["ai-agents","clinical-data","fastapi","healthcare-ai","langgraph","llm","pgvector","postgresql","rag","self-healing-ai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/suranjitpartho.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-10T04:44:54.000Z","updated_at":"2026-05-04T13:13:15.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/suranjitpartho/clinical-data-intelligence-system","commit_stats":null,"previous_names":["suranjitpartho/clinical-data-intelligence-system"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/suranjitpartho/clinical-data-intelligence-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suranjitpartho%2Fclinical-data-intelligence-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suranjitpartho%2Fclinical-data-intelligence-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suranjitpartho%2Fclinical-data-intelligence-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suranjitpartho%2Fclinical-data-intelligence-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/suranjitpartho","download_url":"https://codeload.github.com/suranjitpartho/clinical-data-intelligence-system/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suranjitpartho%2Fclinical-data-intelligence-system/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33334777,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-21T12:23:38.849Z","status":"online","status_checked_at":"2026-05-22T02:00:06.671Z","response_time":265,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","clinical-data","fastapi","healthcare-ai","langgraph","llm","pgvector","postgresql","rag","self-healing-ai"],"created_at":"2026-05-05T19:00:27.066Z","updated_at":"2026-05-22T08:00:50.786Z","avatar_url":"https://github.com/suranjitpartho.png","language":"Python","funding_links":[],"categories":["Clinical Software \u0026 EHR"],"sub_categories":[],"readme":"# CLINICAL DATA INTELLIGENCE SYSTEM\n\n*The Clinical Data Intelligence System is an AI platform designed to make clinical information easy to access through simple, natural conversation. It enables doctors and healthcare staff to instantly search patient records, medical notes, and lab results without needing technical database skills. By seamlessly integrating structured database records with unstructured clinical notes, the system automates manual reporting and provides clear insights that help medical teams save time and provide better care for their patients.*\n\n![Release](https://img.shields.io/badge/Release-v1.0-48C784)\n![Platform](https://img.shields.io/badge/Platform-Web-CDDE21)\n![Size](https://img.shields.io/github/repo-size/suranjitpartho/clinical-data-intelligence-system?label=Size\u0026color=E34F79)\n![Last Commit](https://img.shields.io/github/last-commit/suranjitpartho/clinical-data-intelligence-system?label=Last%20Commit\u0026color=F0B960)\n![Top Language](https://img.shields.io/github/languages/top/suranjitpartho/clinical-data-intelligence-system?color=red)\n![Stars](https://img.shields.io/github/stars/suranjitpartho/clinical-data-intelligence-system?label=Stars\u0026style=flat\u0026color=gold)\n![License](https://img.shields.io/github/license/suranjitpartho/clinical-data-intelligence-system?label=License\u0026color=informational)\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./frontend/src/assets/screenshot.png\" alt=\"System Demo\" width=\"100%\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n## Case Study: Solving Clinical Data Fragmentation\n\n\u003e ⭐ **SITUATION:** Clinical environments suffer from fragmented data. Quantitative metrics (billing/labs) live in rigid SQL databases, while qualitative insights (clinical notes) are locked in unstructured text. Clinicians lose hours waiting for manual data pulls, delaying patient care and operational decisions.\n\u003e\n\u003e ⭐ **TARGET:** The mission was to build a \"Clinical Intelligence Layer\" that translates natural language into precise database queries.\n\u003e\n\u003e ⭐ **ACTION:** Engineered a deterministic state machine using *LangGraph* to orchestrate a hybrid retrieval system. Implemented a *Semantically Augmented Data Dictionary* to bridge clinical logic with SQL schemas, integrated *pgvector* for narrative medical searches, and built a *Proactive Discovery \u0026 Self-Healing loop* that autonomously corrects database hallucinations in real-time. Developed a custom *Observability Layer* that mirrors Langfuse Cloud data for node-level latency and costing transparency.\n\u003e\n\u003e ⭐ **RESULT:** Reduced clinical data retrieval workflows from hours to near real-time responses. Built a dual-layer *Reasoning Trace UI* that exposes both the internal logic and the financial cost of every decision, improving trust, accountability, and operational predictability.\n\n\u003cbr\u003e\n\n## Core Capabilities\n\n| Feature | Clinical Benefit |\n| :--- | :--- |\n| **Self-Healing SQL** | Eliminates manual query fixes by autonomously correcting syntax errors. |\n| **Proactive Discovery** | Prevents hallucinations by fetching real categorical values before writing SQL. |\n| **Hybrid Retrieval** | Combines exact lab results with semantic insights from clinical notes. |\n| **Observability Trace** | Provides node-level transparency for latency, token density, and financial cost. |\n| **Contextual Rewrite** | Maintains diagnostic accuracy in multi-turn conversations by resolving pronouns. |\n| **Dimensional Enrichment** | Automatically fetches medical reference ranges (e.g., lab thresholds) to ground AI synthesis in clinical truth. |\n\n\u003cbr\u003e\n\n## Technical Architecture\n\nThe system is built on a modular, state-managed architecture designed for high availability and clinical precision.\n\n![Python](https://img.shields.io/badge/Python-3.11+-1C96E8?logo=python\u0026logoColor=white) ![FastAPI](https://img.shields.io/badge/FastAPI-0.110.0-22C982?logo=fastapi\u0026logoColor=white) ![LangGraph](https://img.shields.io/badge/LangGraph-0.0.30-DBD51D?logo=langchain\u0026logoColor=white) ![React](https://img.shields.io/badge/React-19.2-2572CF?logo=react\u0026logoColor=white) ![Tailwind](https://img.shields.io/badge/Tailwind-4.2-1FD1CB?logo=tailwindcss\u0026logoColor=white) ![PostgreSQL](https://img.shields.io/badge/PostgreSQL-16-7078C4?logo=postgresql\u0026logoColor=white)\n\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./frontend/src/assets/architecture_diagram2.png\" alt=\"System Architecture\" width=\"100%\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n#### End-to-End Request Flow\n\n1.  **Natural Language Input**: User enters a query (e.g., *\"Show abnormal lab results for Patient A\"*).\n2.  **Contextual Rewrite**: The system resolves conversation history and converts ambiguous prompts into standalone, context-rich queries.\n3.  **Intent Routing**: The Orchestrator determines if the request requires *SQL retrieval* (structured labs), *Semantic RAG* (clinical notes), or a *Hybrid response*.\n4.  **Multi-Modal Retrieval**: \n    - **SQL Node**: Queries structured tables using schema-aware logic.\n    - **RAG Node**: Searches clinical notes and protocols using *pgvector*.\n5.  **Validation \u0026 Self-Correction**: Any SQL syntax errors or schema mismatches capture the *PostgreSQL traceback*, triggering an autonomous retry loop for immediate self-correction.\n6.  **Synthesis Layer**: Combines structured data and unstructured evidence into a single, grounded clinical response.\n7.  **Reasoning Trace**: The execution path is exposed to the UI, providing full transparency of the AI’s decision-making process.\n\n\u003cbr\u003e\n\n#### System Stack Overview\n\n| Layer | Component / Tech | Key Responsibility |\n| :--- | :--- | :--- |\n| **Orchestration** | **LangGraph** | Managing state-based clinical reasoning and tool loops. |\n| **Knowledge Layer** | **Data Dictionary** | Mapping natural language to complex clinical business logic. |\n| **Observability** | **Langfuse** | Capturing LLM latency, token usage, and graph execution traces. |\n| **API Backend** | **FastAPI** | Providing high-concurrency, asynchronous API endpoints. |\n| **Knowledge Base** | **pgvector** | Storing medical narratives and protocol embeddings. |\n| **Modern UI** | **React 19** | Delivering a transparent \"Reasoning Trace\" for clinician trust. |\n\n\u003cbr\u003e\n\n## Engineering Deep Dive: Challenges \u0026 Solutions\n\n✴️ **Challenge: Managing Non-Linear Clinical Logic → Solution: State-Machine Orchestration**  \nAt its core, the system utilizes a *LangGraph-driven State Graph* to manage complex reasoning. Unlike basic linear chains, this architecture allows for *directed cycles*, enabling the agent to revisit previous steps if conditions aren't met. This state-managed approach allows the system to generate a *Reasoning Trace*, exposing its internal \"Chain of Thought\" to clinicians for verification before final synthesis.\n\n✴️ **Challenge: Conversational Context Drift → Solution: Recursive Query Transformation**  \nTo support natural, multi-turn dialogue, the system implements an intelligent *Query Rewrite Node*. This node uses LLM-based transformation to turn ambiguous follow-up questions (e.g., *\"What about his labs?\"*) into standalone, context-rich queries (*\"Show laboratory results for Patient X\"*). This prevents \"memory contamination\" and ensures the intent router always receives a clear, precise instruction.\n\n✴️ **Challenge: Fragmented Patient Histories → Solution: Multi-Modal Data Fusion (SQL + RAG)**  \nTo provide a 360-degree patient view, the system implements a *multi-modal retrieval strategy*. It simultaneously pulls quantitative data (billing, labs) via exact-match SQL and qualitative narratives (symptoms, history) via semantic search. By utilizing the *BGE-M3* embedding model and *pgvector*, the system captures subtle medical nuances that traditional keyword search would miss.\n\n✴️ **Challenge: SQL Hallucination \u0026 Syntactic Errors → Solution: Proactive Discovery \u0026 Self-Correction**  \nTo guarantee precision, the system employs *Proactive Schema Discovery* guided by a *schema-aware data dictionary*. Before generating SQL, the agent consults a custom knowledge map that defines complex clinical relationships and business rules (e.g., precise age-calculation logic). It then fetches real-time categorical values from the database to ensure the query is perfectly grounded in live data. If a query fails, an autonomous *Self-Correction Loop* captures the database error and feeds it back to the agent for an immediate, self-healing rewrite.\n\n✴️ **Challenge: Context Loss in Aggregated SQL → Solution: Automated Dimensional Enrichment**  \nAggregating data (e.g., averages) often \"squashes\" clinical context like reference ranges. I engineered a *Metadata-Driven Enrichment Layer* that dynamically injects a separate \"Dimensional Context\" payload into the synthesis layer. This allows the AI to interpret results against clinical ground truth even for highly aggregated queries.\n\n\u003cbr\u003e\n\n## Technical Rationale: Why This Stack?\n\n*   **LangGraph over LangChain**: Unlike standard chains, LangGraph provides the fine-grained control over *cycles and state* required for a non-linear clinical diagnostic flow.\n*   **PostgreSQL + pgvector over Pinecone**: By using pgvector, the system can execute complex SQL joins and semantic vector searches natively within the same database environment. This unified storage ensures data consistency and allows dynamic context passing (e.g., using SQL results to immediately filter vector searches) without relying on external vector databases.\n*   **FastAPI over Django**: Chosen for its high-performance asynchronous capabilities, efficiently orchestrating multiple concurrent LLM calls to deliver near real-time responses critical for medical consultation environments.\n*   **Langfuse over LangSmith**: Chosen for its open-source, self-hostable architecture which is essential for clinical data privacy and HIPAA compliance. Unlike SaaS-only alternatives, Langfuse allows the clinic to maintain full ownership of its telemetry data while providing granular, 5-decimal micro-billing and node-level latency tracking.\n\n\u003cbr\u003e\n\n## Trust \u0026 Transparency\n\n*   **Reasoning Trace**: The system exposes its internal \"Chain of Thought\" to the user, allowing clinicians to verify the logic and node-level execution sequence behind every data retrieval.\n*   **Deep Observability \u0026 System Analytics**: Integrated with *Langfuse Cloud* for real-time telemetry. Features a high-fidelity analytics dashboard that provides sub-second latency tracking, token density analysis, and 5-decimal micro-billing precision for every graph execution.\n*   **Performance \u0026 Financial Audit**: Every reasoning step (Rewrite, Classify, SQL, RAG) is logged with its specific *sub-second latency*, *token density*, and *USD cost*, ensuring transparent audit trails and predictable OPEX for medical departments.\n*   **Deterministic Guardrails**: Using LangGraph, the system enforces a strict state-managed flow, preventing the AI from wandering into \"creative\" or ungrounded responses.\n*   **Clinical Simulation \u0026 Privacy**: To ensure absolute privacy and HIPAA compliance, this system operates on a *proprietary synthetic dataset*. I engineered a custom *Clinical Simulation Engine* that generates high-entropy patient records and longitudinal narratives for rigorous testing.\n\n\u003cbr\u003e\n\n## Project Structure\n\n```text\n├── app/               # FastAPI Backend (Graph logic, Nodes, Models)\n├── frontend/          # React 19 + Tailwind 4 Frontend\n├── migrations/        # SQLAlchemy/Alembic Database Migrations\n├── scripts/           # Data Seeding \u0026 BGE-M3 Embedding Generation\n├── app/services/      # Core Data Dictionary \u0026 AI Prompts\n└── requirements.txt   # Backend dependencies\n```\n\n### System Requirements\n* **RAM:** Minimum 4GB (8GB recommended for local AI execution).\n* **Disk Space:** ~5GB for Docker images and local model storage.\n* **Docker:** Ensure at least 4GB of memory is allocated to Docker.\n\n\u003cbr\u003e\n\n## Installation \u0026 Setup\n\n### Quick Start with Docker (Recommended)\nThis system is fully containerized. Deployment via Docker ensures environment consistency across both the Frontend and Backend services.\n\n\u003e [!NOTE]\n\u003e **Prerequisite:** Ensure your PostgreSQL instance supports the `pgvector` extension. (Standard on Supabase, Neon, and AWS RDS).\n\n**1. Clone the repository**\n```bash\ngit clone https://github.com/suranjitpartho/clinical-data-intelligence-system.git\ncd clinical-data-intelligence-system\n```\n\n**2. Configure environment**  \nPopulate .env with your API keys and DATABASE credentials\n```bash\ncp .env.example .env\n```\n\n**3. Deploy services**\n```bash\ndocker-compose up --build\n```\nOnce initialized, the unified application will be accessible at **http://localhost:8000**.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuranjitpartho%2Fclinical-data-intelligence-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsuranjitpartho%2Fclinical-data-intelligence-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuranjitpartho%2Fclinical-data-intelligence-system/lists"}