{"id":30358643,"url":"https://github.com/pheonix-19/opsai","last_synced_at":"2026-05-03T09:38:46.612Z","repository":{"id":309783270,"uuid":"1025057309","full_name":"pheonix-19/OpsAI","owner":"pheonix-19","description":"OpsAI (Operational AI) is an intelligent IT support automation platform that uses AI to automatically categorize tickets, suggest solutions, and route requests to the right teams. Built with advanced NLP and machine learning technologies, it integrates with Jira, Slack, and Freshdesk to streamline operational workflows and improve response times.","archived":false,"fork":false,"pushed_at":"2025-08-13T19:02:41.000Z","size":2552,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-13T20:51:28.843Z","etag":null,"topics":["docker","fastapi","freshdesk","grafana","helpdesk-automation","huggingface","lora","machine-learning","nlp","peft","prometheus","sentence-transformers","slack-bot","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pheonix-19.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-23T16:48:23.000Z","updated_at":"2025-08-13T19:02:45.000Z","dependencies_parsed_at":"2025-08-13T20:51:31.197Z","dependency_job_id":"b2c50f79-6d25-4586-a0fb-be5b4f41218c","html_url":"https://github.com/pheonix-19/OpsAI","commit_stats":null,"previous_names":["pheonix-19/opsai"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/pheonix-19/OpsAI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pheonix-19%2FOpsAI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pheonix-19%2FOpsAI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pheonix-19%2FOpsAI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pheonix-19%2FOpsAI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pheonix-19","download_url":"https://codeload.github.com/pheonix-19/OpsAI/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pheonix-19%2FOpsAI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271143306,"owners_count":24706340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-19T02:00:09.176Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker","fastapi","freshdesk","grafana","helpdesk-automation","huggingface","lora","machine-learning","nlp","peft","prometheus","sentence-transformers","slack-bot","transformers"],"created_at":"2025-08-19T11:01:58.634Z","updated_at":"2026-05-03T09:38:46.585Z","avatar_url":"https://github.com/pheonix-19.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🤖 OpsAI: Intelligent IT Support Automation\n\n[![GitHub stars](https://img.shields.io/github/stars/pheonix-19/OpsAI.svg?style=social\u0026label=Star)](https://github.com/pheonix-19/OpsAI)\n[![GitHub forks](https://img.shields.io/github/forks/pheonix-19/OpsAI.svg?style=social\u0026label=Fork)](https://github.com/pheonix-19/OpsAI/fork)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=flat\u0026logo=docker\u0026logoColor=white)](https://www.docker.com/)\n[![FastAPI](https://img.shields.io/badge/FastAPI-005571?style=flat\u0026logo=fastapi)](https://fastapi.tiangolo.com/)\n\n\u003c!--\n    Author: Ayush\n    GitHub: https://github.com/pheonix-19\n    Project: OpsAI - Intelligent IT Support Automation\n    Copyright (c) 2025 Ayush. All rights reserved.\n--\u003e\n\n\u003e **Transform your IT helpdesk with AI-powered ticket triage and resolution suggestions**\n\nOpsAI is an advanced AI system that revolutionizes IT support operations by automatically categorizing tickets, suggesting solutions, and routing requests to the right teams. Using cutting-edge vector embeddings and fine-tuned language models, it learns from historical data to provide instant, contextual support recommendations.\n\n## � **Live Screenshots**\n\n**🖥️ Grafana Dashboard in Action:**\n![Grafana OpsAI Dashboard](asset/grafana.png)\n*Real-time monitoring dashboard showing API metrics, request rates, and system health*\n\n**📊 Prometheus Metrics Collection:**\n![Prometheus Metrics](asset/promethius1.png)\n*Prometheus collecting and displaying OpsAI application metrics*\n\n**⚙️ Prometheus Configuration \u0026 Targets:**\n![Prometheus Targets](asset/promethius2.png)\n*Prometheus monitoring targets and service discovery configuration*\n\n## �📋 **Table of Contents**\n\n- [📸 Live Screenshots](#-live-screenshots)\n- [🏗️ System Architecture](#️-system-architecture--components)\n- [🎯 What Problem Does OpsAI Solve?](#-what-problem-does-opsai-solve)\n- [✨ Core Features](#-core-features)\n- [🚀 Quick Demo](#-quick-demo)\n- [📋 Prerequisites](#-prerequisites)\n- [⚡ Installation \u0026 Setup](#-installation--setup)\n- [🎮 API Endpoints Reference](#-api-endpoints-reference)\n- [📊 Monitoring \u0026 Observability](#-monitoring--observability)\n- [📁 Project Structure](#-project-structure)\n- [🔐 Security \u0026 Secrets Management](#-security--secrets-management)\n- [🐳 Docker \u0026 Deployment](#-docker--deployment)\n- [🔗 Enterprise Integrations](#-enterprise-integrations)\n- [🧪 Testing \u0026 Development](#-testing--development)\n- [🚨 Troubleshooting](#-troubleshooting)\n- [🚀 Quick Start Guide](#-quick-start-guide)\n- [📚 Additional Resources](#-additional-resources)\n- [🤝 Contributing](#-contributing)\n- [📄 License \u0026 Support](#-license--support)Intelligent IT Support Automation\n\n\u003c!--\n    Author: Ayush\n    GitHub: https://github.com/pheonix-19\n    Project: OpsAI - Intelligent IT Support Automation\n    Copyright (c) 2025 Ayush. All rights reserved.\n--\u003e\n\n\u003e **Transform your IT helpdesk with AI-powered ticket triage and resolution suggestions**\n\nOpsAI is an ad## 🔐 **Security \u0026 Secrets Management**anced AI system that revolutionizes IT support operations by automatically categorizing tickets, suggesting solutions, and routing requests to the right teams. Using cutting-edge vector embeddings and fine-tuned language models, it learns from historical data to provide instant, contextual support recommendations.\n\n## 🏗️ **System Architecture \u0026 Components**\n\n```\n┌─────────────────────────────────────────────────────────────────────────────────────┐\n│                           🤖 OpsAI System Architecture                              │\n└─────────────────────────────────────────────────────────────────────────────────────┘\n\n           👤 Users                     🔧 IT Teams                   📊 Stakeholders\n             │                            │                            │\n             ▼                            ▼                            ▼\n┌─────────────────────────────────────────────────────────────────────────────────────┐\n│                               🔗 Integration Layer                                  │\n├────────────────┬────────────────┬────────────────┬────────────────────────────────┤\n│   📋 Jira      │  💬 Slack Bot  │ 🎫 Freshdesk   │     🌐 Custom APIs             │\n│   Webhooks     │  Real-time     │ Ticket Sync    │     REST Endpoints             │\n│   Automation   │  Notifications │ Customer Mgmt   │     External Systems           │\n└────────────────┴────────────────┴────────────────┴────────────────────────────────┘\n             │                            │                            │\n             └──────────────┬─────────────────────────┬─────────────────┘\n                            ▼                         ▼\n┌─────────────────────────────────────────────────────────────────────────────────────┐\n│                            🚀 FastAPI Server (Port 8000)                           │\n├─────────────────────────────────────────────────────────────────────────────────────┤\n│  📍 Endpoints:  /classify  |  /resolve  |  /feedback  |  /metrics  |  /docs        │\n│                      │           │           │           │           │              │\n│                      ▼           ▼           ▼           ▼           ▼              │\n└─────────────────────────────────────────────────────────────────────────────────────┘\n                                        │\n                                        ▼\n┌─────────────────────────────────────────────────────────────────────────────────────┐\n│                              🧠 AI/ML Processing Core                               │\n├──────────────────────┬──────────────────────┬──────────────────────┬──────────────┤\n│   🔍 Vector Search   │   🤖 Language Model   │  🎯 Classification   │ 🔄 Learning  │\n│                      │                      │                      │              │\n│  📊 Embeddings:      │  🧬 Model:           │ 🏷️ Labels:           │ 📈 Training: │\n│  • sentence-trans    │  • GPT-Neo-125M      │ • auth, network      │ • LoRA       │\n│  • all-MiniLM-L6-v2  │  • LoRA Fine-tuned   │ • performance, mail  │ • Adaptation │\n│  • Vector Similarity │  • Context-aware     │ • Team Routing       │ • Feedback   │\n│                      │                      │                      │              │\n│  🗂️ FAISS Index:     │  💭 Generation:      │ 🎯 Mapping:          │ 🔄 Updates:  │\n│  • Fast Search       │  • Solution Suggest  │ • IT Helpdesk        │ • Continuous │\n│  • Metadata Store    │  • Context Tickets   │ • Engineering        │ • Improvement│\n└──────────────────────┴──────────────────────┴──────────────────────┴──────────────┘\n                                        │\n                                        ▼\n┌─────────────────────────────────────────────────────────────────────────────────────┐\n│                               💾 Data Storage Layer                                 │\n├──────────────────────┬──────────────────────┬──────────────────────┬──────────────┤\n│    📁 Raw Data       │   ⚙️ Processed       │   🗂️ Vector Index    │ 🎓 Models    │\n│                      │                      │                      │              │\n│  📄 tickets.csv      │  📋 Normalized:      │  🔍 FAISS Database:  │ 🧬 Weights:  │\n│  📄 tickets.json     │  • ticket_0.json     │  • ticket_index      │ • LoRA       │\n│  📊 Historical Data  │  • ticket_1.json     │  • ticket_meta.pkl   │ • Adapters   │\n│  🔄 Continuous Feed  │  • Clean Format      │  • Fast Retrieval    │ • Fine-tuned │\n│                      │  • Standardized      │  • Similarity Search │ • Checkpoint │\n└──────────────────────┴──────────────────────┴──────────────────────┴──────────────┘\n                                        │\n                                        ▼\n┌─────────────────────────────────────────────────────────────────────────────────────┐\n│                            📊 Monitoring \u0026 Observability                           │\n├──────────────────────┬──────────────────────┬──────────────────────┬──────────────┤\n│  📈 Prometheus       │   📊 Grafana         │   🚨 Alerting        │ 📝 Logging   │\n│  (Port 9090)         │   (Port 3000)        │                      │              │\n│                      │                      │                      │              │\n│  📊 Metrics:         │  📋 Dashboards:      │  🚨 Alerts:          │ 🗂️ Logs:     │\n│  • Request Count     │  • Performance       │  • High Error Rate   │ • API Calls  │\n│  • Response Time     │  • Error Rates       │  • Slow Response     │ • Model Inf. │\n│  • AI Performance   │  • Business KPIs     │  • System Down       │ • Debug Info │\n│  • System Health    │  • Real-time Charts  │  • Auto-notification │ • Audit Trail│\n└──────────────────────┴──────────────────────┴──────────────────────┴──────────────┘\n                                        │\n                                        ▼\n┌─────────────────────────────────────────────────────────────────────────────────────┐\n│                              🐳 Infrastructure Layer                                │\n├──────────────────────┬──────────────────────┬──────────────────────┬──────────────┤\n│   🐳 Docker Setup    │   🐍 Python Env     │   🔥 Hardware        │ ⚙️ CI/CD     │\n│                      │                      │                      │              │\n│  📋 Services:        │  📦 Dependencies:    │  💻 Requirements:    │ 🔄 Pipeline: │\n│  • API Container     │  • transformers      │  • Python 3.11+     │ • GitHub     │\n│  • Prometheus        │  • fastapi           │  • 8GB+ RAM          │ • Actions    │\n│  • Grafana           │  • torch             │  • CUDA GPU (opt)    │ • Testing    │\n│  • Auto-scaling      │  • faiss-cpu         │  • 4GB Disk          │ • Deploy     │\n└──────────────────────┴──────────────────────┴──────────────────────┴──────────────┘\n\n┌─────────────────────────────────────────────────────────────────────────────────────┐\n│                              📈 Data Flow Direction                                 │\n│                                                                                     │\n│  Tickets → Integration → API → AI Processing → Data Storage → Monitoring            │\n│     ↑                                          ↓                     ↓             │\n│  Feedback ←── Solutions ←── Intelligence ←── Training ←── Analytics ←── Metrics     │\n└─────────────────────────────────────────────────────────────────────────────────────┘\n```\n\n## 🎯 **What Problem Does OpsAI Solve?**\n\n### **Before OpsAI (Traditional IT Support):**\n```\nUser reports issue → Manual ticket review → Search past solutions → Assign to team → Resolution\n⏱️ Hours/Days        💰 High cost        🔍 Time-intensive   👥 Manual routing\n```\n\n### **With OpsAI (AI-Powered Support):**\n```\nUser reports issue → AI instant analysis → Auto-suggested solution → Smart team routing → Fast resolution\n⚡ Seconds           💰 Cost efficient   🧠 AI-powered      🎯 Accurate routing\n```\n\n## ✨ **Core Features**\n\n| Feature | Description | Business Impact |\n|---------|-------------|-----------------|\n| 🎯 **Smart Classification** | AI categorizes tickets by type (auth, network, performance) | Automatic team routing |\n| 🧠 **Resolution Suggestions** | Generates solutions based on similar past cases | Faster problem solving |\n| 🔍 **Semantic Search** | Finds relevant tickets using AI understanding, not just keywords | Better context matching |\n| 📊 **Real-time Monitoring** | Prometheus metrics + Grafana dashboards | System health visibility |\n| 🔗 **Enterprise Integration** | Connects with Jira, Slack, Freshdesk | Seamless workflow integration |\n| 🎓 **Continuous Learning** | LoRA fine-tuning adapts to your organization | Improving accuracy over time |\n\n## 🚀 **Quick Demo**\n\n### **Example 1: Ticket Classification**\n```bash\ncurl -X POST \"http://localhost:8000/classify\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"Cannot access email, getting authentication errors\"}'\n```\n**Response:**\n```json\n{\n  \"tags\": [\"auth\", \"mail\", \"user\"],\n  \"teams\": [\"IT Helpdesk\"]\n}\n```\n\n### **Example 2: AI Resolution Suggestion**\n```bash\ncurl -X POST \"http://localhost:8000/resolve\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"Database connection timeout in production\"}'\n```\n**Response:**\n```json\n{\n  \"suggestion\": \"Check database connection pool settings and increase timeout values...\",\n  \"context_tickets\": [{\"title\": \"Similar DB issue\", \"resolution\": \"...\"}]\n}\n```\n\n## 📋 **Prerequisites**\n\n- **Python 3.11+** (tested with 3.12.3)\n- **8GB+ RAM** (for AI model inference)\n- **Docker \u0026 Docker Compose** (for full stack deployment)\n- **CUDA-compatible GPU** (optional, for faster inference)\n- **4GB disk space** (for models and vector index)\n\n## ⚡ **Installation \u0026 Setup**\n\n### **Method 1: Local Development (Recommended for Testing)**\n\n1. **Clone and Setup Environment:**\n```bash\ngit clone https://github.com/pheonix-19/OpsAI.git\ncd OpsAI\n\n# Create virtual environment\npython3 -m venv env\nsource env/bin/activate  # Linux/macOS\n# env\\Scripts\\activate   # Windows\n\n# Install dependencies\npip install -r requirements.txt\npip install -e .\n```\n\n2. **Process Sample Data and Build AI Index:**\n```bash\n# Process the included sample tickets\nPYTHONPATH=. python -m src.ingestion.ingest data/raw data/processed\n\n# Build vector embeddings index for semantic search\nPYTHONPATH=. python -m src.embeddings.build_index --input-dir data/processed --output-dir data/index\n```\n\n3. **Start the API Server:**\n```bash\nPYTHONPATH=. uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload\n```\n\n4. **Access the System:**\n- **API Documentation**: http://localhost:8000/docs\n- **Metrics Endpoint**: http://localhost:8000/metrics\n\n### **Method 2: Full Production Stack (Docker)**\n\n```bash\n# Start complete monitoring stack\ndocker-compose up --build\n\n# Access services:\n# - OpsAI API: http://localhost:8000\n# - Prometheus: http://localhost:9090  \n# - Grafana: http://localhost:3000 (admin/admin)\n```\n\n## 🎮 **API Endpoints Reference**\n\n| Endpoint | Method | Purpose | Example Use Case |\n|----------|--------|---------|------------------|\n| `/` | GET | Health check | Service monitoring |\n| `/classify` | POST | Categorize tickets | Auto-route to teams |\n| `/resolve` | POST | Get AI suggestions | Provide solutions |\n| `/feedback` | POST | Submit user ratings | Improve AI accuracy |\n| `/metrics` | GET | Prometheus metrics | System monitoring |\n\n### **Detailed API Usage:**\n\n#### **🎯 Classify Tickets**\n```bash\ncurl -X POST \"http://localhost:8000/classify\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"Server not responding to ping requests\",\n    \"top_k\": 3\n  }'\n```\n\n#### **🧠 Get AI Resolutions**\n```bash\ncurl -X POST \"http://localhost:8000/resolve\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"text\": \"Application crashes when uploading large files\", \n    \"top_k\": 5\n  }'\n```\n\n#### **📝 Submit Feedback**\n```bash\ncurl -X POST \"http://localhost:8000/feedback\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"ticket\": {\"title\": \"Login issue\", \"description\": \"Cannot access system\"},\n    \"suggestion\": \"Reset password and clear browser cache\",\n    \"rating\": 5,\n    \"comment\": \"Perfect solution, worked immediately!\"\n  }'\n```\n\n## 📊 **Monitoring \u0026 Observability**\n\n### **🔍 Prometheus Metrics**\nOpsAI automatically tracks comprehensive performance metrics:\n\n```bash\n# View current metrics\ncurl http://localhost:8000/metrics | grep opsai\n\n# Example metrics output:\nopsai_requests_total{endpoint=\"/classify\",method=\"POST\"} 5.0\nopsai_request_latency_seconds_sum{endpoint=\"/resolve\"} 2.28\n```\n\n**Key Metrics Tracked:**\n- **Request Volume**: API calls per endpoint per second\n- **Response Times**: Latency percentiles (50th, 90th, 99th)\n- **Error Rates**: Failed requests and status codes\n- **AI Performance**: Model inference times\n- **Business KPIs**: Total tickets processed\n\n### **📈 Grafana Dashboards** ✅ **CONFIGURED \u0026 WORKING**\n\n**✅ Active Dashboards:**\n1. **OpsAI Monitoring Dashboard** - Real-time API metrics\n2. **Prometheus 2.0 Stats** - System performance monitoring  \n3. **Prometheus Stats** - Infrastructure metrics\n\n**📊 Access:** http://localhost:3000 (admin/admin)\n\n**Dashboard Features:**\n- 📊 **Total API Requests**: Live request tracking\n- ⏱️ **Request Rate**: Real-time requests per minute\n- 🚨 **HTTP Status Codes**: Success vs Error monitoring\n- 📈 **Endpoint Breakdown**: Usage analytics by endpoint\n- 🥧 **Visual Analytics**: Interactive charts and tables\n\n### **� Live Screenshots**\n\n**🖥️ Grafana Dashboard in Action:**\n![Grafana OpsAI Dashboard](asset/grafana.png)\n*Real-time monitoring dashboard showing API metrics, request rates, and system health*\n\n**📊 Prometheus Metrics Collection:**\n![Prometheus Metrics](asset/promethius1.png)\n*Prometheus collecting and displaying OpsAI application metrics*\n\n**⚙️ Prometheus Configuration \u0026 Targets:**\n![Prometheus Targets](asset/promethius2.png)\n*Prometheus monitoring targets and service discovery configuration*\n\n### **�🔍 Prometheus Query Examples**\nEssential queries for monitoring (see `PROMETHEUS_QUERIES.md` for complete reference):\n\n```promql\n# Basic metrics\nsum(opsai_requests_total) by (endpoint)          # Total requests by endpoint\nrate(opsai_requests_total[5m])                   # Request rate per second\n\n# Performance monitoring  \navg(opsai_request_latency_seconds) by (endpoint) # Average response time\nhistogram_quantile(0.95, rate(opsai_request_latency_seconds_bucket[5m])) # 95th percentile\n```\n\n## 📁 **Project Structure**\n```\nopsai/\n├── src/                     # Core application code\n│   ├── api/                 # FastAPI endpoints\n│   ├── embeddings/          # Vector search \u0026 FAISS\n│   ├── ingestion/           # Data processing \n│   ├── integrations/        # External APIs (Jira, Slack)\n│   ├── model_training/      # AI model fine-tuning\n│   └── monitoring/          # Prometheus metrics\n├── data/                    # Training data \u0026 indexes\n├── models/                  # LoRA adapters \u0026 weights\n├── tests/                   # Test suite\n└── infra/                   # Docker \u0026 monitoring configs\n```\n\n## � **Security \u0026 Secrets Management**\n\n### **🚨 Important: Managing Secrets in Public Repositories**\n\n⚠️ **NEVER commit actual secrets to your repository!** This guide shows you how to securely manage environment variables and API keys for both local development and CI/CD.\n\n#### **📋 Required vs Optional Credentials**\n\n| **Credential** | **Required For** | **Default Behavior** |\n|---------------|------------------|---------------------|\n| `DATABASE_URL` | Database connection | ✅ Defaults to local SQLite |\n| `OPENAI_API_KEY` | OpenAI features | ⚠️ Optional - features disabled if missing |\n| `HUGGINGFACE_API_TOKEN` | Model downloads | ⚠️ Optional - uses cached/local models |\n| `JIRA_API_TOKEN` | JIRA integration | ⚠️ Only if using JIRA |\n| `SLACK_BOT_TOKEN` | Slack bot | ⚠️ Only if using Slack |\n| `FRESHDESK_API_KEY` | Freshdesk integration | ⚠️ Only if using Freshdesk |\n| `DOCKERHUB_USER/TOKEN` | CI/CD deployment | ⚠️ Only for Docker Hub publishing |\n\n#### **🛡️ Local Development Setup**\n\n1. **Copy environment template:**\n```bash\ncp .env.example .env\n```\n\n2. **Edit `.env` with your actual values (NEVER commit this file):**\n```bash\n# Required only if using specific integrations\nJIRA_URL=\"https://your-company.atlassian.net\"\nJIRA_USER=\"your-email@company.com\"\nJIRA_API_TOKEN=\"your_new_jira_token_here\"\n\nSLACK_BOT_TOKEN=\"xoxb-your-slack-bot-token-here\"\nSLACK_APP_TOKEN=\"xapp-your-slack-app-token-here\"\n\n# Optional - for enhanced AI features\nOPENAI_API_KEY=\"sk-your-openai-key-here\"\nHUGGINGFACE_API_TOKEN=\"hf_your-token-here\"\n```\n\n3. **The `.env` file is automatically ignored by git** (included in `.gitignore`)\n\n#### **🔑 GitHub Secrets for CI/CD**\n\nFor GitHub Actions to work with your secrets:\n\n1. **Go to GitHub Repository Settings**\n2. **Navigate to:** Settings → Secrets and variables → Actions\n3. **Add these secrets** (only the ones you need):\n\n```\n# Docker deployment (required for CI/CD)\nDOCKERHUB_USER=your_dockerhub_username\nDOCKERHUB_TOKEN=your_dockerhub_access_token\n\n# Integration secrets (optional)\nJIRA_API_TOKEN=your_jira_token\nSLACK_BOT_TOKEN=your_slack_token\nFRESHDESK_API_KEY=your_freshdesk_key\n```\n\n#### **✅ Security Best Practices Implemented**\n\n- ✅ **No secrets in source code** - All credentials from environment variables\n- ✅ **Secure config validation** - `src/config.py` handles missing secrets gracefully\n- ✅ **Environment isolation** - Production vs development detection\n- ✅ **CI/CD ready** - GitHub Actions configured with proper secret injection\n- ✅ **Optional integrations** - Core functionality works without external APIs\n\n#### **🔧 Security Configuration Files**\n\n**Key files for security:**\n- `.env.example` - Template with placeholder values (safe to commit)\n- `src/config.py` - Secure configuration management \n- `.gitignore` - Ensures `.env` files are never committed\n- `SECURITY.md` - Complete security guidelines\n\n### **🚨 Token Security Checklist**\n\n- [ ] All real tokens removed from version control\n- [ ] `.env` file exists locally with actual values\n- [ ] GitHub secrets configured for CI/CD\n- [ ] Old/exposed tokens revoked and regenerated\n- [ ] Team members trained on security practices\n\n## 🐳 **Docker \u0026 Deployment**\n\n### **🛠️ Fixed Docker Build Issues**\n\n**Common Docker problems and solutions implemented:**\n\n#### **❌ Problem: Package Version Conflicts**\n```\nERROR: Could not find a version that satisfies the requirement tokenizers==0.21.2\nERROR: No matching distribution found for SQLAlchemy==2.0.23\n```\n\n#### **✅ Solution: Flexible Version Ranges**\nUpdated `requirements.txt` to use compatible version ranges instead of pinned versions:\n\n```python\n# Before (problematic)\ntokenizers==0.21.2\nSQLAlchemy==2.0.23\n\n# After (working)\ntokenizers\u003e=0.13.0,\u003c1.0.0\nSQLAlchemy\u003e=1.4.0,\u003c3.0.0\n```\n\n#### **❌ Problem: Network Timeouts During Build**\n```\npip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool: Read timed out\n```\n\n#### **✅ Solution: Enhanced Dockerfile**\n```dockerfile\n# Install with increased timeout and retries\nRUN pip install --no-cache-dir \\\n    --timeout 1000 \\\n    --retries 5 \\\n    --default-timeout=1000 \\\n    -r requirements.txt\n```\n\n### **🚀 Deployment Options**\n\n#### **Option 1: Quick Development Setup**\n```bash\n# Minimal setup for development\ncp requirements-minimal.txt requirements.txt\ndocker-compose up --build\n```\n\n#### **Option 2: Full Production Stack**\n```bash\n# Complete setup with all features\ndocker-compose up --build\n```\n\n#### **Option 3: Retry Script (Handles Network Issues)**\n```bash\n# Automated retry with fallback to minimal setup\n./docker-build.sh\n```\n\n### **📦 Docker Services Overview**\n\n| **Service** | **Port** | **Purpose** | **Health Check** |\n|-------------|----------|-------------|------------------|\n| `opsai-api` | 8000 | Main application | `curl localhost:8000/` |\n| `prometheus` | 9090 | Metrics collection | `curl localhost:9090/-/healthy` |\n| `grafana` | 3000 | Monitoring dashboards | `curl localhost:3000/api/health` |\n\n### **🔧 Docker Troubleshooting**\n\n**Check service status:**\n```bash\ndocker-compose ps\ndocker-compose logs api\n```\n\n**Restart specific service:**\n```bash\ndocker-compose restart api\ndocker-compose restart prometheus\n```\n\n**Clean rebuild:**\n```bash\ndocker-compose down\ndocker system prune -f\ndocker-compose up --build --no-cache\n```\n\n## 📊 **Monitoring \u0026 Metrics - Complete Setup Guide**\n\n### **🎯 Prometheus Configuration**\n\n**✅ Working Prometheus Setup:**\n\n```yaml\n# infra/prometheus/prometheus.yml\nglobal:\n  scrape_interval: 15s\n\nscrape_configs:\n  - job_name: 'prometheus'      # Self-monitoring\n    static_configs:\n      - targets: ['localhost:9090']\n    \n  - job_name: 'opsai_api'       # Application monitoring\n    static_configs:\n      - targets: ['api:8000']\n```\n\n### **📈 Grafana Dashboard Setup**\n\n**✅ Auto-configured Grafana features:**\n\n1. **Data Source**: Prometheus auto-configured at `http://prometheus:9090`\n2. **Dashboards**: Pre-built OpsAI monitoring dashboard\n3. **Provisioning**: Automatic setup via configuration files\n\n**Access:** http://localhost:3000 (admin/admin)\n\n### **📊 Available Metrics \u0026 Queries**\n\n#### **🚀 OpsAI Application Metrics**\n\n**✅ Confirmed Working Queries:**\n\n```promql\n# Instant metrics (always show data)\nopsai_requests_total                              # Total API requests\nprocess_resident_memory_bytes{job=\"opsai_api\"}   # Memory usage\ntime() - process_start_time_seconds{job=\"opsai_api\"}  # Uptime\nup{job=\"opsai_api\"}                              # Service availability\npython_gc_objects_collected_total{job=\"opsai_api\"}    # Python metrics\n\n# Aggregated metrics\nsum by (endpoint) (opsai_requests_total)         # Requests by endpoint  \nsum by (http_status) (opsai_requests_total)      # Requests by status code\n```\n\n#### **📈 Rate-based Metrics (Need Traffic)**\n\n```promql\n# Generate traffic first: ./generate-traffic.sh\nrate(opsai_requests_total[5m])                   # Request rate\nrate(process_cpu_seconds_total{job=\"opsai_api\"}[5m]) * 100  # CPU usage\nhistogram_quantile(0.95, rate(opsai_request_latency_seconds_bucket[5m]))  # 95th percentile latency\n```\n\n### **🔍 Testing Metrics**\n\n**Generate test traffic:**\n```bash\n# Continuous traffic generation\n./generate-traffic.sh\n\n# Or manual testing\nfor i in {1..20}; do \n  curl -s http://localhost:8000/ \u003e /dev/null\n  curl -s http://localhost:8000/docs \u003e /dev/null\n  sleep 1\ndone\n```\n\n**Verify metrics in Prometheus:**\n```bash\n# Check if metrics are being collected\ncurl -s \"http://localhost:9090/api/v1/query?query=opsai_requests_total\" | jq '.data.result | length'\n\n# Test specific queries\ncurl -s \"http://localhost:9090/api/v1/query?query=up{job=\\\"opsai_api\\\"}\"\n```\n\n### **📋 Grafana Dashboard Features**\n\n**Working dashboard panels:**\n- 📊 **Total API Requests**: Real-time request count\n- ⏱️ **Request Rate**: Requests per minute over time\n- 🥧 **HTTP Status Codes**: Success vs error breakdown  \n- 📈 **Request Latency**: Response time percentiles\n- 💾 **Memory Usage**: RAM consumption tracking\n- ⏰ **Service Uptime**: Time since last restart\n\n### **🚨 Monitoring Troubleshooting**\n\n**If Grafana shows \"No Data\":**\n\n1. **Check Prometheus targets:**\n   ```bash\n   curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'\n   ```\n\n2. **Verify data source in Grafana:**\n   - URL should be: `http://prometheus:9090`\n   - Click \"Save \u0026 Test\" - should show green \"Data source is working\"\n\n3. **Test simple queries in Grafana:**\n   - Start with: `opsai_requests_total`\n   - Set time range to \"Last 15 minutes\"\n   - Enable auto-refresh (5s)\n\n4. **Generate traffic if needed:**\n   ```bash\n   ./generate-traffic.sh\n   ```\n\n### **📈 Custom Dashboard Creation**\n\n**Manual dashboard setup:**\n1. Go to Grafana → \"+\" → Dashboard → Add new panel\n2. Enter query: `opsai_requests_total`\n3. Set visualization type (Time series, Stat, etc.)\n4. Configure time range and refresh interval\n5. Save dashboard\n\n### **🔧 Monitoring Best Practices**\n\n- ✅ **Start simple**: Use instant metrics first (`opsai_requests_total`)\n- ✅ **Generate traffic**: Use `./generate-traffic.sh` for rate metrics\n- ✅ **Check time ranges**: Use \"Last 15 minutes\" for recent data\n- ✅ **Verify targets**: Ensure Prometheus is scraping successfully\n- ✅ **Test queries**: Use Prometheus UI to validate queries before Grafana\n\n## 📚 **Step-by-Step Setup Walkthrough**\n\n### **🚀 Complete Setup from Scratch**\n\n#### **1. Repository Setup**\n```bash\ngit clone https://github.com/pheonix-19/OpsAI.git\ncd OpsAI\n```\n\n#### **2. Security Configuration**\n```bash\n# Copy environment template\ncp .env.example .env\n\n# Edit .env with your actual credentials (optional)\nnano .env\n\n# Verify .env is in .gitignore\ngrep -q \"^\\.env$\" .gitignore \u0026\u0026 echo \"✅ .env properly ignored\"\n```\n\n#### **3. Docker Build (with retry logic)**\n```bash\n# Method 1: Automated retry script\nchmod +x docker-build.sh\n./docker-build.sh\n\n# Method 2: Manual build\ndocker-compose up --build\n\n# Method 3: Minimal build (if having issues)\ncp requirements-minimal.txt requirements.txt\ndocker-compose up --build\n```\n\n#### **4. Verify Services**\n```bash\n# Check all services are running\ndocker-compose ps\n\n# Test API\ncurl http://localhost:8000/\n\n# Test metrics endpoint\ncurl http://localhost:8000/metrics | head -10\n\n# Check Prometheus targets\ncurl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[].health'\n```\n\n#### **5. Setup Monitoring**\n```bash\n# Generate test traffic\n./generate-traffic.sh \u0026\n\n# Open Grafana (admin/admin)\nopen http://localhost:3000\n\n# Open Prometheus\nopen http://localhost:9090\n```\n\n#### **6. Test AI Features**\n```bash\n# Test classification\ncurl -X POST \"http://localhost:8000/classify\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"Cannot login to email account\"}'\n\n# Test resolution suggestions  \ncurl -X POST \"http://localhost:8000/resolve\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"Database connection timeout error\"}'\n```\n\n### **🔧 CI/CD Setup**\n\n#### **GitHub Actions Configuration**\n\nYour repository includes automated CI/CD with these workflows:\n\n**`.github/workflows/ci.yml`** - Tests and builds on every push:\n```yaml\n# Automatically runs:\n- Python linting with flake8\n- Test suite with pytest  \n- Docker image build\n- Deployment to Docker Hub (if secrets configured)\n```\n\n**`.github/workflows/retrain.yml`** - Scheduled model retraining:\n```yaml\n# Runs weekly to:\n- Retrain AI models with new data\n- Update LoRA adapters\n- Upload new model artifacts\n```\n\n#### **Required GitHub Secrets for CI/CD**\n\n**Minimal setup (for basic CI/CD):**\n```\nDOCKERHUB_USER=your_dockerhub_username\nDOCKERHUB_TOKEN=your_dockerhub_access_token\n```\n\n**Full setup (for all integrations):**\n```\nJIRA_API_TOKEN=your_jira_token\nSLACK_BOT_TOKEN=your_slack_token  \nFRESHDESK_API_KEY=your_freshdesk_key\n```\n\n### **📝 Configuration Files Reference**\n\n| **File** | **Purpose** | **When to Edit** |\n|----------|-------------|------------------|\n| `.env.example` | Template for environment variables | Never (contains placeholders) |\n| `.env` | Your actual secrets (not in git) | Add your real credentials |\n| `src/config.py` | Configuration management | Customize app settings |\n| `requirements.txt` | Python dependencies | Add new packages |\n| `docker-compose.yml` | Service orchestration | Modify ports/volumes |\n| `infra/prometheus/prometheus.yml` | Metrics collection | Add monitoring targets |\n\n### **🎯 Quick Validation Checklist**\n\n- [ ] Services start: `docker-compose ps` shows all running\n- [ ] API responds: `curl http://localhost:8000/` returns JSON\n- [ ] Metrics work: `curl http://localhost:8000/metrics` shows data\n- [ ] Prometheus scraping: Targets page shows \"UP\" status\n- [ ] Grafana connected: Data source test succeeds\n- [ ] Dashboards show data: Generate traffic and verify graphs\n- [ ] AI features work: Classification and resolution endpoints respond\n- [ ] Security configured: No real secrets in git, `.env` properly ignored\n\nThis comprehensive setup ensures your OpsAI deployment is secure, monitored, and production-ready! 🎉\n\n## 🔗 **Enterprise Integrations**\n\n### **📋 Jira Integration**\n```bash\n# Environment variables for Jira\nJIRA_URL=https://your-domain.atlassian.net\nJIRA_USER=your-email@company.com\nJIRA_API_TOKEN=your-api-token\n\n# Auto-process tickets from Jira webhooks\n# POST /jira/webhook - Receives ticket updates\n```\n\n### **💬 Slack Bot Integration**\n```bash\n# Slack bot configuration\nSLACK_BOT_TOKEN=xoxb-your-bot-token\nSLACK_APP_TOKEN=xapp-your-app-token\n\n# Start the Slack bot\npython src/integrations/slack_bot.py\n```\n\n## 🧪 **Testing \u0026 Development**\n\n### **Run Test Suite:**\n```bash\n# Run all tests\npytest\n\n# Run specific modules\npytest tests/test_api.py        # API endpoint tests\npytest tests/test_embeddings.py # Vector search tests\npytest tests/test_ingestion.py  # Data processing tests\n```\n\n### **Development Workflow:**\n```bash\n# Hot reload during development\nuvicorn src.api.main:app --reload --port 8000\n\n# Process new training data\npython src/ingestion/ingest.py --input data/raw/new_tickets.csv\n\n# Rebuild search index\npython src/embeddings/build_index.py --input-dir data/processed --output-dir data/index\n```\n\n## 🚨 **Troubleshooting**\n\n### **Common Issues \u0026 Solutions**\n\n**🔧 Device Mismatch Error:**\n```\nRuntimeError: Expected all tensors to be on the same device\n```\n**Solution:** ✅ Fixed in latest version - tensors automatically moved to correct device\n\n**🔧 Import Errors:**\n```\nImportError: attempted relative import with no known parent package\n```\n**Solution:** Use `PYTHONPATH=. python -m src.module.script`\n\n**🔧 Port Already in Use:**\n```\nOSError: [Errno 98] Address already in use\n```\n**Solution:** Use different port: `--port 8001` or kill existing process\n\n### **Debugging Commands**\n```bash\n# Check API health\ncurl http://localhost:8000/\n\n# View current metrics\ncurl http://localhost:8000/metrics | grep opsai\n\n# Check Docker services\ndocker-compose ps\n```\n\n## 🚀 **Quick Start Guide**\n\n**1. Test Basic Classification:**\n```bash\ncurl -X POST \"http://localhost:8000/classify\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"Password reset needed for user account\"}'\n```\n\n**2. Get AI-Powered Solutions:**\n```bash\ncurl -X POST \"http://localhost:8000/resolve\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\": \"Email server connection timeout\"}'\n```\n\n**3. Provide Feedback for Learning:**\n```bash\ncurl -X POST \"http://localhost:8000/feedback\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"ticket\": {\"title\": \"Login issue\"},\n    \"suggestion\": \"Reset password\", \n    \"rating\": 5,\n    \"comment\": \"Perfect solution!\"\n  }'\n```\n\n## 📚 **Additional Resources**\n\n### **🔗 Useful Links**\n- **📖 API Documentation**: http://localhost:8000/docs (when running)\n- **📊 Monitoring**: http://localhost:3000 (Grafana dashboards)  \n- **🔍 Metrics**: http://localhost:9090 (Prometheus)\n- **📋 Query Reference**: See `PROMETHEUS_QUERIES.md` for complete monitoring guide\n- **🐛 Issues**: https://github.com/pheonix-19/OpsAI/issues\n- **💬 Discussions**: https://github.com/pheonix-19/OpsAI/discussions\n\n### **📖 Technical Stack**\n- **Vector Embeddings**: `sentence-transformers/all-MiniLM-L6-v2`\n- **Language Model**: `EleutherAI/gpt-neo-125M` with LoRA fine-tuning\n- **Search Index**: FAISS (Facebook AI Similarity Search)\n- **Monitoring**: Prometheus + Grafana stack\n- **API Framework**: FastAPI with automatic OpenAPI docs\n\n## 🤝 **Contributing**\n\nWe welcome contributions! Here's how to get started:\n\n### **🛠️ Development Setup**\n```bash\n# Fork the repository\ngit clone https://github.com/your-username/OpsAI.git\ncd OpsAI\n\n# Create feature branch\ngit checkout -b feature/amazing-improvement\n\n# Make changes and test\npytest\npre-commit run --all-files\n\n# Submit pull request\ngit push origin feature/amazing-improvement\n```\n\n### **🎯 Contribution Areas**\n- 🐛 **Bug Fixes**: Fix issues and improve stability\n- ✨ **New Features**: Add integrations, UI improvements, ML enhancements\n- 📚 **Documentation**: Improve guides, examples, and API docs\n- 🧪 **Testing**: Add test coverage and performance benchmarks\n\n## 📄 **License \u0026 Support**\n\n### **📜 License**\nThis project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.\n\n### **🆘 Getting Help**\n\n**For Questions:**\n1. 📖 Check this README and API documentation first\n2. 🔍 Search existing [GitHub issues](https://github.com/pheonix-19/OpsAI/issues)\n3. 💬 Start a [GitHub discussion](https://github.com/pheonix-19/OpsAI/discussions)\n4. 🐛 Create a new issue with detailed information\n\n**For Bugs:**\nInclude in your issue:\n- Python version and OS\n- Complete error message and stack trace  \n- Steps to reproduce the problem\n- Expected vs actual behavior\n\n### **🙏 Acknowledgments**\n\n- **Hugging Face**: For transformer models and libraries\n- **FastAPI**: For the excellent web framework\n- **Prometheus \u0026 Grafana**: For monitoring and observability\n- **FAISS**: For efficient vector similarity search\n- **OpenAI/EleutherAI**: For foundation language models\n\n---\n\n## 🎉 **Ready to Transform Your IT Support?**\n\nOpsAI is production-ready and has been tested with real-world IT scenarios. Start with the sample data, then gradually add your organization's historical tickets to improve accuracy.\n\n**Get started in 5 minutes:**\n```bash\ngit clone https://github.com/pheonix-19/OpsAI.git\ncd OpsAI\npython3 -m venv env\nsource env/bin/activate\npip install -r requirements.txt\npip install -e .\nPYTHONPATH=. uvicorn src.api.main:app --reload\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpheonix-19%2Fopsai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpheonix-19%2Fopsai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpheonix-19%2Fopsai/lists"}