{"id":28863144,"url":"https://github.com/aaryan04/advanced-text-to-sql-rag","last_synced_at":"2026-04-15T05:31:44.617Z","repository":{"id":299957993,"uuid":"1004709692","full_name":"Aaryan04/advanced-text-to-sql-rag","owner":"Aaryan04","description":"🚀 Advanced Text-to-SQL RAG System with LangChain, LangGraph \u0026 React - Convert natural language to SQL with AI-powered intelligence","archived":false,"fork":false,"pushed_at":"2025-06-19T05:42:32.000Z","size":985,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-19T06:27:31.288Z","etag":null,"topics":["ai","fastapi","langchain","langgraph","machine-learning","natural-language-processing","python","rag","react","sql","text-to-sql","typescript"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Aaryan04.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-19T04:13:21.000Z","updated_at":"2025-06-19T05:42:35.000Z","dependencies_parsed_at":"2025-06-19T06:38:35.774Z","dependency_job_id":null,"html_url":"https://github.com/Aaryan04/advanced-text-to-sql-rag","commit_stats":null,"previous_names":["aaryan04/advanced-text-to-sql-rag"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Aaryan04/advanced-text-to-sql-rag","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aaryan04%2Fadvanced-text-to-sql-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aaryan04%2Fadvanced-text-to-sql-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aaryan04%2Fadvanced-text-to-sql-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aaryan04%2Fadvanced-text-to-sql-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Aaryan04","download_url":"https://codeload.github.com/Aaryan04/advanced-text-to-sql-rag/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aaryan04%2Fadvanced-text-to-sql-rag/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260898761,"owners_count":23079263,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","fastapi","langchain","langgraph","machine-learning","natural-language-processing","python","rag","react","sql","text-to-sql","typescript"],"created_at":"2025-06-20T07:02:01.518Z","updated_at":"2026-04-15T05:31:44.609Z","avatar_url":"https://github.com/Aaryan04.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧠 Advanced Text-to-SQL RAG System\n\nA sophisticated Text-to-SQL system built with **LangChain**, **LangGraph**, and modern web technologies. This system converts natural language questions into SQL queries using Retrieval-Augmented Generation (RAG) with advanced features like self-correction, query optimization, and real-time execution.\n\n![image](https://github.com/user-attachments/assets/ff18dce0-097e-4b40-b7ba-8114e63a1f0e)\n\n## ✨ Features\n\n### 🔍 Advanced RAG System\n- **Vector embeddings** for schema understanding and query examples\n- **Semantic search** for relevant context retrieval\n- **Example-based learning** with curated query patterns\n- **Schema-aware** query generation\n\n### 🚀 LangGraph Workflow\n- **Multi-step workflow** with validation and self-correction\n- **Automatic retry logic** for failed queries\n- **Query optimization** and performance tuning\n- **Real-time progress tracking** via WebSocket\n\n### 🛡️ Security \u0026 Validation\n- **SQL injection prevention** with comprehensive validation\n- **Query complexity analysis** and safety checks\n- **Execution sandboxing** with result limits\n- **Error handling** and user feedback\n\n### 🎨 Modern UI/UX\n- **Dark theme** with Material-UI components\n- **Real-time query execution** with progress indicators\n- **Interactive data visualization** with charts\n- **Monaco Editor** for SQL syntax highlighting\n- **Responsive design** for all devices\n\n### 📊 Analytics \u0026 Monitoring\n- **Query history** tracking and analysis\n- **Performance metrics** and success rates\n- **Error analysis** and debugging tools\n- **Database schema explorer**\n\n## 🏗️ Architecture\n\n```\n├── backend/                 # FastAPI Backend\n│   ├── main.py             # Application entry point\n│   ├── database/           # Database management\n│   ├── rag/               # RAG system implementation\n│   ├── graph/             # LangGraph workflow\n│   └── utils/             # Utilities and validators\n├── frontend/              # React Frontend\n│   ├── src/\n│   │   ├── components/    # Reusable components\n│   │   ├── pages/         # Page components\n│   │   ├── hooks/         # Custom hooks\n│   │   └── utils/         # API and utilities\n└── requirements.txt       # Python dependencies\n```\n\n## 🚀 Quick Start\n\n### Prerequisites\n- Python 3.9+\n- Node.js 16+\n- OpenAI API key\n- PostgreSQL (optional, SQLite by default)\n\n### Backend Setup\n\n1. **Install dependencies**:\n```bash\npip install -r requirements.txt\n```\n\n2. **Set environment variables**:\n```bash\ncp .env.example .env\n# Edit .env with your OpenAI API key and database settings\n```\n\n3. **Run the backend**:\n```bash\ncd backend\npython main.py\n```\n\nThe backend will start at `http://localhost:8001`\n\n### Frontend Setup\n\n1. **Install dependencies**:\n```bash\ncd frontend\nnpm install\n```\n\n2. **Start the development server**:\n```bash\nnpm start\n```\n\nThe frontend will start at `http://localhost:3000`\n\n## 🎯 Usage Examples\n\n### Basic Queries\n- \"Show all employees in the engineering department\"\n- \"What is the average salary by department?\"\n- \"List all active projects\"\n\n### Complex Analytics\n- \"Show top 5 highest paid employees with their department info\"\n- \"Find employees working on multiple active projects\"\n- \"Compare sales performance by region for this year\"\n\n### Advanced Patterns\n- \"Which departments have budget exceeding the average?\"\n- \"Show project timeline with employee assignments\"\n- \"Analyze salary distribution across departments\"\n\n## 🔧 Configuration\n\n### Environment Variables\n```bash\nOPENAI_API_KEY=your_openai_api_key_here\nDATABASE_URL=postgresql://user:password@localhost:5432/texttosql_db\nREDIS_URL=redis://localhost:6379\nCHROMA_PERSIST_DIRECTORY=./chroma_db\nLOG_LEVEL=INFO\nMAX_QUERY_COMPLEXITY=10\nQUERY_TIMEOUT_SECONDS=30\nENABLE_QUERY_CACHING=true\nEMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2\n```\n\n### Database Configuration\nThe system supports both SQLite (default) and PostgreSQL:\n\n- **SQLite**: Zero configuration, perfect for development\n- **PostgreSQL**: Production-ready with advanced features\n\n## 📊 Sample Data\n\nThe system includes a comprehensive sample database with:\n- **Employees** table with HR data\n- **Departments** with budget information\n- **Projects** with timeline and status\n- **Sales** data with regional breakdown\n- **Relationships** between entities\n\n## 🛠️ Advanced Features\n\n### RAG System\n- **Embedding-based retrieval** using Sentence Transformers\n- **Context-aware** query generation\n- **Few-shot learning** with example queries\n- **Schema documentation** integration\n\n### LangGraph Workflow\n```python\nretrieve_context → generate_sql → validate_sql → optimize_query → execute_query\n                    ↓              ↓             ↓\n                  retry_logic → error_handling → result_processing\n```\n\n### Query Optimization\n- **Automatic LIMIT** addition for performance\n- **Index suggestion** based on query patterns\n- **Redundant condition** removal\n- **Join optimization** hints\n\n### Security Measures\n- **Whitelist-based** SQL validation\n- **Injection attack** prevention\n- **Query complexity** limits\n- **Execution timeout** controls\n\n## 🔍 API Documentation\n\n### Core Endpoints\n- `GET /health` - System health check\n- `POST /query` - Execute natural language query\n- `GET /schema` - Database schema information\n- `GET /tables` - List all tables\n- `GET /query-history` - Query execution history\n- `WS /ws/query` - Real-time query execution\n\n### Example Request\n```json\nPOST /query\n{\n  \"question\": \"Show top 5 employees by salary\",\n  \"include_explanation\": true,\n  \"max_results\": 100\n}\n```\n\n### Example Response\n```json\n{\n  \"sql_query\": \"SELECT * FROM employees ORDER BY salary DESC LIMIT 5\",\n  \"results\": [...],\n  \"explanation\": \"This query selects all columns from the employees table...\",\n  \"confidence_score\": 0.95,\n  \"execution_time\": 0.123,\n  \"metadata\": {\n    \"complexity\": \"simple\",\n    \"validation_passed\": true,\n    \"optimization_applied\": true\n  }\n}\n```\n\n## 🎨 UI Components\n\n### Query Interface\n- **Natural language input** with autocomplete\n- **Real-time execution** with progress tracking\n- **Result visualization** with charts and tables\n- **SQL query display** with syntax highlighting\n\n### Schema Explorer\n- **Interactive table browser**\n- **Column details** with types and constraints\n- **Sample data preview**\n- **Relationship visualization**\n\n### Analytics Dashboard\n- **Query performance** metrics\n- **Success rate** tracking\n- **Error analysis** and debugging\n- **Usage patterns** and trends\n\n## 🚀 Advanced Suggestions\n\n### 1. **Custom Schema Integration**\n```python\n# Add your own database schema\nawait db_manager.add_custom_schema({\n    \"your_table\": {\n        \"columns\": [...],\n        \"relationships\": [...],\n        \"sample_queries\": [...]\n    }\n})\n```\n\n### 2. **Extend RAG Context**\n```python\n# Add domain-specific examples\nrag_system.add_examples([\n    {\n        \"question\": \"Your domain question\",\n        \"sql\": \"SELECT ...\",\n        \"explanation\": \"Domain-specific explanation\"\n    }\n])\n```\n\n### 3. **Custom Validators**\n```python\n# Add business logic validation\nclass CustomValidator(SQLValidator):\n    def validate_business_rules(self, query):\n        # Your custom validation logic\n        pass\n```\n\n### 4. **Performance Monitoring**\n```python\n# Add custom metrics\n@app.middleware(\"http\")\nasync def add_metrics(request, call_next):\n    # Custom monitoring logic\n    pass\n```\n\n## 🤝 Contributing\n\nWe welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.\n\n### Development Setup\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests\n5. Submit a pull request\n\n## 📜 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- **LangChain** for the RAG framework\n- **LangGraph** for workflow orchestration\n- **OpenAI** for language model capabilities\n- **Material-UI** for the beautiful interface\n- **FastAPI** for the high-performance backend\n\n---\n\nBuilt with ❤️ for the AI and Data Science community\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaryan04%2Fadvanced-text-to-sql-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaryan04%2Fadvanced-text-to-sql-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaryan04%2Fadvanced-text-to-sql-rag/lists"}