{"id":29223907,"url":"https://github.com/shubhojit-mitra-dev/ai-data-analytics","last_synced_at":"2025-09-14T20:06:19.231Z","repository":{"id":293887310,"uuid":"985410008","full_name":"shubhojit-mitra-dev/ai-data-analytics","owner":"shubhojit-mitra-dev","description":"PERN based SQL agent (Not exactly 😅), tried something that failed terribly, but this remains a project that I can definitely refer to later for complex SQL based AI applications. Bug Fixes Remain 🐛","archived":false,"fork":false,"pushed_at":"2025-05-17T20:59:25.000Z","size":102,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-28T07:53:03.639Z","etag":null,"topics":["expressjs","gemini","google-cloud","nodejs","postgresql","react","supabase","vertex-ai"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shubhojit-mitra-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-17T17:49:09.000Z","updated_at":"2025-05-17T21:05:11.000Z","dependencies_parsed_at":"2025-05-17T19:22:52.266Z","dependency_job_id":"9e63a179-bbda-4b21-ba78-6be770c24902","html_url":"https://github.com/shubhojit-mitra-dev/ai-data-analytics","commit_stats":null,"previous_names":["shubhojit-mitra-dev/ai-data-analytics"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/shubhojit-mitra-dev/ai-data-analytics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shubhojit-mitra-dev%2Fai-data-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shubhojit-mitra-dev%2Fai-data-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shubhojit-mitra-dev%2Fai-data-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shubhojit-mitra-dev%2Fai-data-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shubhojit-mitra-dev","download_url":"https://codeload.github.com/shubhojit-mitra-dev/ai-data-analytics/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shubhojit-mitra-dev%2Fai-data-analytics/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275160370,"owners_count":25415767,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["expressjs","gemini","google-cloud","nodejs","postgresql","react","supabase","vertex-ai"],"created_at":"2025-07-03T05:07:15.165Z","updated_at":"2025-09-14T20:06:19.204Z","avatar_url":"https://github.com/shubhojit-mitra-dev.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI SQL Data Agent for Complex Business Analytics\n\nThis application allows users to query a complex business database using natural language questions, which are then translated to SQL by the Gemini Flash AI model and executed against a Supabase database.\n\n## Features\n\n- Natural language to SQL conversion using Google Gemini Flash\n- Complex database schema with intentionally \"dirty\" elements to challenge AI\n- Visualization of query results\n- Detailed explanations of generated SQL\n- Handling of business analytics questions about sales, users, products, and regions\n\n## Project Structure\n\nThe project consists of two main parts:\n\n- **Client**: React frontend with TypeScript and Tailwind CSS\n- **Server**: Node.js backend with Express and TypeScript\n\n## Getting Started\n\n### Prerequisites\n\n- Node.js (v16 or higher)\n- pnpm (or npm/yarn)\n- Supabase account\n- Google Gemini API key\n\n### Setup\n\n1. **Clone the repository**\n\n```bash\ngit clone \u003crepository-url\u003e\ncd ai-data-analytics\n```\n\n2. **Set up environment variables**\n\nCreate `.env` files in both the client and server directories.\n\nFor server:\n```\nSUPABASE_URL=https://nxyvixarrciicytnkrvs.supabase.co\nSUPABASE_ANON_KEY=your-supabase-anon-key\nGEMINI_API_KEY=your-gemini-api-key\nPORT=5000\n```\n\nFor client:\n```\nVITE_API_URL=http://localhost:5000/api\n```\n\n3. **Install dependencies and start the applications**\n\nFor server:\n```bash\ncd server\npnpm install\npnpm dev\n```\n\nFor client:\n```bash\ncd client\npnpm install\npnpm dev\n```\n\n## Database Schema\n\nThe database schema is intentionally complex with some \"dirty\" elements to challenge the AI:\n\n### Tables\n\n1. **regions**: Geographic regions with cryptic columns\n   - Contains boolean flags Q1-Q4 indicating quarterly activity\n   - Has ambiguous x_factor metric\n\n2. **products**: Product catalog\n   - Contains columns with abbreviated names like prd_desc and prd_code\n   - Includes various product status indicators\n\n3. **users**: Customer information\n   - Contains cryptic tier indicators t1, t2, t3\n   - Stores user preferences in JSON format\n\n4. **sales**: Transaction data\n   - Includes cryptic columns c1 (channel), c2 (campaign code)\n   - Contains discount percentage and satisfaction score\n   - Linked to products, users, and regions\n\n5. **inventory**: Stock information\n   - Tracks inventory by product and region\n   - Contains fields for stock thresholds and flags\n\n## Usage Examples\n\nHere are some example questions you can ask:\n\n1. \"Why did sales drop in Q2 in the southern zone?\"\n   ```sql\n   -- Expected SQL:\n   SELECT \n     r.name AS region_name, \n     p.name AS product_name,\n     SUM(CASE WHEN EXTRACT(QUARTER FROM s.date) = 1 THEN s.amount ELSE 0 END) AS q1_sales,\n     SUM(CASE WHEN EXTRACT(QUARTER FROM s.date) = 2 THEN s.amount ELSE 0 END) AS q2_sales,\n     COUNT(DISTINCT s.id) AS transaction_count,\n     AVG(s.dscnt) AS avg_discount\n   FROM\n     sales s\n   JOIN\n     regions r ON s.region_id = r.id\n   JOIN\n     products p ON s.product_id = p.id\n   WHERE\n     r.zone = 'South'\n     AND EXTRACT(YEAR FROM s.date) = 2024\n   GROUP BY\n     r.name, p.name\n   HAVING\n     SUM(CASE WHEN EXTRACT(QUARTER FROM s.date) = 1 THEN s.amount ELSE 0 END) \u003e\n     SUM(CASE WHEN EXTRACT(QUARTER FROM s.date) = 2 THEN s.amount ELSE 0 END)\n   ORDER BY\n     (SUM(CASE WHEN EXTRACT(QUARTER FROM s.date) = 1 THEN s.amount ELSE 0 END) - \n      SUM(CASE WHEN EXTRACT(QUARTER FROM s.date) = 2 THEN s.amount ELSE 0 END)) DESC\n   ```\n\n2. \"Which products had the highest profit margin in Q4?\"\n   ```sql\n   -- Expected SQL:\n   SELECT \n     p.id, \n     p.name AS product_name,\n     p.category,\n     SUM(s.amount) AS total_sales,\n     SUM(s.qty * p.cost) AS total_cost,\n     SUM(s.amount) - SUM(s.qty * p.cost) AS total_profit,\n     (SUM(s.amount) - SUM(s.qty * p.cost)) / SUM(s.amount) * 100 AS profit_margin_percentage\n   FROM \n     sales s\n   JOIN \n     products p ON s.product_id = p.id\n   JOIN \n     regions r ON s.region_id = r.id\n   WHERE \n     r.\"Q4\" = true\n     OR EXTRACT(QUARTER FROM s.date) = 4\n   GROUP BY \n     p.id, p.name, p.category\n   ORDER BY \n     profit_margin_percentage DESC\n   LIMIT 10\n   ```\n\n3. \"Show me the top 5 selling products in New York in 2024\"\n   ```sql\n   -- Expected SQL:\n   SELECT\n     p.id,\n     p.name AS product_name,\n     p.category,\n     SUM(s.qty) AS total_quantity_sold,\n     SUM(s.amount) AS total_sales_amount\n   FROM\n     sales s\n   JOIN\n     products p ON s.product_id = p.id\n   JOIN\n     regions r ON s.region_id = r.id\n   WHERE\n     r.name = 'New York'\n     AND EXTRACT(YEAR FROM s.date) = 2024\n   GROUP BY\n     p.id, p.name, p.category\n   ORDER BY\n     total_quantity_sold DESC\n   LIMIT 5\n   ```\n\n4. \"Compare online vs in-store sales across different regions\"\n   ```sql\n   -- Expected SQL:\n   SELECT\n     r.name AS region_name,\n     r.country,\n     SUM(CASE WHEN s.c1 = 'online' THEN s.amount ELSE 0 END) AS online_sales,\n     SUM(CASE WHEN s.c1 = 'in-store' THEN s.amount ELSE 0 END) AS in_store_sales,\n     COUNT(CASE WHEN s.c1 = 'online' THEN s.id END) AS online_transactions,\n     COUNT(CASE WHEN s.c1 = 'in-store' THEN s.id END) AS in_store_transactions,\n     SUM(CASE WHEN s.c1 = 'online' THEN s.amount ELSE 0 END) / \n       NULLIF(SUM(CASE WHEN s.c1 = 'in-store' THEN s.amount ELSE 0 END), 0) AS online_to_instore_ratio\n   FROM\n     sales s\n   JOIN\n     regions r ON s.region_id = r.id\n   GROUP BY\n     r.name, r.country\n   ORDER BY\n     r.name\n   ```\n\n5. \"Which regions have the most critical inventory levels?\"\n   ```sql\n   -- Expected SQL:\n   SELECT\n     r.name AS region_name,\n     r.country,\n     p.name AS product_name,\n     p.category,\n     i.qty_available,\n     i.min_stock_level,\n     i.qty_available - i.min_stock_level AS stock_buffer,\n     CASE\n       WHEN i.qty_available \u003c= i.min_stock_level THEN 'Critical'\n       WHEN i.qty_available \u003c= i.min_stock_level * 1.2 THEN 'Warning'\n       ELSE 'Normal'\n     END AS stock_status\n   FROM\n     inventory i\n   JOIN\n     regions r ON i.region_id = r.id\n   JOIN\n     products p ON i.product_id = p.id\n   WHERE\n     i.qty_available \u003c= i.min_stock_level * 1.2\n   ORDER BY\n     stock_buffer ASC,\n     r.name\n   LIMIT 15\n   ```\n\n## Implementation Details\n\n- The application uses Supabase for database hosting\n- Gemini Flash API is used for natural language to SQL conversion\n- The backend includes a service to provide the AI with schema information\n- A custom SQL execution function in Supabase allows dynamic query execution\n\n## Limitations\n\n- The visualization component is a placeholder and would need integration with a charting library in a production environment\n- The Gemini Flash model may sometimes generate incorrect SQL for very complex queries\n- Error handling could be improved in a production version\n\n## License\n\n[MIT License](LICENSE)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshubhojit-mitra-dev%2Fai-data-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshubhojit-mitra-dev%2Fai-data-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshubhojit-mitra-dev%2Fai-data-analytics/lists"}