{"id":42273099,"url":"https://github.com/coder7475/text2sql","last_synced_at":"2026-01-27T07:30:42.725Z","repository":{"id":318108446,"uuid":"1070012313","full_name":"coder7475/text2sql","owner":"coder7475","description":"Text2SQL Analytics System","archived":false,"fork":false,"pushed_at":"2025-10-17T01:58:03.000Z","size":1752,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-18T04:51:19.882Z","etag":null,"topics":["fastapi","postgresql","pytest","pytest-cov","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/coder7475.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-05T04:36:59.000Z","updated_at":"2025-10-17T01:32:56.000Z","dependencies_parsed_at":"2025-10-05T07:24:38.784Z","dependency_job_id":null,"html_url":"https://github.com/coder7475/text2sql","commit_stats":null,"previous_names":["coder7475/text2sql"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/coder7475/text2sql","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coder7475%2Ftext2sql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coder7475%2Ftext2sql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coder7475%2Ftext2sql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coder7475%2Ftext2sql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/coder7475","download_url":"https://codeload.github.com/coder7475/text2sql/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/coder7475%2Ftext2sql/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28808012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-27T07:14:39.408Z","status":"ssl_error","status_checked_at":"2026-01-27T07:14:39.098Z","response_time":168,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","postgresql","pytest","pytest-cov","python"],"created_at":"2026-01-27T07:30:41.885Z","updated_at":"2026-01-27T07:30:42.719Z","avatar_url":"https://github.com/coder7475.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text2SQL Analytics System\n\n## Project Overview\n\nThe **Text2SQL Analytics System** allows users to interact with a normalized Northwind PostgreSQL database using natural language queries. Queries are converted into SQL via the **Gemini API**, validated for safety, executed on the database, and returned as structured outputs (JSON or pandas DataFrame).\n\n**Architecture Diagram:**\n\n```sh\n[User Input (Natural Language)]\n|\nv\n[Text2SQL Engine - Gemini API]\n|\nv\n[Query Sanitizer \u0026 Validator]\n|\nv\n[PostgreSQL DB]\n|\nv\n[Results: pandas DataFrame / JSON]\n```\n\n## Quickstart\n\n1. Create a Python 3.10+ virtual environment and activate it.\n   ```bash\n   python3 -m venv .venv\n   source .venv/bin/activate   # macOS / Linux\n   .venv\\Scripts\\activate    # Windows (PowerShell)\n   ```\n2. Install dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n3. Fill `.env` based on `.env.example` and start a local Postgres (or Docker Compose).\n4. Use `scripts/setup_database.py` to prepare schema and optionally load CSVs.\n5. Use `run_query.py` to try sample queries (currently uses mocked LLM responses).\n\n## Project Structure\n\n```bash\ntext2sql-analytics/\n├── README.md\n├── requirements.txt\n├── docker-compose\n├── .env.example\n├── .gitignore\n├── setup.py\n├── data/\n│   ├── normalized/*\n│   ├── raw/\n│   │   └── northwind.xlsx\n│   └── schema/\n│       └── schema.sql\n├── src/\n│   ├── __init__.py\n│   ├── config.py\n│   ├── data_loader.py\n│   ├── text2sql_engine.py\n│   ├── query_validator.py\n│   └── utils.py\n├── tests/\n│   ├── __init__.py\n│   ├── conftest.py\n│   ├── test_data_loader.py\n|   |── test_db_connection.py\n|   |── test_mock_utils_db.py\n│   ├── test_query_validator.py\n│   ├── test_text2sql_engine.py\n│   ├── test_utils.py\n│   └── mocks/\n│       ├── mock_gemini_client.py\n├── notebooks/\n│   └── analysis.ipynb\n└── scripts/\n    ├── setup_database.py\n\n```\n\n## Database Setup\n\n### Run with Docker Compose\n\n1. **Ensure Docker is installed and running**\n\n   - [Install Docker](https://docs.docker.com/get-docker/)\n   - [Install Docker Compose](https://docs.docker.com/compose/install/)\n\n2. **Start PostgreSQL via Docker Compose**\n\n   From the project root directory:\n\n   ```bash\n   docker-compose up -d\n   ```\n\n   This will:\n\n   - Start a PostgreSQL 17 container named `text2sql-db-1`\n   - Expose port `5432`\n   - Create a database `northwind_db` with user `northwind_admin` and password `northwind123`\n   - Persist data inside `data/postgres/`\n\n3. **Verify the container is running**\n\n   ```bash\n   docker ps\n   ```\n\n4. **Connect to the database**\n\n   ```bash\n   docker exec -it text2sql-db-1 psql -U northwind_admin -d northwind_db\n   ```\n\n### Environment Configuration\n\n1. Create a `.env` file in the project root (or copy `.env.example`):\n\n   ```bash\n   cp .env.example .env\n   ```\n\n2. Update it with your local database credentials:\n\n   ```bash\n   # Example environment variables\n   DB_HOST=localhost\n   DB_PORT=5432\n   DB_NAME=northwind_db\n   DB_USER=northwind_admin\n   DB_PASSWORD=northwind123\n   GEMINI_API_KEY=your_gemini_api_key_here\n\n   ```\n\n### Verify Database Connection\n\nTo confirm the connection works:\n\n```bash\npython3 tests/test_db_connection.py\n```\n\nIf you see:\n\n```\nConnected to: northwind_db\n```\n\nYou’re good to go!\n\n## Data Model\n\nER Diagram\n\n[![](./Northwind_ER_Diagram.png)]\n\n## Data Engineering\n\nLoad the data\n\n```bash\npython3 src/data_loader.py --excel data/raw/northwind.xlsx\n```\n\n## Text2SQL engine\n\nRun\n\n```sh\npython3 src/text2sql_engine.py\n```\n\noutput:\n\n```json\npython3 src/text2sql_engine.py\nGenerated SQL Query: SELECT * FROM cities;\n/home/fahad/text2sql/src/../src/utils.py:141: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.\n  df = pd.read_sql_query(query, conn)\n\nQuery: Find all unique city names\n\nQuery Results (JSON):\n[\n  {\n    \"city_id\":1,\n    \"city_name\":\"Berlin\",\n    \"region_id\":1,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":2,\n    \"city_name\":\"M\\u00e9xico D.F.\",\n    \"region_id\":2,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":3,\n    \"city_name\":\"London\",\n    \"region_id\":3,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":4,\n    \"city_name\":\"Lule\\u00e5\",\n    \"region_id\":4,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":5,\n    \"city_name\":\"Mannheim\",\n    \"region_id\":1,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":6,\n    \"city_name\":\"Strasbourg\",\n    \"region_id\":5,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":7,\n    \"city_name\":\"Madrid\",\n    \"region_id\":6,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":8,\n    \"city_name\":\"Marseille\",\n    \"region_id\":5,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":9,\n    \"city_name\":\"Tsawassen\",\n    \"region_id\":7,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":10,\n    \"city_name\":\"Buenos Aires\",\n    \"region_id\":8,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n  {\n    \"city_id\":11,\n    \"city_name\":\"Bern\",\n    \"region_id\":9,\n    \"created_at\":1759732635038,\n    \"updated_at\":1759732635038\n  },\n\n ....\n\n  {\n    \"city_id\":95,\n    \"city_name\":\"Annecy\",\n    \"region_id\":5,\n    \"created_at\":1759732635709,\n    \"updated_at\":1759732635709\n  },\n  {\n    \"city_id\":96,\n    \"city_name\":\"Ste-Hyacinthe\",\n    \"region_id\":25,\n    \"created_at\":1759732635709,\n    \"updated_at\":1759732635709\n  },\n  {\n    \"city_id\":97,\n    \"city_name\":\"Colchester\",\n    \"region_id\":45,\n    \"created_at\":1759732830414,\n    \"updated_at\":1759732830414\n  }\n]\n```\n\n## API Design\n\n### Overview\n\nThe Text2SQL Analytics System exposes a RESTful API built with **FastAPI** that converts natural language queries into SQL and executes them against a PostgreSQL database. The API includes built-in security features, rate limiting, and comprehensive error handling.\n\n### Base URL\n\n```\nhttp://localhost:8000\n```\n\n### Authentication \u0026 Security\n\n- **Rate Limiting**: 5 requests per 10 seconds per IP address\n- **Request Timeout**: Monitored via `X-Process-Time` header\n- **SQL Injection Protection**: Built-in query validation and sanitization\n- **Error Handling**: Structured error responses with appropriate HTTP status codes\n\n### Endpoints\n\n#### 1. Health Check\n\n**GET** `/`\n\nReturns the API health status.\n\n**Response:**\n\n```json\n{\n  \"status\": \"ok\",\n  \"message\": \"Text2SQL API running.\"\n}\n```\n\n#### 2. Generate and Execute SQL\n\n**POST** `/generate-sql`\n\nConverts natural language to SQL, validates the query, and executes it against the database.\n\n**Request Body:**\n\n```json\n{\n  \"question\": \"Show all orders shipped in 1997\"\n}\n```\n\n**Response Schema:**\n\n```json\n{\n  \"sql_query\": \"string\", // Raw SQL generated by Gemini\n  \"sanitized_query\": \"string\", // SQL after sanitization\n  \"validate_query\": \"string\", // Final validated SQL\n  \"result_json\": \"string\" // Query results as JSON string\n}\n```\n\n**Example Request:**\n\n```bash\ncurl -X POST \"http://localhost:8000/generate-sql\" \\\n     -H \"Content-Type: application/json\" \\\n     -d '{\"question\": \"Find all customers from Germany\"}'\n```\n\n**Example Response:**\n\n```json\n{\n  \"sql_query\": \"SELECT * FROM customers WHERE country = 'Germany';\",\n  \"sanitized_query\": \"SELECT * FROM customers WHERE country = 'Germany'\",\n  \"validate_query\": \"SELECT * FROM customers WHERE country = 'Germany'\",\n  \"result_json\": \"[{\\\"customer_id\\\":1,\\\"company_name\\\":\\\"Alfreds Futterkiste\\\",\\\"country\\\":\\\"Germany\\\"}]\"\n}\n```\n\n### Error Responses\n\n#### 400 Bad Request\n\n```json\n{\n  \"detail\": \"Validation error: Invalid SQL syntax\"\n}\n```\n\n#### 429 Too Many Requests\n\n```json\n{\n  \"message\": \"Too many requests\"\n}\n```\n\n#### 500 Internal Server Error\n\n```json\n{\n  \"detail\": \"Internal error: Database connection failed\"\n}\n```\n\n### Request/Response Models\n\n**Text2SQLRequest:**\n\n- `question` (string, required): Natural language query\n- Default: \"Show all orders shipped in 1997\"\n\n**SQLResponseModel:**\n\n- `sql_query` (string): Original SQL generated by Gemini API\n- `sanitized_query` (string): SQL after sanitization process\n- `validate_query` (string): Final validated SQL ready for execution\n- `result_json` (string): Query results serialized as JSON\n\n### API Features\n\n#### Middleware Stack\n\n1. **Process Time Tracking**: Adds `X-Process-Time` header to all responses\n2. **Rate Limiting**: IP-based request limiting (5 req/10sec)\n3. **CORS**: Cross-origin resource sharing support\n\n#### Query Processing Pipeline\n\n1. **Natural Language Input**: User provides English question\n2. **Prompt Engineering**: Question converted to optimized prompt\n3. **SQL Generation**: Gemini API generates SQL query\n4. **Sanitization**: Remove dangerous SQL constructs\n5. **Validation**: Ensure SQL syntax and safety\n6. **Execution**: Run validated query against PostgreSQL\n7. **Serialization**: Convert results to JSON format\n\n#### Supported Query Types\n\n- **SELECT queries**: Data retrieval operations\n- **Aggregations**: COUNT, SUM, AVG, MIN, MAX\n- **Joins**: Inner, left, right joins across tables\n- **Filtering**: WHERE clauses with various conditions\n- **Grouping**: GROUP BY with HAVING clauses\n- **Sorting**: ORDER BY operations\n\n#### Blocked Operations\n\n- INSERT, UPDATE, DELETE statements\n- DROP, ALTER, CREATE statements\n- System function calls\n- Subqueries with potential security risks\n\n### Interactive Documentation\n\nFastAPI automatically generates interactive API documentation:\n\n- **Swagger UI**: [http://localhost:8000/docs](http://localhost:8000/docs)\n- **ReDoc**: [http://localhost:8000/redoc](http://localhost:8000/redoc)\n\n## Running the FastAPI Application\n\nTo start the development server with hot-reload, run:\n\n```bash\nuvicorn src.main:app --reload\n```\n\nThe API will be available at [http://localhost:8000](http://localhost:8000).\n\n### Production Deployment\n\nFor production deployment, use:\n\n```bash\n# With specific host and port\nuvicorn src.main:app --host 0.0.0.0 --port 8000\n\n# With multiple workers\nuvicorn src.main:app --host 0.0.0.0 --port 8000 --workers 4\n```\n\n## Testing\n\nFrom root use:\n\n```bash\npytest -v\n```\n\nGenerate text coverage html\n\n```bash\npytest --cov=src --cov-report=html\n```\n\nto see HTML coverage open `htmlcov/index.html` in browser:\n\n```\nhttp://localhost:5500/htmlcov/\n```\n\n## References\n\n- [GenAI Doc](https://ai.google.dev/gemini-api/docs/quickstart)\n- [Prompting_text2SQL](https://medium.com/datamindedbe/prompt-engineering-for-a-better-sql-code-generation-with-llms-263562c0c35d)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoder7475%2Ftext2sql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoder7475%2Ftext2sql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoder7475%2Ftext2sql/lists"}