{"id":18408690,"url":"https://github.com/h2oai/sql-sidekick","last_synced_at":"2025-04-07T09:33:02.891Z","repository":{"id":221102492,"uuid":"632699123","full_name":"h2oai/sql-sidekick","owner":"h2oai","description":"Experiment on QnA tabular data using LLMs and SQL","archived":false,"fork":false,"pushed_at":"2024-10-24T16:56:05.000Z","size":7111,"stargazers_count":28,"open_issues_count":20,"forks_count":5,"subscribers_count":40,"default_branch":"main","last_synced_at":"2025-04-07T04:32:48.864Z","etag":null,"topics":["genai","genai-usecase","llm","llm-chain","llm-framework","llm-tools","llm2sql","sql-query","txt2sql","txt2sql-python-cli","wave"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/h2oai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-26T00:28:21.000Z","updated_at":"2025-04-04T04:12:32.000Z","dependencies_parsed_at":"2024-02-15T07:27:35.986Z","dependency_job_id":"c91c73e0-0528-4109-b3d0-8c956b09ebc5","html_url":"https://github.com/h2oai/sql-sidekick","commit_stats":null,"previous_names":["h2oai/sql-sidekick"],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2oai%2Fsql-sidekick","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2oai%2Fsql-sidekick/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2oai%2Fsql-sidekick/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/h2oai%2Fsql-sidekick/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/h2oai","download_url":"https://codeload.github.com/h2oai/sql-sidekick/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247626533,"owners_count":20969323,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["genai","genai-usecase","llm","llm-chain","llm-framework","llm-tools","llm2sql","sql-query","txt2sql","txt2sql-python-cli","wave"],"created_at":"2024-11-06T03:20:36.650Z","updated_at":"2025-04-07T09:33:02.002Z","avatar_url":"https://github.com/h2oai.png","language":"Python","readme":"# sql-sidekick\nA simple SQL assistant (WIP)\nTurn ★ into ⭐ (top-right corner) if you like the project! 🙏\n\n## Motivation\n- Historically, it’s common for data to be stored in Databases, democratizing insight generation.\n- Enable a helpful assistant to help write complex queries across different database dialects with acceptable efficient execution accuracy (not just matching accuracy)\n- Push to derive consistent generation without errors using smaller OSS models to save on compute costs.\n- Provide a toolkit for users to mix and match different model sizes to optimize compute cost - e.g., smaller models for generation, remote bigger models for syntax correction or spell correction …\n- Build a smart search engine for Databases/structured data, Text to SQL as a Natural Language interface (NLI) for data analysis\n\n\n## Key Features\n- An interactive UI to capture feedback along with a python-client and CLI mode.\n- Ability for auto DB schema generation for input data using custom input format.\n- Support for in-context learning (ICL) pipeline with RAG support to control hallucination\n- Guardrails: to check for SQL injections via SELECT statements, e.g., `SELECT * FROM SleepStudy WHERE user_id = 11 OR 1=1;`\n- Entity mapping/Schema linking: Ability to build memory for mapping business context to the data schema dynamically; **Note: currently enabled only via CLI, others WIP.\n- Ability to save the chat history of query/answer pairs for future reference and improvements.\n- Self-correction loop back: Validates syntactic correction of generation. **Note: Self-correction is currently enabled for all openAI GPT models. WIP for other OSS models.\n- Integration with different database dialects - currently, SQLite/Postgres(_might be broken temporarily_)/Databricks is enabled. WIP to add support for Duckdb and others.\n- Debug mode: Ability to evaluate/modify and validate SQL query against the configured database via UI\n- Recommend sample questions: Often, given a dataset, we are unsure what to ask. To come around this problem, we have enabled the ability to generate recommendations for possible questions.\n\n# Installation\n\n## Requirements\nThis project requires Python version to be within the range \"3.8.1\" to \"3.10.0\". You can check your Python version by running the following command in your terminal:\n\n```\npython --version\n```\nIf your Python version is not within the specified range, you may need to update or downgrade it.\n\n## Dev\n```\n1. git clone git@github.com:h2oai/sql-sidekick.git\n2. cd sql-sidekick\n3. make setup\n4. source ./.sidekickvenv/bin/activate\n5. poetry install (in case there is an error, try `poetry update` before `poetry install`)\n6. python sidekick/prompter.py\n```\n## Usage\n```\nDialect: postgres\n- docker pull postgres (will pull the latest version)\n- docker run --rm --name pgsql-dev -e POSTGRES_PASSWORD=abc -p 5432:5432 postgres\n\nDefault: sqlite\nStep:\n- Download and install .whl --\u003e s3://sql-sidekick/releases/sql_sidekick-0.0.3-py3-none-any.whl\n- python3 -m venv .sidekickvenv\n- source .sidekickvenv/bin/activate\n- python3 -m pip install sql_sidekick-0.0.3-py3-none-any.whl\n```\n## Start\n```\n`sql-sidekick`\n\nWelcome to the SQL Sidekick! I am an AI assistant that helps you with SQL\nqueries. I can help you with the following:\n  0. Generate input schema:\n  `sql-sidekick configure generate_schema configure generate_schema --data_path \"./sample_passenger_statisfaction.csv\" --output_path \"./table_config.jsonl\"`\n\n  1. Configure a local database(for schema validation and syntax checking):\n  `sql-sidekick configure db-setup -t \"\u003clocal_dir_path_to_\u003e/table_info.jsonl\"` (e.g., format --\u003e https://github.com/h2oai/sql-sidekick/blob/main/examples/telemetry/table_info.jsonl)\n\n  2. Ask a question: `sql-sidekick query -q \"avg Gpus\" -s \"\u003clocal_dir_path_to_\u003e/samples.csv\"` (e.g., format --\u003e https://github.com/h2oai/sql-sidekick/blob/main/examples/telemetry/samples.csv)\n\n  3. Learn contextual query/answer pairs: `sql-sidekick learn add-samples` (optional)\n\n  4. Add context as key/value pairs: `sql-sidekick learn update-context` (optional)\n\nOptions:\n  --version  Show the version and exit.\n  --help     Show this message and exit.\n\nCommands:\n  configure  Helps in configuring local database.\n  learn      Helps in learning and building memory.\n  query      Asks question and returns SQL\n```\n\n## UI\n### Steps to start locally\n1. Download wave serve [0.26.3](https://github.com/h2oai/wave/releases/tag/v0.26.3)\n2. `tar -xzf wave-0.26.3-linux-amd64`; `./waved -max-request-size=\"20M\"`\n3. Download the latest bundle: https://github.com/h2oai/sql-sidekick/releases/latest\n4. unzip `ai.h2o.wave.sql-sidekick.x.x.x.wave`\n5. make setup\n6. source ./.sidekickvenv/bin/activate\n7. make run\n\u003cimg width=\"1670\" alt=\"Screen Shot 2023-11-15 at 6 19 14 PM\" src=\"https://github.com/h2oai/sql-sidekick/assets/1318029/5cf8a3ef-0d36-4416-ae2f-52672024fead\"\u003e\n\n## Citation \u0026 Acknowledgment\nPlease consider citing our project if you find it useful:\n\n## Blogs:\nhttps://medium.com/the-story-within/state-of-text-to-sql-dc3e3e4f8c64\n\n```bibtex\n@software{sql-sidekick,\n    title = {{sql-sidekick: A simple SQL assistant}},\n    author = {Pramit Choudhary, Michal Malohlava, Narasimha Durgam, Robin Liu, h2o.ai Team}\n    url = {https://github.com/h2oai/sql-sidekick},\n    year = {2024}\n}\n```\nLLM frameworks adopted: [h2ogpt](https://github.com/h2oai/h2ogpt), [h2ogpte](https://pypi.org/project/h2ogpte/), [LangChain](https://github.com/langchain-ai/langchain), [llama_index](https://github.com/run-llama/llama_index), [openai](https://openai.com/blog/openai-api)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2oai%2Fsql-sidekick","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fh2oai%2Fsql-sidekick","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fh2oai%2Fsql-sidekick/lists"}